The change to print_slot() in cdc6b2f8e7a8d43dcfe0475a9d3498333ea686b8 made
the function return correct errno for read errors while ignoring write errors,
which is not what was intended. This patch tries to rectify things.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
If print_slot() fails, the client will be served an inferior response.
This patch makes sure that such an error will be returned to main(), which
in turn will try to inform about the error in the response itself.
The error is also printed to the cache_log, i.e. stderr, which will make
the error message appear in error_log (atleast when httpd==apache).
Noticed-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
These functions handles EINTR/EAGAIN errors during read/write operations,
which is something cache.c didn't.
While at it, fix a bug in print_slot() where errors during reading from the
cache slot might go by unnoticed.
Noticed-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
We'll need proper return-values from these functions to make the cache
behave correctly (which includes giving proper error messages).
Noticed-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
This new page will list all entries found in the current cache, which is
useful when reviewing the new cache implementation. There are no links to
the new page, but it's reachable by adding 'p=ls_cache' to any cgit url.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
The original caching layer in cgit has no upper bound on the number of
concurrent cache entries, so when cgit is traversed by a spider (like the
googlebot), the cache might end up filling your disk. Also, if any error
occurs in the cache layer, no content is returned to the client.
This patch redesigns the caching layer to avoid these flaws by
* giving the cache a bound number of slots
* disabling the cache for the current request when errors occur
The cache size limit is implemented by hashing the querystring (the cache
lookup key) and generating a cache filename based on this hash modulo the
cache size. In order to detect hash collisions, the full lookup key (i.e.
the querystring) is stored in the cache file (separated from its associated
content by ascii 0).
The cache filename is the reversed 8-digit hexadecimal representation of
hash(key) % cache_size
which should make the filesystem lookup pretty fast (if directory content
is indexed/sorted); reversing the representation avoids the problem where
all keys have equal prefix.
There is a new config option, cache-size, which sets the upper bound for
the cache. Default value for this option is 0, which has the same effect
as setting nocache=1 (hence nocache is now deprecated).
Included in this patch is also a new testfile which verifies that the
new option works as intended.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
The functions found in cache.c are only used by cgit.c, so there's no
point in rebuilding all object files when the cache interface is changed.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
This removes the global variable which is used to keep track of the
currently selected repository, and adds a new variable in the cgit_context
structure.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
This removes another big set of global variables, and introduces the
cgit_prepare_context() function which populates a context-variable with
compile-time default values.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
This struct will hold all the cgit runtime information currently found in
a multitude of global variables.
The first cleanup removes all querystring-related variables.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
The single static buffer makes it impossible to use the result of two
different calls to this function simultaneously. Fix it by using 4
buffers.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
This makes is possible to use repo-urls like '/pub/scm/git/git.git' and
even add path specifications, like '/pub/scm/git/git.git/log/documentation'.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Make sure we chdir(2) back to the original getcwd(2) when a page
has been generated. Also, if the cgit_cache_root do not exist,
try to create it.
This is a feature intended to ease testing/debugging.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Since fmt() uses 8 alternating static buffers, and cache_lock might call
cache_create_dirs() multiple times, which in turn might call fmt() twice,
after four iterations lockfile would be overwritten by a cachedirectory
path.
In worst case, this could cause the cachedirectory to be unlinked and replaced
by a cachefile.
Fix: use xstrdup() on the result from fmt() before assigning to lockfile, and
call free(lockfile) before exit.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
An embarrassing thinko in cgit_check_cache() would truncate valid cachefiles
in the following situation:
1) process A notices a missing/expired cachefile
2) process B gets scheduled, locks, fills and unlocks the cachefile
3) process A gets scheduled, locks the cachefile, notices that the cachefile
now exist/is not expired anymore, and continues to overwrite it with an
empty lockfile.
Thanks to Linus for noticing (again).
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Add a global variable, cgit_max_lock_attemps, to avoid the possibility of
infinite loops when failing to acquire a lockfile. This could happen on
broken setups or under crazy server load.
Incidentally, this also fixes a lurking bug in cache_lock() where an
uninitialized returnvalue was used.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
This enables internal caching of page output.
Page requests are split into four groups:
1) repo listing (front page)
2) repo summary
3) repo pages w/symbolic references in query string
4) repo pages w/constant sha1's in query string
Each group has a TTL specified in minutes. When a page is requested, a cached
filename is stat(2)'ed and st_mtime is compared to time(2). If TTL has expired
(or the file didn't exist), the cached file is regenerated.
When generating a cached file, locking is used to avoid parallell processing
of the request. If multiple processes tries to aquire the same lock, the ones
who fail to get the lock serves the (expired) cached file. If the cached file
don't exist, the process instead calls sched_yield(2) before restarting the
request processing.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>