36 Commity

Autor SHA1 Zpráva Datum
John Keeping
db9a70b159 cache: use F_SETLK to avoid stale lock files
If CGit is killed while it holds a lock on a cache slot (for example
because it is taking too long to generate a page), the lock file will be
left in place.  This prevents any future attempt to use the same slot
since it will fail to exclusively create the lock file.

Since CGit is the only program that should be manipulating lock files,
we can use advisory locking to detect whether another process is
actually using the lock file or if it is now stale.

I have confirmed that this works on Linux by setting a short TTL in a
custom cgitrc and running the following with CGit patched to print a
message to stderr if the fcntl(2) fails:

	$ export CGIT_CONFIG=$PWD/cgitrc
	$ export QUERY_STRING=url=cgit/tree/ui-shared.c
	$ ./cgit |
		grep -v -e '^<div class=.footer.>' \
			-e '^Last-Modified: ' \
			-e ^'Expires: ' >expect
	$ seq 50000 | dd bs=8192 |
		parallel -j200 "diff -u expect <(./cgit |
			grep -v -e '^<div class=.footer.>' \
				-e '^Last-Modified: ' \
				-e ^'Expires: ') || echo BAD"

This printed the fail message several times without ever printing "BAD".

Signed-off-by: John Keeping <john@keeping.me.uk>
2015-03-03 23:55:27 +01:00
Lukas Fleischer
6ceba453a2 Skip cache slot when time-to-live is zero
If time-to-live is set to zero, we don't need to regenerate the cache
slots on every request. Instead, just skip the caching process and
immediately provide the dynamically generated version of the page.
Setting time-to-live to zero is useful when you want to disable caching
for certain pages.

Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
2014-02-21 01:19:45 +01:00
Sebastian Andrzej Siewior
d3581b5889 cache: use sendfile() instead of a pair of read() + write()
sendfile() does the same job and avoids to copy the content into userland
and back. One has to define NO_SENDFILE in case the OS (kernel / libc)
does not supported. It is disabled by default on non-linux environemnts.
According to the glibc, sendfile64() was added in Linux 2.4 (so it has
been there for a while) but after browsing over the mapage of FreeBSD's I
noticed that the prototype is little different.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
2014-01-19 15:08:49 +01:00
Lukas Fleischer
f60ffa143c Switch to exclusively using global ctx
Drop the context parameter from the following functions (and all static
helpers used by them) and use the global context instead:

* cgit_print_http_headers()
* cgit_print_docstart()
* cgit_print_pageheader()

Remove context parameter from all commands

Drop the context parameter from the following functions (and all static
helpers used by them) and use the global context instead:

* cgit_get_cmd()
* All cgit command functions.
* cgit_clone_info()
* cgit_clone_objects()
* cgit_clone_head()
* cgit_print_plain()
* cgit_show_stats()

In initialization routines, use the global context variable instead of
passing a pointer around locally.

Remove callback data parameter for cache slots

This is no longer needed since the context is always read from the
global context variable.

Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
2014-01-17 00:44:54 +01:00
John Keeping
382ecf152e cache: don't leave cache_slot fields uninitialized
Valgrind says:

==18344== Conditional jump or move depends on uninitialised value(s)
==18344==    at 0x406C83: open_slot (cache.c:63)
==18344==    by 0x407478: cache_ls (cache.c:403)
==18344==    by 0x404C9A: process_request (cgit.c:639)
==18344==    by 0x406BD2: fill_slot (cache.c:190)
==18344==    by 0x4071A0: cache_process (cache.c:284)
==18344==    by 0x404461: main (cgit.c:952)
==18344==  Uninitialised value was created by a stack allocation
==18344==    at 0x40738B: cache_ls (cache.c:375)

This is caused by the keylen field being used to calculate whether or
not a slot is matched.  We never then check the value of this and the
length of data read depends on the key length read from the file so this
isn't dangerous, but it's nice to avoid branching based on uninitialized
data.

Signed-off-by: John Keeping <john@keeping.me.uk>
2014-01-12 19:32:20 +01:00
Lukas Fleischer
f7f26f8875 Update copyright information
* Name "cgit Development Team" as copyright holder to avoid listing
  every single developer.

* Update copyright ranges.

Signed-off-by: Lukas Fleischer <cgit@crytocrack.de>
2014-01-08 15:10:49 +01:00
John Keeping
f32a2da636 cache.c: cache ls_cache output properly
By using the standard library's printf, cache_ls does not redirect its
output to the cache when we change the process' stdout file descriptor
to point to the cache file.  Fix this by using "htmlf" in the same way
that we do for writing HTTP headers.

Signed-off-by: John Keeping <john@keeping.me.uk>
2013-05-22 12:53:06 +02:00
John Keeping
f75900b04f cache.c: fix cache_ls
Commit fb3655d (use struct strbuf instead of static buffers, 2013-04-06)
broke the logic in cache.c::cache_ls by failing to set slot->cache_name
before calling open_slot.

While fixing this, also free the strbufs added by that commit once we're
done with them.

Signed-off-by: John Keeping <john@keeping.me.uk>
2013-05-18 19:45:28 +02:00
John Keeping
fb3655df3b use struct strbuf instead of static buffers
Use "struct strbuf" from Git to remove the limit on file path length.

Notes on scan-tree:
This is slightly involved since I decided to pass the strbuf into
add_repo() and modify if whenever a new file name is required, which
should avoid any extra allocations within that function.  The pattern
there is to append the filename, use it and then reset the buffer to its
original length (retaining a trailing '/').

Notes on ui-snapshot:
Since write_archive modifies the argv array passed to it we
copy the argv_array values into a new array of char* and then free the
original argv_array structure and the new array without worrying about
what the values now look like.

Signed-off-by: John Keeping <john@keeping.me.uk>
2013-04-08 16:12:52 +02:00
Lukas Fleischer
bafab423f2 Mark several functions/variables static
Spotted by parsing the output of `gcc -Wmissing-prototypes [...]`.

Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
2013-03-04 19:50:39 -05:00
Jason A. Donenfeld
bdae1d8a8d White space around control verbs.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2013-03-04 09:12:54 -05:00
Lukas Fleischer
53bc747d31 Fix several whitespace errors
* Remove whitespace at the end of lines.
* Replace space indentation by tabs.
* Add whitespace before/after several operators ("+", "-", "*", ...)
* Add whitespace to assignments ("foo = bar;").
* Fix whitespace in parameter lists ("foobar(foo, bar, 42)").

Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
2013-03-04 01:12:48 +01:00
Ramsay Jones
bdd4a56ad5 Fix some warnings to allow -Werror
The type used to declare the st_size field of a 'struct stat' can
be a 32- or 64-bit sized type, which can vary from one platform to
another, or even from one compilation to another.  In particular,
on linux, if you include the following define:

    #define _FILE_OFFSET_BITS 64

prior to including certain system header files, then the type used
for the st_size field will be __off64_t, otherwise it will be an
__off_t.  Note that the above define is included at the top of
git-compat-util.h.

In cache.c, the "%zd" format specifier expects a "signed size_t",
another type which can vary, when an __off64_t or a __off_t is
provided.  To supress the warning, use the PRIuMAX format specifier
and cast the st_size field to uintmax_t.  This should work an any
platform for which git currently compiles.

In ui-plain.c, the size parameter of sha1_object_info() and
read_sha1_file() is defined to be "unsigned long *" not "size_t *".
So, to supress the warning, simply declare size with the correct type.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-11-06 19:18:07 +01:00
Lars Hjemli
2cecf839a0 cache.c: use %zd for off_t argument
Signed-off-by: Lars Hjemli <hjemli@gmail>
2008-09-01 22:39:00 +02:00
Lars Hjemli
dd7c172542 cache.c: fix error checking in print_slot()
The change to print_slot() in cdc6b2f8e7a8d43dcfe0475a9d3498333ea686b8 made
the function return correct errno for read errors while ignoring write errors,
which is not what was intended. This patch tries to rectify things.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-05-20 17:56:47 +02:00
Lars Hjemli
af2e75616d cache.c: do not ignore errors from print_slot()
If print_slot() fails, the client will be served an inferior response.
This patch makes sure that such an error will be returned to main(), which
in turn will try to inform about the error in the response itself.

The error is also printed to the cache_log, i.e. stderr, which will make
the error message appear in error_log (atleast when httpd==apache).

Noticed-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-05-18 23:59:11 +02:00
Lars Hjemli
cdc6b2f8e7 cache.c: use xread()/xwrite() from libgit
These functions handles EINTR/EAGAIN errors during read/write operations,
which is something cache.c didn't.

While at it, fix a bug in print_slot() where errors during reading from the
cache slot might go by unnoticed.

Noticed-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-05-18 23:34:55 +02:00
Lars Hjemli
d402811bd2 cache.c: make all io-related functions return errno on error
We'll need proper return-values from these functions to make the cache
behave correctly (which includes giving proper error messages).

Noticed-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-05-18 23:16:50 +02:00
Lars Hjemli
6102bcfce4 cache.c: read(2) returns -1 on error, not 0
Noticed-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-05-18 23:10:05 +02:00
Lars Hjemli
9000bbf865 Add page 'ls_cache'
This new page will list all entries found in the current cache, which is
useful when reviewing the new cache implementation. There are no links to
the new page, but it's reachable by adding 'p=ls_cache' to any cgit url.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-04-28 12:10:13 +02:00
Lars Hjemli
939d32fda7 Redesign the caching layer
The original caching layer in cgit has no upper bound on the number of
concurrent cache entries, so when cgit is traversed by a spider (like the
googlebot), the cache might end up filling your disk. Also, if any error
occurs in the cache layer, no content is returned to the client.

This patch redesigns the caching layer to avoid these flaws by
* giving the cache a bound number of slots
* disabling the cache for the current request when errors occur

The cache size limit is implemented by hashing the querystring (the cache
lookup key) and generating a cache filename based on this hash modulo the
cache size. In order to detect hash collisions, the full lookup key (i.e.
the querystring) is stored in the cache file (separated from its associated
content by ascii 0).

The cache filename is the reversed 8-digit hexadecimal representation of

  hash(key) % cache_size

which should make the filesystem lookup pretty fast (if directory content
is indexed/sorted); reversing the representation avoids the problem where
all keys have equal prefix.

There is a new config option, cache-size, which sets the upper bound for
the cache. Default value for this option is 0, which has the same effect
as setting nocache=1 (hence nocache is now deprecated).

Included in this patch is also a new testfile which verifies that the
new option works as intended.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-04-28 11:32:42 +02:00
Lars Hjemli
ee4056bd2c Add cache.h
The functions found in cache.c are only used by cgit.c, so there's no
point in rebuilding all object files when the cache interface is changed.


Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-03-27 09:22:13 +01:00
Lars Hjemli
d1f3bbe9d2 Move cgit_repo into cgit_context
This removes the global variable which is used to keep track of the
currently selected repository, and adds a new variable in the cgit_context
structure.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-02-16 13:56:09 +01:00
Lars Hjemli
b228d4ff82 Add all config variables into struct cgit_context
This removes another big set of global variables, and introduces the
cgit_prepare_context() function which populates a context-variable with
compile-time default values.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-02-16 13:10:50 +01:00
Lars Hjemli
d14d77fe95 Introduce struct cgit_context
This struct will hold all the cgit runtime information currently found in
a multitude of global variables.

The first cleanup removes all querystring-related variables.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2008-02-16 12:07:28 +01:00
Lars Hjemli
72fa5c63f8 cache_safe_filename() needs more buffers
The single static buffer makes it impossible to use the result of two
different calls to this function simultaneously. Fix it by using 4
buffers.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2007-05-18 22:51:02 +02:00
Lars Hjemli
30ccdcaa74 Enable url=value querystring parameter
This makes is possible to use repo-urls like '/pub/scm/git/git.git' and
even add path specifications, like '/pub/scm/git/git.git/log/documentation'.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2007-05-18 22:51:01 +02:00
Lars Hjemli
2c2047ff67 Remove troublesome chars from cachefile names
Add a funtion cache_safe_filename() which replaces possibly bad filename
characters with '_'.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2007-01-12 00:24:35 +01:00
Lars Hjemli
83a5f35a27 Move cache_prepare() to cgit
This moves some cgit-specific stuff away from cache.c

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2007-01-12 00:00:15 +01:00
Lars Hjemli
7c849d94ec Allow relative paths for cgit_cache_root
Make sure we chdir(2) back to the original getcwd(2) when a page
has been generated. Also, if the cgit_cache_root do not exist,
try to create it.

This is a feature intended to ease testing/debugging.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2006-12-16 13:55:58 +01:00
Lars Hjemli
58d04f6523 cache_lock: do xstrdup/free on lockfile
Since fmt() uses 8 alternating static buffers, and cache_lock might call
cache_create_dirs() multiple times, which in turn might call fmt() twice,
after four iterations lockfile would be overwritten by a cachedirectory
path.

In worst case, this could cause the cachedirectory to be unlinked and replaced
by a cachefile.

Fix: use xstrdup() on the result from fmt() before assigning to lockfile, and
call free(lockfile) before exit.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2006-12-12 10:16:41 +01:00
Lars Hjemli
fbaf1171b4 Don't truncate valid cachefiles
An embarrassing thinko in cgit_check_cache() would truncate valid cachefiles
in the following situation:
  1) process A notices a missing/expired cachefile
  2) process B gets scheduled, locks, fills and unlocks the cachefile
  3) process A gets scheduled, locks the cachefile, notices that the cachefile
     now exist/is not expired anymore, and continues to overwrite it with an
     empty lockfile.

Thanks to Linus for noticing (again).

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2006-12-11 22:53:50 +01:00
Lars Hjemli
318d106300 Avoid infinite loops in caching layer
Add a global variable, cgit_max_lock_attemps, to avoid the possibility of
infinite loops when failing to acquire a lockfile. This could happen on
broken setups or under crazy server load.

Incidentally, this also fixes a lurking bug in cache_lock() where an
uninitialized returnvalue was used.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2006-12-11 12:10:12 +01:00
Lars Hjemli
f5069d88df Fix cache algorithm loophole
This closes the door for unneccessary calls to cgit_fill_cache().

Noticed by Linus.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2006-12-11 09:57:58 +01:00
Lars Hjemli
7640d90b73 Add license file and copyright notices
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2006-12-10 22:41:14 +01:00
Lars Hjemli
25105d7eca Add caching infrastructure
This enables internal caching of page output.

Page requests are split into four groups:
  1) repo listing (front page)
  2) repo summary
  3) repo pages w/symbolic references in query string
  4) repo pages w/constant sha1's in query string

Each group has a TTL specified in minutes. When a page is requested, a cached
filename is stat(2)'ed and st_mtime is compared to time(2). If TTL has expired
(or the file didn't exist), the cached file is regenerated.

When generating a cached file, locking is used to avoid parallell processing
of the request. If multiple processes tries to aquire the same lock, the ones
who fail to get the lock serves the (expired) cached file. If the cached file
don't exist, the process instead calls sched_yield(2) before restarting the
request processing.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
2006-12-10 22:31:36 +01:00