Commit gráf

100 Commit-ok

Szerző SHA1 Üzenet Dátum
55b43fdaac Rename our 'pledge' mechanism to 'vow'
Among other differences, pledge() from OpenBSD takes a string
and has exec promises. We don't.

Using the same name yet providing a different interface does not
appear reasonable.
2021-12-28 11:05:24 +01:00
6420ca1b40 Add landlock runtime detection
We cannot assume that landlock is enabled if we can compile it.
Even if it's enabled in the kernel it may still not be loaded.

We fill fallback to chroot/bind-mounts if we can.

If we can't (because path policies have landlock-specific options),
we can't do that either.

Closes: #21
2021-12-27 16:51:08 +01:00
98c76089de Handle new 5.16 syscall: futex_waitv 2021-12-27 14:26:37 +01:00
631980b775 Include linux/capability.h instead of sys/capability.h
Some distros put sys/capability.h into libcap-dev or
similiar, which is a bit unforunate, we don't need
libcap-dev or anything like that.

Since we anyway only used the capget()/capset(), we can
just define a simple wrapper and call the syscall directly
and therefore avoid above mentioned issue.
2021-12-27 14:15:50 +01:00
0be081c55d Merge get_pledge_argfilter() with get_pledge_argfilter() 2021-12-27 14:11:58 +01:00
ca0f82790c Use some macros to increase readabiltiy of BPF rules 2021-12-27 12:35:54 +01:00
77adf09d34 test: Add tests for exile_pledge() 2021-12-27 12:35:54 +01:00
bcab0377f1 Add exile_pledge(): A convenience wrapper
exile_pledge() adds seccomp filters derived from the
promises.
2021-12-27 12:35:54 +01:00
b469a82eec pledge: Allow NO_NEW_PRIVS prctls
Retreiving it does no harm. It cannot be unset once set, thus
no harm in allowing to set it either.
2021-12-27 12:35:54 +01:00
6711b394d9 pledge: Add EXILE_SYSCALL_PLEDGE_SECCOMP_INSTALL to allow adding further seccomp filters 2021-12-27 12:35:54 +01:00
9abbc7510c Introduce exile_create_policy(): Creates an clean/empty policy.
exile_create_policy() Creates an empty policy that can be
used by the exile.h API.

exile_init_policy() sets opinionated default values.
2021-12-27 12:35:54 +01:00
029762e894 pledge: Add EXILE_SYSCALL_PLEDGE_IOCTL to allow ioctl() without argfilters 2021-12-27 12:35:54 +01:00
6b513f8339 pledge: Add prctl() default filter 2021-12-27 12:35:54 +01:00
d2357ac676 pledge: Introduce clone() filter and EXILE_SYSCALL_PLEDGE_THREAD 2021-12-27 12:35:54 +01:00
0b0dda0de1 pledge: Begin filter for setsockopt() args 2021-12-27 12:35:54 +01:00
7115ef8b4d Begin an pledge()-like implementation
This begins a pledge() implementation. This also
retires the previous syscall grouping approach,
as pledge() is the superior mechanism.

Squashed:
test: Begin basic pledge test
pledge: Begin EXILE_SYSCALL_PLEDGE_UNIX/EXILE_SYSCALL_PLEDGE_INET
test: Add pledge socket test
Introduce EXILE_SYSCALL_PLEDGE_DENY_ERROR, remove exile_policy->pledge_policy
pledge: Add PROT_EXEC
2021-12-27 12:35:54 +01:00
15a6850023 Begin low-level seccomp arg filter interface
Squashed:
test: Adjust existing to new API with arg filters
test: Add tests for low-level seccomp args filter API
test: Add seccomp_filter_mixed()
test: Switch to syscall() everywhere
append_syscall_to_bpf(): Apply EXILE_SYSCALL_EXIT_BPF_NO_MATCH also for sock_filter.jt
2021-12-27 12:35:54 +01:00
48deab0dde exile_enable_policy(): Only chdir() post chroot() 2021-12-27 12:35:35 +01:00
ce7eb57998 enter_namespaces(): Fix error message 2021-12-27 12:35:35 +01:00
3407fded04 Add EXILE_FS_ALLOW_ALL_{READ,WRITE}
Issue: #19
2021-12-27 00:30:52 +01:00
1b4c5477a5 rename to exile.h
qssb.h was a preliminary name and can't be pronounced smoothly.

exile.h is more fitting and it's also short. Something exiled is essentially
something isolated, which is pretty much what this library does (isolation from
resources such as file system, network and others accessible by system calls).
2021-11-30 18:19:15 +01:00
756b0fb421 rename qssb.h to exile.h 2021-11-30 17:40:36 +01:00
d150c2ecd9 Don't add any seccomp rules by default
Cannot be done properly on a pure syscall basis at this point.

A whitelist is almost certainly too restrictive, which means user
has to manually adjust the policy anyway. Then the default is not
of much use. Or too permissive.

A blacklist has to play catchup with new kernel versions. This may
be be improved upon by blocking all unknown (too new) syscall
numbers. However, in light of the fact we drop caps and set no_new_privs,
it's debtable how much we can gain from a blacklist anyway.

So best to leave it to the user. We also need to allow checking args
too in order to make it easier to build policies. Perhaps get
inspiration from pledge() in OpenBSD.
2021-11-20 20:54:28 +01:00
435bcefa48 test: Skip landlock specific tests if unavailble during compile time 2021-11-20 19:25:30 +01:00
2a4cee2ece test: Use xqssb_enable_policy() throughout where reasonable 2021-11-20 16:56:19 +01:00
d847d0f996 qssb_append_group_syscall_policy(): Make QSSB_SYSCGROUP_NONE an invalid group 2021-11-14 21:46:47 +01:00
1a2443db18 qssb_append_syscalls_policy(): Fix mem leak on failure 2021-11-14 21:46:47 +01:00
db17e58deb Assign syscalls into groups. Add whitelist mode (default).
Classify syscalls into groups, for x86_64 only for now.
Up to date for 5.15, generate some #ifndef for syscalls
introduced since 5.10. Only support x86_64 therefore at this point.

Switch from blacklisting to a default whitelist.
2021-11-14 21:46:47 +01:00
0d7c5bd6d4 append_syscall_to_bpf(): Explicit type cast to fix (C++) warnings 2021-10-25 18:18:31 +02:00
55e1f42ca8 check_policy_sanity(): Initialize last_policy 2021-10-03 21:25:37 +02:00
11d64c6fcf enter_namespaces(): Check fopen/fprintf errors 2021-09-12 20:00:03 +02:00
ebe043c08d Fix missing \n in some error outputs 2021-09-12 19:50:05 +02:00
8bc0d1e73a Use overflow-safe operator builtins
As a precaution as it does not hurt
2021-09-12 19:47:45 +02:00
215032f32c enable_no_fs(): Fix corresponding test by adding missing default policy 2021-09-06 21:43:50 +02:00
411e00715d Rename qssb_append_default_syscall_policy() to better distinguish it from qssb_append_syscall_default_policy() 2021-09-05 17:24:42 +02:00
8a9b1730de test: Remove argc,argv from tests as there was no use for them 2021-09-05 17:12:25 +02:00
b2b501d97e test: Refactor: Put seccomp tests into child processes ; Simplfy .sh
Refactor the test logic. Seccomp tests that can be
killed run in their own subprocess now.

All test functions now return 0 on success. Therefore,
the shell script can be simplified.
2021-09-05 17:12:25 +02:00
26f391f736 test: implement test_seccomp_errno() 2021-09-05 17:12:25 +02:00
68fd1a0a87 test: test_seccomp_blacklisted_call_permitted(): Add missing default policy 2021-09-05 17:12:25 +02:00
b0d0beab22 README.md: Update 2021-09-05 17:12:25 +02:00
c44ce85628 test: Add test ensuring seccomp ends with default rule, minor fixes 2021-09-05 17:12:25 +02:00
25d8ed9bca check_policy_sanity(): Add syscall policy checks 2021-09-05 17:12:25 +02:00
e389140436 test.sh: Log exit code, print yes/no instead of 1/0 2021-09-05 17:12:25 +02:00
f6af1bb78f policy: Add disable_syscall_filter policy. Add defaults only on enable.
Only add default syscall policy when disable_syscall_filter is 0 (default)
and no user-custom policy has been added.
2021-09-05 17:12:25 +02:00
9192ec3aa4 Rewrite syscall policy logic
Instead of having a blacklist and whitelist, we now allow
setting a policy that runs as a chain.

This adds qssb_append_syscalls_policy()

Furthermore, add a feature to decide per syscall which action to take.
This allows now to return an error instead of just killing the process.

In the future, it may allow us to set optimize/shrink the BPF filter.
2021-09-05 17:12:03 +02:00
51844ea3ab bpf: Deny x32 system calls for now
The arch field is the same for x86_64 and x32, thus checking it
is not enough.

Simply using x32 system calls would allow a bypass. Thus,
we must check whether the system call number is in __X32_SYSCALL_BIT.

This is of course a lazy solution, we could also add the
same system call number + _X32_SYSCALL_BIT to our black/whitelists.

For now however, this however will do.
2021-08-12 12:25:12 +02:00
66c6d28dcd bpf: Check arch value
The filter was missing this check for arch, allowing bypasses
by using different calling conventions of other architectures.

A trivial example is execve() of x86 from and x86_64 process.
2021-08-12 11:57:13 +02:00
5cd45c09b7 bpf: Use SECCOMP_RET_KILL_PROCESS instead SECCOMP_RET_KILL
We generally want to kill the process not the thread.
2021-08-12 11:40:29 +02:00
fa06287b13 Use new qssb_append_*_syscall functions, remove old fields 2021-08-12 11:37:19 +02:00
68694723fe Begin qssb_append_*_syscall family of functions
The purpose of these new functions is to make it simpler for users
to add new syscalls to the whitelist and blacklist.

The current approach uses a user-supplied pointer which however
was difficult to manage with "no_fs", which may add systemcalls
to the blacklist. Then we must resize arrays, and suddenly
it's our job to free them.

As a bonus, implementing them here allows easier data structure
changes and decreases the chances tgat users of this API
do something wrong, like forgetting -1 at then end, etc.
2021-08-12 11:37:19 +02:00