clone3() is used more and more, but we cannot filter it. We can either
allow it fully or return ENONYS. Some libraries perform fallbacks to the
older clone() in that case, which we can filter again.
Especially with exile_launch(), we will be included
from more than one translation unit. Thus, ODR becomes
a headache now.
So move definitions to exile.c.
Certain functions can fail before we execute exile_enable_policy().
While the return code should be checked, it's easily forgotten. For
most users, checking just the exile_enable_policy() return code
should suffice.
exile_append_path_policies(): Add check whether a path exists. If not,
set the error flag.
This also allows an early exit, allowing to cleanly handle the case
when a path does not exist. Previously, this was only caught
during activation, and a failure there is generally undefined.
Among other differences, pledge() from OpenBSD takes a string
and has exec promises. We don't.
Using the same name yet providing a different interface does not
appear reasonable.
We cannot assume that landlock is enabled if we can compile it.
Even if it's enabled in the kernel it may still not be loaded.
We fill fallback to chroot/bind-mounts if we can.
If we can't (because path policies have landlock-specific options),
we can't do that either.
Closes: #21
This begins a pledge() implementation. This also
retires the previous syscall grouping approach,
as pledge() is the superior mechanism.
Squashed:
test: Begin basic pledge test
pledge: Begin EXILE_SYSCALL_PLEDGE_UNIX/EXILE_SYSCALL_PLEDGE_INET
test: Add pledge socket test
Introduce EXILE_SYSCALL_PLEDGE_DENY_ERROR, remove exile_policy->pledge_policy
pledge: Add PROT_EXEC
Squashed:
test: Adjust existing to new API with arg filters
test: Add tests for low-level seccomp args filter API
test: Add seccomp_filter_mixed()
test: Switch to syscall() everywhere
append_syscall_to_bpf(): Apply EXILE_SYSCALL_EXIT_BPF_NO_MATCH also for sock_filter.jt
qssb.h was a preliminary name and can't be pronounced smoothly.
exile.h is more fitting and it's also short. Something exiled is essentially
something isolated, which is pretty much what this library does (isolation from
resources such as file system, network and others accessible by system calls).
Classify syscalls into groups, for x86_64 only for now.
Up to date for 5.15, generate some #ifndef for syscalls
introduced since 5.10. Only support x86_64 therefore at this point.
Switch from blacklisting to a default whitelist.
Refactor the test logic. Seccomp tests that can be
killed run in their own subprocess now.
All test functions now return 0 on success. Therefore,
the shell script can be simplified.
Instead of having a blacklist and whitelist, we now allow
setting a policy that runs as a chain.
This adds qssb_append_syscalls_policy()
Furthermore, add a feature to decide per syscall which action to take.
This allows now to return an error instead of just killing the process.
In the future, it may allow us to set optimize/shrink the BPF filter.
The arch field is the same for x86_64 and x32, thus checking it
is not enough.
Simply using x32 system calls would allow a bypass. Thus,
we must check whether the system call number is in __X32_SYSCALL_BIT.
This is of course a lazy solution, we could also add the
same system call number + _X32_SYSCALL_BIT to our black/whitelists.
For now however, this however will do.