Instead of having a blacklist and whitelist, we now allow
setting a policy that runs as a chain.
This adds qssb_append_syscalls_policy()
Furthermore, add a feature to decide per syscall which action to take.
This allows now to return an error instead of just killing the process.
In the future, it may allow us to set optimize/shrink the BPF filter.
The arch field is the same for x86_64 and x32, thus checking it
is not enough.
Simply using x32 system calls would allow a bypass. Thus,
we must check whether the system call number is in __X32_SYSCALL_BIT.
This is of course a lazy solution, we could also add the
same system call number + _X32_SYSCALL_BIT to our black/whitelists.
For now however, this however will do.
The filter was missing this check for arch, allowing bypasses
by using different calling conventions of other architectures.
A trivial example is execve() of x86 from and x86_64 process.
The purpose of these new functions is to make it simpler for users
to add new syscalls to the whitelist and blacklist.
The current approach uses a user-supplied pointer which however
was difficult to manage with "no_fs", which may add systemcalls
to the blacklist. Then we must resize arrays, and suddenly
it's our job to free them.
As a bonus, implementing them here allows easier data structure
changes and decreases the chances tgat users of this API
do something wrong, like forgetting -1 at then end, etc.
Landlock can handle write access without it implying read access,
in contrast to the existing bind mounts solution. Hence, remove
ALLOW_READ from ALLOW_WRITE bitmask.
Previously, we needed chroot and bind mounts to enforce path_policies. Therefore,
in the presence of path policies, we had to explicitly create a chroot
dir.
With the coming landlock support, this is not required anymore.
However, one might still want to chroot and bind mount flags. But
path policies don't dictate that anymore.