Jämför commits
No commits in common. "769f729dc51f2feb8bc3cbb2a48ed91ff2d56bf3" and "8f38dc4480d51e2bf737ef87dd4a4f408d90a8a6" have entirely different histories.
769f729dc5
...
8f38dc4480
65
README.md
65
README.md
@ -1,12 +1,13 @@
|
||||
# exile.h
|
||||
`exile.h` provides an API for processes on Linux to easily isolate themselves in order
|
||||
to mitigate the effect of exploited vulnerabilities, i. e. when attacker has achieved
|
||||
arbitrary code execution. exile.h makes it simpler for developers to use existing technologies such as Seccomp and Linux Namespaces. Those generally require knowledge of details and are not trivial for developers to employ, which prevents a more widespread adoption.
|
||||
`exile.h` is a header-only library, enabling processes to easily isolate themselves on Linux for exploit mitigation purposes. exile.h wants to make existing technologies, such as Seccomp and Linux Namespaces, easier to use. Those generally
|
||||
require knowledge of details and are not trivial for developers to employ, which prevents a more widespread adoption.
|
||||
|
||||
The following section offers small examples. Then the motivation is explained in more detail. Proper API documentation will be maintained in other files.
|
||||
The following section offers small examples. Then the motivation is explained in more detail.
|
||||
Proper API documentation will be maintained in other files.
|
||||
|
||||
## Quick demo
|
||||
This section quickly demonstrates the simplicity of the API. It serves as an overview to get a first impression.
|
||||
This section quickly demonstrates the simplicity of the API. It serves as an overview to get
|
||||
a first impression.
|
||||
|
||||
system() is used to keep the example C code short. It also demonstrates that subprocesses are also subject to restrictions imposed by exile.h.
|
||||
|
||||
@ -39,12 +40,12 @@ int main(void)
|
||||
}
|
||||
```
|
||||
|
||||
The assert() calls won't be fired, consistent with the policy that allows only reading
|
||||
from /home/user. We can write to /tmp/ though as it was specified in the policy.
|
||||
The assert() calls won't be fired, consistent with the policy.
|
||||
|
||||
### vows(): pledge()-like API / System call policies
|
||||
### System call policies / vows
|
||||
exile.h allows specifying which syscalls are permitted or denied. In the following example,
|
||||
'ls' is never executed, as the specified "vows" do not allow the execve() system call. The process will be killed.
|
||||
ls is never executed, as the specified "vows" do not allow the execve() system call. The
|
||||
process will be killed.
|
||||
|
||||
```c
|
||||
#include "exile.h"
|
||||
@ -80,7 +81,7 @@ int main(void)
|
||||
Produces ```curl: (6) Could not resolve host: evil.tld```. For example, this is useful for subprocesses which do not need
|
||||
network access, but perform tasks such as parsing user-supplied file formats.
|
||||
|
||||
### Isolation of single functions (EXPERIMENTAL)
|
||||
### Isolation of single functions
|
||||
Currently, work is being done that hopefully will allow isolation of individual function calls in a mostly pain-free manner.
|
||||
|
||||
Consider the following C++ code:
|
||||
@ -127,28 +128,23 @@ We execute "cat()". The first call succeeds. In the second, we get an exception,
|
||||
the subprocess "cat()" was launched in violated the policy (missing "rpath" vow).
|
||||
|
||||
Naturally, there is a performance overhead. Certain challenges remain, such as the fact
|
||||
that being executed in a subproces, we operate on copies, so handling references
|
||||
that being executed in a subprocess, we operate on copies, so handling references
|
||||
is not something that has been given much thought. There is also the fact
|
||||
that clone()ing from threads opens a can of worms, particularly with locks. Hence, exile_launch() is best avoided in multi-threaded contexts.
|
||||
that clone()ing from threads opens a can of worms, particularly with locks. Hence, exile_launch()
|
||||
is best avoided in multi-threaded contexts.
|
||||
|
||||
## Status
|
||||
No release yet, experimental, API is unstable, builds will break on updates of this library.
|
||||
|
||||
Currently, it's mainly evolving from the needs of my other projects which use exile.h.
|
||||
Currently, it's mainly evolving from the needs of my other projects.
|
||||
|
||||
|
||||
### Real-world usage
|
||||
- looqs: https://github.com/quitesimpleorg/looqs
|
||||
- qswiki: https://gitea.quitesimple.org/crtxcr/qswiki
|
||||
|
||||
|
||||
## Motivation and Background
|
||||
exile.h unlocks existing Linux mechanisms to facilitate isolation of processes from resources. Limiting the scope of what programs can do helps defending the rest of the system when a process gets under attacker's control (when classic mitigations such as ASLR etc. failed). To this end, OpenBSD has the pledge() and unveil() functions available. Those functions are helpful mitigation mechanisms, but such accessible ways are unfortunately not readily available on Linux. This is where exile.h steps in.
|
||||
|
||||
Seccomp allows restricting the system calls available to a process and thus decrease the systems attack surface, but it generally is not easy to use. Requiring BPF filter instructions, you generally just can't make use of it right away without learning
|
||||
about BPF. exile.h provides an API inspired by pledge(), building on top of seccomp. It also provides an interface to manually restrict the system calls that can be issued.
|
||||
Seccomp allows restricting the system calls available to a process and thus decrease the systems attack surface, but it generally is not easy to use. Requiring BPF filter instructions, you generally just can't make use of it right away. exile.h provides an API inspired by pledge(), building on top of seccomp. It also provides an interface to manually restrict the system calls that can be issued.
|
||||
|
||||
Traditional methods employed to restrict file system access, like different uids/gids, chroot, bind-mounts, namespaces etc. may require administrator intervention, are perhaps only suitable for daemons and not desktop applications, or are generally rather involved. As a positive example, Landlock since 5.13 is a vast improvement to limit file system access of processes. It also greatly simplifies exile.h' implementation of fs isolation.
|
||||
Traditional methods employed to restrict file system access, like different uids/gids, chroot, bind-mounts, namespaces etc. may require administrator intervention, are perhaps only suitable
|
||||
for daemons and not desktop applications, or are generally rather involved. As a positive example, Landlock since 5.13 is a vast improvement to limit file system access of processes. It also greatly simplifies exile.h' implementation of fs isolation.
|
||||
|
||||
Abstracting those details may help developers bring sandboxing into their applications.
|
||||
|
||||
@ -174,24 +170,18 @@ It's recommended to start with [README.usage.md] to get a feeling for exile.h.
|
||||
API-Documentation: [README.api.md]
|
||||
|
||||
## Limitations
|
||||
Built upon kernel technologies, exile.h naturally inherits their limitations:
|
||||
|
||||
- New syscalls can be introduced by new kernel versions. exile.h must keep in sync, and users must keep the library up to date.
|
||||
- seccomp has no deep argument inspection (yet), particularly new syscalls
|
||||
cannot be reasonably filtered, such as clone3(), or io_uring.
|
||||
- You can't know what syscalls libraries will issue. An update to existing
|
||||
libraries may cause them to use different syscalls not allowed by a policy. However, using vows and keeping up to date with exile.h should cover that.
|
||||
- Landlock, currently, does not apply to syscalls such as stat().
|
||||
|
||||
TODO:
|
||||
TODO:
|
||||
- seccomp must be kept up to date syscalls kernel
|
||||
- ioctl does not know the fd, so checking values is kind of strange
|
||||
- redundancies: some things are handled by capabilties, other by seccomp or both
|
||||
- seccomp no deep argument inspection
|
||||
- landlock: stat() does not apply
|
||||
- no magic, be reasonable, devs should not get sloppy, restrict IPC.
|
||||
|
||||
## Requirements
|
||||
Kernel >=3.17
|
||||
|
||||
While mostly transparent to users of this API, kernel >= 5.13 is required to take advantage of Landlock. Furthermore, it depends on distro-provided kernels being reasonable and enabling it by default. In practise, Landlock maybe won't be used in some cases so exile.h will use a combination of namespaces, bind mounts and chroot as fallbacks.
|
||||
While mostly transparent to users of this API, kernel >= 5.13 is required to take advantage of Landlock. Furthermore, it depends on distro-provided kernels being reasonable and enabling it by default. In practise, this means that Landlock probably won't be used for now, and exile.h will use a combination of namespaces, bind mounts and chroot as fallbacks.
|
||||
|
||||
|
||||
## FAQ
|
||||
@ -207,8 +197,13 @@ You can thank a Debian-specific kernel patch for that. Execute
|
||||
|
||||
Note that newer releases should not cause this problem any longer, as [explained](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-information.en.html#linux-user-namespaces) in the Debian release notes.
|
||||
|
||||
### Why "vows"?
|
||||
pledge() cannot be properly implemented using seccomp. The "vow" concept here may look similiar, and it is, but it's not pledge().
|
||||
### Real-world usage
|
||||
- looqs: https://gitea.quitesimple.org/crtxcr/looqs
|
||||
- qswiki: https://gitea.quitesimple.org/crtxcr/qswiki
|
||||
|
||||
Outdated:
|
||||
- cgit sandboxed: https://gitea.quitesimple.org/crtxcr/cgitsb
|
||||
- qpdfviewsb sandboxed (quick and dirty): https://gitea.quitesimple.org/crtxcr/qpdfviewsb
|
||||
|
||||
### Other projects
|
||||
- [sandbox2](https://developers.google.com/code-sandboxing/sandbox2/)
|
||||
|
53
exile.c
53
exile.c
@ -361,11 +361,10 @@ inline int exile_landlock_is_available()
|
||||
{
|
||||
#if HAVE_LANDLOCK == 1
|
||||
int ruleset = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION);
|
||||
return ruleset > 0;
|
||||
return ruleset == 1;
|
||||
#endif
|
||||
return 0;
|
||||
}
|
||||
|
||||
int exile_append_syscall_policy(struct exile_policy *exile_policy, long syscall, unsigned int syscall_policy, struct sock_filter *argfilters, size_t n)
|
||||
{
|
||||
struct exile_syscall_policy *newpolicy = (struct exile_syscall_policy *) calloc(1, sizeof(struct exile_syscall_policy));
|
||||
@ -382,7 +381,6 @@ int exile_append_syscall_policy(struct exile_policy *exile_policy, long syscall,
|
||||
{
|
||||
EXILE_LOG_ERROR("Too many argfilters supplied\n");
|
||||
exile_policy->exile_flags |= EXILE_FLAG_ADD_SYSCALL_POLICY_FAIL;
|
||||
free(newpolicy);
|
||||
return -1;
|
||||
}
|
||||
for(size_t i = 0; i < n; i++)
|
||||
@ -390,10 +388,10 @@ int exile_append_syscall_policy(struct exile_policy *exile_policy, long syscall,
|
||||
newpolicy->argfilters[i] = argfilters[i];
|
||||
}
|
||||
newpolicy->next = NULL;
|
||||
|
||||
|
||||
*(exile_policy->syscall_policies_tail) = newpolicy;
|
||||
exile_policy->syscall_policies_tail = &(newpolicy->next);
|
||||
|
||||
|
||||
exile_policy->disable_syscall_filter = 0;
|
||||
return 0;
|
||||
}
|
||||
@ -816,13 +814,11 @@ char *concat_path(const char *first, const char *second)
|
||||
if(written < 0)
|
||||
{
|
||||
EXILE_LOG_ERROR("Error during path concatination\n");
|
||||
free(result);
|
||||
return NULL;
|
||||
}
|
||||
if(written >= PATH_MAX)
|
||||
{
|
||||
EXILE_LOG_ERROR("path concatination truncated\n");
|
||||
free(result);
|
||||
return NULL;
|
||||
}
|
||||
return result;
|
||||
@ -873,18 +869,18 @@ static int perform_mounts(const char *chroot_target_path, struct exile_path_poli
|
||||
{
|
||||
while(path_policy != NULL)
|
||||
{
|
||||
int mount_flags = get_policy_mount_flags(path_policy);
|
||||
|
||||
char *path_inside_chroot = concat_path(chroot_target_path, path_policy->path);
|
||||
if(path_inside_chroot == NULL)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
//all we do is bind mounts
|
||||
mount_flags |= MS_BIND;
|
||||
|
||||
if(path_policy->policy & EXILE_FS_ALLOW_ALL_READ || path_policy->policy & EXILE_FS_ALLOW_ALL_WRITE)
|
||||
{
|
||||
int mount_flags = get_policy_mount_flags(path_policy);
|
||||
|
||||
char *path_inside_chroot = concat_path(chroot_target_path, path_policy->path);
|
||||
if(path_inside_chroot == NULL)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
//all we do is bind mounts
|
||||
mount_flags |= MS_BIND;
|
||||
|
||||
int ret = mount(path_policy->path, path_inside_chroot, NULL, mount_flags, NULL);
|
||||
if(ret < 0 )
|
||||
{
|
||||
@ -901,10 +897,9 @@ static int perform_mounts(const char *chroot_target_path, struct exile_path_poli
|
||||
free(path_inside_chroot);
|
||||
return ret;
|
||||
}
|
||||
|
||||
path_policy = path_policy->next;
|
||||
free(path_inside_chroot);
|
||||
}
|
||||
path_policy = path_policy->next;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
@ -1451,20 +1446,7 @@ static void close_file_fds()
|
||||
long max_files = sysconf(_SC_OPEN_MAX);
|
||||
for(long i = 3; i <= max_files; i++)
|
||||
{
|
||||
struct stat statbuf;
|
||||
int fd = (int) max_files;
|
||||
int result = fstat(i, &statbuf);
|
||||
if(result == -1 && errno != EBADF && errno != EACCES)
|
||||
{
|
||||
EXILE_LOG_ERROR("Could not fstat %i: %s\n", fd, strerror(errno));
|
||||
abort();
|
||||
}
|
||||
int type = statbuf.st_mode & S_IFMT;
|
||||
if(type != S_IFIFO && type != S_IFSOCK)
|
||||
{
|
||||
/* No error check, retrying not recommended */
|
||||
close(fd);
|
||||
}
|
||||
close((int)i);
|
||||
}
|
||||
}
|
||||
|
||||
@ -1527,11 +1509,6 @@ int exile_enable_policy(struct exile_policy *policy)
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if(policy->keep_fds_open != 1)
|
||||
{
|
||||
close_file_fds();
|
||||
}
|
||||
|
||||
if(enter_namespaces(policy->namespace_options) < 0)
|
||||
{
|
||||
EXILE_LOG_ERROR("Error while trying to enter namespaces\n");
|
||||
|
Laddar…
Referens i nytt ärende
Block a user