From a6331b5144a0ae648772f04d1b1b50ee1e90c115 Mon Sep 17 00:00:00 2001 From: Ben Burwell Date: Sun, 28 Jun 2020 10:50:43 -0400 Subject: Make some minor edits to seccomp post --- ...earning-about-syscall-filtering-with-seccomp.md | 50 +++++++++++----------- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/_posts/2020-06-27-learning-about-syscall-filtering-with-seccomp.md b/_posts/2020-06-27-learning-about-syscall-filtering-with-seccomp.md index 226928a..e0462fc 100644 --- a/_posts/2020-06-27-learning-about-syscall-filtering-with-seccomp.md +++ b/_posts/2020-06-27-learning-about-syscall-filtering-with-seccomp.md @@ -2,8 +2,8 @@ title: Learning About Syscall Filtering With Seccomp --- -I'd heard about being able to [run Docker containers with a custom "security -profile"][docker-seccomp], but wasn't really sure what that meant, or what was +I'd heard about being able to [run Docker containers with a custom security +profile][docker-seccomp], but wasn't really sure what that meant or what was happening behind the scenes, so I decided to do some experimentation to find out. @@ -16,17 +16,15 @@ should kill your program. But why would you want to do this? I think if you had a pretty simple program, -using `seccomp` might be overkill. But if your program does different things on -the system depending on some possibly untrustworthy user input, it might make -sense to use. For example, if your program runs user-specified commands, you -might want to make sure that an approved subset of system functionality is -available. Looking at [a list of software using `seccomp` on Wikipedia][wiki] -backs this up: the software listed are mostly hypervisors/container runners -(like Docker), web browsers, etc. +using `seccomp` might be overkill. But if your program makes different system +calls depending on possibly-untrustworthy user input, it might make sense to try +to limit what the program is allowed to do. Looking at [a list of software using +`seccomp` on Wikipedia][wiki] backs this up: the software listed are mostly +hypervisors/container runners (like Docker), web browsers, etc. By reading [the manual page for the `seccomp(2)` system call][man-seccomp], we can learn how to write a program to try this out. The simplest action is to -enter "strict mode", which prevents all system calls except for `read(2)`, +enter "strict mode," which prevents all system calls except for `read(2)`, `write(2)`, `_exit(2)`, and `sigreturn(2)` --- in other words, what I think should be just enough to write hello world! Let's give it a shot: @@ -49,9 +47,9 @@ main() When I compile and run my program, I just see **Killed** being printed, not **hello, world!**. Well, this is pretty good evidence that `seccomp` is doing -_something_ --- it's at least killing my program! Let's try to find out why -using [`strace`, a program that shows you all of the system calls being -made][strace]: +_something_ --- it's at least killing my program! Let's try to find out why it's +being killed using [`strace`, a program that shows you all of the system calls +being made][strace]: ``` $ strace ./hello @@ -89,9 +87,9 @@ fstat(1, ) = ? Killed ``` -There's a lot of stuff in here that I don't fully understand about loading -dynamically linked libraries, reading the program binary, and mapping it into -memory, but the last few syscalls provide some clues: right after `prctl` is +There's a lot at the beginning about loading dynamically linked libraries, +reading the program binary, and mapping it into memory that I don't fully +understand. But the last few syscalls provide some clues: right after `prctl` is called, we see `fstat` being called! `fstat` is a system call for getting the status of a file, and `1` happens to be the file descriptor for standard output. It makes sense that calling `printf` might involve checking the status of @@ -115,16 +113,16 @@ manual page for `seccomp`, it said: It looks like I'll need to actually do some real filtering if I want to run my hello world program and not just use strict mode. To do this, we need to use -`SECCOMP_MODE_FILTER` and pass a pointer to a `struct sock_fprog`, which is "a -Berkeley Packet Filter program designed to filter arbitrary system calls and -system call arguments." +`SECCOMP_MODE_FILTER` and pass a pointer to a `struct sock_fprog`, which +according to the manpage is "a Berkeley Packet Filter program designed to filter +arbitrary system calls and system call arguments." -While we could construct a BPF program using an array of `struct sock_filter` -instructions, looking at the chain of instructions we'd need to set my made me -think it would be much easier to enlist the services of -[`libseccomp`][libseccomp], a library designed for just this purpose. Let's try -rewriting `hello.c` to use `libseccomp` and allowing those three syscalls we saw -before (`fstat`, `write`, and `exit_group`): +While we could construct a BPF program using an array of `struct sock_filter`s, +looking at the chain of instructions we'd need made me think it would be much +easier to enlist the services of [`libseccomp`][libseccomp], a library designed +for just this purpose. Let's try rewriting `hello.c` to use `libseccomp` and +allowing those three syscalls we saw before (`fstat`, `write`, and +`exit_group`): ``` #include @@ -226,7 +224,7 @@ Now[^1] let's go back to how this all fits in to Docker. Looking at [Docker's default `seccomp` profile][docker-default], a lot of it starts to make more sense. In fact, it looks like they're using the exact same names from `libseccomp` that we used in our program! If we search [the moby source code for -`"libseccomp"`][moby], we can see that it is indeed being used (via Go +`libseccomp`][moby], we can see that it is indeed being used (via Go bindings). Let's try to use a custom `seccomp` profile to prohibit programs in our Docker -- cgit v1.2.3