CVE-2016-0728 just came out. The vulnerability was present in the kernel code since 2012, and it was discovered by Perception Point. Sample exploit code is available.
“It’s pretty bad because a user with legitimate or lower privileges can gain root access and compromise the whole machine […]. Every Linux server needs to be patched as soon the patch is out.” (Yevgeny Pats, cofounder and CEO of Perception Point)
A patch is already out, and a fix is available in Debian. Before “apt-get update && apt-get upgrade” let’s see what is all about. I grab the sample code, compile it and try it out. The exploit program runs for a long time:
$ ./cve_2016_0728 PP1 uid=1000, euid=1000 Increfing...
It all depends on how powerful the CPU is, it could stay like this easily for an hour. On my computer (Debian Jessie, AMD K10) the exploit gets killed after 20 minutes by the disk quota limit system. Assuming you didn’t change the defaults coming with Jessie, everything should be fine. But it could be a problem on other systems, elevate privileges, and become root.
Now, let’s try the same exploit in a Firejail sandbox:
$ firejail ./cve_2016_0728 PP1 Reading profile /etc/firejail/generic.profile Reading profile /etc/firejail/disable-mgmt.inc Reading profile /etc/firejail/disable-secret.inc Reading profile /etc/firejail/disable-common.inc ** Note: you can use --noprofile to disable generic.profile ** Parent pid 3987, child pid 3988 Child process initialized uid=1000, euid=1000 Command terminated by signal 31 parent is shutting down, bye... $
The program is killed immediately by seccomp-bpf filter. Syslog gives us more information:
$ sudo tail -f /var/log/syslog [...] Jan 19 19:47:21 debian kernel: [11199.682124] audit: type=1326 audit(1453250841.513:2): auid=1000 uid=1000 gid=1000 ses=9 pid=3990 comm="cve_2016_0728" exe="/home/netblue/work/cve/cve_2016_0728" sig=31 syscall=250 compat=0 ip=0x7fed999f9fd9 code=0x0
Syscall 250 on this machine is keyctl and is blacklisted by default by Firejail. add_key and request_key are also disabled, but the exploit didn’t get that far.
According to Yevgeny Pats the exploitation is straightforward, but it’s unknown whether it’s been used to date. It could definitely give some people new ideas. Although they mention web servers, it is more likely to find it in user-space programs. The solution is a simple seccomp filter.
About Seccomp
seccomp-bpf is an application sandboxing mechanism in Linux kernel that allows filtering of system calls (syscalls) using a configurable policy implemented using Berkeley Packet Filter rules.
The Linux Kernel supports over 300 syscalls. To function normally, applications usually need only a small syscalls subset. Using seccomp-bpf kernel feature we can disable the unused syscalls for a particular application, thus limiting the attack surface of the kernel. It works like a tripwire. In case the application suddenly starts making unusual syscalls, the application is killed immediately.
seccomp-bpf was introduced in Linux kernel 3.5. It is compiled by default (CONFIG_SECCOMP_FILTER) on most Linux distributions. Application authors can use the API exposed by the kernel, or they can use an external library, libseccomp.
Most applications combine seccomp-bpf with other security techniques implemented in Linux kernel. Among them, chroot and Linux namespaces. This is a short list of programs using seccomp-bpf:
- vsftpd – vsftpd FTP server was one of the first applications to use a whitelist seccomp to boost security. It also uses chroot and Linux namespaces.
- sftp (OpenSSH) – sftp component of OpenSSH follows closely on the footsteps of vsftpd. It uses a whitelist seccomp filter on top of a chroot.
- BIND – BIND is by far the most widely used DNS server software on the Internet. A whitelist seccomp filter was introduce in version 9.10.1.
- Google Chrome/Chromium – Google was playing with sandboxes in Chromium browser long before seccomp-bpf was introduced in Linux kernel 3.5. It currently use an SUID sandbox to restrict the worker processes using PID and network namespaces and seccomp-bpf.
- Opera Web Browser – Some time ago Opera browser internals have been switched to a fork of Google Chromium. The SUID sandbox, Linux namespaces and seccomp-bpf filters survived the porting, and are currently used by the browser.
- QEMU – QEMU (Quick EMUlator) is a generic machine emulator and virtualizer. It is used often in conjunction with acceleration in the form of a Type-I hypervisor such as KVM or Xen. Recently, QEMU introduced seccomp-bpf support. This enables kernel filtering of system calls to prevent malicious guests from doing damage.
- LXC – LXC is a generic sandbox for running containers. Unlike other sandboxes available, the focus is running full distro images, also known as system containers. It uses Linux namespaces, chroot and seccomp. By default the syscall list is empty, the user has to build her own list.
This is a very small list of programs. I only hope more and more developers will consider using these types of security technologies. Adding secoomp, chroot and Linux namespaces support to an existing application is easy. The heavy lifting is implemented in the Linux kernel, there are no external dependencies required. Most of the time all it takes is a small number of simple system calls.
About Firejail
Firejail is a security sandbox similar to the sandbox currently running internally in Google Chrome. Originally intended to secure Firefox, the sandbox supports by default a large number of desktop programs, including proprietary programs such as Skype, Steam and Spotify. For more information please visit the project page.