The more complex a program or application, the more likely it contains exploitable or otherwise dangerous faults. Containers are a way of limiting the damage by limiting an application access to the bare minimum. Ideally we would have a separate and instantly replaceable computer for every little daemon and service we run. Sadly, even with virtual machines, this would hardly be an efficient use of resources, so containers try to find a middle ground by allowing us to separate applications almost as if they were running on different machines, while actually sharing the same hardware and operating system kernel.
Several features come together to make this possible:
And it is a good idea to augment them with others:
- seccomp-bpf syscall filter
- packet filtering (ebtables)
- virtual network devices
Armed with these keywords, your week should now be filled with interesting and productive reading. :-)
If all you want are some opinionated basics however, read on:
How to use a container?
The more containment and separation between applications, the more secure a system gets.
Unfortunately few applications come with ready-to-use definition of what access and other applications it requires to function at a minimum. There are ways of finding this out (makejail, documentation, strace, apparmor profiles), but it will always involve some work.
You have to find your own balance between the security one requires and the effort needed to attain it.
Furthermore, what even is an application? Is it a single program or daemon? Or a bunch of programs running in parallel and working in concert towards a single purpose?
How to NOT use a container?
Sometimes (web)developers see the Linux systems that run their (web)applications as a house of card that they better not touch or they don’t want to deal with security and quality policies of various Linux distributions. The reason being, that if it works for them, it should work just like that for the end-user. Conveniently containers now allow us to package an applications plus the whole linux ecosystem around it. Unfortunately this just makes everything worse. It makes it harder to actually secure and separate applications from each other while providing a false semblance of security.
What is chroot and what can and can’t it do?
A chroot is a root file system (e.g. a whole Linux distribution) inside your host root file system in which a program runs.
The idea being that a chrooted application can not access any file outside it’s chroot.
Obviously then you have to provide (duplicate) all the tools and applications that will be needed inside the chroot filesystem.
Tools like debootstrap help installing a debian linux inside a directory or image.
Chroot does not restrict assess to the network, syscalls, hardware devices like usb etc.
E.g. The root user inside a chroot can easly read harddisk device or re-mount it inside the chroot and do whatever he/she pleases.
Links to some great introduction to cgroups I came across
The RedHat-blog has some very good hands-on introduction articles on cgroups called „World domination with Cgroups“ I suggest you read them:
Restrict services using systemd unit files
Sometimes you just want to contain a single executable. In that case you might not truly need a full machine, you can chroot a service and restrict it just using it’s .service file.
My favorite: SystemCallFilter! it filters what syscalls a process is allowed to use.
e.g. you could run a program or bunch of programs with strace -c — <cmd> and put the list of syscalls at the end in the systemd unit file.
Other extremely useful directives you should read up on:
ProtectHome=read-only || true ProtectKernelTunables=yes ProtectSystem=strict || full || true ProtectControlGroups=yes PrivateDevices=true #DevicePolicy=closed #DeviceAllow=/dev/urandom r #DeviceAllow=/dev/random r NoNewPrivileges=true ReadWritePaths= ReadOnlyPaths= InaccessiblePaths= PrivateNetwork= MountFlags= AmbientCapabilities= CapabilityBoundingSet=
What about containers as virtual machines replacement? Enter systemd-nspawn
systemd-nspawn is systemd’s tool to run containers like you would a virtual machine. Thus ’nspawn containers are called „machines“ and are managed with the tool machinectl. Actually machinectl interfaces not only with systemd-nspawn but also rkt, docker, libvirt and others, thus providing a convenient place to manage all your running virtualisations at once.
Putting our application into an container.
Easy! Create a chroot under /var/lib/machines/examplecontainer/ and a permission/network/container configuration-file /etc/systemd/nspawn/examplecontainer.nspawn. Then boot it using machinectl start examplecontainer. Done.
[Exec] Boot=yes PrivateUsers=no [Network] ## put container (together with other containers) ## in a virtual private network-bridge called "examplevlan" Zone=examplevlan