
Isolate workloads from Docker and Kubernetes with Kata Containers
Sealed Off
If DevOps adherents want to package and run applications separately, as the paradigm of continuous delivery dictates, they encounter a problem. Either they host their workloads on physical or virtualized servers in the old-fashioned way, or they entrust data and programs to the kind of containers that have recently become popular under the Docker label. Both approaches have advantages and disadvantages.
The advantage of physical servers is the high degree of isolation. In the case of a bare metal device, this is immediately understandable; for a virtual machine (VM), the hypervisor handles separation with the help of its processor. It's a peculiar twist of fate that the Meltdown and Spectre vulnerabilities have shown that even hardware is not infallible, but disregarding this unpleasant chapter on the limits of engineering skill, both servers and VMs effectively separate resources by sharing only the CPU and no other resources.
In contrast, all containers running on one host use the same kernel. When the individual workloads are small and plentiful, the savings potential is considerable, particularly with modern cloud-native applications that run as microservices.
Isolation of Resources
Data isolation is handled by a relatively new mechanism in the Linux kernel. Namespaces keep a record within the kernel of which process is allowed to access which resources (e.g., processes, network interfaces, or hard disk directories). Docker instrumentalizes these namespaces and controls them and a few other mechanisms with a convenient command-line tool.
DevOps thus effectively has the choice between heavyweight VMs that take a long time to boot and lightweight containers with limited isolation that rely on cleverly implemented kernel-based isolation.
Starting from this situation, several initiatives have tried to combine the two approaches. As a by-product of its Hyper-V virtualizer, Microsoft developed the runV project, which can be understood as a very lightweight hypervisor [1]. Intel has taken a similar direction with its Clear Containers project [2], which uses the VT-x technology built into most modern Intel CPUs.
Under the umbrella of the OpenStack Foundation, which for some time now has not dealt exclusively with widespread cloud software or other infrastructure projects, developers have brought together the ingredients of Microsoft and Intel and created the Kata Containers project under the Apache license [3].
In May 2018, the project team released version 1.0; now, version 1.5-rc2 is available for download and promotes the software with the slogan "The speed of containers, the security of VMs."
Compatible Run-Time Environment
Kata Containers avoids a new application model and jumps on the Docker bandwagon. Under pressure from competitors and the community, Docker Inc., the company behind the software of the same name, removed the run time from Docker some time ago and installed the Open Container Interface (OCI), which is supervised by the Linux Foundation. The OCI reference implementation runC is available in Docker.
In addition to runC are a number of alternatives, such as CRI-O (originally known as OCID) developed by Red Hat or Rkt (say "rocket") originally driven by CoreOS. Not all run times fully meet the OCI specification, but they use conceptually similar techniques. The real trick now with Kata Containers is that a hypervisor-based run time is available that differs only under the hood from the classic runC in Docker. The Docker commands, their meaning, and even the image formats and command-line parameters remain the same.
For this to work, a few prerequisites must be met: Kata Containers only works on the x86 platform and requires the presence of the VT-x function. The guest system must allow Kata Containers to launch its own hypervisor, which may well be an issue on virtualized hosts, for example, if they are running as VMs in a public cloud and nested virtualization there is not enabled.
Nested Virtualization
If you have access to the host system for your own virtualization setup, you can run
cat /sys/module/kvm_*/parameters/nested
to check whether this function is enabled. If so, the command displays Y, and N otherwise. The shell wildcard is necessary because either one of the kernel modules – kvm_intel
or kvm_amd
– must be loaded to provide the function. The hypervisor must additionally have the mode=host-model
set in the <cpu>
tag of the XML configuration.
If you do not have access to the host system and nested virtualization is disabled, you might be able to use a bare metal service that some clouds offer. From a security point of view, this may be a good idea anyway, because cloud customers can then be sure that no neighbors in the cloud will have access to their data, even in the event of incidents such as Meltdown or Spectre. Nevertheless, this approach is only suitable for really sensitive data, because the shared economy advantages of the cloud are of course forfeit with this kind of server.
Docker Tuning
Now that the prerequisites have been met, the container framework needs to be equipped with the new run time. Of course, the classic Docker first has to be installed. Listing 1 shows how to install the software on Ubuntu 18.04 LTS from the vendor repositories. The script should be called by a normal user, who uses sudo
later on to escalate privileges for the install.
Listing 1: Installing Docker
#!/bin/sh echo "Configure Docker repo ..." sudo -E apt-get -y install apt-transport-https ca-certificates wget software-properties-common curl -sL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - arch=$(dpkg --print-architecture) sudo -E add-apt-repository "deb [arch=${arch}] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo -E apt-get update echo "Install Docker ... "; sudo -E apt-get -y install docker-ce echo "Enable user 'ubuntu' to use Docker ... "; sudo usermod -aG docker ubuntu
Conscientious admins will want to compare the GPG checksum output from the first curl
command with the reference from the Docker website. The script grants the Ubuntu user the right to use Docker from that point on, which indirectly corresponds to assigning root privileges. If you want other users to be able to use Docker, they also need to be included in the docker
group; then, the users have to log off and back on again or update their group memberships with newgrp -
.
If docker info
does not report any errors, the container software is installed with the classic runC run time. Kata Containers can be installed in a similar way. The major distributions have installation packages, available from SUSE's build service.
Listing 2 shows the installation steps. Again you need to compare the GPG signature with the information on the Kata website to be on the safe side.
Listing 2: Installing Kata Containers
#!/bin/sh echo "Configure Kata repo ..." sudo sh -c "echo 'deb http://download.opensuse.org/repositories/home:/katacontainers:/release/xUbuntu_$(lsb_release -rs)/ /'>/etc/apt/sources.list.d/kata-containers.list" curl -sL http://download.opensuse.org/repositories/home:/katacontainers:/release/xUbuntu_$(lsb_release -rs)/Release.key | sudo apt-key add - sudo -E apt-get update echo "Install Kata components ..." sudo -E apt-get -y install kata-runtime kata-proxy kata-shim echo "Create new systemd unit ..." sudo mkdir -p /etc/systemd/system/docker.service.d/ cat <<EOF | sudo tee /etc/systemd/system/docker.service.d/kata-containers.conf [Service] ExecStart=/usr/bin/dockerd -D --add-runtime kata-runtime=/usr/bin/kata-runtime --default-runtime=kata-runtime EOF echo "Restart Docker with new OCI driver ..." sudo systemctl daemon-reload sudo systemctl restart docker
Kata Containers comes in three packages: kata-runtime
, kata-proxy
, and kata-shim
(Figure 1). The first handles communication with the container manager; the Docker kata-proxy
command translates commands so that a VM also understands them; and kata-shim
is an intermediate layer that is responsible for many or the convenience features built into Docker, including, for example, managing the standard output as a logging mechanism or firing off signals (e.g., docker stop
or docker kill
).
![An overview of the functional principle of Kata Containers (Kata Containers, CC BY 4.0 [4]). An overview of the functional principle of Kata Containers (Kata Containers, CC BY 4.0 [4]).](images/infografik-kata.png)
Everything necessary for the use of the Kata Containers is now present, and only the Docker subsystem has to be restarted, which can be done quite brutally by rebooting or in a targeted way by the system daemon (i.e., systemd in almost all current distributions). The line in Listing 2 after [Service]
adds a user-defined system unit that overwrites the run-time default with kata-runtime
. The script then restarts the Docker Unit in the last two lines.
To check whether the new run time is set up, simply enter:
# docker info | grep Runtime Runtimes: kata-runtime runc Default Runtime: kata-runtime
If you now start a new container (e.g., with docker run -it ubuntu
), it is now running in Kata Containers. If you want to start a container with the classic run time again,
docker run -it --runtime kata-runtime ubuntu
is all it takes.
When the runC driver starts a new container, it prepares a directory from the image, creates an overlay filesystem, creates a new process, and applies new namespaces. The new container then starts up.
Kata, on the other hand, boots a separate kernel for each container. For this purpose, the installation packages contain a kernel and a VM image, both of which are reduced to the max and ultimately start the kata-agent
process in the new VM, which in turn uses CMD
or ENTRYPOINT
to run the desired command defined in the image.
Performance Comparison
Performance is remarkably fast. When I called dmesg
in a Kata container while writing this article, kernel version 4.14.51 started in significantly less than one second.
I ran the test on bare metal flavor physical.o2.medium on the Open Telekom Cloud, which provides a 2x8-core CPU Broadwell EP Xeon E5-2667v4 at 3.2GHz and with 256GB of RAM. This quite decent server can manage many containers. In this way, rounding effects during measurement can be reduced.
Important system parameters are memory requirements and CPU time. To compare how Kata Containers performs against a runC run time, the script in Listing 3 starts 100 containers from the nginx
image and stores a short, individual and static file, each containing the number of the respective container. The time required for this is measured by the script. For this purpose, the nginx
image should be loaded in cache in advance, so a test run is recommended before the actual measurement, to warm up both the image cache and the Linux cache.
Listing 3: Benchmark
#!/bin/bash N=100 time for i in {1..$N}; do CID=$(docker run --name server-$i -d nginx) docker exec server-$i /bin/sh -c "echo I am number $i > /usr/share/nginx/html/index.html" done # Check every container once: time for i in {1..$N}; do IP=$(docker inspect --format '{{.NetworkSettings.IPAddress}}' server-$i) curl http://${IP}/ & done
After starting the containers, the script can deliver the stored file from each instance over HTTP to prove that the services in the containers really are available. The time is also measured by the script. If you add both measured values and divide them by the number of containers, you get an average provisioning time per web server in one container.
For the Kata Containers, this was 1.6 seconds. The classic runC run time needs 0.6 seconds for the same task, and this value is stable even if the number of passes that can be configured in the script varies by using the N
variable.
Upper limits for N
are about 1,000 containers, because more than 1,024 IP addresses will not work with the virtual bridge interface that Docker uses. In the case of Kata Containers, 700 instances are the end of the line, because their lightweight VMs with 160MB of RAM have a far larger main memory requirement than a normal Linux process – no wonder, since each Kata container starts its own kernel.
If you want to recreate such a test yourself, you should do so on a dedicated system because it requires considerable resources. However, the load and the response behavior were no problem. Afterward, the containers are best discarded with a similar script.
Special Cases
Installing Kata Containers is easy. Even if you already have a Docker environment in use, you can simply extend it with the additional run time and select it as required with the --runtime
option. If you change the default permanently, all new containers automatically run on VMs.
Whether this makes sense for usual operations is a different matter. The additional memory requirement is noticeable, but the slightly longer startup time will only have an effect in very highly frequented microservice architectures. From a security point of view, VM isolation of course offers a completely different level of protection from that provided by namespaces and the like.
Kata Containers in the Docker Framework is a good alternative in cases when an additional isolation layer has to be introduced, which can be useful, for example, when processing particularly sensitive personal or business data.