Tools Calico Lead image: Lead Image © Kirsty Pargeter, 123RF.com

Layer 3 SDN

Package Delivery

Calico chooses an unusual approach for software-defined networking, relying on open standards like BGP. We look at the distinctions and advantages of Calico. By Martin Loschwitz

In the past, network duties in the data center were strictly separated: admins took care of the Linux systems, storage admins of the network storage, and network admins of the routers and switches. For individual customers, specific configurations were built on the data center's own hardware; everything was set up once, and the job was done.

Recently, however, responsibilities have shifted considerably – largely because of the cloud, which has made IT services far more dynamic. Customers no longer want to wait weeks or months for a custom infrastructure to be set up; they want storage and compute immediately. As a result, IT providers have become platform operators. Instead of customer-specific setups, a generic platform of compute and storage resources greets customers, who then consume bit by bit according to their needs.

Now, the network magic happens on the customer side with software-defined networking (SDN). Calico [1] steps in on the software side to replace the functions of classic network devices in virtual environments. Admittedly, the project is not at the forefront of the SDN movement. Other approaches (e.g., Open vSwitch) claim seniority. However, Calico is fundamentally different from most other solutions: It comes without an encapsulation technique and relies instead on standard Layer 3 protocols.

Calico makes some bold claims. The developers of the solution promise special security and native kernel speed on Linux systems, which is reason enough to put the product through its paces.

SDNs

A short excursion into the terms and functions of SDN helps to understand the central unique selling proposition behind Calico. SDN is no longer a homogeneous approach, and different implementations compete for admins' favor. The core idea of the SDN is to shift network functions from typical network hardware to the software level and thus to individual systems.

One example is the separation of traffic for different systems. Virtual LANs (VLANs) are used for this purpose on switches; they implement the division in Layer 2. SDN is therefore basically making up for a part of the virtualization that was missing until the emergence of the first conceptual networks. RAM and CPUs had long been virtualized when physical networks were still routed into virtual machines over bridge interfaces.

In large environments, such as those operated by platform providers, this approach is not useful. It would simply be impossible to reconfigure all the switches in the system to create a new customer and that customer's virtual networks. Instead, in a completely flat Layer 2 segment, the SDN solution takes care of the problem.

Underlay and Overlay

Conventional SDN implementations are divided into an underlay and an overlay. The underlay is the Layer 2 segment referred to earlier, whereas the overlay describes the virtual environment within which the packages with payload traffic find their way.

Almost all common SDN approaches use encapsulation solutions in the underlay, such as generic routing encapsulation (GRE), virtual extensible LAN (VXLAN), or generic network virtualization encapsulation (Geneve), to separate data traffic in the overlay from that in the underlay. Encapsulation is the process by which a packet finds its way from virtual machine (VM) A on host 1 to VM B on host 2, but it is precisely this encapsulation that regularly causes trouble in the context of SDN. In theory, encapsulation should not have any effect on performance, but in practice, it has a significant effect.

One well-known example is Open vSwitch, earlier versions of which built a kind of packet Multitron and were happy to drown hosts in Address Resolution Protocol (ARP) requests. In the meantime, such weaknesses have been eliminated by the developers, but the separation into overlay and underlay, including the encapsulation that goes along with it, still has many admins' backs up.

On Layer 3

Calico promises to get rid of this separation. Instead, it relies completely on Layer 3, by employing a network concept that has been around for some time but has hardly established itself: routing in Layer 3.

To recap: Layer 2, the Link Layer, has, among other things, the task of switching packets within a physical network. It addresses the communication partners by the Media Access Control (MAC) addresses of their network interface cards. Layer 3, the switching layer, on the other hand, takes care of packet forwarding across the boundaries of local networks. This layer is the home of the Internet Protocol, whose addresses ensure that certain network nodes can be addressed worldwide on completely different networks. The device for forwarding packets in Layer 2 is the simple switch, and in Layer 3 the router.

When this division of labor was introduced into the Open Systems Interconnection (OSI) model, however, huge Layer 2 networks, as found today in clouds with thousands of servers, were not yet conceivable. Inventions that were helpful at the time, such as the Spanning Tree Protocol, are now a problem. One way out is Layer 3 routing, which determines the path of a packet from host to host solely on the basis of IP addresses.

The perpetual motion in such a setup is the Border Gateway Protocol (BGP). Every router that speaks BGP knows the routes to all other hosts in the network. A router no longer needs to be a separate box. Services such as Quagga, BIRD, and free range routing (FRR) can run on any Linux host, and now every standard switch from major vendors also supports BGP.

Every Linux server mutates into a small router; accordingly, every server on the network knows the route to every other host. The relevance of the physical infrastructure is in fact dwindling. Because all traffic is routed in Layer 3, network physical boundaries no longer matter.

Thinking Ahead

In the meantime, several solutions use or combine parts of the concept. The Ethernet virtual private network (EVPN), for example, is also based on BGP but shifts the routing completely to the switch level. The individual hosts no longer need BGP daemons.

Even genuine Layer 3 routing up to the host has ready-made instructions and solutions: The hosts then use the unnumbered BGP extension, meaning not every host needs an internal autonomous system number (ASN). The configuration is largely automatic. Routing on the host, however, requires the SDN solution used by the particular environment to provide support.

Calico expands on this principle. Basically, Calico sees itself as an SDN solution that implements the classic SDN overlay – not in the form of encapsulation, but by BGP. Specifically, Calico docks with the respective virtualization solution on one side (e.g., to OpenStack (Figure 1) or Kubernetes (Figure 2)), and on the other side, it speaks BGP to connect the instances in the virtual environment.

Figure 1: As an SDN solution, Calico supports both OpenStack … © OpenStack, Apache-2.0 [2]

Figure 2: … and Kubernetes, plus several other solutions. © Kubernetes, CC BY 4.0 [3]

Security plays a prominent role in this context. Because traffic at the Layer 2 level cannot be isolated (encapsulation is not used), Calico has to solve this problem differently. The developers strictly adhere to the requirement of using existing technology to the extent possible. Accordingly, they fall back on the filter mechanisms that Linux includes out of the box: network namespaces, groups, and packet filters.

Calico Architecture

Calico now faces a common challenge in the cloud: The configuration is not where it is needed. Most cloud concepts assume the existence of controllers that contain the statuses of all resources in the environment. However, the virtual instances, whether containers or real VMs, run on separate systems, where the configuration stored on the controllers must somehow be put into practice.

Calico is accordingly based on a control plane (on the controllers) and agent (on the target systems) architecture. Several plugins are at hand for external solutions such as OpenStack or Kubernetes.

Central Datastore

The Calico developers require their software, as an application of the cloud age, to follow the common maxims of modern development. Therefore, Calico stores its configuration data in a central database.

Data storage is of great importance in a distributed solution, providing a single source of truth (i.e., an instance in the cluster whose database is considered correct and binding for all cluster-wide actions). The Calico developers could have reinvented the wheel at this point, but that would not have made much sense. The best-known implementations include Consul and Etcd, which is also one of the two variants that the Calico developers have decided to use. If desired, Calico will also store its complete configuration in Etcd.

Etcd has several advantages. First, it has proven to be a robust consensus algorithm for distributed systems that does not care about the failure of individual nodes. Second, Etcd itself is distributed: The service is written in Go, does not require much in terms of resources, and runs on the hosts in the environment.

Because a configuration change in Etcd is automatically propagated to all other Etcd instances, the complete configuration database with all relevant parameters is always available on each host, saving the agent on the target systems a great deal of network traffic for queries to a central database like MySQL to retrieve values. The connection to Etcd is particularly suitable for setups with conventional virtualizers like OpenStack or without a central control mechanism.

Confd as an Add-On

Calico comes with a specially tuned version of Confd, which docks with Etcd in the background and fetches configuration data there. From the data, it generates configuration files on the respective systems for those services that cannot retrieve their configurations directly from Etcd. In fact, Confd replaces a part of the functionality of Puppet, Ansible, and the like, but it is far more dynamic.

Moreover, the configuration of an individual system or all systems can change continuously in Calico. If you wanted to map this manually, you would have to use solutions such as the inotify API to monitor the relevant configuration files, rewrite them, and then send a SIGHUP to the service that uses them. Confd in Calico relieves you of all these tasks.

Variant B: Kubernetes

Although Calico is also explicitly designed for Kubernetes, the container manager comes with its own controller services, which already include the required configuration data. Here, the Calico developers have built the connection to the datastore in a modular way. Instead of the module for Etcd, Calico can load a module for Kubernetes and communicate directly with the Kubernetes API.

If you want to supply many Kubernetes instances with Calico, Typha lets you scale an instance of Calico, including the datastore, horizontally. Kubernetes then only talks indirectly to Calico through Typha, which acts as a kind of cache. This architecture means that you can use a Calico instance to populate huge environments on a Kubernetes basis without sacrificing the benefits of a single point of administration.

Intelligence in the Controller

Calico's controller is of central importance. A conceptual distinction is necessary: The datastore contains information on the configuration – but essentially on the configuration that must ultimately be implemented on the target systems.

The Calico controller is responsible for translating this configuration into Calico-speak. Only then is the interaction of interfaces, namespaces, iptables, and the BGP daemon that the Calico configuration requires created on the target systems. The control plane therefore reads the desired configuration on the one hand and converts it into a specific system configuration on the other.

The counterpart to the Calico API on the target systems is Felix, the Calico agent. It receives instructions from the Calico control plane and creates the corresponding idempotent configuration on the system. As long as the system is stored with the same configuration in the control plane, Felix reliably creates the same configuration on the server and defends it against any external intervention, if necessary.

BIRD and Iptables

Felix handles several jobs on the target systems. Of course, for real routing on the host, you need a BGP daemon. The developers chose BIRD, which is a tad more modern than Quagga and implements the BGP standard along with some extensions. Additionally, Felix configures the iptables packet filter with its built-in policy engine.

The Calico website explicitly advertises that Calico works with a variety of fleet virtualizers. Like almost everywhere in the hip IT world, the primary focus of the Calico project lately has been on Kubernetes. Kubernetes is given much more space in the Calico documentation than for deployment on other platforms, and the components in Calico speak a clear language that explicitly targets its use with Kubernetes.

Accordingly, Calico performs well in the Kubernetes context. In particular, it implements a Container Networking Interface (CNI) that allows containers to use network interfaces according to defined criteria.

Calico as a Profiler

A core aspect of Calico is establishing a connection between several communication partners. A second, very important complementary topic is security. Because Calico has no separation into overlay and underlay, even the basic encapsulation techniques for separating traffic of different origins are missing.

Everything takes the same paths over the same network interfaces. To secure the connection, the on-board toolset available in Linux must be used accordingly. Here, the developers have come up with something clever: Calico uses profiles (i.e., previously created standard configurations for certain security settings) that are deployed as needed according to circumstances (Figure 3).

Figure 3: Calico implements its security guidelines in profiles and cooperates directly with solutions like Kubernetes.

A simple example might be a Calico profile for a web server running in a container. Ports like 80 and 443 would be allowed, but no other ports would be open. The approach also reveals its full potential when Calico adheres to the security requirements of the orchestrator with which it works. For example, Kubernetes can automatically tell Calico what type of container is being deployed, and Calico then automatically selects the appropriate profile from this information.

Automatic Encryption

In the security context, Calico also offers a particularly interesting feature that is only available as a Technology Preview and has not yet been released by the developers for production: automatic transport encryption.

The Calico developers are very critical of those who do not use SSL within their setups and simply assume that their network is secure and protected against intruders – with good reason: It is becoming increasingly common for attackers to infiltrate networks and sniff traffic without notice for months; therefore, Calico has declared war on the concept of the local network as a secure domain.

The idea is, if IP and BGP can be used to control the path that packets take between hosts anyway, transport encryption can also be integrated transparently into this process. This practice reminds me of solutions like Istio, which implements similar features in the cloud for their containers. But in Calico's case, the function also extends to the physical Layer 3.

In such situations, you would no longer have to make sure your services can handle SSL and would still have continuous transport encryption on the network. Although still a dream of the future, sooner or later Calico will make the feature available in the production version.

Extended Interaction

All in all, Calico proves to be a versatile SDN solution that takes a refreshingly new approach when compared with Open vSwitch and similar solutions. Another feature makes the solution appealing: Calico offers direct links to other network solutions through its API. Istio [4], practically in common with Calico, is a good example. Calico establishes the Layer 3 connection, and Istio uses it dynamically.

Migration to the cloud has made networking far more complex for more than just administrators. Admittedly, it is the admin that has to deal with SDN and the like and prepare the cloud in such a way that appropriate services are available. However, once this initial setup is in place, the admin only needs to give the topic more attention if something does not work as desired. The developer, on the other hand, has to deal with a far more complex network than ever before.

Complicated Microarchitecture

The physical cloud is accompanied by the desire to make the best use of its services. The principle of cloud-ready applications therefore dictates that programs for the cloud are always also distributed programs.

According to common doctrine, distribution can ideally be handled by a microservices architecture. Instead of monoliths, today's developers build applications that comprise various individual parts and communicate with each other over defined API interfaces. From the developer's point of view, it is no longer sufficient to roll out an application. Instead, they must think about how to harden communication between the components of the application – both in terms of reliable functionality and security.

Istio specifically addresses developers with this problem and promises to build a full mesh network between the components of an application following the microservices doctrine. Load balancers, firewall rules, and routing are the key features, so it is only natural that Istio and Calico developers work in close collaboration.

Access to All Details

In practice, that means that Istio connects directly to the Calico services and retrieves there most of the parameters it needs for its own configuration. If the network changes during operation, Istio automatically adapts according to the received data.

It stands to reason that it is a good idea to create a direct link between the SDN application in the cloud and mesh solutions like Istio. Calico and Istio impressively show us what this can look like.

Conclusions

Calico proves that the router on the host approach is valid and works well, conveniently implementing networks for cloud VMs and containers without the overhead of encapsulation or other techniques. Calico can't hide its preference for Kubernetes, but it works well with other orchestrators, too. If you are looking for a versatile SDN solution well removed from Open vSwitch and the like, you would do well to take a closer look at Calico.