Tools Dynamic Multipoint VPN Lead image: Photo by Damon Lam on Unsplash
Photo by Damon Lam on Unsplash
 

Open source multipoint VPN with VyOS

Connected Mesh

The VyOS Linux distribution puts network routing, firewall, and VPN functionality together and presents a fully working dynamic multipoint VPN router as an alternative or addition to a Cisco DMVPN mesh. By Markus Stubbig

Virtual private networks (VPNs) connect remote offices over the Internet. However, when the number of offices increases, so does the number of VPN tunnels. Scaling becomes important when connecting more than 10 offices, because many single tunnels result in a long and confusing configuration. Dynamic multipoint VPN (DMVPN) is a well-known Cisco solution that solves the scalability issue when building large VPNs.

Luckily, all DMVPN components have been open sourced. In this article, I show you how to set up a DMVPN with the VyOS Linux router distribution, which also can be used to improve, secure, or reduce the cost of an existing DMVPN network.

Intro to VPN

The collection of VPN software is large, and many implementations are open source, free of charge, and available for virtually every operating system. Usable bandwidth is much higher compared with a leased line or a multiprotocol label switching (MPLS) link at the same price, and big keys or certificates can achieve a high level of security.

This setup sounds great until it comes to scalability. Every VPN tunnel has two endpoints that need configuration – and don't forget the backup tunnel, which also needs to be prepared and tested.

When talking about six remote offices, the level of hands-on activity is acceptable. If every office needs direct communication with every other office, you would need 15 tunnels. If the business has many smaller sites (e.g., sales offices or warehouses), the configuration becomes complex, with the number of tunnels increasing exponentially with the number of locations. A full mesh of 30 sites requires 435 tunnels and, most likely, some kind of automation or intelligent VPN solution.

Partly Meshed

In a full mesh network, every site can communicate directly with any other site. Voice over IP is a good example of a full mesh wide-area network (WAN), without which, the packets would travel through a transit site, increasing delay time, which is precisely what degrades speech quality.

DMVPN

To cut a long story short, Cisco understood the challenge and implemented DMVPN in its products years ago. The designers use generic routing encapsulation (GRE) as the tunnel mechanism and IPsec for the security aspect. The idea is to define a central site (hub) – usually the corporate headquarters – that knows all included VPN gateways in the remote sites (spokes).

Imagine that site A wants to reach site B: Router A will ask the central router for router B; with that information in hand, router A sets up a new VPN tunnel between A and B, so traffic can start flowing (Figure 1), automatically and without conf term.

In a multipoint VPN, every router creates a tunnel connection with every other router.
Figure 1: In a multipoint VPN, every router creates a tunnel connection with every other router.

Cisco invented the Next Hop Resolution Protocol (NHRP) as a way for the router to get details about its peer and published it as RFC 2332 [1]. Finally, the open source community has built its own implementation, OpenNHRP, and provides the code on SourceForge [2].

Now all parts of the puzzle are freely available and usable on Linux. The Vyatta router distribution combines all pieces into the formula:

DMVPN=GRE+OpenNHRP+IPsec

The developers have even added a command-line interface (CLI) with the feel of a commercial router, completing the free DMVPN router. Unfortunately, Brocade acquired Vyatta in 2012 and put it under a commercial license. Vyatta quickly became Brocade Vyatta 5400 vRouter and is now available for a price.

VyOS

Naturally, the Linux community didn't like this strategy, and several projects emerged from the last open source version of Vyatta, most notably VyOS [3]. The fork was successful, because VyOS includes all the DMVPN components: dynamic routing with high availability on top.

EdgeOS

Another player in the Vyatta market is Ubiquiti, which used the sources to build its operating system, EdgeOS, that runs perfectly on its own hardware boards and resulted in the excellent Edge Router. Unfortunately, Ubiquiti forked Vyatta before it had the DMVPN code, although perhaps Ubiquiti will implement it in a future version.

OpenVPN

The well-known OpenVPN is also capable of multipoint tunnels. Its client-to-client and topology subnet options establish communication between two endpoints, but below the surface, the packets flow through the hub. This detour of packets increases latency and needs more bandwidth at the hub site for transit traffic.

Requirements

A large VPN cloud with a multipoint setup isn't enough, because the GRE tunnel is unencrypted. In this example, I'll use IPsec to secure the traffic and prevent anybody from eavesdropping on the data or injecting packets.

When the transport is secure, you should focus on routing. All VPN gateways must announce their connected IP subnets. Static routing is possible, but very time consuming. A better approach is dynamic routing, which also includes automatic rerouting in case of link failures. With the OSPF (open shortest path first) option, every router sends its IP information to all other routers, which then calculate and build their own routing tables (see the "Dynamic Routing: RIP vs. OSPF" box).

If DMVPN is already in place, VyOS could be a cheap backup router. Cisco prefers proprietary technology, but in this case, all components have a good chance of playing well together. If everyone sticks to the RFC, it should work. However, it is best to ensure that Cisco and VyOS integrate well when choosing additional features, the cryptographic algorithm, and protocol variants.

VyOS performs on virtually anything, so the hardware platform is up to you. If a commodity server is not at hand, several recommendations are listed below.

Lab Network

When reviewing a technology like DMVPN, the use of real Internet links is mostly impossible. Thus, the features are tested in a lab network with simulated links and virtual machines. The demonstration network here (Figure 2) contains several sites with one or two VyOS VPN routers mixed with Cisco gear to test interoperability. Even the DMVPN hub is a VyOS-Cisco pair. All routers are connected by WAN emulators running WANem [4], which can introduce link instability, delay, and packet loss, so you can monitor network quality while WANem is doing bad things to your packets.

The lab prototype of a corporate network with Internet access.
Figure 2: The lab prototype of a corporate network with Internet access.

Availability is always a concern in corporate networks, and designers use two different DMVPN networks to meet the requirement. Sites with two Internet uplinks obtain redundancy for both cabling and the provider. Sites with a single Internet link and two routers get at least some hardware redundancy. Small field offices normally have only one router, resulting in no redundancy even when both DMVPN tunnels are configured.

Hardware

VyOS needs a hardware platform with an Intel i386/x86_64 CPU or compatible. Also, it is fully supported when running as a virtual machine on VMware or VirtualBox. However, when using embedded hardware or single-board computers, double-check the specs first.

The test scenarios and lab network put VyOS on an apu system board by the Swiss manufacturer PC Engines [5]. The board comes with 3Gb adapters, low power consumption, and no fans. The chipset is Realtek, so please don't take "gigabit" literally: Serious research and testing shows at least 250Mbps throughput [6].

OSPF on Top

When the DMVPN connection between all peers is running smoothly, it is time for OSPF to manage the IP stuff. OSPF builds a neighbor relationship between the spoke routers and the hub router through the tunnel interface; then, the spoke tells the hub about its local IP networks. The hub starts relaying this information to all other OSPF neighbors.

Spoke routers of the same site must also know each other and build an additional OSPF neighborship over the LAN adapter. This link will be used for the traffic when one of the VPN tunnels is unavailable.

OSPF needs to know the bandwidth of the network adapters to work properly. Best practices use the exact same value on all adapters pointing to the primary DMVPN cloud. The same applies to the secondary DMVPN adapter, but the value must be smaller to indicate the lower preference. For the LAN adapter, it makes sense to use values that represent the physical speed of the interface, especially when more OSPF neighbors are present in the local area network.

WAN Failure

The OSPF routers have learned all IP subnets from their peers over two different paths. The primary path uses DMVPN tunnel 1 and gets its place in the routing table. The less preferred path over tunnel 2 won't be discarded: It stays in the local OSPF database and waits for a tunnel 1 outage and is then promoted in the routing table.

During normal operation, all OSPF routers send keepalive packets at regular times so that if the main Internet link is lost (e.g., cut by a construction worker), both the local OSPF router and the DMVPN hub will learn about this situation because of the missing keepalives. In this scenario, all routes using the unavailable neighbor are removed from the routing table and check the OSPF database and try to find alternatives. They are lucky, because all missing routes are present, with the backup tunnel as the destination. Finally, the routes over tunnel 2 move into the routing table, with availability restored to the other sites.

This automatic method is hidden from the applications, but traceroute (Linux/macOS) or tracert (Windows) discover the rerouting. Listing 1 shows a client at site 3 reaching site 4 with and without the primary VPN tunnel.

Listing 1: Solving a Single-Link Outage

# normal state: all links in
# working condition
traceroute -In 10.4.1.25
 1  10.3.1.21    # primary VPN
                 # router site 3
 2  172.16.0.8   # primary VPN
                 # router site 4
 3  10.4.1.25    # target host
                 # in site 4
# Problem: First link broken
# and network has converged
traceroute -In 10.4.1.25
 1  10.3.1.22    # backup VPN
                 # router site 3
 2  172.16.1.7   # backup VPN
                 # router site 4
 3  10.4.1.25

LAN Failure

If the primary tunnel fails and the LAN is also using OSPF, OSPF will tell all neighbors (Figure 3); otherwise, you need a first hop redundancy protocol, like the Hot Standby Router Protocol (HSRP), Virtual Router Redundancy Protocol (VRRP), Common Address Redundancy Protocol (CARP), or Gateway Load Balancing Protocol (GLBP). The lowest common denominator between VyOS and Cisco is VRRP.

A day in the life of VyOS.
Figure 3: A day in the life of VyOS.

VRRP uses an additional virtual IP address that is shared by the routers. The clients use this IP address as its "default gateway." To function, the VRRP routers must know which device is responsible for the virtual address. The routers elect a master and a backup. The master works on routing and sends heartbeat packets to its backup router. The backup router stays passive and listens to the heartbeat. If the packets stop arriving, it assumes the master has died and takes over the virtual address. Clients have no need to make any changes during failover or failback.

The router holding the primary VPN must win and become VRRP master. The VRRP's priority values manipulate the VRRP election and determine the correct router as master. If this doesn't happen, the routing will become asymmetric, and troubleshooting gets really messy.

VyOS Compatibility

VyOS uses OpenNHRP [2], which implements DMVPN phase 1 (hub-to-spoke) and phase 2 (spoke-to-spoke). Phase 3 is proprietary to Cisco and takes care of scalability of up to thousands of sites.

A large number of routers or IP networks require a routing protocol. If the Cisco-style Enhanced Interior Gateway Routing Protocol (EIGRP) is already in place, then VyOS must fold. Cisco published EIGRP as RFC 7868 [7] in 2013, but the open source community does not yet have a stable implementation. VyOS can only play along with OSPF or RIP.

VyOS can translate network addresses for site-to-site VPNs, but not multipoint VPNs. If NAT is really required, you have to dig deep on the Linux command line. Convince the VPN software OpenSwan to authenticate the peer, even if the IP address inside the Internet Key Exchange (IKE) header mismatches the source address in the IP header. A good knowledge of Linux and IPsec is recommended; you should not mess with configuration files directly in a production environment. One hopes a future release of VyOS will master this special case.

Last, but not least, VyOS does not have a web interface. Life takes place on the command line with show, set, and config. If you are familiar with Juniper routers, then VyOS won't look too different. Fans of Cisco and its IOS networking software need a little training, and the remainder is similar.

Security First: Firewall

The missing support of IPv4 address translation in VyOS forces the need for a direct Internet access between the VPN router and a public address. The device must take care of its own security, but a firewall ruleset for the public interface is straightforward:

These rules do not apply to network traffic traveling through the tunnel. Inside the tunnel everything is permitted. If you want to filter inside the DMVPN, set up an additional firewall policy and apply it to the tunnel interface.

Authentication

Both OSPF and VRRP protocols have their own security methods to prevent an unknown device from becoming an OSPF neighbor or a VRRP master. However, VyOS and Cisco only become friends under OSPF if they choose authentication and have a matching MD5 checksum. A hostile OSPF router may announce itself to the network, but neighborship will fail. The failure of neighborship prevents unwanted routers and well-known routes pointing to wrong destinations.

The strongest authentication method in VRRP that both vendors implement is a cleartext password. Although it helps prevent some unintended peering, it will fail when an attacker knows how to operate Wireshark.

At least the VPN tunnel knows how to do strong encryption. Pick AES and a 256-bit key for the best security. The strongest form of authentication in VyOS for DMVPN is a pre-shared key, and it is best to build a key out of many different letters, numbers, and symbols. Unfortunately VyOS only can do RSA or X.509 certificates for site-to-site VPN.

IPv6

The list of limitations grows: VRRP on VyOS hates IPv6 addresses. Also the VPN tunnel accepts only an IPv6 address if it doesn't operate in multipoint mode. In summary, IPv6 in VyOS is absolutely not ready for prime time.

Optimization: Timer Tuning

Keep the time to recover from a failure at a minimum by fine-tuning timers and thresholds. Low values for a keepalive interval should only be used for a stable Internet link; otherwise, every lost packet will trigger a failover.

All values for VRRP, OSPF, and the Dead Peer Detection (DPD) for VPN must work hand in hand. For VRRP, a small timeout is acceptable because the LAN has virtually no packet loss. The idea behind DPD is to detect an inactive or faulty tunnel and to rebuild the tunnel before OSPF notices and starts a failover.

DPD and OSPF operate in the WAN and require higher timeouts. A good start is 30 seconds for DPD and 40 seconds for OSPF. If the DMVPN environment is running smoothly, try to lower the values. If the WAN is flappy and unstable, also try timeouts greater than a minute. Sometimes it is just about trying which values works best.

Which MTU Is the Best?

An IPsec VPN has lots of headers (Table 1; Figure 4). The size depends on the WAN technology and chosen cryptographic algorithm. Despite their size, they have one thing in common: They reduce the maximum transmission unit (MTU). However, don't ignore the MTU setting, because OSPF expects the same MTU value on both ends of a link, and an inappropriate MTU can lower the throughput of the VPN tunnel.

Tabelle 1: GRE Tunnel with IPsec Headers

Header

Size (bytes)

TCP/UDP

20

IPv4

20

   GRE

8

   IPv4

20

       ESP

40

       IPv4

20

           PPPoE

8

Total

112

A small packet contains more header than data.
Figure 4: A small packet contains more header than data.

To pick an MTU value, you can use one of two ways: (1) choose a low but safe value of 1,400 bytes or (2) calculate the MTU with a web-based MTU calculator [8] and test it. When applied to the tunnel, validate the setting with:

ping IP -l 1450 -f

The demonstration network uses an MTU of 1,450 bytes.

A third option to detect the MTU automatically with Path MTU Discovery was not reliable during lab testing and has introduced issues when forming OSPF neighborships.

Traffic Shaping

A traffic shaper reduces the available packet rate to match the rate of the link. Fast packets only slow down to prevent them from being dropped at the next hop. This leads to a somewhat higher bandwidth because a delayed packet is better than a dropped packet.

The correct value for the traffic shaper matters. Compared with OSPF, the shaper must act on packets that will violate the outgoing bandwidth of the Internet link. A lower value will waste bandwidth and a higher value will make the shaper dispensable. You could even limit incoming traffic with a policer, but that makes no sense in this setup.

Limited Perspective

The DMVPN cloud now offers communication between all peers on OSI Layer 3. Every client can address its target by IP address.

In some cases, or even to satisfy curiosity, an end-to-end communication on OSI Layer 2 is required. The hosts see each other's MAC address, and the WAN becomes one large Ethernet switch. Normally this kind of setup is typical for a data center, when merging virtual environments, or when interconnecting multiple data centers.

The solution for this sounds simple: Just bridge the LAN adapter and the tunnel interface together. However, the underlying TUN kernel module in VyOS refuses this action. Bridging is not supported for multipoint tunnels.

Surprisingly VyOS supports the virtual extensible LAN (VXLAN), which is the perfect match for this setup. The name indicates a LAN environment, but using it in the WAN is possible. VXLAN puts an Ethernet-like layer over the existing DMVPN. In correct terms, the VXLAN is the overlay network, and the DMVPN cloud is the underlay network.

If you really think about using this approach, here are the limitations: Even when it feels like Ethernet to a client, it is actually a WAN environment with packet loss, delay, jitter, and a smaller MTU than most applications would expect from a LAN.

Moreover, a spanning tree is included. A redundant path in the LAN (even if it is a disguised WAN) needs loop prevention, so the complexity of NHRP, IPsec, OSPF, and VRRP is extended by some form of spanning tree protocol.

DMVPN is flexible enough to host an OSI Layer 2 network like VXLAN, although that's not a recommended design. VXLAN on top of DMVPN is more of a workaround when Ethernet connectivity is the main goal.

Show Me the Money

Now that the pros and cons of the alternative DMVPN are exposed, what kind of investment should you expect? Cisco's smallest router for DMVPN is the C881 series and starts at $250. Although this might sound feasible for a home office with limited bandwidth, if you need to saturate an Internet link of 100Mbps, pick a Cisco 1921, which needs a budget of $600. For higher bandwidth, Cisco asks for four digits.

Clearly, open source software will win the race when it comes to nonrecurring costs, but you must also keep a close look on time and business risk. The old catch phrase, "Nobody ever got fired for buying IBM," might be true for Cisco, but not for VyOS.

Graphical Interface?

The chances for a VyOS web interface are low. Brocade does offer a Vyatta web UI for paying customers, and Ubiquiti ships its EdgeOS with a wonderful web-based interface that includes most areas of configuration; however, it binds the web UI to their own hardware by license.

From a technical perspective, a browser front end can communicate through web sockets with the back end (Ubiquiti EdgeRouter). The daemon /usr/sbin/ubnt-util receives the queries and performs the reconfiguration. Unfortunately, this Ubiquiti element is closed source. The software is a MIPS64 binary, which won't run on Intel architecture without an emulator and many dirty tricks.

Conclusions

When the number of remote offices grow faster than the IT team can set them up, it is time for a dynamic VPN mesh. Dynamic multipoint VPN is Cisco's all-purpose solution for scalability in VPN clouds that allows every participating router to establish a direct connection to every other router without additional configuration. This solution truly saves setup effort and reduces delay times.

The free VyOS Linux distribution offers all the required protocols needed to create a new DMVPN landscape or to extend the existing Cisco world. VyOS does a pretty good job at hiding the many complicated Linux tools and routing daemons behind well-know CLI commands. Before deploying, however, pay attention to the limitations that crop up when playing together with Cisco, IPv6, or network address translation. Finally, your DMVPN can reside on hardware or a virtual infrastructure.