Security OPNids Lead image: Lead Image © sarah maher, 123RF.com

OPNids: Suricata with built-in machine learning

Packet Checker

Does OPNids combine the Suricata IDS with machine learning to detect attack threats automatically, as advertised? By Martin Loschwitz

Intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are some of the classic tools in the administrator's toolbox to counter sophisticated attacks. One popular Linux candidate is the highly functional Suricata [1] (Figure 1).

Figure 1: Suricata is a comprehensive tool for detecting digital attacks. © Linux Screenshots (USA)

Various databases house online attack signatures for Suricata, which the tool uses to examine the data traffic and detect attacks. Much like antivirus programs, however, Suricata can only detect attacks with identified signatures. With the progress made in machine learning over the past few years, efforts have been made to generate new signatures automatically with artificial intelligence (AI) and to enter them into Suricata.

In this article, I first look into Suricata in detail and then introduce the Dragonfly machine learning engine (MLE) [2] specifically designed for Suricata. Finally, I look at OPNids [3], a fork of the OPNsense firewall and routing software that integrates Suricata and Dragonfly.

IDS and IPS

First, however, I want to focus briefly on terminology. An IDS merely examines data traffic for known patterns as it passes by, whereas an IPS also can manipulate the traffic and, if necessary, flip a kill switch as soon as it detects an attack pattern. The Suricata presented here offers both functions (i.e., it can act both as an IDS and an IPS). For the sake of simplicity, I will be filing Suricata under IDS in this article, but this does not exclude the IPS part of the tool.

How IDS Systems Work

For an IDS system to check incoming traffic for known signatures, it must first see the traffic. The same is true for outbound traffic: A common misconception is that attacks are identified primarily by suspicious network traffic. Active malware such as unwanted Bitcoin miners, for example, can only be identified by a large volume of unwanted traffic suddenly trying to find its way into the outside world. One way or another, for an IDS to have its full effect, it must be able to see the packets passing in both directions.

One idea is to roll out Suricata on load balancers, because they see most of the traffic in an environment. The IDS would then be a simple loop, so to speak, that would be attached to the respective load-balancing software and examine the data traffic before it was forwarded to the target systems. However, load balancers and corresponding appliances usually do not have the resources for comprehensive analysis of network traffic, and having an IDS as a proxy between the inside and the outside of a setup would inevitably create a bottleneck that would become a problem over time.

IDSs therefore adopt a different approach. Virtually every switch operating system (e.g., Junos OS by Juniper or Nexus by Cisco) offers the ability to set up a mirror port, wherein you use rules to tell the switch what traffic to investigate. The devices then copy this traffic to a separate port declared up front as a mirror. Then, you attach an IDS system to this switch port, and the IDS sees the mirrored data traffic for the other nodes in the system. Of course, this functional principle rules out the possibility of using IPS functions – to do so, the IPS system would need to reconfigure the firewalls – but at least attacks can be reliably detected.

What Suricata Can Do

Suricata is justifiably considered a prime example of a comprehensive and well-functioning IDS. The open source software, licensed under the GPL, has a history and is considered stable and mature. Suricata's distribution matches its reputation: Packages are available for all relevant distributions, mostly from official repositories. If no suitable package can be found there, third-party providers maintain Suricata PPAs (e.g., for Ubuntu). In the steps that then follow, you simply build a configuration and adapt it to your local environment.

As already mentioned, Suricata has an internal engine that manages all rules for the IDS. With a separate tool, known as the Oinkmaster [4], you could even create a set of rules specific to a particular application. A number of authors have published their personal rules in online directories of templates.

However, you should not install all rules blindly; instead, make a meaningful selection up front. It would make no sense at all to let Suricata search for SMTP packets in the data traffic if you don't run an SMTP server yourself. Every rule that Suricata has to apply to packets costs system resources. At the end of the day, most admins build a subset that contains a set of Suricata rules that perfectly fits the respective usage scenario.

Verbose Logging

As mentioned earlier, the issue of rigid rules for the IPS and IDS is somewhat problematic. On the one hand, such a system forces you to update the signatures continuously; on the other, attacks may already be taking place in the wild that have not yet been systematically described and for which no signatures are available.

It would be practical if Suricata could automatically teach itself to recognize attacks on the basis of incoming data. The idea is not completely absurd – it works with spam, for example. One of the central features of tools like SpamAssassin is that they self-adapt to incoming messages and automatically adjust their rules to reflect new situations on the basis of statistics and with user support. Nobody would talk about AI and machine learning in spam filters yet, but the principle is the same.

The bad news is that Suricata does not currently provide such functions, even though the program at least provides the basis to build such a system. The magic word is EVE (extensible event format) [5], which describes Suricata's extremely flexible log function.

EVE Lists Most Details

As a rule, when you enable EVE output in Suricata (Figure 2), the log entries in JSON format end up in a central file named eve.json. EVE is not easily fooled. It defines the traffic type or central details like the size of a request in a JSON field for different types of incoming traffic. Suricata is also able to identify and interpret the most important protocols. If someone tries to take control of a website with an HTTP request, Suricata extracts this information from the packet stream and records it in a logfile.

Figure 2: An EVE interface lets you transfer Suricata events to other systems. © Elastic

However, EVE is by far not the only logging function Suricata offers. Moreover, the program can discover specific information about packet flows and store it in a flow logfile. Suricata also maintains a separate file containing nameserver requests for DNS. In other words, it has all the details that an MLE needs to train Suricata with defined parameters. This is where Dragonfly MLE for Suricata comes into play.

Dragonfly MLE as Quasi-AI

Dragonfly MLE was created by the authors of OPNids. It works in three phases: First, it attaches itself to one or more data sources, such as EVE logs from Suricata. Second, it analyzes the events found in these logs and correlates them. Third, it outputs the result of its deliberations to a sink, which can be a simple file or socket in Lua. The trick is that other tools that also use Lua can tap into those sinks and derive their sets of rules from them. This functionality is exactly what Suricata offers. If you throw a Lua sink [6] to the tool, it automatically adopts it, including the ruleset.

The process I just described sounds a bit theoretical, but a practical example will clarify what I mean. To begin, assume that you have a running Suricata system that writes incoming events to an EVE log at regular intervals. At the same time, you install an instance of the Dragonfly MLE system, which runs as a daemon in the background, taps directly into the EVE stream in the Suricata log, and discovers everything Suricata is doing.

Once Dragonfly MLE has assimilated the incoming data, it passes through the analyzer layer. The analyzer's central task is to investigate the information provided by Suricata and create a weighting according to the results. Dragonfly MLE already comes with some analyzers, but admins and users are encouraged to write and use their own variants. The factory-supplied analyzers evaluate traffic by unusual countries of origin or strange time stamps.

In principle, it works like SpamAssassin. A specific number of points is assigned for individual criteria. If an event has a sufficiently high score at the end of the calculation, Dragonfly forwards this information to the Lua sinks mentioned above. If you configure Suricata in the intended way, it retrieves the information from the sink provided by Dragonfly MLE, creating a kind of cycle of events and resulting in actions and extended rules on the Suricata side.

Referring to this process as machine learning would be inaccurate. Unlike complex neural networks, the system does not improve its results autonomously from feedback, which is the core of a learning process. It is more about filters that work a bit more intelligently because the user weights them.

Not Off the Rack

Although Suricata is ready to be installed out of the box, Dragonfly MLE is a different matter. The product is available as free software on GitHub, not from repositories of the major distributors, which this means work on your end. You need to install Dragonfly MLE manually, build a package, or put it into a Docker container.

A container would be the most elegant and cleanest way to run the MLE, because the underlying system remains untouched. However, this approach is not very convenient, either, because the whole process of integrating Suricata and Dragonfly MLE has to be completed manually.

OPNids to the Rescue

OPNids, which can be found on GitHub [3], could come in handy now. Although not written by the OPNsense developers, it is a direct fork of OPNsense; therefore, it makes sense to take a brief look at the starting point.

OPNsense (Figure 3) modestly describes itself as a high-end open source security firewall. At its core, OPNsense operates as a stateful firewall with on-board packet filtering. However, it is enriched with all kinds of functions. OPNsense comes with its own graphical interface (dashboard) and its own management tool for active firewall rules, as well as two-factor authentication and a traffic shaper, which can kill the data flow if necessary. OPNsense can act as an endpoint for a classic VPN connection and as a proxy with a cache function for outgoing traffic.

Figure 3: OPNsense is a popular firewall appliance based on free software. © OPNsense

The OPNsense component list also includes an IDS in the form of Suricata (Figure 4) that can be controlled and administered from a GUI, bringing Suricata into action even faster than on standard systems. The program has a good set of standard signatures, so you can save the trouble of trial and error.

Figure 4: OPNsense includes Suricata out of the box, but the Dragonfly MLE component for automatic learning is missing. © OPNsense

OPNids Derivative

OPNsense is available under a completely free license, so it was no problem for the OPNids developers to use it as a basis for new software. If the OPNsense developers had addressed the issue of machine learning, they would probably have integrated that functionality into OPNsense right away. The fact that this did not happen could still become a problem for OPNids – but more about that later.

If you want to use OPNids instead of OPNsense, you might encounter problems, because OPNids offers a single feature that OPNsense lacks, and OPNsense offers a powerful feature set, but not the MLE functionality. As a true drop-in replacement, neither of the solutions can replace the other.

Complex Build

Unlike the developers of OPNsense, the OPNids developers do not currently provide ISO images of their product. You cannot download a single file from which to install the OPNids appliance. However, the OPNsense tools and the OPNids GitHub directory can be used to build ISO images – as mentioned in a manual [7] – although not very conveniently.

Once you've fought your way through the process, the basic OPNsense framework of OPNids is ready for action. Additionally, the Dragonfly MLE-specific entries are immediately available in the web GUI, but for incomprehensible reasons they are disabled in OPNids in the default installation, so you have to enable them first. You also has to remember to define the port on the machine running OPNids as a mirror port on the switch; otherwise, OPNids will not see all the packets in its environment.

Annoyingly, although OPNids integrates Dragonfly MLE, you still have to configure central settings like teaming up Suricata and Dragonfly MLE manually. This process simply feels as if the OPNids developers have not completed their work (Figure 5).

Figure 5: Although OPNids comes with a GUI plugin for Dragonfly MLE, it turns out not to be very useful. © OPNids

Uncertain Future?

This awkward impression is also backed by something that came to light during working on this article: The OPNids website [8] suddenly disappeared from the web. When called up at press time, all I got back was a Connection Refused, which cannot be the result of a lack of popularity for OPNids, because many other sites link to it. This made me suspect that the overhead of maintaining what is in part an OPNsense fork is simply too much for the OPNids developers. Although diversity is one of the key factors in the FL/OSS world, sometimes it makes more sense to join forces than to do your own thing.

The amount of work that an admin has to invest to use OPNids in a meaningful way by far exceeds the effort needed to configure Suricata and OPNids manually on an off-the-shelf Linux system. Also, the GUI, which takes care of the Suricata and Dragonfly MLE configuration in OPNids, is missing, although it only really helps you with OPNids for Suricata.

The part that deals with Dragonfly MLE, on the other hand, offers little added value. Whether you type the names of Suricata logfiles to be monitored in a configuration file or a path box in the web interface doesn't matter in the end. The Suricata in OPNids is a standard configuration – you could even set automatically where the EVE log resides and how Dragonfly MLE can access it. Instead, the OPNids interface forces you into a bout of pointless mouse pushing.

Dragonfly MLE is still under active development, but the last commit in the OPNids GitHub repo was quite a while ago. I can't rule out hearing the OPNids swan song some time soon. Given the current state of the project, this would not be a tragedy either.

Conclusions

The idea of automatically detecting attacks on the basis of various factors and rating them with a points system seems clever and makes a lot of sense. If it works for SpamAssassin, for example, it can also be applied to an IDS: The patterns of incoming traffic have certain characteristics, so the probability that something undesirable is lurking out there increases. If you train an engine for IDS rules to recognize these patterns and adapt its heuristics in a meaningful way, it can actually detect attacks autonomously.

Dragonfly MLE was explicitly designed for use with Suricata, which means that its developers have undoubtedly put their money on the best horse in the open source IDS systems stable. Combined with OPNids, however, it is not very exciting. OPNids seems more dead than alive. It remains to be seen whether it really makes sense to use a half-baked fork of a well-functioning solution in the production environment of a setup. Most admins would intuitively say no to this question anyway.

Luckily, Dragonfly MLE development is still ongoing, and if you run Suricata and the MLE on Linux, you get a functional combination for automatic threat detection, and it is precisely what I would advise you to do at the end of the day. OPNids is certainly a neat idea, but it is currently not possible to use it in a meaningful way.