Management Huginn Lead image: Lead Image © it Ilka Burckhardt, Fotolia.com
Lead Image © it Ilka Burckhardt, Fotolia.com
 

Aggregating information with Huginn

Smart Collection

Huginn collects information and data from websites and processes and mails it to a user. Huginn also executes predefined actions automatically for certain events; however, setting up the small IFTTT alternative requires some work. By Tim Schürmann

The web application Huginn [1] continuously monitors several user-defined websites and Internet services. On the basis of the information published there, it draws up a summary, which it then mails to your breakfast table. In its report, Huginn might refer to the weather for your location or send an alert when Twitter sees many posts on a keyword such as net neutrality. You might receive an abstract like, "Twitter yesterday had an unusually high number of posts on net neutrality, the new XKCD comic published last night is all about drones, and the weather will be rainy today."

Much like the Internet service IFTTT [2], Huginn automatically triggers actions for certain events (e.g., if the text on a news page changes or the online shop reduces the price of the laptop you have your sights on). It is also useful for Internet of Things (IoT) projects. With a few exceptions, users access Huginn through a web interface.

Sophisticated

Huginn is available under the MIT license and runs on your server. It needs at least 2GB of RAM and a dual-core processor. If you want to run Huginn on a low-power computer like the Raspberry Pi, you need to manage the software's zest for action through configuration. The required modifications are explained in the Huginn wiki [3].

The software is based on Ruby on Rails [4] and therefore requires an installed Ruby environment; in fact, it needs version 2.2 or 2.3 of the reference implementation. The alternative Ruby implementations JRuby and Rubinius will not do the job.

Huginn's acquired data ends up in a MySQL or PostgreSQL database. The web interface delivers Nginx [5], but it should also work with an Apache server without any problems [6]. The developers only guarantee smooth operation on Debian 7 or 8 and Ubuntu 14.04 or 16.04. In principle, the installation also works on other Linux distributions, as well as OS X and FreeBSD.

Installing Huginn requires some work (see also the "Fast, Faster, Docker" box), and Ubuntu 16.04 has a nasty stumbling block: Huginn looks for runit, which resides in the repositories, but requires Upstart as the init system. However, Canonical replaced Upstart with systemd, causing Ubuntu 16.04 to report an error when installing runit.

If you still want to use Ubuntu 16.04, build runit yourself or ignore the error message. In our lab, however, runit ran flawlessly. Huginn also fails on the current Ubuntu 17.04, where the Foreman tool crashes reproducibly when creating the init scripts. In the end, Huginn only ran smoothly on Ubuntu 14.04 and Debian 8 (Jessie).

Package Services

In Debian, you first log in as root and run the sudo command-line tool, which lets you run commands under a different user account:

su root
apt install sudo

You will need a text editor, such as Nano, used in this example. The commands from Listing 1 retroactively resolve all dependencies.

Listing 1: Resolving Dependencies

sudo apt update
sudo apt upgrade
sudo apt install runit build-essential git zlib1g-dev libyaml-dev libssl-dev libgdbm-dev libreadline-dev libncurses5-dev libffi-dev curl openssh-server checkinstall libxml2-dev libxslt-dev libcurl4-openssl-dev libicu-dev logrotate python-docutils pkg-config cmake nodejs graphviz

Besides Ruby, Huginn also needs Bundler, Rake, and Foreman. On Ubuntu 16.04, the following command installs these tools:

sudo apt install ruby2.3 ruby-bundler ruby-foreman ruby2.3-dev rake

Debian 8 and Ubuntu 14.04 only offer Ruby 2.1 and 2.0 in their repositories, so you need to build Ruby and the required helpers from the source code (Listing 2); however, this step also places the responsibility for updating Ruby on your shoulders. The commands in Listing 2 create a temporary directory, fetch the source code for Ruby 2.3.4, compile the code, and install the Ruby environment. The final command installs Rake, Bundler, and Foreman.

Listing 2: Installing Ruby 2.3

mkdir /tmp/ruby && cd /tmp/ruby
curl -L --progress http://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.4.tar.bz2 | tar xj
cd ruby-2.3.4
./configure --disable-install-rdoc
make -j`nproc`
sudo make install
sudo gem install rake bundler foreman --no-ri --no-rdoc

Huginn does not support version management such as RVM, rbenv, or chruby. If you do use a version manager, you can expect hard-to-identify problems.

Huginn comprises the web interface and a matching background service that retrieves the information. For security reasons, you can use the command:

sudo adduser --disabled-login --gecos 'Huginn' huginn

to create an account for the huginn user.

Storage Space

The next missing ingredient is the database. The following command installs MySQL on your computer:

sudo apt install -y mysql-server mysql-client libmysqlclient-dev

MySQL has an all-powerful root database user who can tweak the complete database. During the installation of MySQL, you will want to devise the best possible password for this user and type it twice in succession. Once the MySQL packages have been imported, run the command:

sudo mysql_secure_installation

After a few prompts, the script that is launched saves the database. After entering the password for the root MySQL user, proceed to say "no" to the first prompt (about changing the password) by pressing n. However, confirm all other prompts by pressing y. When the script is done, you need to grant Huginn access to MySQL in the MySQL command-line client, which you launch with the command:

mysql -u root -p

To log in, again use the password for the root MySQL user, and at the prompt, enter the commands from Listing 3, which create a user account for Huginn in MySQL; select InnoDB as the storage engine; and allow Huginn to access a database named huginn_production. Replace the <secret> in the first command with a password that is as difficult to guess as possible; then, check whether MySQL has created the new user account correctly by entering:

sudo -u huginn -H mysql -u huginn -p -D huginn_production

Listing 3: Creating the Database User

CREATE USER 'huginn'@'localhost' IDENTIFIED BY '<secret>';
SET default_storage_engine=INNODB;
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, LOCK TABLES ON `huginn_production`.* TO 'huginn'@'localhost';
exit

When you type the password for the huginn user (<secret> in this case), you should see an ERROR: 1049 Unknown database error. Otherwise, something has gone wrong creating the account.

Getting Huginn

Once the database is running, run the following commands as the huginn user to load the source code for the software into the matching home directory:

cd /home/huginn
sudo -u huginn -H git clone https://github.com/cantino/huginn.git -b master huginn

The Huginn developers do not release stable versions; they only provide the current source code on GitHub (in the master branch). The second command therefore always automatically downloads the current Huginn version.

No fewer than two sample configuration files included with the distribution simplify the setup, which you simply copy to the appropriate location. Listing 4 displays the required commands and creates some subdirectories needed by Huginn – including directories for the logs – and then modifies the access permissions. The command

sudo -u huginn -H bundle install --deployment --without development test

finally installs the additional packages required for Ruby (the Gems).

Listing 4: Creating Subdirectories for Huginn

cd huginn
sudo -u huginn -H cp .env.example .env
sudo -u huginn -H cp config/unicorn.rb.example config/unicorn.rb
sudo -u huginn mkdir -p log tmp/pids tmp/sockets
sudo chown -R huginn log/ tmp/
sudo chmod -R u+rwX,go-w log/ tmp/
sudo chmod -R u+rwX,go-w log/
sudo chmod -R u+rwX tmp/
sudo -u huginn -H chmod o-rwx .env

Well Met

After completing the installation marathon, it's time to configure Huginn. To begin, determine the MySQL version number by typing mysql --version. Note whether the number displayed here is greater than or equal to 5.5.3; then, call

sudo -u huginn -H rake <secret>

This command returns a long cryptic string that is copied to the clipboard for use later. Now, working as the huginn user, open the .env configuration file in a text editor:

sudo -u huginn -H nano.env

At the start of the file, replace REPLACE_ME_NOW! with the cryptic string mentioned above. In the Database Setup section, replace huginn_development with huginn_production:

DATABASE_NAME=huginn_production

Additionally, the Huginn operator needs to replace root with huginn:

DATABASE_USERNAME=huginn

To the right of DATABASE_PASSWORD=, type the password in quotes used by the Huginn account to log in to MySQL (i.e., <secret> in this example).

If the MySQL version you are using is version 5.5.3 or newer, then replace utf8 with utf8mb4:

DATABASE_ENCODING=utf8mb4

The results should look like Figure 1. At the bottom of the config file, remove the leading hash (#) in the line:

# RAILS_ENV=production
These settings give Huginn access to the huginn_production database.
Figure 1: These settings give Huginn access to the huginn_production database.

For Huginn to send email, you need to store the access credentials for an SMTP server: Search for the Email Configuration section in the lower part of the configuration files. Enter the domain name of an SMTP server after SMTP_SERVER= and the port after SMTP_PORT=.

If Huginn needs a username and password to log in to the SMTP server, enter these after SMTP_USER_NAME= and SMTP_PASSWORD=. If necessary, adjust the authentication method after SMTP_AUTHENTICATION= and the encryption after SMTP_ENABLE_STARTTLS_AUTO=. Comments describing the settings can be helpful. Finally, add the sender address after EMAIL_FROM_ADRESS=.

In Nano, the keyboard shortcut Ctrl+O saves the changes, and Ctrl+X takes you back to the command line. The next step is to set up the database with the commands from Listing 5. The first line creates the database, and the second line updates it. So that you can immediately log into Huginn in the web interface, the third command sets up an admin account with the password supersecret. The final command in Listing 5 prepares all the files required by the web interface, including the JavaScript code.

Listing 5: Setting Up the Database

sudo -u huginn -H bundle exec rake db:create RAILS_ENV=production
sudo -u huginn -H bundle exec rake db:migrate RAILS_ENV=production
sudo -u huginn -H bundle exec rake db:seed RAILS_ENV=production SEED_USERNAME=admin SEED_PASSWORD=supersecret
sudo -u huginn -H bundle exec rake assets:precompile RAILS_ENV=production

If the first command prompts an error message or requires the password for the root MySQL user, the access credentials in the .env configuration file are incorrect. In this case, check whether the correct password typed after DATABASE_PASSWORD= is in quotes.

Init Scripts

The next step is to create a number of init scripts. To begin, type

sudo -u huginn -H nano Procfile

to create the configuration file, or Procfile. Comment out the following two lines in the file, by adding hash signs at the start:

# web: bundle exec rails server -p ${PORT-3000} -b ${IP-0.0.0.0}
# jobs: bundle exec rails runner bin/threaded.rb

Conversely, remove the hash signs from the following two lines in the PRODUCTION section:

web: bundle exec unicorn -c config/unicorn.rb
jobs: bundle exec rails runner bin/threaded.rb

After saving the files, generate the scripts, set up log rotation, and check whether Huginn is running:

sudo bundle exec rake production:export
sudo cp deployment/logrotate/huginn /etc/logrotate.d/huginn
sudo bundle exec rake production:status

If you want to change one of the Huginn configuration files – .env, unicorn.rb, or Procfile – you need to regenerate the init scripts with:

sudo bundle exec rake production:export

Nginx delivers the web interface, and Huginn at least comes with a few prebuilt configuration files, which you only need to sign for the web server. If Huginn is the only website served up by Nginx, the fourth command disables the default Nginx page (Listing 6).

Listing 6: Installing and Configuring Nginx

sudo apt install nginx
sudo cp deployment/nginx/huginn /etc/nginx/sites-available/huginn
sudo ln -s /etc/nginx/sites-available/huginn /etc/nginx/sites-enabled/huginn
sudo rm /etc/nginx/sites-enabled/default

Just one small setting is missing in the Nginx configuration file. In Nano, open the file

sudo nano /etc/nginx/sites-available/huginn

and replace the YOUR_SERVER_FQDN placeholder with the domain name of the host running Huginn. When in doubt, or on a test system, this is usually localhost. In any case, make sure you close the line with a semicolon. After saving, entering

sudo nginx -t

checks the configuration file for typos. If everything is correct, type

sudo service nginx restart

to restart Nginx.

Point and Click

Installation is finally complete. In the future, Huginn and Nginx will start automatically at system bootup. To manage Huginn, the admin calls it in the browser. If it is running on the local system, you should see the page shown in Figure 2 after typing localhost into the address bar. Clicking Login and entering username admin and the supersecret password puts you in the user interface, where you should change the password under Account | Account and enter the correct email address to which Huginn later sends the summary.

This page appears if the Huginn installation worked.
Figure 2: This page appears if the Huginn installation worked.

In Huginn, agents collect and process data from websites, and users manage these with the Agents menu option (Figure 3). For example, the XKCD Source agent regularly checks for a new XKCD comic, and the Afternoon Digest agent sends email to the user every evening. The arrows in the first column indicate whether the agent collects data (left arrow) or outputs data (right arrow). A double arrow means that the agent both receives and outputs data.

After the install, Huginn already has seven sample agents.
Figure 3: After the install, Huginn already has seven sample agents.

The Schedule column tells you when the agent is active, and the Working? column needs a green Yes here for the agent to work. You can launch an agent manually using Actions | Run on the right end of the agent line.

When an agent has collected data, it generates an event that is typically the raw data (i.e., the title and description of the current XKCD comic, among other things in the example). The Events menu item lists all the events (Figure 4). The list is initially empty, but later, the content of the respective event and the information collected can be displayed by selecting Show.

Each agent generates one or multiple events, which can lead to a pretty long list.
Figure 4: Each agent generates one or multiple events, which can lead to a pretty long list.

The events serve as a data source for other agents. For example, the Comic Formatter, which wraps the title of the comic in HTML tags, converts the information retrieved from XKCD Source. Because it waits for the events from XKCD Source, it automatically processes the delivered data. If the agent fails to react to an event, you can resend the event with the Re-emit button. Clicking Agents | View diagram reveals which agent passes its data to another agent (Figure 5).

The XKCD Source agent passes the XKCD comic description to the Comic Formatter, which hands over the neatly formatted text to the Afternoon Digest email agent.
Figure 5: The XKCD Source agent passes the XKCD comic description to the Comic Formatter, which hands over the neatly formatted text to the Afternoon Digest email agent.

Because the view can become cluttered if you have a large number of agents, Huginn groups them, describing the groups as scenarios. For example, Huginn groups all the agents that tap into Twitter into a scenario named Twitter, whereas the agents in the second scenario, Weather, unsurprisingly process the weather data. You can freely decide how many scenarios to set up and how to distribute the agents across them. The Scenarios menu option lists the existing scenarios. Huginn comes with a standard default-scenario. Clicking on a scenario displays the agents it contains, and clicking New Scenario creates a new one. Anyone who thinks that scenarios are too complicated can simply ignore them.

007 Colleagues

To create a new agent, select Agents | New Agent. Below Type, decide which data the agent should fetch or process. In addition to specialized agents (e.g., that access Twitter), agents can be used for general tasks. For example, Website Agent cuts text from any web page, and Rss Agent taps into a newsfeed. Finding the right type for a desired action is not easy, because Huginn does not sort the list alphabetically. You can use the input field to search for the name or activity of the targeted service.

After selecting the type of agent, a description of the agent appears in the gray box on the right (Figure 6). On the left side, you add a short description in the Name box. Schedule determines when, or at what intervals, the agent does its work. Huginn remembers generated events forever. If there are many, it can quickly use up the available disk space. An agent can thus delete older events, if necessary. The setting under Keep events determines when this happens. Make sure not to choose too short a time interval, because this is the only way to give other agents a chance to keep processing events.

Creating an agent that taps into the RSS feed provided by Linux Magazine online.
Figure 6: Creating an agent that taps into the RSS feed provided by Linux Magazine online.

If an agent adopts another agent's data, you need to specify this agent as a data source below Sources. Similarly, Receivers groups those agents to which other agents transfer data.

To select an agent, click on a free area of the input field and select a suitable one. The list only offers you the existing agents. If you want to add additional agents as sources or receivers, click again in an empty area of the input field. Checking the box next to an agent's name lets you remove it. Along the same principle, you can choose the desired scenario under Scenarios. In case of doubt, just go for default-scenario.

The settings under Options are a function of the respective agent. For example, if the agent taps into an RSS feed, the url field shows the Internet address of the feed. To change a setting, simply click on its value. The text in the gray box on the right-hand side explains the meaning of the individual settings. You will find some other settings that go beyond those in Options. Pressing the small plus symbol adds a new setting.

Clicking Dry Run checks whether the agent works as desired. The generated event appears in a new window, and the Logs tab collects any errors that occur. If everything works in the dry run, a click on Save finally creates the agent. If something goes wrong in spite of a successful dry run, you can edit the settings retroactively by selecting Agents (Figure 7) and then Actions | Edit Agent.

If you click on the name of an agent under Agents, you are treated to an overview of all your settings and the generated events.
Figure 7: If you click on the name of an agent under Agents, you are treated to an overview of all your settings and the generated events.

Pass Through

Events generated by RSS agents and others typically contain unformatted raw data. The EventFormattingAgent promises to beautify this data, but first it has to learn how the events it is supposed to process appear. To do this, go to the list to the right of Events and perform a Dry Run as described before. To experiment, you can trigger the latest run by pressing the Actions button in Agents at any time. For example, the Rss Agent produces an event like that in Listing 7 (excerpt) for each message in the RSS feed.

Listing 7: Event from an RSS Feed

01 [...]
02 {
03         [...]
04         "url": "http://www.linux-magazine.com/NEWS/Gartner-worldwide-server_sales_drop",
05         "links": [
06         {
07                 "href": "http://www.linux-magazine.com/NEWS/Gartner-worldwide-server_sales_drop"
08         }
09         ],
10         "title": "Gartner: Worldwide server sales drop",
11         "content": "        <p>\n\tAccording to Gartner, fewer servers were sold in the first quarter of 2017 than in the previous year. Bucking the trend, two Chinese manufacturers were able to substantially boost the number of units sold.    <\/p>",
12         [...]
13 }
14 [...]

With this knowledge in mind, create a new EventFormattingAgent – I am using the RSS Agent as an example. The idea is for the formatted information from the RSS feed to appear in the afternoon email.

The Afternoon Digest Agent would be the right choice under Receivers. The EventFormattingAgent later outputs the text stored after "message": in the Options box. You need to paste the information from the event into this text box using placeholders. The last placeholder uses the same name as the matching information in the event, but you need to wrap this name in double curly brackets.

In the example, the agent would swap the {title} placeholder with the Gartner: Worldwide server sales drop text. In other words if you add the text Linux Magazine reports: {{title}} in the message line, the email sent in the afternoon reads Linux Magazine reports: Gartner: Worldwide server sales drop.

After successfully creating the agent, ensure that the Rss Agent delivers its data exclusively to the EventFormattingAgent. If several agents are linked, it takes a while for Huginn to activate them. By default, the software forwards events once a minute.

Following the same principle, you can set up any number of additional agents and link them. The so-called trigger agent is particularly useful: It executes a freely selectable action as soon as a user-defined event occurs. Other interesting receivers besides the morning and afternoon digests include the Shell Command Agent, which launches a command-line command, and the Twitter Publish Agent, which outputs a tweet.

That said, some agents only work if you modify the accompanying .env configuration file up front – including, for example, all the Twitter agents. To get these up and running, you first need to register a new app on Twitter and add the credentials for the app to the .env file after TWITTER_OAUTH_OAUTH_KEY= and TWITTER_OAUTH_SECRET=. By the way, if you are familiar with Ruby, you can add their own custom-made agents to the existing collection [9].

Conclusions

Make sure you schedule sufficient time for installing and setting up Huginn. The "documentation" is pretty much an incomplete and partly obsolete wiki [10]. However, once the software is running, it reliably aggregates information and triggers actions, and the data remains on your own server.