Feature Puppet Lead image: Photo by Intricate Explorer on Unsplash

Protecting the production environment

Methuselah

Puppet, the ancient rock of configuration management, is not easy to learn, but the program rewards admins with flexibility and security for those willing to tackle the learning curve. By Lennart Betz

Puppet is the Methuselah among solutions for configuration management, matured for a proud 15 years and currently at version 7. In contrast to Ansible, Puppet takes a declarative approach (i.e., it describes the state of a resource and not how to achieve it).

Listing 1 declares the kermit account, which must exist and must belong to the muppets primary group. A gonzo user must not exist at the same time. Puppet must therefore be able to determine the current state and independently change it to the declared, desired state.

Listing 1: Resource Declarations

user { 'kermit':
 ensure => present,
 gid    => 'muppets',
}
user { 'gonzo':
 ensure => absent,
}
group { 'muppets':
 ensure => present,
}

Resource Abstraction Layer

A major role in how Puppet accomplishes this task is played by the resource abstraction layer (RAL). This core element in Puppet is also responsible for platform independence. To do this, RAL distinguishes between types and providers. A type defines the properties of a resource like a user. These properties include parameters such as gid, home, or shell. Each type must have at least one provider that describes how the current state is determined and how the desired state can be achieved. The provider type is a metaparameter, because it is always available with every resource.

Figure 1 also shows a package type, which is used to take care of various software packages. If more than one provider is assigned to a type, there is always a default provider, which can differ depending on the platform. For example, on Red Hat-based systems, the Yum package manager is the default for package, whereas Debian derivatives use Apt instead. If you also wants to manage GEM packages, you need to specify the provider explicitly.

Figure 1: Architecture of the Puppet resource abstraction layer (RAL).

Providers always use the standard system tools for their work. To manage a service on a RHEL version 7 or higher, Puppet uses systemctl. The useradd, usermod, and userdel tools are for users. Puppet's behavior always adapts internally to match the underlying system. If you are acting on a level that abstracts this complexity, you do not need to adjust, but you do need to check how a system that is new to you reacts to the various requirements.

Develop and Test

Once the Puppet agent is installed on the target workstation (preferably in a virtual machine), you need to test the code – the manifest in Puppet-speak – with the command:

$ sudo puppet apply ./test.pp

Vagrant [1] has proven its value as a management framework for virtual machines (VMs). In addition to fast provisioning of test VMs at the command line, it offers an uncomplicated option for mounting local directories as a filesystem on the VM, which means you can work on your own workstation with your favorite development tools and quickly test the manifest with puppet apply.

Dependencies

In the context of resources, the order of processing is of particular importance in Puppet. A second look at Listing 1 raises the issue of why the muppets group is at the end, although the resource declaration refers to this group at the kermit section earlier on. Also, although I just talked about Puppet running useradd on Linux, if you have ever tried to specify a group that does not exist on the system with -g switch when creating a user, you will know that the attempt throws an error.

Puppet does not process resources according to their order in the manifest but determines the sequence itself. In doing so, dependencies between certain resource types are implicit. In this specific case, this means that the muppets group must be processed before the kermit user, which requires you to manage both the group and the user. Failure to do so will also cause Puppet to throw an error. Another example is the relationship between a file and the directory in which it resides. If both are in the manifest, Puppet always takes care of the directory first.

In addition to implicit dependencies, there are also dependencies that Puppet cannot detect. A typical example in the Unix environment is the sequence for setting up a service. You have to (1) install the package, (2) adjust its configuration file, and (3) start the associated service. If step 3 is done before step 2, the daemon will run, but not with the desired configuration. Starting with step 3 results in a fatal error. Both scenarios could be fixed with a second Puppet run, but this contradicts the paradigm of idempotency: The state after one run has to be the same as after any number of runs.

The Puppet before and notify metaparameters ensure that Puppet applies its own resource before the referenced resources. You can use notify to cause the service to restart if changes are made to the configuration file, as shown in Listing 2. Conversely, require and subscribe have the opposite effect. The latter detects changes to the referenced resource and triggers a restart of its own.

Listing 2: Puppet apache Class

01 class apache(
02  String $package_name,
03  Stdlib::Absolutepath $config_file,
04  String $service_name,
05  Stdlib::Ensure::Service $ensure = 'running',
06  Boolean                 $enable = true,
07 ) {
08  package { $package_name:
09  ensure => installed,
10  before => File[$config_file],
11  }
12  file { $config_file:
13  ensure => file,
14  content => template('apache/httpd.conf'),
15  notify  => Service[$service_name],
16  }
17  service { $service_name:
18  ensure => $ensure,
19  enable => $enable,
20  }
21 }

Not every resource type offers a restart feature. The resource types that do offer one include service and exec. The exec type lets you run arbitrary commands or scripts and offers parameters that help to maintain idempotency.

Templates, Functions, Classes

One of the most important operations is managing and manipulating configuration files. Puppet, written in Ruby, makes use of the Ruby template engine [2] to assemble the contents of files dynamically. You call it from within Puppet by the template function. Puppet already comes with a variety of functions out of the box, but it can also be extended with user-defined functions.

Listing 2 demonstrates yet another important construct, the class. A Puppet class groups multiple resources, to which actions are then applied jointly. A class can be used only once per system or node. If, on the other hand, you need a separate type of resource for the application of multiple resources, you can create a defined resource for this purpose. In both cases, the distinction between the definition and declaration is important. Listing 2 shows a class definition that does nothing on its own. Only a declaration like that in Listing 3 leads to the application of the desired actions.

Listing 3: Class Declaration

class { 'apache':
 package_name => 'httpd',
 config_file  => '/etc/httpd/httpd.conf',
 service_name => 'httpd',
}

Variables

Classes and defined resources can be parameterized by variables, which always use a $ as a prefix. The = operator assigns a value to a variable. In a parameter list, the equals sign sets a default. Variables, as shown in the example, contribute to platform independence. Paths and names differ from distribution to distribution. Although the scope of a variable is limited to its class, it is possible to access a variable of another class if necessary (Listing 4, line 15).

Listing 4: Multiple Classes

01 ./apache/manifests/init.pp
02 class apache(
03  String $package_name,
04 ) {
05  class { 'apache::install': }
06  -> class { 'apache::config': }
07  ~> class { 'apache::service': }
08  contain apache::install
09  contain apache::config
10  contain apache::service
11 }
12
13 ./apache/manifests/install.pp
14 class apache::install {
15  $package_name = $apache::package_name
16
17  if $facts['osfamily'] == 'redhat' {
18  package { 'mod_ssl':
19  ensure => installed,
20  }
21  }
22
23  package { $package_name:
24  ensure => installed,
25  }
26 }

In the context of variables, Puppet also provides conditions such as if-else and case to control the flow depending on the value of a variable. For example, on a Red Hat system, for HTTPS on the Apache web server, you also need the RPM package mod_ssl (Listing 4, lines 17-21), which other distributions do not require.

Since Puppet 4, you can and should typecast parameters to eliminate the need for validations and to standardize error messages. Two of the types used in Listing 2 for $config_file and $ensure are not included in Puppet, but have to be loaded by the standard library (stdlib) as a module to extend the language scope.

Modules

In addition to data types, modules also include functions, classes, defined resources, templates, and much more, all of which must be located in certain subdirectories of the module directory [3] so that Puppet always finds them (autoloading).

Put simply, a module is a summary of all the resources needed to configure a piece of software. Puppet Forge [4] serves as a community portal for modules. In addition to many modules from hobbyists, a large number of professionally maintained modules can be found here, some from Puppet itself (e.g., for Apache) or from the Vox Pupuli project [5].

Each module has its own namespace, which always corresponds to its name. Class or defined resource names start with the module name; the same applies to data types (see Listing 2). Listing 4 shows the approach to further subdivide a class in a module by dividing the tasks into installation, configuration, and service.

The arrows between the class declarations are an alternative notation for setting the order, wherein -> corresponds to a before and ~> to a notify. Therefore, each resource of one class is processed before those of the other. The opposite case also exists. Any resource that needs to trigger a service restart when a change is made goes into the apache::config class, and only the resource affecting the service goes after apache::install. Now you only have to consider the dependencies of resources within the respective class.

The contain statement lets you declare classes whose resources also belong to the parent class, unlike include, and is the only way to make sense of a dependency definition (e.g., for the apache class) [6]. Idempotency allows a declaration with class and then another with include or contain, but not vice versa.

As you will see later, these modules are a major strength of Puppet. First, however, look at how Puppet communicates with the systems to be configured in the first place and decides which resources belong to which host with which parameters.

The Configuration Cycle

The controlling centralized server is called the Puppet master. On each of the servers to be configured (the nodes), you then install a Puppet agent, which runs as a daemon and connects to the master every 30 minutes (configurable). This TLS-encrypted connection is authenticated by a certificate on both sides. The master usually also assumes the certification authority (CA) role for certificate management.

After the connection is established, the agent transmits the facts it has determined to the server (Figure 2), which are then available there as variables in a scope superordinate to the classes, the top scope. Facts contain important information about the type and version of the operating system (e.g., Listing 4, line 17), the hardware, mounted filesystems, or the content of custom facts.

Figure 2: The data flow while processing a Puppet run.

The server uses the facts to determine the manifest for the querying node. This information can reside on the server as:

node "certname.node.one" {
 include role::cms
}
node "certname.node.two" {
 include role::webserver
}

In most cases, though, it is obtained over an interface from an external node classifier (ENC) such as Foreman or a configuration management database (CMDB). Puppet compiles the manifest of classes, variables, templates, and resources determined in this way on the server to create a catalog. It does not contain additional classes, templates, or variables and is sent back to the agent.

The agent receives the desired state for its node in the form of the catalog, uses the RAL to check the current state of resources contained in the catalog, and if necessary, converts them to the desired state. Finally, the agent sends the log of this Puppet run as a report to the master, which passes it on with a corresponding handler. Possible destinations for a report include logfiles, PuppetDB, or Foreman.

Maintaining Your Code

Your code, which will usually be a mixture of your own modules and many upstream modules from Puppet Forge, is best maintained in a Git repository. The default is to run a control repository that stores a list of modules (in the Puppetfile file) along with the module versions to use. These modules can come from the Forge or from other Git repositories.

Self-written or patched upstream modules are usually maintained in the same Git as a separate repository with their own versions. If you don't use an ENC, the control repository in manifests/site.pp contains the declaration of the individual nodes, as defined above.

In a branch of the control repository, you can store other module versions, such as newer, untested versions. When you push them into one of the branches, the r10k software [7] (in the puppet-bolt package) assembles a separate Puppet environment on the server from each branch and the information about modules there. The ENC controls which environment the agent requests. In this way, any number of different test scenarios can be set up and tested extensively before being transferred to production. A push into a Git repository belonging to the module also triggers an r10k call, which then only updates the version of the module – but in all environments. Admins typically refer to these as tagged versions, branches, or commits.

The integration of r10k with a Git and the necessary hooks are not part of the open source variant of Puppet but can be easily replicated by experienced admins.

Hiera: Separating Code and Data

Hiera also is usually maintained directly in the control repo. This hierarchical key-value store queries its dataset according to search parameters. The idea is to determine different values for $package_name, $config_file, and $service_name on different platforms (e.g., for the code in Listing 2), depending on what the agent submitted and using facts as the operating system.

To do this, first store a hierarchy in the hiera.yaml file in the control repository (Listing 5). In the simplest case, the data is also in YAML format in the control repository – in ./data/. Puppet's automatic parameter lookup feature automatically performs a Hiera lookup for values for its parameters each time a class is declared. The names of the keys in Hiera must match the namespace of the respective class (Listing 6).

Listing 5: Defining a Hiera Hierarchy

version: 5
defaults    :
   datadir  : data
   data_hash: yaml_data
hierarchy:
   - name: 'Node specific'
     path: 'nodes/%{trusted.certname}.yaml'
   - name: 'Operating System Family'
     path: '%{facts.os.family}.yaml'
   - name: 'common'
     path: 'common.yaml'

Listing 6: Hiera Files for Red Hat and Debian

$ cat ./data/RedHat.yaml
apache::package_name: httpd
apache::config_file: /etc/httpd/httpd.conf
apache::service_name: httpd
$ ./data/Debian.yaml
apache::package_name: apache2
apache::config_file: /etc/apache2/apache2.conf
apache::service_name: apache2

Because a value assignment can also be a declaration or a set default, the question arises as to the order of evaluation. The default is considered the weakest link in the chain only if neither an explicit declaration was made nor a Hiera lookup returns a value. The assignment in a class declaration is strongest, leaving the place in the middle for the automatic parameter lookup.

The lookup parameter allows a class to be declared with include or contain, as long as the lookup provides a value for the parameters (e.g., with include apache). Although it can override defaults, it does not necessarily have to. Hiera can also be run within a module, but it only stores key-values for the namespace of the module itself. All values there can be overwritten again in the Hiera environment of the control repository if necessary.

More Dependencies

Some dependencies between resources are not limited to just one host. A web cluster with an upstream load balancer can only accept a new cluster node once it has been configured. Puppet supports exported resources for such a case. Cluster membership is declared as such an exported resource for the new cluster node, but instead of applying it there, the master stores it in PuppetDB.

During the next Puppet run on the load balancer, all cluster members, including the new one, are entered into the configuration there and the cluster then has a new, functional member. In the manifest for the load balancer, this is done by a collector that queries PuppetDB on the server for appropriate resources and passes them to the agent in the catalog. The otherwise optional PuppetDB is therefore a mandatory requirement for exported resources.

Roles and Profiles

Dependencies, especially those limited to one host, quickly led to problems in the early days of Puppet. The desire to maintain an overview and not get tangled up in dependencies led to the concept of roles and profiles.

Modules are divided into three levels. The lowest are the component modules, which generically take care of configuring software. They can be obtained to a large extent from Puppet Forge. Component modules only take care of specific software – the Apache module should only take care of the web server and not handle log rotation or firewall configuration, which are done by separate component modules.

At the second level, the implementation layer, individual classes of a module bring together the required component modules. You have to create the corresponding profile module yourself, in which you merge the configuration for log rotation and the firewall configuration for your own web server, for example.

Classes on both levels can use parameters and be configured by Hiera. In the role, which is the top layer, you can only declare profile classes following the strategy. For example, the CMS summarizes the profiles for web server, application and database as a role. Such a class of the role module is also referred to as a business class or role.

This abstraction drastically reduces dependencies between resources that need to be defined, and reported dependency cycles occur far less frequently.

What Puppet Can't Do

Puppet is designed for configuration management, not for managing or updating software. Resources of type package can be fixed to a version of the package, which could change in the course of the time, although it should be avoided if possible. Besides the inevitable problems with package managers, difficulties with downgrades occur with version jumps.

If something like this is allowed to happen, you might find Puppet attempting a downgrade during the next general system update. The clear recommendation is to establish some kind of software management like Spacewalk or Satellite or Foreman/Katello to provide different software states in different versions of the software repositories.

Because Puppet uses an asynchronous approach, it is also not suitable as an orchestration tool (e.g., in contrast to Ansible) to execute tasks simultaneously. One typical example is updates or reconfigurations on cluster nodes, for which Ansible is better suited. However, if you are already using Puppet, it is worth taking a look at Puppet Bolt [8], Puppet's orchestration tool. Bolt works with tasks that can be written in any language and includes Puppet; then, you benefit from the RAL. In this way, the same code can be used across platforms.

Infrastructure and Scaling

Puppet requires a not inconsiderable infrastructure for professional use. You need the Puppet server including a CA, a controlling Git with r10k, optionally some kind of software management, and a GUI for reporting. Additionally, you implement PuppetDB as an application with a PostgreSQL database if you need exported resources.

I mostly use Foreman for reporting and ENC, extended by the plugin Katello for software management, which also requires a PostgreSQL server. A PuppetDB is therefore a good choice and does not cause much additional work. Moreover, you need GitLab Community Edition as a Git with a GUI for integrated issue management.

A Puppet server can serve around 500 hosts. If more, Puppet scales very well horizontally: Additional Puppet servers that also compile catalogs can be integrated easily.

Conclusions

Puppet is complex, heavyweight, and not easy to learn. However, once mastered, it proves to be flexible and secure. The many modules maintained on Puppet Forge are a massive advantage. They usually also require a training period but leave hardly any wishes unfulfilled.

Puppet protects its own production environment with established processes. However, this also means saying goodbye to the idea that you can quickly write code for it. The code has to be tested conscientiously and transferred to production via staging. Configuration parameters for managed applications do not require such tests and can be easily adapted in Hiera.