Although there’s a ton of resources on the internet about what Puppet is and best practices with Puppet, I’ve noticed that there aren’t many which just detail what you need to do to bootstrap a barebones Puppet setup.

What is Puppet

A quick intro in case you need it:

Puppet is a configuration management tool. You write out what what the server should look like/be configured as, and Puppet applies those configurations for you. More info can be found on Puppet’s own site.

Agent vs Agentless

An important distinction to make early on is the decision of whether you’re going to use a Puppet master to compile & distribute your catalogs, or if you’re going to compile each catalog locally. Each approach comes with its benefits but also some downsides. For example, choosing to use a Puppet master allows you to use exported resources & mcollective to maintain a central source of truth of what exactly is in your fleet, but at the same time brings a centralised source of failure (puppet master goes down = no puppet updates for you).

For now, let’s assume that each node is being administered through a local Puppet apply.

Getting started

Profiles and Roles

Before we go over anything more, a design pattern that I’ll be referring to quite often is the Profile and Roles design pattern. Essentially, it’s a method of classifying nodes by assigning them each one role which describes their task/purpose. These roles are then broken up into profiles, which represent a logical system/tech stack that would belong on that node.

An example that’s thrown around a lot is Jenkins.

The role would define the node as a Jenkins Master/Slave (role::jenkins_master)
The profiles attached could include profile::base, profile::jenkins, profile::kubernetes_worker, profile::nginx_frontrunner
These profiles would then call component classes which can then manage Jenkins, Kubernetes and nginx in their own respective manner.

More info is again available on Puppet’s own site.

Directory structure

Imagine that we have a basic Puppet setup like so:

puppet/
├── Puppetfile
├── environment.conf
├── forge
├── hiera
│   ├── common.yaml
│   ├── nodes
│   ├── os
│   ├── region
│   └── roles
├── hiera.yaml
├── manifests
│   └── site.pp
└── modules
    └── base
        ├── facts.d
        ├── lib
        │   └── facter
        ├── manifests
        │   ├── init.pp
        │   ├── install.pp
        │   └── service.pp
        └── templates

14 directories, 8 files

Let’s go over what each of these folders and files do!

Puppetfile

The Puppetfile contains a list of modules you want to pull in externally. These can come from the Puppet forge and by default do, but you can specify git/hg/svn sources if you wish. A sample of what you might see in a Puppetfile:

moduledir 'forge' # Folder to clone the below modules to

mod 'puppetlabs-stdlib', '5.0.0' # Pulls in a module from Puppet Forge

mod 'my-custom-module', # Pulls in a module from a git source @ tag 0.0.1
    :git => 'https://github.com/username/repo',
    :tag => '0.0.1'

The file is pretty straightforward and shouldn’t contain anything to trip you up. If you need some documentation the Puppetfile documentation should point you in the right direction.

environment.conf

This configuration file dictates to Puppet what to look for, and where in this particular environment to look (note: we’ll cover what an environment is later). For now, all that we care about is the modulepath. The modulepath is similar to the POSIX PATH variable as it dictates the directories where Puppet should look for classes. An example environment.conf:

# Each environment can have an environment.conf file. Its settings will only
# affect its own environment. See docs for more info:
# https://docs.puppetlabs.com/puppet/latest/reference/config_file_environment.html

# Any unspecified settings use default values; some of those defaults are based
# on puppet.conf settings.

# If these settings include relative file paths, they'll be resolved relative to
# this environment's directory.

modulepath = ./modules:./forge:$basemodulepath
# manifest = (default_manifest from puppet.conf, which defaults to ./manifests)
# config_version = (no script; Puppet will use the time the catalog was compiled)
# environment_timeout = (environment_timeout from puppet.conf, which defaults to 0)
    # Note: unless you have a specific reason, we recommend only setting
    # environment_timeout in puppet.conf.

The environment.conf above is very bareboned but is all you need for this setup. It’ll tell Puppet to first look in ./modules, then ./forge, and finally the default path set within Puppet when trying to resolve classes.

forge/

This directory is where externally downloaded modules go (i.e. modules listed within your Puppetfile).

hiera/

This is where your hiera .yaml data files live. A common design pattern is shown above, where you have structured data which are either globally applied or applied to nodes belong to a OS, role, etc.

hiera.yaml

Contains the definition of how Puppet should look data up. A sample configuration:

---
version: 5
defaults:
  datadir: hiera
  data_hash: yaml_data

hierarchy:
  - name: "Per-node data"
    path: "nodes/%{trusted.certname}.yaml"
  - name: "Per-role data"
    path: "roles/%{facts.role}.yaml"
  - name: "Per-OS defaults"
    path: "os/%{facts.os.family}.yaml"
  - name: "Per-region defaults"
    path: "region/%{facts.region}.yaml"
  - name: "Common data"
    path: "common.yaml"

The above config instructs Puppet to attempt to resolve hiera queries by traversing the following files in order, with values in files higher in the search order taking precedence.

Node specific hiera configuration
Role specific hiera configuration
OS specific hiera configuration
Region specific hiera configuration
- This may not be relevant to most configurations, only makes sense if you can segregate your infrastructure into logical regions (i.e. countries, backbones).
Global hiera data

manifests/

Contains the main manifests you’ll use to call Puppet. For this overview, we’ll have a single file: site.pp. It’ll serve as our entrypoint and call the relevant role class. An example site.pp file:

# site.pp
# =========
#
# Entrypoint into Puppet
#   - Determines which role the current node is and
#     from this configures it appropriately

include ::base

# Include the relevant role (each node must have a node), otherwise warn about something being wrong
if has_key($facts, 'role') {
  class { "::role::${::role}": }
} else {
  notify {"${::facts['fqdn']} has no role": }
}

modules/

The main meat of Puppet. As the name suggests, this is where you put your modules. I’ll go into modules in more detail in the next section, including how modules can/should be structured.

Modules

Let’s have a closer look at the base module from before:

base/
├── facts.d
├── lib
│   └── facter
├── manifests
│   ├── init.pp
│   ├── install.pp
│   └── service.pp
├── files
└── templates

5 directories, 3 files

Facts

The notable directories for facts are facts.d and lib/facter. They exist as mechanisms that allow you add custom facts to your Puppet installation.

facts.d is the preferred method of adding facts, and allow you to create facts with pretty much any interpreter as long as the shebang is valid. These facts are known as external facts.

lib/facter is used to store traditional custom facts written in ruby. As far as I can see, unless your facts are written in ruby and require some functionality only available to these types of facts, there is no real benefit to use these types of facts. These facts are known as custom facts (shocking, I know).

As always, more information is available on Puppet’s official documentation.

External fact example

#!/usr/bin/env bash

REGION="unknown"

if [[ "${HOSTNAME}" =~ "*.apac.mycompany.tld" ]]; then
	REGION="au"
elif [[ "${HOSTNAME}" =~ "*.emea.mycompany.tld" ]]; then
	REGION="eu"
fi

echo "region=${REGION}"

NB: Ensure that the external facts that you want to run are marked executable, and that the mount that they’re placed on is not marked noexec. This usually is the /var partition, as Puppet stores its <vardir> in /var/puppet/, and pluginsync facts are placed in <vardir>/facts.d

Custom fact example

Facter.add("region") do
  setcode do
    fqdn = Facter.value(:fqdn)
    if fqdn.end_with?('.apac.mycompany.tld') then
      'au'
    else if fqdn.end_with?('.emea.mycompany.tld') then
      'eu'
    else
      'unknown'
    end
  end
end

Manifests

This is where the fun (or suffering) happens! Within each module, there are usually at least these four manifests:

init.pp
service.pp
config.pp
install.pp

init.pp

This manifest is responsible for orchestrating the module, and serves as an entry point into it. It describes how the module should behave at a higher level view and defines what parameters this module should take.

# base::init
# ==========
#
# This class defines what each server should have installed, hereby called the 'base' of the server.
#
class base (
  # This class takes no parameters.
) {
  contain ::base::config
  contain ::base::install
  contain ::base::service
  
  # Ensure that install happens first, then config, then service
  # Also ensure that if config changes, the service is notified
  Class[Base::Install]
  -> Class[Base::Config]
  ~> Class[Base::Service]
}

NB: If you look at puppetlabs modules, you’ll see that instead of defining parameters in init.pp, they actually define them in a separate file called params.pp. I choose not to go into this pattern just yet.

install.pp

This manifest is responsible for installing whatever software is relevant to the module. For the sake of demonstration, let’s say each system should come with vim installed by default, since it’s the only real text editor.

# base::install
# =============
#
class base::install (
  # This class takes no parameters.
) {
  package { 'vim-enhanced':
    ensure => 'latest',
  }
}

config.pp

This manifest is responsible for installing the configuration relevant to the software in the module. Continuing with the vim example, let’s populate the systemwide vim configuration.

# base::config
# ============
#
class base::config (
  # This class takes no parameters.
) {
  file { '/etc/vimrc':
    ensure => 'file',
    owner  => 'root',
    group  => 'root',
    mode   => '0644',
    source => 'puppet:///modules/base/vimrc',
  }
}

The code above pretty much just copies across the vimrc file contained within the module’s file directory to /etc/vimrc.

service.pp

This service is responsible for ensuring any relevant services are started/stopped. Since our example of vim doesn’t really fit into here, I’m not quite sure what to use as an example.

Files and Templates

This part is pretty straightforward actually, for files at least. Any files you drop into a modules files directory will be accessible to you from a file resource, as shown above.

With templates, it’s a little trickier. Templates are sourced from the templates directory, and can be written in either erb or epp. epp is the standard Puppet DSL to construct templated files, and erb is just traditional Ruby templating.

We’ll cover templates in Part II, since this is getting a little lengthy.

Conclusion

With the contrived examples above, hopefully you have a better understanding of Puppet and what you need to actually create a barebones setup. The base module as it is right now isn’t too useful (unless you want vim on all your systems), but we’ll go over a more useful example in the next part.

Bootstrapping your Puppet: Part I

21 November 2018