Install PuppetDB on CentOS 7

Post Views: 5,204

What PuppetDB can do in your architecture is unbelievable… Yeah!! I started learning the Puppet since few months ago and it’s quiet interesting. And the interesting part is the PuppetDB. It’s a puppet module to store agent nodes details. That we will cover later. In this article, I would like convey all the details about PuppetDB working. From starting the installation to use cases with different scenarios. Should be interesting for you guys. Here we go!!

In this blog, I will discuss / explain the following things:

What is Puppet? Why we need to use Puppet?
How Puppet working?
What are the other configuration management tools available in industry?
Advantages of Puppet instead of other available tools.
What is PuppetDB?
Why PuppetDB?
Install and configure PuppetDB on CentOS 7 Puppet master server.
Use cases.

Jump directly to installation steps –> Install and configure PuppetDB on CentOS 7 Puppet master server.

1. What is Puppet? Why we need to use Puppet?

Puppet is a great configuration management tool available in the industry now a days. Hope you have the idea about configuration management. Simply – it’s a centralised area to manage your configurations.. In architectural point of view – you will have a Puppet master server and agents running on every servers which you want to manage configurations using Puppet. The agent will continuously communicate with the Master server and check for any changes in the files associated with the agent nodes. If it find any changes in Master server, it will apply to the agent nodes.

Here your infrastructure is considering as a code – IAAS [Infrastructure As A Code]

Puppet is a Ruby based application. It uses DSL [Domain Specific Language]. Using puppetized infrastructure increase your productivity. Saving time. Deploying changes accurately without delay. Consider a scenario, you have 100 + LAMP servers which uses same configuration for Apache running on those servers. Now you need to implement some hardening on your Apache on the account of some security patches. What will you do, in this case? You need to SSH into all LAMP servers and need to edit the Apache configuration file and need to reload/restart daemon. Dude – you will be screwed!!

If you have a centralised configuration management tool, like Puppet for managing the configuration – just update the changes in the configuration in Puppet master and go for a beer!! The agents on your node servers will fetch the updated configuration and apply it.. Jumping to next section. Check more details from https://puppet.com/

The main advantage of Puppet is it scales very well. You can add one agent node to 200k plus agent nodes to your puppet master server.

2. How Puppet works?

We need more pages to explain the entire working details of Puppet in details. Here I am adding the basic working principle. Will add more details later.. See the image added below:

This is basically a working diagram of Puppet tool. It’s a base level diagram which we can simply explain the concept of Puppet working. Once again it’s the base diagram. Illustration of how an agent node is communicating with the Puppet master node and getting information. Based on that information puppet agent daemon will apply changes to agent node.

As we know all agent nodes are connecting / communicating to the master node in a regular interval. By default it’s 30 minutes. First the agent node will send its facts to master node. That’s the first steps in this communication. Facts information means the information about the agent node machine. For example, hostname, IP, architecture, network details etc..

Based on this facts details, Puppet master identify the metadata for that particular node and compile the catalog based on that information. This is a wide topic. Need to be discussed in a different thread. Once the catalog is complied, master will send that catalog to agent node.

Then puppet agent daemon on agent node will apply that catalog into agent node. Then it will create a report and sent that back to the Master server. This is how I can explain the working in a base level.

3. What are the other configuration management tools available in industry?

We have a separate thread for this discussion. Adding that thread here.. Introduction to IAAC [Infrastructure As A Code] tools

Just go through the above lists for more details..

4. Advantages of Puppet instead of other available tools.

Open source.
High performance.
Boosts productivity.
Cross platform.
Puppet uses it’s own language. It’s clean and simple.
Puppet has an active community.

5. What is PuppetDB?

Hey PuppetDB.. This is the actual topic which we are focusing on this article.. PuppetDB is a great feature offered by Puppetlabs. It’s not enabled/installed by default. We need to setup this on Puppet master server. By considering the use cases, it’s an awesome thing by Puppetlabs. It’s a data warehouse of Puppet.

PuppetDB is the fast, scalable, and reliable data warehouse for Puppet. It caches data generated by Puppet, and gives you advanced features at awesome speed with a powerful API. Using API calls, we can fetch all node facts simply. It’s a great tool for managing / understanding your infrastructure. PuppetDB collects data generated by Puppet. It enables advanced Puppet features like exported resources, and can be the foundation for other applications that use Puppet’s data. Yes, we can fetch the data using Puppet DB and can be used for other applications. We will discuss this with more details on upcoming articles.

What data PuppetDB collects?

PuppetDB stores:

1, The most recent facts from every node

[Before requesting a catalog (or compiling one with puppet apply), Puppet will collect system information with Facter. Puppet receives this information as facts, which are pre-set variables you can use anywhere in your manifests. Puppet can access the following facts:

Facter’s built-in core facts [https://puppet.com/docs/facter/3.12/core_facts.html]
Any custom facts or external facts present in your modules

You can see the list of core facts to get acquainted with what’s available. You can also run “facter -p” at the command line to see real-life values, or browse facts on node detail pages in the Puppet Enterprise console.]

2, The most recent catalog for every node

[Puppet manifests can use conditional logic to describe many nodes’ configurations at once. Before configuring a node, Puppet compiles manifests into a catalog, which is only valid for a single node and which contains no ambiguous logic. Catalogs are static documents which contain resources and relationships. At various stages of a Puppet run, a catalog will be in memory as a Ruby object, transmitted as JSON, and persisted to disk as YAML.

In the standard agent/master architecture, nodes request catalogs from a Puppet master server, which compiles and serves them to nodes as needed. When running Puppet standalone with Puppet apply, catalogs are compiled locally and applied immediately. Agent nodes cache their most recent catalog. If they request a catalog and the master fails to compile one, they will re-use their cached catalog.]

3, Optionally, 14 days (configurable) of event reports for every node

6. Why PuppetDB?

The above section explains it well. Using different API calls we can fetch agent nodes facts. We can use the Puppet facter module effectively.

7. Install and configure PuppetDB on CentOS 7 Puppet master server.

As I mentioned earlier, it won’t be installed by default. We need to install and configure PuppetDB on Puppet master server. The installation and configuration is very simply. We need to complete the following things:

Install Postgress database server. If you have Postgress on your server you can use that.
Create a database and database user for PuppetDB.
Configure your Puppet master to save data to PuppetDB.
Install a plugin (puppetdb-termini). It’s a Ruby application which help to save data to PuppetDB.

Hardware requirements

Minimum 16 GB RAM is recommended.
nix server with JDK 1.7+ (Debian) or JDK 1.8+ (RHEL-derived
OS: Red Hat Enterprise Linux 6.6+ and 7 (and any derived distro that includes Java 1.8)
Debian 7 (Wheezy) and 8 (Jessie)
Ubuntu 12.04 LTS, 14.04 LTS

Step 1: Setting up Postgresql server.

Postgresql higher than 9 is required. In this article we are installing Postgresql 11 version.

rpm -Uvh https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm
yum install postgresql11-server postgresql11-contrib

Initialize postgresql

/usr/pgsql-11/bin/postgresql-11-setup initdb

Start PostrgeSQL service

systemctl enable postgresql-11.service
systemctl start postgresql-11.service

Create puppetdb user and database

sudo -iu postgres
createuser -DRSP puppetdb
createdb -E UTF8 -O puppetdb puppetdb

As we are running PostgreSQL, we should install the RegExp-optimized index extension pg_trgm. This may require installing the postgresql-contrib (or equivalent) package, depending on your distribution:

psql puppetdb -c 'create extension pg_trgm'
exit

Enabling access

For this we need to modify the pg_hba.conf file to allow for MD5 authentication from at least localhost. To locate the file you can either issue a locate pg_hba.conf. In our case this configuration file is located here “/var/lib/pgsql/11/data/pg_hba.conf”

Open this configuration file using your favourite file editor and make changes as follows:

# TYPE  DATABASE   USER   CIDR-ADDRESS  METHOD
local   all        all                  md5
host    all        all    127.0.0.1/32  md5
host    all        all    ::1/128       md5

Restart Postgresql. That’s about the Postgresql thing. Now you have one Postgres server running on the Puppet Master server [You can also setuo a remote Postgress server] and a puppetdb database and user.

Installing PuppetDB

This can be done by different ways. Using package manage or using the Puppet module. Here I am following the second method. By using puppet itself we can install PuppetDB.

sudo puppet resource package puppetdb ensure=latest

Before starting the PuppetDB, you need make some changes in PuppetDB configuration file to use the Postgresql database. You can find all the configuration files under “/etc/puppetlabs/puppetdb/conf.d/”

To configure PuppetDB to use this database, put the following in the [database] section:

classname = org.postgresql.Driver
subprotocol = postgresql
subname = //localhost:5432/puppetdb
username = puppetdb
password = password

Example

Uncomment host section in “/etc/puppetlabs/puppetdb/conf.d/jetty.ini” configuration file.

Edit /etc/sysconfig/puppetdb and re-map memory needed for puppetdb, if needed

JAVA_ARGS="-Xmx192m

That’s it about configurations. Now you can start the PuppetDB service. You can use the following command to start PuppetDB:

sudo puppet resource service puppetdb ensure=running enable=true

The above command will start the PuppetDB service and make sure that it will be enabled on server startup.

Configuring Puppet server

Now we have PuppetDB on the Puppet master server and we have Postgresql database for storing collected data. Now we need to configure Puppet to write data to PuppetDB. Currently, Puppet masters need additional Ruby plug-ins in order to use PuppetDB.

sudo puppet resource package puppetdb-termini ensure=latest

make sure puppetb DNS name is resolveable (/etc/hosts). In our case we can use localhost also.

Edit /etc/puppetlabs/puppet/puppet.conf, add following lines

storeconfigs = true
storeconfigs_backend = puppetdb

Create /etc/puppetlabs/puppet/puppetdb.conf

server_urls = https://puppetdb.example.com:8081/

Create /etc/puppetlabs/puppet/routes.yaml

---
master:
 facts:
 terminus: puppetdb
 cache: yaml

While copy pasting, please make sure that the above configuration's indentation are proper. Please validate it; http://beautifytools.com/yaml-validator.php

All set. Restart Puppet server

systemctl restart puppetserver

In few minutes, your node details should be recorded on PuppetDB database on your Postgresql server.

How to verify that?

That’s simple – log into Postgres and search.

psql -h localhost puppetdb puppetdb
puppetdb=>\x
puppetdb=>select * from catalogs;

You can see all the tables using the command \dt

8. Use cases

You can fetch all nodes facts using simple API calls. This facts can be used for your other applications input. I will create a separate article soon with these API calls. Here I am listing two examples:

Listing all agent nodes:

API call: pdb/query/v4/nodes

# curl http://ip-172-31-7-7.ap-south-1.compute.internal:8080/pdb/query/v4/nodes | jq '.'

 [
   {
     "deactivated": null,
     "latest_report_hash": null,
     "facts_environment": "production",
     "cached_catalog_status": null,
     "report_environment": null,
     "latest_report_corrective_change": null,
     "catalog_environment": "production",
     "facts_timestamp": "2019-01-23T13:34:49.806Z",
     "latest_report_noop": null,
     "expired": null,
     "latest_report_noop_pending": null,
     "report_timestamp": null,
     "certname": "arunlal.crybit.com",
     "catalog_timestamp": "2019-01-23T13:34:49.876Z",
     "latest_report_job_id": null,
     "latest_report_status": null
   },
   {
     "deactivated": null,
     "latest_report_hash": null,
     "facts_environment": "production",
     "cached_catalog_status": null,
     "report_environment": null,
     "latest_report_corrective_change": null,
     "catalog_environment": "production",
     "facts_timestamp": "2019-01-23T13:28:13.076Z",
     "latest_report_noop": null,
     "expired": null,
     "latest_report_noop_pending": null,
     "report_timestamp": null,
     "certname": "ip-172-31-16-237.ap-south-1.compute.internal",
     "catalog_timestamp": "2019-01-23T13:28:13.124Z",
     "latest_report_job_id": null,
     "latest_report_status": null
   }
 ]

Listing facts details of all agent nodes:

API call: /pdb/query/v4/facts/$fact-name

# curl http://ip-172-31-7-7.ap-south-1.compute.internal:8080/pdb/query/v4/facts/kernelversion | jq '.'
 
 [
   {
     "certname": "ip-172-31-16-237.ap-south-1.compute.internal",
     "environment": "production",
     "name": "kernelversion",
     "value": "3.10.0"
   },
   {
     "certname": "arunlal.crybit.com",
     "environment": "production",
     "name": "kernelversion",
     "value": "3.10.0"
   }
 ]

More than 150 default facts available. All facts lists are available here. https://puppet.com/docs/facter/3.12/core_facts.html

That’s about the installation and configuration of PuppetDB 🙂 Dont forget to add your suggestions and questions as comment.

5 thoughts on “Install PuppetDB on CentOS 7”

Jamie says:

September 10, 2019 at 5:40 pm

Nice guide, really helped me out. But I noticed 2 mistakes which need to be fixed for the above configuration to work.

Create /etc/puppetlabs/puppet/routes.yaml (terminus and cache not indented properly)
—
master:
facts:
terminus: puppetdb
cache: yaml

Create /etc/puppetlabs/puppet/puppetdb.conf (needs to have the [main] stanza)
[main]
server_urls = https://puppetdb.example.com:8081/

Also people may need to update their certificate with dns alternative names 🙂

1. Jamie says:
  
  September 10, 2019 at 5:41 pm
  
  Ah comments don’t show the yaml indentation properly! https://puppet.com/docs/puppetdb/5.2/connect_puppet_master.html
  
Ashok Kumar says:

June 20, 2020 at 8:05 pm

I wasted many hours because of wrong Yaml indentation!

1. Arunlal Ashok says:
  
  September 6, 2020 at 3:16 pm
  
  Hey Ashok, sorry for that. I added a YAML validator just below the configuration. It looks some alignment issue.
  
Tony says:

February 24, 2021 at 9:02 pm

Very good guide. It helped me to install our puppeDB service. However It’s missing an important step.

You need to create SSL, otherwise the puppetDB service will fail to start.

I did the following:
puppet agent –test # This command will create the certificates
puppetdb ssl-setup # This will copy puppet certs over to puppetdb ssl folder.

After that I was able to start puppetdb services without getting any ssl error.

Thanks a lot for the guide.