Regex process check with Nagios/Adagios

Check Command:

define command {
 command_name check_nrpe_procs_regex
 command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_procs_regex -a $_SERVICE_WARNING$ $_SERVICE_CRITICAL$ $_SERVICE_USER$ $_SERVICE_EREG_ARG_ARRAY$
}

NRPE command:

command[check_procs_regex]=/usr/lib64/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -u $ARG3$ --ereg-argument-array "$ARG4$"

Nagios service:

define service {
 use okc-linux-check_proc
 host_name hostname.domain.com
 __NAME apache2
 __WARNING 1:100
 __CRITICAL 0:200
 service_description Process apache2
 check_command check_nrpe_procs_regex
 __EREG_ARG_ARRAY '/usr/sbin/apache2'
 __USER www-data
}
Advertisements
Regex process check with Nagios/Adagios

Creating custom okconfig templates

For this example I have a host (google.com) with HTTP, HTTPS, DNS, and Ping checks.

google-1

I’ve customized some of the service checks and want to create a template called “Google Server” from this host and it’s services.

To do this I will have to combine the config files for all services into a template. Templates are by default stored in /etc/nagios/okconfig/examples/, and have the file extension .cfg-example.

Locate all config files for this host, this can be done in a few ways, but the easiest is probably with pynag:

[root@adagios ~]# pynag list --quiet filename where host_name=google.com and register=1 | sort | uniq
/etc/nagios/okconfig/hosts/default/google.com-dns.cfg
/etc/nagios/okconfig/hosts/default/google.com-host.cfg
/etc/nagios/okconfig/hosts/default/google.com-http.cfg
/etc/nagios/okconfig/hosts/default/google.com-https.cfg
[root@adagios ~]#

I can do this in the Adagios web interface too, by searching for the host, selecting all services and clicking the bulk edit button (I won’t actually be editing anything, bulk edit will just show me all file names).
google-2

Next I want to combine all services into a template file: /etc/nagios/okconfig/examples/google-server.cfg-example

[root@adagios ~]# cat /etc/nagios/okconfig/hosts/default/google.com-dns.cfg > /etc/nagios/okconfig/examples/google-server.cfg-example
[root@adagios ~]# cat /etc/nagios/okconfig/hosts/default/google.com-http.cfg >> /etc/nagios/okconfig/examples/google-server.cfg-example
[root@adagios ~]# cat /etc/nagios/okconfig/hosts/default/google.com-https.cfg >> /etc/nagios/okconfig/examples/google-server.cfg-example

NOTE: I didn’t include the host config because I haven’t defined any custom services in it. If there are any services in the host config you would like in the template (chances are there will be), add them to the template file yourself. To see what services are defined in the host config, use: pynag list where object_type=service and register=1 and filename=<location of host config>

[root@adagios ~]# pynag list where object_type=service and register=1 and filename=/etc/nagios/okconfig/hosts/default/google.com-host.cfg
object_type          shortname            filename
--------------------------------------------------------------------------------
service              google.com/Ping      /etc/nagios/okconfig/hosts/default/google.com-host.cfg
service              google.com/test      /etc/nagios/okconfig/hosts/default/google.com-host.cfg
----------2 objects matches search condition------------------------------------
[root@adagios ~]#

As seen above, I added a “test” service to the host through the Adagios web interface, and it was saved in the host config file. The define service {…} part is what you would add to the template config.

[root@adagios ~]# cat /etc/nagios/okconfig/hosts/default/google.com-host.cfg
...
define service {
         service_description            test
         use                            generic-service
         host_name                      google.com
        check_command                 okc-check_dummy
        __EXIT_CODE                   0
        __MESSAGE                     Cool!
}

To prepare the template so it can be applied to other hosts, replace the host name and group name with HOSTNAME and GROUP. Okconfig will substitute HOSTNAME and GROUP with the host (and if any, group) name you specify when adding the template. In my case the hostname was google.com and group was default.

sed -i 's/google.com/HOSTNAME/g;s/default/GROUP/g' /etc/nagios/okconfig/examples/google-server.cfg-example

Example how a service defenition should change:

Before:

define service {
        host_name               google.com
        contact_groups          default
        service_description     HTTPS google.com
        check_command           okc-check_https
        use                     okc-check_https
        __URI                   /
        __SEARCH_STRING
        __RESPONSE_WARNING      2
        __RESPONSE_CRITICAL     10
        __VIRTUAL_HOST          google.com
        __PORT                  443
}

After:

define service {
        host_name               HOSTNAME
        contact_groups          GROUP
        service_description     HTTPS HOSTNAME
        check_command           okc-check_https
        use                     okc-check_https
        __URI                   /
        __SEARCH_STRING
        __RESPONSE_WARNING      2
        __RESPONSE_CRITICAL     10
        __VIRTUAL_HOST          HOSTNAME
        __PORT                  443

}

Next, add the host and select the newly created Google Server template:

google-3

google-4

I deleted the google.com host, removed all config files, and added it again with only the template I created:

google-5

All this could probably be done in one (or a few) pynag copy commands, but I haven’t tested that.

Creating custom okconfig templates

Postgresql 9.2 monitoring with Adagios on CentOS 7

On the PostgreSQL server:

Note: You may need to deal with SELinux.

Install some needed perl modules, download the check script and make it executable:

yum install perl-Data-Dumper perl-Digest-MD5 perl-Getopt-Long perl-File-Temp perl-Time-HiRes perl-TimeDate
cd /usr/lib64/nagios/plugins
wget https://raw.githubusercontent.com/bucardo/check_postgres/master/check_postgres.pl
chmod +x check_postgres.pl

Add the following to /usr/lib64/nagios/plugins/check_postgres_stats.sh:

#!/bin/bash
DB="$1"
STATS=$(/usr/lib64/nagios/plugins/check_postgres.pl --datadir /var/lib/pgsql/data/ -db "$DB" --action dbstats | sed 's/:/=/g')
echo "OK: Postgres stats collected | $STATS"

Add the following to /etc/nrpe.d/check_postgres.cfg:

command[check_postgres]=/usr/bin/sudo -u postgres /usr/lib64/nagios/plugins/check_postgres.pl --datadir /var/lib/pgsql/data/ -db '$ARG1$' --action '$ARG2$'
command[check_postgres_w]=/usr/bin/sudo -u postgres "/usr/lib64/nagios/plugins/check_postgres.pl" --datadir /var/lib/pgsql/data/ -db '$ARG1$' --action '$ARG2$' --warning '$ARG3$'
command[check_postgres_wc]=/usr/bin/sudo -u postgres "/usr/lib64/nagios/plugins/check_postgres.pl" --datadir /var/lib/pgsql/data/ -db '$ARG1$' --action '$ARG2$' --warning '$ARG3$' --critical '$ARG4$'
command[check_postgres_stats]=/usr/bin/sudo -u postgres /usr/lib64/nagios/plugins/check_postgres_stats.sh '$ARG1$'

Add the following to /etc/sudoers.d/nrpe using visudo:

visudo -f /etc/sudoers.d/nrpe
Defaults:nrpe !requiretty
nrpe ALL=(postgres) NOPASSWD: /usr/lib64/nagios/plugins/check_postgres.pl
nrpe ALL=(postgres) NOPASSWD: /usr/lib64/nagios/plugins/check_postgres_stats.sh
 On the Nagios server:

Create the check commands:

pynag add command command_name="2ks-check_nrpe_postgres" command_line='$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_postgres -a '$_SERVICE_DATABASE$' '$_SERVICE_ACTION$''
pynag add command command_name="2ks-check_nrpe_postgres_w" command_line='$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_postgres_w -a '$_SERVICE_DATABASE$' '$_SERVICE_ACTION$' '$_SERVICE_WARNING$''
pynag add command command_name="2ks-check_nrpe_postgres_wc" command_line='$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_postgres_wc -a '$_SERVICE_DATABASE$' '$_SERVICE_ACTION$' '$_SERVICE_WARNING$' '$_SERVICE_CRITICAL$''
pynag add command command_name="2ks-check_nrpe_postgres_stats" command_line='$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_postgres_stats -a '$_SERVICE_DATABASE$''

Create the okconfig template /etc/nagios/okconfig/examples/postgres.cfg-example:

define service {
    use                            okc-linux-check_proc
    __WARNING                      1:
    __NAME                         postgres
    host_name                      HOSTNAME
    service_description            Process postgres
    __CRITICAL                     :20
    check_command                 okc-check_nrpe!check_procs -a $_SERVICE_WARNING$ $_SERVICE_CRITICAL$ $_SERVICE_NAME$
}

define service {
        service_description           PostgreSQL Database connection
         use                            generic-service
         host_name                      HOSTNAME
        check_command                 2ks-check_nrpe_postgres
        __DATABASE                    database_1
        __ACTION                      connection
        notes                         Simply connects and returns version number.
}

define service {
    use                            generic-service
    __DATABASE                     database_1
    check_command                 2ks-check_nrpe_postgres_stats
    host_name                      HOSTNAME
        service_description           PostgreSQL Database statistics
        notes                         Reports information from the pg_stat_database view, and outputs as performance data.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      bloat
         host_name                      HOSTNAME
        service_description           PostgreSQL Database bloat
        __CRITICAL                    50%
        __WARNING                     25%
        notes                         Checks the amount of bloat in tables and indexes. Bloat is generally the amount of dead unused space taken up in a table or index. This space is usually reclaimed by use of the VACUUM command.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      locks
         host_name                      HOSTNAME
        service_description           PostgreSQL Database locks
        __CRITICAL                    300
        __WARNING                     150
        notes                         Check the total number of locks on one or more databases.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      timesync
         host_name                      HOSTNAME
        service_description           PostgreSQL Database timesync
        __CRITICAL                    5
        __WARNING                     2
        notes                         Compares the local system time with the time reported by one or more databases.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      last_vacuum
         host_name                      HOSTNAME
        service_description           PostgreSQL Database last vacuum
        __CRITICAL                    7d
        __WARNING                     3d
        notes                         Checks how long it has been since vacuum (or analyze) was last run on each table in one or more databases.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      backends
         host_name                      HOSTNAME
        service_description           PostgreSQL Database backends
        __CRITICAL                    95
        __WARNING                     80
        notes                         Checks the current number of connections for one or more databases.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      hitratio
         host_name                      HOSTNAME
        service_description           PostgreSQL Database hitratio
        __CRITICAL                    80%
        __WARNING                     90%
        notes                         Checks the hit ratio of all databases and complains when they are too low.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      query_time
         host_name                      HOSTNAME
        service_description           PostgreSQL Database query time
        __CRITICAL                    10
        __WARNING                     5
        notes                         Checks the length of running queries on one or more databases.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      txn_idle
         host_name                      HOSTNAME
         service_description            PostgreSQL Database connections idle in transaction
        __CRITICAL                    5 for 10 seconds
        __WARNING                     2 for 5 seconds
        notes                         Checks the number and duration of "idle in transaction" queries on one or more databases.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_w
        __ACTION                      disabled_triggers
         host_name                      HOSTNAME
         service_description            PostgreSQL Database disabled triggers
        __WARNING                     1
        notes                         Checks on the number of disabled triggers inside the database. In normal usage having disabled triggers is a dangerous event.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_wc
        __ACTION                      checkpoint
         host_name                      HOSTNAME
        service_description           PostgreSQL Database last checkpoint
        __CRITICAL                    600
        __WARNING                     400
        notes                         Determines how long since the last checkpoint has been run.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        check_command                 2ks-check_nrpe_postgres_w
        __ACTION                      settings_checksum
         host_name                      HOSTNAME
         service_description            PostgreSQL Database settings checksum
        __WARNING                     c6358648f0d06757a8311709be307f24
        notes                         Checks that all the Postgres settings are the same as last time you checked.
}

define service {
         use                            generic-service
         __DATABASE                     database_1
        __WARNING                     15GB
         check_command                  2ks-check_nrpe_postgres_wc
        __ACTION                      database_size
         host_name                      HOSTNAME
         service_description            PostgreSQL Database size
        __CRITICAL                    30GB
        notes                         Checks the size of all databases and complains when they are too big.
}

Add the template to a host:

okconfig addtemplate db-01.domain.com --template postgres

The values provided in the above configuration are examples. You should change them according to your needs.
adagios_postgres_status
Source: https://bucardo.org/check_postgres/check_postgres.pl.html

Postgresql 9.2 monitoring with Adagios on CentOS 7