Varnish 4 monitoring with Adagios on CentOS 7

On the Varnish server:

Install prerequisites:

yum install git automake libtool varnish-libs-devel

Clone the varnish-nagios repo, autogen, configure, and make:

git clone
cd varnish-nagios

Move the check_varnish binary to /usr/lib64/nagios/plugins/ and restore SELinux context:

mv check_varnish /usr/lib64/nagios/plugins/
restorecon /usr/lib64/nagios/plugins/check_varnish

Create the nrpe command and restart nrpe:

echo 'command[check_varnish]=/usr/lib64/nagios/plugins/check_varnish -p "$ARG1$" -w "$ARG2$" -c "$ARG3$"' > /etc/nrpe.d/check_varnish.cfg
systemctl restart nrpe.service

To see if the check works, run:

/usr/lib64/nagios/plugins/check_varnish -p MAIN.sess_dropped -w 0 -c 5
/usr/lib64/nagios/plugins/check_varnish -p MGT.child_panic -w 0 -c 2
/usr/lib64/nagios/plugins/check_varnish -p SMA.Transient.c_fail -c 0
/usr/lib64/nagios/plugins/check_varnish -p ratio -w 20:90 -c 10:98

It should return:

[root@varnish-host ~]# /usr/lib64/nagios/plugins/check_varnish -p MAIN.sess_dropped -w 0 -c 5
VARNISH OK: Sessions dropped for thread (0)|MAIN.sess_dropped=0
[root@varnish-host ~]# /usr/lib64/nagios/plugins/check_varnish -p MGT.child_panic -w 0 -c 2
VARNISH OK: Child process panic (0)|MGT.child_panic=0
[root@varnish-host ~]# /usr/lib64/nagios/plugins/check_varnish -p SMA.Transient.c_fail -c 0
VARNISH OK: Allocator failures (0)|SMA.Transient.c_fail=0
[root@varnish-host ~]# /usr/lib64/nagios/plugins/check_varnish -p ratio -w 20:90 -c 10:98
VARNISH OK: Cache hit ratio (26)|ratio=26
[root@varnish-host ~]#
On the Nagios server:

Create a check command:

pynag add command command_name="2ks-check_nrpe_varnish_status" command_line='$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_varnish -a "$_SERVICE_PARAMETER$" "$_SERVICE_WARNING$" "$_SERVICE_CRITICAL$"'

NOTE: In my case pynag placed the cfg file in /etc/nagios/commands/, but it was not included as a cfg_dir in nagios.cfg. To fix that, run:

pynag config --append cfg_dir=/etc/nagios/commands/

Create an okconfig template:

echo 'define service {
    service_description            Varnish: Sessions dropped
    use                            generic-service
    host_name                      HOSTNAME
    __PARAMETER                   MAIN.sess_dropped
    check_command                 2ks-check_nrpe_varnish_status
    __CRITICAL                    5
    __WARNING                     0
    notes                         This counter will show the number of requests that have to be dropped because no more threads were available to handle them.
define service {
    service_description            Varnish: Child process panic
    use                            generic-service
    host_name                      HOSTNAME
    __PARAMETER                   MGT.child_panic
    check_command                 2ks-check_nrpe_varnish_status
    __CRITICAL                    2
    __WARNING                     0
    notes                         This counter will count the number of times the child has paniced. The master process will restart the child immediately when it happens, and the cache will be flushed.
define service {
    service_description            Varnish: Allocator failures
    use                            generic-service
    host_name                      HOSTNAME
    __PARAMETER                   SMA.Transient.c_fail
    check_command                 2ks-check_nrpe_varnish_status
    __CRITICAL                    0
    __WARNING                     0
    notes                         This counter indicates that the operating system is unable to allocate memory as requested.
define service {
    service_description            Varnish: Cache hit ratio
    use                            generic-service
    host_name                      HOSTNAME
    __PARAMETER                   ratio
    check_command                 2ks-check_nrpe_varnish_status
    __CRITICAL                    10:98
    __WARNING                     20:90
define service {
    use                            okc-linux-check_proc
    __WARNING                      1:
    __NAME                         varnishd
    host_name                      HOSTNAME
    service_description            Process varnishd
    __CRITICAL                     :10
    check_command                 okc-check_nrpe!check_procs -a $_SERVICE_WARNING$ $_SERVICE_CRITICAL$ $_SERVICE_NAME$

}' > /etc/nagios/okconfig/examples/varnish.cfg-example

Add the template to a host:

okconfig addtemplate --template varnish

Reload nagios and run the service checks from the Adagios web interface, and they should be green:

Varnish 4 monitoring with Adagios on CentOS 7