Keeping Naemon or Nagios running at all times with a systemd drop-in unit

Due to a silly bug in Naemon 1.0.4, I looked into ways to make sure it always restarts if it dies or is killed. Turns out it’s rather easy thanks to systemd.

Create a naemon.service.d directory in /etc/systemd/system/

cd /etc/systemd/system/
mkdir naemon.service.d
cd naemon.service.d

Create the file 10-restart.conf with the following contents:

[Service]
RestartSec=10s
Restart=always

Now reload systemd:

systemctl daemon-reload

And make sure the unit is overridden:

[root@manage naemon.service.d]# systemd-delta | grep naemon
[EXTENDED] /usr/lib/systemd/system/naemon.service -> /etc/systemd/system/naemon.service.d/10-restart.conf

Then try killing naemon and watch it restart

killall naemon
watch systemctl status naemon
Advertisements
Keeping Naemon or Nagios running at all times with a systemd drop-in unit

Upgrading from naemon. 1.0.3 to 1.0.4

Important: First of all, back up /etc/naemon/ before updating.

rsync /etc/naemon/ /etc/naemon-bak/ -av
yum update -y

Note: If you’ve already upgraded and don’t have a backup, you can copy the config from nagios (if installed):

Skip this step if you have a backup of /etc/naemon/!

cp /etc/nagios/objects/templates.cfg /etc/naemon/conf.d/templates/
cp /etc/nagios/objects/contacts.cfg /etc/naemon/conf.d/
cp /etc/nagios/objects/timeperiods.cfg /etc/naemon/conf.d/
cp /etc/nagios/objects/commands.cfg /etc/naemon/conf.d/

Verify the naemon config:

naemon -vp /etc/naemon/naemon.cfg
Error in configuration file '/etc/naemon/naemon.cfg' - Line 344 (Warning: Failed to open check_result_path '/var/cache/naemon/checkresults': No such file or directory)
 Error processing main config file!

check_result_path is deprecated and you can safely remove it from the config.

sed -i '/check_result_path=/d' /etc/naemon/naemon.cfg

Verify the config again:

naemon -vp /etc/naemon/naemon.cfg
Reading configuration data...
Warning: enable_environment_macros is deprecated and will be removed.
Warning: use_large_installation_tweaks is deprecated and will be removed. Naemon should always be fast
Warning: daemon_dumps_core is deprecated and will be removed. Use system facilities to control coredump behaviour instead
Warning: max_check_result_file_age is deprecated and will be removed. Support for processing check results from disk will be removed
Warning: max_check_result_reaper_time is deprecated and will be removed. Support for processing check results from disk will be removed
Warning: check_result_reaper_frequency is deprecated and will be removed. Support for processing check results from disk will be removed
Warning: naemon_group is deprecated and will be removed. Naemon is compiled to be run as naemon:naemon
Warning: naemon_user is deprecated and will be removed. Naemon is compiled to be run as naemon:naemon
 Read main config file okay...
Error: Template 'generic-host' specified in host definition could not be found (config file '/usr/share/okconfig/templates/misc/hosts.cfg', starting on line 3)
Error: Template 'generic-service' specified in service definition could not be found (config file '/usr/share/okconfig/templates/linux/services.cfg', starting on line 4)
Error: Template 'generic-service' specified in service definition could not be found (config file '/usr/share/okconfig/templates/http/services.cfg', starting on line 41)
Error: Template 'generic-service' specified in service definition could not be found (config file '/usr/share/okconfig/templates/nagios/services.cfg', starting on line 29)
Error: Template 'generic-service' specified in service definition could not be found (config file '/usr/share/okconfig/templates/nagios/services.cfg', starting on line 20)
Error: Template 'generic-service' specified in service definition could not be found (config file '/usr/share/okconfig/templates/nagios/services.cfg', starting on line 11)
Error: Template 'generic-service' specified in service definition could not be found (config file '/usr/share/okconfig/templates/nagios/services.cfg', starting on line 2)
Error: Template 'generic-service' specified in service definition could not be found (config file '/usr/share/okconfig/templates/misc/services.cfg', starting on line 53)
Error: Template 'generic-service' specified in service definition could not be found (config file '/usr/share/okconfig/templates/wmi/wmi.cfg', starting on line 6)
 Error processing object config files!

Rsync the templates directory from the backup and remove the deprecated attributes:

rsync naemon-bak/conf.d/templates/ naemon/conf.d/templates/ -av
sed -i '/enable_environment_macros/d' /etc/naemon/naemon.cfg
sed -i '/use_large_installation_tweaks/d' /etc/naemon/naemon.cfg
sed -i '/daemon_dumps_core/d' /etc/naemon/naemon.cfg
sed -i '/max_check_result_file_age/d' /etc/naemon/naemon.cfg
sed -i '/max_check_result_reaper_time/d' /etc/naemon/naemon.cfg
sed -i '/check_result_reaper_frequency/d' /etc/naemon/naemon.cfg
sed -i '/naemon_group/d' /etc/naemon/naemon.cfg
sed -i '/naemon_user/d' /etc/naemon/naemon.cfg

Verify the config again:

naemon -vp /etc/naemon/naemon.cfg
Reading configuration data...
 Read main config file okay...
Error: Could not find member group 'admins' specified in contactgroup 'default' (config file '/etc/naemon/okconfig//groups/default.cfg', starting on line 11)
 Error processing object config files!

Rsync the contacts config from the backup:

rsync naemon-bak/conf.d/contacts.cfg naemon/conf.d/ -av

Verify the config again:

naemon -vp /etc/naemon/naemon.cfg
Reading configuration data...
 Read main config file okay...
Error: Service notification period '24x7' specified for contact 'naemonadmin' is not defined anywhere!
Error: Could not register contact (config file '/etc/naemon/conf.d/contacts.cfg', starting on line 24)
 Error processing object config files!

Rsync the timeperiods config from the backup:

rsync naemon-bak/conf.d/timeperiods.cfg naemon/conf.d/ -av

Verify the config again:

naemon -vp /etc/naemon/naemon.cfg
Reading configuration data...
 Read main config file okay...
Error: Host check command 'check-host-alive' specified for host 'monitor-01' is not defined anywhere!
Error: Could not register host (config file '/etc/naemon/okconfig//hosts/default/monitor-01-host.cfg', starting on line 3)
 Error processing object config files!

Rsync the commands config from the backup:

rsync naemon-bak/conf.d/commands.cfg naemon/conf.d/ -av

Verify the config again:

naemon -vp /etc/naemon/naemon.cfg
Reading configuration data...
 Read main config file okay...
 Read object config files okay...

Either make sure that if you have defined broker_module=/usr/lib64/naemon/naemon-livestatus/livestatus.so[…] or the module-conf.d include:

echo 'include_dir=module-conf.d' >> /etc/naemon/naemon.cfg

Next:

Due to a bug in naemon 1.0.4 you need to configure automatic restarting for the service with a drop-in config for the systemd unit. See: Keeping Naemon or Nagios running at all times with a systemd drop-in unit

Upgrading from naemon. 1.0.3 to 1.0.4

Regex process check with Nagios/Adagios

Check Command:

define command {
 command_name check_nrpe_procs_regex
 command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_procs_regex -a $_SERVICE_WARNING$ $_SERVICE_CRITICAL$ $_SERVICE_USER$ $_SERVICE_EREG_ARG_ARRAY$
}

NRPE command:

command[check_procs_regex]=/usr/lib64/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -u $ARG3$ --ereg-argument-array "$ARG4$"

Nagios service:

define service {
 use okc-linux-check_proc
 host_name hostname.domain.com
 __NAME apache2
 __WARNING 1:100
 __CRITICAL 0:200
 service_description Process apache2
 check_command check_nrpe_procs_regex
 __EREG_ARG_ARRAY '/usr/sbin/apache2'
 __USER www-data
}
Regex process check with Nagios/Adagios