I’m currently setting up some new clusters under RedHat and each cluster is getting it’s own Nagios instance, trying to use the web based management interface however threw and error.
I’ve seen this error before on Ubuntu and was getting it again under RedHat. Of course I revisited my Ubuntu solution and realised that it didn’t help at all, due to using dpkg overrides, also the situation was very different!
root@nagios:/var/log/nagios/rw# ls -al
total 8
drwxrwxr-x 2 nagios apache 4096 Oct 30 13:37 .
drwxrwxr-x 5 nagios apache 4096 Oct 30 13:40 ..
The file didn’t exist 🙁
A quick scan of a working system showed :
prw-rw---- 1 nagios nagcmd 0 2009-10-15 13:32 nagios.cmd
It’s a pipe! Woohoo.
So we need to create it
root@nagios:/var/log/nagios/rw# mknod nagios.cmd p
root@nagios:/var/log/nagios/rw# chown nagios:apache nagios.cmd
root@nagios:/var/log/nagios/rw# chmod 660 nagios.cmd
root@nagios:/var/log/nagios/rw# ls -la
total 8
drwxrwxr-x 2 nagios apache 4096 Oct 30 13:37 .
drwxrwxr-x 5 nagios apache 4096 Oct 30 13:43 ..
prw-rw---- 1 nagios apache 0 Oct 30 13:37 nagios.cmd
root@nagios:/var/log/nagios/rw# /etc/init.d/nagios restart
A quick check of the site and .. same error? Still broken? Noooooooooooo
Lets have another look ..
root@mm2su0:/var/log/nagios/rw# ls -al
total 8
drwxrwxr-x 2 nagios apache 4096 Oct 30 13:45 .
drwxrwxr-x 5 nagios apache 4096 Oct 30 13:45 ..
prw-rw---- 1 nagios nagios 0 Oct 30 13:45 nagios.cmd
Nice, nagios changed the permissions for us so apache can’t write to it. I’m not setting apache to run as the nagios user 🙁
A look in the init file for nagios shows that it actually manages the file itself, so we didn’t need to actually make it (strange that it wasn’t there when I looked then ..) The init file actually handles creation and removal of the file :
root@nagios:/var/log/nagios/rw# /etc/init.d/nagios stop
Stopping nagios: done.
root@nagios:/var/log/nagios/rw# ls
root@nagios:/var/log/nagios/rw# /etc/init.d/nagios start
Starting nagios: done.
root@nagios:/var/log/nagios/rw# ls
nagios.cmd
root@nagios:/var/log/nagios/rw# ls
nagios.cmd
root@nagios:/var/log/nagios/rw# ls -al
total 8
drwxrwxr-x 2 nagios apache 4096 Oct 30 13:52 .
drwxrwxr-x 5 nagios apache 4096 Oct 30 13:52 ..
prw-rw---- 1 nagios nagios 0 Oct 30 13:52 nagios.cmd
To fix this I’m going to stick the apache user in the nagios group.
[/code]
root@nagios:/var/log/nagios/rw# usermod -G nagios apache
root@nagios:/var/log/nagios/rw# /etc/init.d/httpd restart
[/code]
No more error, problem solved!
If you want to become a more advanced Nagios administrator, I recommend Nagios by O’Reilly. It’s full of best practice implementation advice.