I’m currently setting up some new clusters under RedHat and each cluster is getting it’s own Nagios instance, trying to use the web based management interface however threw and error.
I’ve seen this error before on Ubuntu and was getting it again under RedHat. Of course I revisited my Ubuntu solution and realised that it didn’t help at all, due to using dpkg overrides, also the situation was very different!
<br /> root@nagios:/var/log/nagios/rw# ls -al<br /> total 8<br /> drwxrwxr-x 2 nagios apache 4096 Oct 30 13:37 .<br /> drwxrwxr-x 5 nagios apache 4096 Oct 30 13:40 ..<br />
The file didn’t exist 🙁
A quick scan of a working system showed :
<br /> prw-rw---- 1 nagios nagcmd 0 2009-10-15 13:32 nagios.cmd<br />
It’s a pipe! Woohoo.
So we need to create it
<br /> root@nagios:/var/log/nagios/rw# mknod nagios.cmd p<br /> root@nagios:/var/log/nagios/rw# chown nagios:apache nagios.cmd<br /> root@nagios:/var/log/nagios/rw# chmod 660 nagios.cmd<br /> root@nagios:/var/log/nagios/rw# ls -la<br /> total 8<br /> drwxrwxr-x 2 nagios apache 4096 Oct 30 13:37 .<br /> drwxrwxr-x 5 nagios apache 4096 Oct 30 13:43 ..<br /> prw-rw---- 1 nagios apache 0 Oct 30 13:37 nagios.cmd<br /> root@nagios:/var/log/nagios/rw# /etc/init.d/nagios restart<br />
A quick check of the site and .. same error? Still broken? Noooooooooooo
Lets have another look ..
<br /> root@mm2su0:/var/log/nagios/rw# ls -al<br /> total 8<br /> drwxrwxr-x 2 nagios apache 4096 Oct 30 13:45 .<br /> drwxrwxr-x 5 nagios apache 4096 Oct 30 13:45 ..<br /> prw-rw---- 1 nagios nagios 0 Oct 30 13:45 nagios.cmd<br />
Nice, nagios changed the permissions for us so apache can’t write to it. I’m not setting apache to run as the nagios user 🙁
A look in the init file for nagios shows that it actually manages the file itself, so we didn’t need to actually make it (strange that it wasn’t there when I looked then ..) The init file actually handles creation and removal of the file :
<br /> root@nagios:/var/log/nagios/rw# /etc/init.d/nagios stop<br /> Stopping nagios: done.<br /> root@nagios:/var/log/nagios/rw# ls<br /> root@nagios:/var/log/nagios/rw# /etc/init.d/nagios start<br /> Starting nagios: done.<br /> root@nagios:/var/log/nagios/rw# ls<br /> nagios.cmd<br /> root@nagios:/var/log/nagios/rw# ls<br /> nagios.cmd<br /> root@nagios:/var/log/nagios/rw# ls -al<br /> total 8<br /> drwxrwxr-x 2 nagios apache 4096 Oct 30 13:52 .<br /> drwxrwxr-x 5 nagios apache 4096 Oct 30 13:52 ..<br /> prw-rw---- 1 nagios nagios 0 Oct 30 13:52 nagios.cmd<br />
To fix this I’m going to stick the apache user in the nagios group.
[/code]
root@nagios:/var/log/nagios/rw# usermod -G nagios apache
root@nagios:/var/log/nagios/rw# /etc/init.d/httpd restart
[/code]
No more error, problem solved!
If you want to become a more advanced Nagios administrator, I recommend Nagios by O’Reilly. It’s full of best practice implementation advice.
