xend refusing to start

We recently had a few power outages at work, some scheduled, some not, and this played havoc with our xen servers.

One of the problems we had was that xend would not start (and thus xendomains would also not start).

Checking /var/log/xen/xend.log gave us the following snippet:

<br /> inst = XendNode()<br /> File "/usr/lib/python2.5/site-packages/xen/xend/XendNode.py", line 164, in __init__<br /> saved_pifs = self.state_store.load_state('pif')<br /> File "/usr/lib/python2.5/site-packages/xen/xend/XendStateStore.py", line 104, in<br /> load_state<br /> dom = minidom.parse(xml_path)<br /> File "xml/dom/minidom.py", line 1913, in parse<br /> File "xml/dom/expatbuilder.py", line 924, in parse<br /> File "xml/dom/expatbuilder.py", line 211, in parseFile<br /> ExpatError: no element found: line 1, column 0<br /> [2008-03-10 21:37:40 18122] INFO (__init__:1094) Xend exited with status 1.<br />

A quick google of that error revealed several people that had come across the same problem, but no actual answer!

It looks like xen is having problems parsing an xml file, so some quick mental inspiration, and the find command, yielded /var/lib/xend/state/pif.xml which was a 0 byte file! A comparison to a working server showed that it should (or atleast could) contain this:

`

`

A copy and paste later and we had a working xend! However it refused to create any of the xenlets:

<br /> root@xen0:/etc/xen# xm create server0.cfg<br /> Using config file "./server0.cfg".<br /> Error: The privileged domain did not balloon!<br />

Despite their being plenty of RAM!

<br /> root@xen0:/var/log/xen# xm list<br /> Name ID Mem VCPUs State Time(s)<br /> Domain-0 0 7928 8 r----- 832.8<br /> root@xen0:/var/log/xen# free<br /> total used free shared buffers cached<br /> Mem: 8119416 393028 7726388 0 11344 58832<br /> -/+ buffers/cache: 322852 7796564<br /> Swap: 15631224 0 15631224<br />

An strace of the process revealed xen did think it had less memory available than it actually had ..

<br /> [2008-03-10 21:47:48 18620] DEBUG (__init__:1094) Balloon: 131064 KiB free; 0 to scrub;<br /> need 524288; retries: 20.<br />

As we had a working xend finally we decided to implement a technique we’d learned from working with Windows machines and rebooted the server. This magically fixed the memory issue, it would have been nice to know what actually caused it and if there was a proper fix though.