Last friday failover and Stonith were evil...
Friday, end of the week, almost week-end, almost time to enjoy the warm weather of my hometown and yet…
As I said in the introduction, we were friday and ready to go back in France for a weekend. But something went wrong… friday… of course… First we noticed that one of the server was down, hum? I * this server got fenced, but why? I logged to the server and first check the available space, I always do. This how I discovered that
/ was full. But how in hell? I checked for the biggest file and saw something like:
-rw-r--r-- 1 root root 1.6G Nov 26 15:57 rmtab
What? But rmtab is in
/var and it has his own partition. For an unknown reason, at the next reboot the system hasn’t been able to mount every partition and then put everything on
How everything collapsed?
The exportfs has been restarted one more time but for this kind of incident, once is once too often:
- Pacemaker attemped to restarted the exportfs resource
- The resource failed to stop because a function
- This took ages and raised the timeout value in the configuration
- The node has been fenced
- The resource migrated and started on the other node
- The function
restore_rmtabwas called and from the shared storage restored all the content of the
- The other node went full (
- Eventually got fenced by the other chaotic node
- Over and over again until I disabled Stonith
Hopefully, a quick search on Google and the linux-ha ML led me to the solution.
Say thank you to the exportfs Pacemaker resource agent from the Official Ubuntu 12.04 repository! The rmtab contains the list of all the filesystems currently being mounted by remote machines. When a filesystem is unmounted by a remote machine, the line in the rmtab is just commented out, not deleted.
rmtab is used by rpc.mountd. This file needs to be synchronized to ensure a smooth failover and client reconnection. An entry look like this, when the filesystem is mounted:
When the filesystem is unmounted:
Pacemaker backups (by default) the
rmtab in the root of the mount point with the following format
This could be modified via thermbackup` primitive paramater but always needs to be on the shared export, otherwise the mecanism introduced by the RA doesn’t make sense anymore. Since I already exposed who messed up everything, let’s analyse the content of both functions.
Exportfs backup function from Official Ubuntu 12.04 repository:
Exportfs restore function from Official Ubuntu 12.04 repository:
As you can the see the problem is in the
restore_rmtab function. It copies the content stored on the shared storage (the backup) in
The upstream version of the RA (available on github) prevents the
rmtab to grow infinitely by the
sort -u command.
Thanks to added piece of code, we don’t experience this issue anymore.
Simple pacemaker setup:
============ Last updated: Mon Nov 26 21:43:28 2012 Last change: Mon Nov 26 15:37:25 2012 via cibadmin on c2-nfs-01 Stack: openais Current DC: c2-nfs-01 - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 3 Resources configured. ============ Online: [ c2-nfs-01 c2-nfs-02 ] Resource Group: g_exportfs p_vip_exportfs (ocf::heartbeat:IPaddr2): Started c2-nfs-01 p_fs_exportfs (ocf::heartbeat:Filesystem): Started c2-nfs-01 p_export (ocf::heartbeat:exportfs): Started c2-nfs-01
And then one client that mount the NFS share.
It was fairly easy to reproduce the problem and to see the grep action and the
>> action from the
$ sudo cat /var/lib/nfs/rmtab
I had to stop the process because my
/var was almost full. At the end, around 30 times restart the resource and got 1.2G file.
Upgrade the exportfs RA:
$ sudo wget https://raw.github.com/ClusterLabs/resource-agents/master/heartbeat/exportfs
It took 4 seconds to cleanup all the redundant entries in
K Useful links:
It almost looked like Friday 13…