We had some trouble with our [FreeBSD][] systems over the holiday shutdown a few weeks ago. Our backup generator didn’t kick in during an extended power outage, and our UPS’s didn’t provide enough runtime to see us through. Result: all our systems crashed. (Before anybody mentions it… Yes, I *know* I should have installed [NUT][] and configured our network to shutdown in such cases.)
When I got to the office and started powering systems back up, I noticed the expected messages about the filesystems not being dismounted properly. After all the systems were up, I went back around and started logging in to survey the damage. Here’s an example of what happened when I tried to run [fsck][] on one of our filesystems:
# fsck -y /tmp
** /dev/twed0s1e (NO WRITE)
** Last Mounted on /tmp
** Phase 1 – Check Blocks and Sizes
** Phase 2 – Check Pathnames
** Phase 3 – Check Connectivity
.
.
.
I had never seen “(NO WRITE)” show up on fsck before. A little searching turned up [this post][] which explains that *(NO WRITE)* means the filesystem is mounted, thus fsck cannot write to it. I went back and rebooted each system into single-user mode. Then I was able to fsck all the local filesystems and reboot cleanly.
# init 1
# fsck -y -t ufs
# reboot
[FreeBSD]: http://www.freebsd.org/
[NUT]: http://networkupstools.org/ “Network UPS Tools”
[fsck]: http://www.FreeBSD.org/cgi/man.cgi?query=fsck&sektion=8&apropos=0&manpath=FreeBSD+7.1-RELEASE
[this post]: http://markmail.org/message/ijqcch4exhvcznmr “Message explaining (NO WRITE)”
The generator didn’t kick in? Impossible, it always works 🙂
This afternoon I found out what happened to the generator. We had a scheduled power outage on Dec 26. When the electrician came out here he (helpfully) started up the generator before he cut the main power. But because the power was “on” (due to the generator), the cut-over circuit (from utility power the generator) didn’t trip. The generator was effectively idling; meanwhile, none of the emergency circuits had any power. It took about 20 minutes to fix the problem, but none of our UPS’s have enough battery capacity to run that long.
On the “plus” side, I finally have approval to implement NUT here! 🙂
Well since long time passed since your blog post this probably is not relevant anymore.
If during the scan fsck was fixing some inconsistencies (not just removing unused blocks) then you might want to add background_fsck=”NO” to your /etc/rc.conf
The goal of SoftUpdates is reorganize writes in such order that any power failure will result mostly in unnecessarily allocated blocks and no data ever will be lost. The whole point of background fsck is to reclaim those blocks.
The problem is with drives which increase performance by reordering writes (most often through write cache) when that happens all hard work done through SoftUpdates is lost and it is possible for drive to be in inconsistent state.
If that happens and you have background fsck enabled, you might miss any idications that there’s something wrong with filesystem and by using inconsistent FS you might increase chances of data loss…