Let’s do the Time Warp again

I know I should update this blog more often, but I keep having to deal with problems which are blog worthy. There’s an irony. I have a lovely post coming up about problems with Mail.app and adding CRLFs to text files, for example. That episode is enough to have me looking at using Outlook for work email.

Anyway, this post is about the importance of backups. Plural. One backup is never enough, as I was reminded yesterday.

Our MSci iMacs are backed up to a QNAP NAS which offers Time Machine compatibility. The only officially supported network Time Machine clients are either Apple’s Time Capsule or else storage served up by OS X Server. Neither work well for us – Time Capsule is a home technology, and I’ve had enough problems with OS X Server (post Snow Leopard) that the proverbial wild horses would not get me back to using it again. I wanted to use networked Time Machine as we had a small issue with roof leaks which meant that machines and their USB-attached backup drives were getting soaked (fair play to Apple, one iMac has been rained on twice and still works fine), and the QNAP seemed like a reasonable choice.

On a local USB disk Time Machine works by copying files direct to the drive; simple and efficient. On a network disk, no matter the source, it creates what’s called a sparse bundle disk image. This is a directory which emulates a single file, sort of like an ISO CD image. The directory contains multiple smaller files, called bands, which sort-of correspond to sectors on the virtual drive. These are 8MB each, and the idea is that only sectors which are needed are created. The problem with this approach is that for large disk images, say around the TB level, you’re looking at maybe 120,000 of them, which might be a lot of overhead for the server to deal with.

A machine had a hard drive failure, so I brought up the spare and started to restore files from the Time Machine backup using Migration Assistant. All went OK apart from a 500GB directory, which would only copy at around 1-200 kB/s, and that on a gigabit LAN capable of up to 100MB/s. I tried many different options to get at the data and no matter what I did the machine was intolerably slow, and crashed entirely twice. At this point I was most unhappy, and found myself wishing for a second backup (which I had said we’d needed and been overruled on).

In the end, I managed to get the data off the machine by turning off all Apple file sharing, mounting the TimeMachine partition via NFS on my iMac, opening the disk image and copying the files out from there. That worked OK, giving me the expected 50MB/s transfers. So clearly the disk image was OK, it’s just that the QNAP could not serve it efficiently over the AFP protocol.

Lessons from this? Firstly, always have more than one backup. I’ve already ordered some USB hard drives and set up an rsync script to a remote server as a stopgap to a better networked solution. Secondly, I don’t think that networked Time Machine is a good idea, however it’s done. On my home network I’ve had issues with disk images getting corrupted on my Time Capsule, or just failing without an error, and that’s with 100% Apple kit. Relying on an unapproved third-party work-alike for important things is not worth the risk. In the future I think I’ll be using locally attached hard drives for Time Machine, and some other network arrangements for disaster recovery – either rsync or Chronosync are top of my list. Along with some proper ‘enterprise’ grade storage…

Now, back to playing with Outlook. Sigh.