Saturday, January 25, 2014

Adventures in the 512 Sector

I'm going to talk for a moment about the importance of testing your backups. When I say you should test your backups, I don't mean just checking to see if the logs said they were successful, or restoring a file or two just to see if you can talk to them. That's a good start, don't get me wrong, but if you really want to find out how good your backups are, you need to try restoring everything, just to see what happens.

I did just that last week, though that wasn't the plan. I was putting together a virtual lab to test some performance issues we're having with a line of business application at work - essentially, I needed to know if the problem was our network hardware, our server software, our PCs, or the software itself. Incredibly mundane stuff. At the same time, I also wanted to find out how hard it would be to migrate our existing VMWare ESX 5 virtual machines to Hyper-V. We're already paying for Windows licensing through our educational Campus Agreement and we're not running a data center, just a couple of servers, so continuing to pay VMWare for maintenance and support is a little wasteful. At the same time, we recently rolled out a Storagecraft ShadowProtect-based backup regimen, and this seemed as good of time as any to find out if I could restore the VMWare virtual machines backed up with this software into a different virtualization technology and what that would entail.

Now, I'm a big fan of ShadowProtect - been using it for years in the field and its hardware independent restore functionality has saved my bacon more times than I'd care to admit. Because of that, I have the system restore procedure pretty thoroughly burned into my retinas:
  1. Insert ShadowProtect Recovery CD into destination hardware. Boot to CD.
  2. Attach drive that contains backup files.
  3. Delete any partitions existing on the destination drive.
  4. Create source partitions on the destination hardware.
  5. Restore ShadowProtect backups to destination hardware, making sure the Hardware Independent Restore check box is checked.
  6. Reboot and, if you're restoring SQL or Active Directory, hit F8, get into DSRM or Safe Mode with Networking, and remove any old network adapters from the device list.
So, the plan was to restore a couple of domain controllers (one of which doubles as a DHCP server), our line of business database server, and then get a couple of virtual Windows 7 hosts to connect to the restored network and do some testing. 

Things went sideways almost immediately.

First, I attempted to over-think the restore - instead of using ShadowProtect's built in drivers for the Hyper-V VMs, I added Hyper-V's drivers to the restore process. That didn't play with with VMWare Tools install that was already on the source VMs, so, upon rebooting, they blue-screened. 

Then I just gave up on thinking entirely. After repeating the restore with just the stock ShadowProtect driver package, I finally got the servers to boot into DSRM or Safe Mode with Networking. Enabled hidden devices, removed the old network cards, installed Hyper-V Integration Services, reboot... blue screen. Turns out VMWare Tools and the Hyper-V Integration Services don't play well together, in no small part because VMWare Tools tells Windows to boot from SCSI, while Hyper-V only supports SCSI boot in very narrowly defined circumstances.

Okay, fine. Perform the restore one more time, this time uninstalling VMWare Tools on the domain controllers while working in DSRM and disabling VMWare Tools using msconfig for the SQL server. Reboot. SQL server comes back to life without any issue; the domain controllers, however, blue screened again with the following error message:

STOP: c00002e2 Directory Services could not start because of the following error: 
A device attached to the system is not functioning. 

- Error Status: 0xc0000001

Well, that's new. I reboot the domain controllers back into DSRM and check the event log. There, in the Active Directory log*, was this:

Error: the log file sector size does not match the current volume's sector size (-546)

After a bit of sleuthing and running across this article on Exchange 2010 issuing the same error, along with this article on using VHDs on large capacity drives, it dawned on me - I was restoring to VHDXs, which report 4k byte sector sizes from backups that were performed from a 512 byte sector sized virtual drive. So, I converted the domain controller VHDXs to VHDs, mounted the VHDs as boot volumes, and then started the virtual machines.

Success!

The moral of the story, long as it is, is this - if you're using ShadowProtect to back up your virtual machines and you're restoring into another virtual environment:
  • Uninstall or disable the previous virtual host's integration tools as soon as possible.
  • Make sure you're restoring to a virtual drive with the same byte sector size as the host drive.
  • Test your backups before things go wrong so you can get your environment-specific wrinkles documented and sorted out.
* I'm running from memory here. Unfortunately, I don't have the server sitting in front of me, so I'm not entirely sure which error log I checked specifically.