Wednesday 24 March 2010

Table Storage Backup & Restore for Windows Azure

If you're using Table Storage in Windows Azure you're probably well aware of its real-time replication of data, which for me was a key factor in deciding to use the technology.

That said, I think the ability to perform a traditional database backup or restore (i.e a snapshot of the database) would be a really nice feature - which Table Storage does not currently support. Here are my top reasons why:

  1. Data replication may protect you from disk faults, but it doesn't protect you from accidental or malicious deletion. You'll only get this by taking snapshots of your data and storing it elsewhere.
  2. From a testing perspective, it can be really handy (or sometimes imperative) to "copy back" your production DB to your UAT or development environment.

So I thought I could write my own backup tool that retrieves all data via queries and stores it in a file - and then restore it back again by performing inserts. What started as a small & quick project turned into something much bigger - so I'm releasing it as open source.

Download Table Storage Backup

The project consists of 3 components:

  1. Backup Server. The backup server can be installed in your existing Web Role or Worker Role. The backup server performs all backup & restore operations within the Windows Azure environment.
  2. Backup Client. The backup client provides a friendly way of performing a backup & restore from a Windows PC.
  3. Backup Library. You can use the backup library to implement your own backup system or automate your backup operations, e.g. perform backups on a schedule.

How does it work?

  1. Data is backed up by retrieving all entities from all tables. The maximum number of entities are returned per table service query (until a partition entity is hit or 1000 entities are returned).
  2. Data is restored by performing batch insert operations. The maximum number of entities are inserted per batch (100 entities per partition or 4mb batch size).
  3. All transactions are performed at the raw REST level for efficiency, and to ensure data is duplicated precisely.

Please Contribute!

If you have any questions, feedback or bug reports please post them on the CodePlex site - and if you'd like to work on this project directly please contact me!

Cheers,
Anthony.