Wednesday 24 March 2010

Table Storage Backup & Restore for Windows Azure

If you're using Table Storage in Windows Azure you're probably well aware of its real-time replication of data, which for me was a key factor in deciding to use the technology.

That said, I think the ability to perform a traditional database backup or restore (i.e a snapshot of the database) would be a really nice feature - which Table Storage does not currently support. Here are my top reasons why:

  1. Data replication may protect you from disk faults, but it doesn't protect you from accidental or malicious deletion. You'll only get this by taking snapshots of your data and storing it elsewhere.
  2. From a testing perspective, it can be really handy (or sometimes imperative) to "copy back" your production DB to your UAT or development environment.

So I thought I could write my own backup tool that retrieves all data via queries and stores it in a file - and then restore it back again by performing inserts. What started as a small & quick project turned into something much bigger - so I'm releasing it as open source.

Download Table Storage Backup

The project consists of 3 components:

  1. Backup Server. The backup server can be installed in your existing Web Role or Worker Role. The backup server performs all backup & restore operations within the Windows Azure environment.
  2. Backup Client. The backup client provides a friendly way of performing a backup & restore from a Windows PC.
  3. Backup Library. You can use the backup library to implement your own backup system or automate your backup operations, e.g. perform backups on a schedule.

How does it work?

  1. Data is backed up by retrieving all entities from all tables. The maximum number of entities are returned per table service query (until a partition entity is hit or 1000 entities are returned).
  2. Data is restored by performing batch insert operations. The maximum number of entities are inserted per batch (100 entities per partition or 4mb batch size).
  3. All transactions are performed at the raw REST level for efficiency, and to ensure data is duplicated precisely.

Please Contribute!

If you have any questions, feedback or bug reports please post them on the CodePlex site - and if you'd like to work on this project directly please contact me!

Cheers,
Anthony.

6 comments:

Shrey Chouhan said...

Hello Anthony,

I have just downloaded your Table Storage Backup and Recovery Application.

When I use my window azure account to backup my table into my computer, then this error is coming “Could not connect to Backup Service, check your connection settings". Even I have provided all information in Storage Account in your Options window.

But what is Connection Information in Option Window? What I will provide here? Is this error is coming due to these information.

Please reply.
Shrey Chouhan
www.cerebrata.com

Anthony said...

Hi Shrey,

Before you can use the backup client, you need to install the backup server within an existing Windows Azure application - this is what the connection information relates to.

Refer to the documentation pages on the CodePlex site, there is an installation guide for the backup server.

Cheers,
Anthony.

Dale Anderson said...

G'day Anthony,

Looks like a promising tool from screen shots, but there are a couple of things:

- installation is a bit of a pain, trying to install and it claims it requires the SharpZipLib library in the GAC. The installation guide looks long!

- I don't understand the requirement for a server component. Do you realise you can access the table storage REST API outside of the Azure environment?

Anthony said...

Hi Dale,

This project has been replaced with a new project that backs up both tables & blobs:

http://azurestoragebackup.codeplex.com/

The new project is simply a DLL that can be installed within a hosted Azure project - it's then up to you how you want to invoke it.

The reason this is a server component and not a client application is because it means that all backup traffic remains within the Azure datacentre. This gives you the most efficiency and saves on bandwidth costs.

Cheers,
Anthony.

Dale Anderson said...

Ahh nice, the new project looks good.

Fair call on the traffic side of things, though I would find it useful from a development perspective to be able to export / import all tables in a table storage account with the click of a button.

Cheers!

Anthony said...

There is a test harness included in the project which has a backup button & a restore button.

You can run it locally & point it to your dev storage, or to a remote storage account.

Anthony.