Thursday, December 2, 2010

A Lap Around SQL Azure Data Sync - Step By Step Example

SQL Azure Data Sync is a cloud-based service, hosted on Microsoft SQL Azure, which allows many SQL Azure databases to stay consistent with one another. This capability is ideal for a project I’m currently working on where the users need low latency data access across an entire company but they are geographically separated by great distances with one office in USA and one in Europe. SQL Azure Data Sync allows them to have consistently synchronized data physically close to both locations.

SQL Azure Data Sync is super simple to setup (try saying that five times fast). In this article, I’ll show you how to implement a data synchronization between two databases on the same logical SQL Azure server. We’ll be implementing the design in the diagram below.

STEP 1 – Signup for SQL Azure
If you need some guidance on how to do that, follow this link:
SQL Azure Signup Tutorial

As of this writing (AOTW), Microsoft is literally giving SQL Azure away (Web Edition – 1GB), so you really have no excuse on why you haven’t signed up yet.

STEP 2 – Create Two Databases
OK, here’s where you might get charged. For this example, we’ll create two databases that will sync with each other, and you might get charged when you add the second database. When we're finished, you may want to delete one or both databases to keep costs down. Database usage is metered per day, so your bill should be less than $1, but I haven’t confirmed that.

You create a database by following these steps:

a) Go to Click on the project you created in Step 1.

b) First you’ll see just the Master database. That doesn’t do us much good, since SQL Azure doesn’t allow us to add tables to the Master Database.

We need two more databases. Click on “Create Database.” The dialog below will pop up. Type in library1, and click “Create.” Do the same for the library2 database.

When you are finished, your console should look like this:

Now you have two blank databases. Let’s add some schema and data to library1, then we'll configure SQL Azure Data Sync to copy them over to library2.

STEP 3 – Add Schema and Data to Library1.

We’ll use SQL Server Management Studio (SSMS) to add the schema and data, though we could have easily used Visual Studio, SQLCMD, or even BCP.

Connect to SQL Azure using your server name, user name, and password. Don’t worry, it’s all encrypted when connecting to SQL Azure. Make sure your internal firewall allows port 1433 to pass through to SQL Azure. AOTW, SQL Azure’s TCP port cannot be adjusted, so don’t bother trying to look up how to change it. It’s a tough break that network admins at many organizations unilaterally block that port.

Your SSMS connection dialog will look something like this:

Once you’re connected, run this script to create two tables. You’ll notice that the TITLES table references the AUTHORS table. That will be important later.

, name VARCHAR(100) DEFAULT('')

, name VARCHAR(100) DEFAULT(11)
authorId INT REFERENCES authors(id))

('1', 'Stephen King')
, (
'2', 'George RR Martin')
, (
'3', 'Neal Stephenson')
, (
'4', 'Steig Larsson')

('1', 'A Game of Thrones', '2')
, (
'2', 'A Clash of Kings', '2')
, (
'3', 'The Song of Susannah', '1')
, (
'4', 'The Gunslinger', '1')

There are a couple of things to notice about this script. One, each table in SQL Azure needs a clustered index. I’m creating one by specifying a primary key in the create table statement. AOTW, SQL Azure will allow you to create a table without a clustered index, but it won’t allow you to insert data into it, so there’s no reason to bother to even create the heap table. Two, I like the insert syntax where we can insert multiple rows after the VALUES keyword. That’s not SQL Azure specific, I just think it’s cool.

We have two databases, one filled with tables and data and one empty. Let’s fix that and get to the meat of this demo.

STEP 4 – Setup SQL Azure Data Sync.
a) Goto and click on “SQL Azure Data Sync”

b) You’ll need to sign in to Windows Live ID.

c) The first time you go to this site, you’ll have to agree to a license agreement.

d) Click “Add New”.

e) Name the Sync Group "LibrarySync". I don’t know the limit of how long this name can be, but I’ve thrown in a lot of text in there and it took it. I wonder if it’s varchar(max). Then click “Next”.

f) Register your server by typing in your server name, user name, and password. Notice how in red it says “Your credentials will be stored in an encrypted format.” This is good news because it saves your credentials when registering other databases on the same server.

g) Then click “Add Hub”. The Hub database is similar to a publishing database in replication. For instance, it will win if there are any update conflicts.

h) Then choose the library2 database and click “Add Member”. Your screen should look something like this:

i) Then click “Next”.

j) You’ll get to a screen that looks like this:

The order you select the tables in this screen is the same order that the tables will be replicated. Remember that we have a foreign key constraint, so it’s real important that we add the authors table before we add the titles table. Click “Finish".

k) OK, now it seems like you’re done, but you’re not. Click on “LibrarySync” and then click “Schedule Sync”. Notice the options you have for scheduling synchronization. You can sync hourly, daily, etc. If you choose hourly, the “Minute” dropdown does not let you schedule at a minute interval, rather it allows you to choose the minute after the hour that the sync will begin. Click “OK”.

l) Technically, your sync is ready to go, but click “Sync Now” and wait a minute or two so we can examine the changes.

Step 5 – Examine the Changes
a) Notice the library2 database has all the schema and data from library1. It also has some other things that SQL Azure Data Sync added for us. BAM! We did it!

b) Look at the 3 tables that SQL Azure Data Sync added in both databases. These tables seem to track sync and schema information.
a. Schema_info
b. Scope_config
c. Scope_info

c) Each user table gets its own tracking table. For instance, we have authors_tracking and titles_tracking. This tells SQL Azure Data Sync which records need to be updated on the other members of the sync group. Notice how this is not an auditing tool like Change Data Capture. It works more like Change Tracking in SQL Server 2008. You won’t get all the changes that led up to the final state of the data.

d) Each user table gets three triggers that are used to keep the databases consistent.

e) There are many stored procedures added to both databases for the same reason. Feel free to poke around and examine the triggers and stored procedures. I found them to be cleanly written. I like how they’re using the MERGE statement, introduced in SQL Server 2008.

f) Feel free to add a record to Library2, and click “Sync Now”. You’ll see it in Library1 in no time, thus proving that the synchronization is indeed bi-directional.

g) On the SQL Azure Data Sync Page, check out the synchronization log by clicking on the Dashboard button.

Final Thoughts
At PDC 2010, Microsoft announced that this service will extend to on-premise SQL Servers. They demo’d how it’s done through an agent that’s installed on the on-premise SQL Server. It should be available to use in CTP 2, which is not publicly available AOTW.

Also, although the initial snapshot pushed the schema down to the library2 database, it will not keep schema in sync without tearing down the sync group and rebuilding it. I recommend you finalize your schema before setting up the sync group.

This is built on the Microsoft Sync Framework. I believe we’ll be seeing this used to synchronize all sorts of data from all sorts of data sources. I think it’s worth learning and I hope this gives you a fair introduction to the technology. Remember to delete at least one of your library databases to keep from being charged. Also, delete your data sync. Good luck!

No comments:

Post a Comment