Introducing Amazon DynamoDB
- It's schemaless ("NoSQL"). The database has "tables", but any item in a table can have its own unique columns that other rows may or may not share.
- It's auto-replicated, so a save to the database happens on a few different servers. If one dies, no problem, all the data is still there.
- It's easily scalable. Need to handle more requests? Add more servers, and they'll spread the data between them to lighten the load.
- In addition to common key/value functionality, there's a layer of secondary indexing. So while you still can't query it like MySQL and join tables (without a MapReduce operation, at least), you can still say "Get me all replies made to this certain blog post, in order by date." More powerful than "dumb" key/value stores.
- It's eventually consistent. You connect to a server, send it new data, and disconnect -- a super fast operation. Then the other servers in the cluster talk amongst themselves to get the data propagated through.
This Dynamo paper is what kickstarted NoSQL databases like Cassandra, Voldemort, and Riak, which are all directly based on that paper. And many of the concepts it in have impacted other popular NoSQLs, too.
Well, last week, Amazon released a new AWS service called DynamoDB. It's their Dynamo database, released to the public as a hosted service -- much like their other AWS datastores, like S3 and SimpleDB. Programmers' disdain for SimpleDB is reportedly one of the things that drove DynamoDB, as everyone wanted a database without storage limits, secondary indexing functionality, and a better pricing model.
Here are some highlights of the new service:
- It's dead fast, for a NoSQL. Most of my queries are coming back in the low-single-digit milliseconds.
- It's fully managed. You tell Amazon how many simultaneous writes and how many simultaneous reads you want to be able to do (on a per-table basis) and they provision and manage the hardware and replication necessary for it.
- You can scale your read/write capacity up or down at will without any data loss.
- 10 simultaneous writes and 50 simultaneous reads are $0.01 per hour. And if you're a new AWS user, that's in the free tier -- so you don't pay a thing. So even for a pretty darn popular application, even if you don't have access to the free stuff anymore, you're looking at $87 a YEAR for a fast database that's managed for you and can't go down (unless the whole DC goes down... which would suck).
- Automated backups to S3.
- Ties in with Amazon's Hadoop service, so you can MapReduce over your data. What's more, you can MapReduce over DynamoDB, SimpleDB, and S3, with the same query at the same time.
I considered posting about this last week, but I wanted to get my feet wet with it first. I finished a simple little Node.js library for it today at work, and I'm darn impressed. If you're interested in using this in Node.js (or any other language for which Amazon themselves don't provide a library for yet) then post here and I'll share all the undocumented stuff I had to figure out for myself, but if you're in Java, .NET, or PHP, you don't have to worry about it.
So if I've piqued your interest, here's the overview page (complete with a little introduction video): http://aws.amazon.com/dynamodb/
And here are the API docs I've been swimming in for the past few days -- note that there are some inaccuracies in the JSON samples here: http://docs.amazonwe...ion.html?r=1036
All in all, this service looks like it might replace Riak for my project at work. Riak definitely has more features when it comes to things like secondary indexing, but you just can't beat DynamoDB's ease of use and price. If you decide to use it, speak up! We can talk about storage strategies and such






Cartoon Clouds
Mountains
Sunrise
Clouds
Green Clouds
None

















Help