webdevRefinery Forum: Introducing Amazon DynamoDB - webdevRefinery Forum

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

User is offline Kyek 

  • Founder of wdR
  • Group: Administrators
  • Posts: 4833
  • Joined: 20-February 10
  • LocationPhiladelphia, PA, USA
  • Expertise:HTML,CSS,PHP,Java,Javascript,Node.js,SQL

Posted 25 January 2012 - 07:45 PM (#1)

Introducing Amazon DynamoDB


Back in 2007 Amazon released a paper introducing their database concept, called "Dynamo", to the world. It's what they built to handle the data storage for their sites. The software itself was still proprietary and unreleased, but they wanted to share the concepts they came up with to the community. The main features were these:
  • It's schemaless ("NoSQL"). The database has "tables", but any item in a table can have its own unique columns that other rows may or may not share.
  • It's auto-replicated, so a save to the database happens on a few different servers. If one dies, no problem, all the data is still there.
  • It's easily scalable. Need to handle more requests? Add more servers, and they'll spread the data between them to lighten the load.
  • In addition to common key/value functionality, there's a layer of secondary indexing. So while you still can't query it like MySQL and join tables (without a MapReduce operation, at least), you can still say "Get me all replies made to this certain blog post, in order by date." More powerful than "dumb" key/value stores.
  • It's eventually consistent. You connect to a server, send it new data, and disconnect -- a super fast operation. Then the other servers in the cluster talk amongst themselves to get the data propagated through.


This Dynamo paper is what kickstarted NoSQL databases like Cassandra, Voldemort, and Riak, which are all directly based on that paper. And many of the concepts it in have impacted other popular NoSQLs, too.

Well, last week, Amazon released a new AWS service called DynamoDB. It's their Dynamo database, released to the public as a hosted service -- much like their other AWS datastores, like S3 and SimpleDB. Programmers' disdain for SimpleDB is reportedly one of the things that drove DynamoDB, as everyone wanted a database without storage limits, secondary indexing functionality, and a better pricing model.

Here are some highlights of the new service:
  • It's dead fast, for a NoSQL. Most of my queries are coming back in the low-single-digit milliseconds.
  • It's fully managed. You tell Amazon how many simultaneous writes and how many simultaneous reads you want to be able to do (on a per-table basis) and they provision and manage the hardware and replication necessary for it.
  • You can scale your read/write capacity up or down at will without any data loss.
  • 10 simultaneous writes and 50 simultaneous reads are $0.01 per hour. And if you're a new AWS user, that's in the free tier -- so you don't pay a thing. So even for a pretty darn popular application, even if you don't have access to the free stuff anymore, you're looking at $87 a YEAR for a fast database that's managed for you and can't go down (unless the whole DC goes down... which would suck).
  • Automated backups to S3.
  • Ties in with Amazon's Hadoop service, so you can MapReduce over your data. What's more, you can MapReduce over DynamoDB, SimpleDB, and S3, with the same query at the same time.


I considered posting about this last week, but I wanted to get my feet wet with it first. I finished a simple little Node.js library for it today at work, and I'm darn impressed. If you're interested in using this in Node.js (or any other language for which Amazon themselves don't provide a library for yet) then post here and I'll share all the undocumented stuff I had to figure out for myself, but if you're in Java, .NET, or PHP, you don't have to worry about it.

So if I've piqued your interest, here's the overview page (complete with a little introduction video): http://aws.amazon.com/dynamodb/
And here are the API docs I've been swimming in for the past few days -- note that there are some inaccuracies in the JSON samples here: http://docs.amazonwe...ion.html?r=1036

All in all, this service looks like it might replace Riak for my project at work. Riak definitely has more features when it comes to things like secondary indexing, but you just can't beat DynamoDB's ease of use and price. If you decide to use it, speak up! We can talk about storage strategies and such :)
0


User is offline Daniel15 

  • dan.cx
  • Group: Moderators
  • Posts: 3038
  • Joined: 17-April 10
  • LocationMelbourne, Australia
  • Expertise:HTML,CSS,PHP,Java,Javascript,SQL

Posted 25 January 2012 - 07:59 PM (#2)

Interesting :o. I haven't used any Amazon web services yet... I've got to try them out some time!

Quote

It's schemaless ("NoSQL"). The database has "tables", but any item in a table can have its own unique columns that other rows may or may not share.
It's auto-replicated, so a save to the database happens on a few different servers. If one dies, no problem, all the data is still there.
It's easily scalable. Need to handle more requests? Add more servers, and they'll spread the data between them to lighten the load.

This sounds exactly the same as Microsoft's Windows Azure table storage. Table storage has the concept of a "partition key" for every row. All rows with the same partition key are guaranteed to be on the same server. So if you had a table containing jobs for multiple clients, you could set the partition key to the client's ID or name. All data for the one client is guaranteed to be on one server, but clients could be spread across multiple servers. Seems to work pretty well. The primary key is a composite key consisting of the partition key and the row key (which is like a traditional ID).
Daniel15! :D
Repeat after me: jQuery is not JavaScript. It is not the answer to every JavaScript-related question. When you have to write some JavaScript, do not instantly react with "Oh, I'll do that with jQuery!"

javascript:alert((''+[][[]])[!+[]+!+[]]+(![]+[])[+!+[]]+(''+!+[]/[])[+!+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(''+!![])[+!+[]+!![]+!![]]+(![]+[])[!+[]+!+[]]+(+!+[])+(!+[]+!+[]+!+[]+!+[]+!+[]))

View PostKyek, on 16 November 2011 - 11:14 AM, said:

Daniel15 is ruining my life D:

View Postmorrison_levi, on 30 September 2011 - 04:10 PM, said:

They added more features to tables because. . . oh, yeah, they do have valid uses! Ever heard of data? We do still use that. :)
0


User is offline Kyek 

  • Founder of wdR
  • Group: Administrators
  • Posts: 4833
  • Joined: 20-February 10
  • LocationPhiladelphia, PA, USA
  • Expertise:HTML,CSS,PHP,Java,Javascript,Node.js,SQL

Posted 25 January 2012 - 08:39 PM (#3)

View PostDaniel15, on 25 January 2012 - 07:59 PM, said:

Interesting :o. I haven't used any Amazon web services yet... I've got to try them out some time!

They're pretty slick :) Some of the APIs can be a little asinine (case and point: You can't log into DynamoDB with your account credentials. You need to use your account credentials to get a temporary access key for DynamoDB which expires in a maximum of 36 hours, and use that. wtf?) but past that, they have some truly awesome services.

Quote

This sounds exactly the same as Microsoft's Windows Azure table storage. Table storage has the concept of a "partition key" for every row. All rows with the same partition key are guaranteed to be on the same server. So if you had a table containing jobs for multiple clients, you could set the partition key to the client's ID or name. All data for the one client is guaranteed to be on one server, but clients could be spread across multiple servers. Seems to work pretty well. The primary key is a composite key consisting of the partition key and the row key (which is like a traditional ID).

Interesting! Wonder if they based any of that on the Dynamo paper :). DynamoDB has a similar concept, where you declare one of your table fields as the Hash Key, which can be used as a standard 'id' for a stored item. The hash key determines which physical server(s) the data is stored to. Optionally, you can set the table up with a composite key by specifying a Range field, and if you do that, the Hash Key no longer has to be unique. So you could have 10 rows with a hash key of "Comments_post164" and a range key that's a timestamp. So, all the comments for that post are stored on the same physical device, and you can query them and retrieve them all as a group, in order. Sounds like the same basic concept, since you could technically make that Range key anything and it would work just like the partition key + id that you were referring to :).
0


User is offline Daniel15 

  • dan.cx
  • Group: Moderators
  • Posts: 3038
  • Joined: 17-April 10
  • LocationMelbourne, Australia
  • Expertise:HTML,CSS,PHP,Java,Javascript,SQL

Posted 04 February 2012 - 06:30 AM (#4)

I read that DynamoDB uses SSDs for all the storage :o
http://www.paperplan...s-dynamodb.html
Daniel15! :D
Repeat after me: jQuery is not JavaScript. It is not the answer to every JavaScript-related question. When you have to write some JavaScript, do not instantly react with "Oh, I'll do that with jQuery!"

javascript:alert((''+[][[]])[!+[]+!+[]]+(![]+[])[+!+[]]+(''+!+[]/[])[+!+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(''+!![])[+!+[]+!![]+!![]]+(![]+[])[!+[]+!+[]]+(+!+[])+(!+[]+!+[]+!+[]+!+[]+!+[]))

View PostKyek, on 16 November 2011 - 11:14 AM, said:

Daniel15 is ruining my life D:

View Postmorrison_levi, on 30 September 2011 - 04:10 PM, said:

They added more features to tables because. . . oh, yeah, they do have valid uses! Ever heard of data? We do still use that. :)
0


User is offline Kyek 

  • Founder of wdR
  • Group: Administrators
  • Posts: 4833
  • Joined: 20-February 10
  • LocationPhiladelphia, PA, USA
  • Expertise:HTML,CSS,PHP,Java,Javascript,Node.js,SQL

Posted 04 February 2012 - 08:02 AM (#5)

View PostDaniel15, on 04 February 2012 - 06:30 AM, said:

I read that DynamoDB uses SSDs for all the storage :o
http://www.paperplan...s-dynamodb.html

They do :) It is extremely impressively fast. Not primary-key-select-on-MySQL-with-everything-in-RAM fast, but comparable.
0


User is offline NoizeMe 

  • Group: Members
  • Posts: 591
  • Joined: 06-May 10
  • LocationGermany
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Node.js,SQL,MongoDB,CouchDB,Cassandra

Posted 04 February 2012 - 08:56 AM (#6)

Reminds me on this one paper I've read about Microsoft claiming that broke the CAP theorem with the Azure Storage.
I think this claim is fake and gay :D.
So grew Microsoft Azure...
Posted Image
0


User is offline Kyek 

  • Founder of wdR
  • Group: Administrators
  • Posts: 4833
  • Joined: 20-February 10
  • LocationPhiladelphia, PA, USA
  • Expertise:HTML,CSS,PHP,Java,Javascript,Node.js,SQL

Posted 04 February 2012 - 09:04 AM (#7)

View PostNoizeMe, on 04 February 2012 - 08:56 AM, said:

Reminds me on this one paper I've read about Microsoft claiming that broke the CAP theorem with the Azure Storage.
I think this claim is fake and gay :D.
So grew Microsoft Azure...

rofl, that's crazy xD. Everything I've been reading about Azure says that it's a Dynamo clone, without coming out and admitting it's a Dynamo clone. Most articles come out and call it eventually consistent. You can pass it a flag to ensure consistency when reading or writing (just like DynamoDB) which is nothing new, but you can't ensure availability if every individual request ties up every server in that partition.
0


User is offline NoizeMe 

  • Group: Members
  • Posts: 591
  • Joined: 06-May 10
  • LocationGermany
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Node.js,SQL,MongoDB,CouchDB,Cassandra

Posted 05 February 2012 - 09:57 AM (#8)

View PostKyek, on 04 February 2012 - 09:04 AM, said:

rofl, that's crazy xD. Everything I've been reading about Azure says that it's a Dynamo clone, without coming out and admitting it's a Dynamo clone. Most articles come out and call it eventually consistent. You can pass it a flag to ensure consistency when reading or writing (just like DynamoDB) which is nothing new, but you can't ensure availability if every individual request ties up every server in that partition.


Yeah, we all know that the CAP theorem should be impossible to break.
They say that they provide availability because they have an 99% uptime guarantee, but for 365 days a weak this mean they can be down for 3,65 days.
That is not availability.
Posted Image
0


Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

3 User(s) are reading this topic
0 members, 3 guests, 0 anonymous users


Enter your sign in name and password


Sign in options
  Or sign in with these services