RDBMS vs SimpleDB Overview

Enter the key-value store, exit the RDBMS

Anyone who has worked directly or indirectly with a relational database will tell you that it would be foolish to build a business that didn’t use one to store your business’s data.

One may argue whether MySQL or Oracle is the better choice, but would someone actually argue that an RDBMS (a.k.a. relation database) was not the best choice for storing your data?

Yes! There is a movement, the NoSQL movement, that is challenging the supremacy of RDBMSs for storing your data!

Some are listed here

Now 10 years after Eric Brewer’s game-changing introduction of the CAP theorem, a mass exodus is starting towards AP (availability + partition tolerance) and away from CP (strong consistency + partition tolerance).

Amazon’s SimpleDB is one such alternative to an RDBMS. Simply put, it is a distributed, replicated, eventually-consistent, always-available, key-value store owned and operated (i.e. hosted) by Amazon’s Web Services division.

In SimpleDB-speak, a domain is a table, attributes are columns, and items are records. So, you can speak about a domain named customer_addresses containing 1000 items (i.e. 1000 customer addresses), such that each item contains attributes like Street Name and Zip.

SDB domains are sparse (i.e. schema-less) and SDB supports the following APIs:

  • select — a subset of the SQL 92 standard offering order by, but no group by or joins, etc…
  • getAttributes(key, attributeNameList)
  • putAttributes(key, attribute_name1=attribute_value1, attribute_name2=attribute_value2, ..)
  • removeAttributes(key, attributeNameList)

With the exception of the putAttributes() call, omission of the attributeNameList is allowed.

I’ve been transitioning several tables to SimpleDB from Oracle over the past year. Here are some issues I have with it.

  • Backup-recovery
    • They don’t have a way for users to back up their domains. God forbid that you should corrupt your data, you don’t have the ability to rollback to a previous good checkpoint
  • High-availability
    • When a write is committed, you should be able to read your data within seconds. WHEN A WRITE IS COMMITTED. This is the catch. If the node that you are writing to is being hammered (by you or someone else), you will receive ‘503 - Service Unavailable’ responses. In other words, you won’t be able to commit your writes.
    • You may have to make several attempts to commit, during which your own site might become less-available.
  • Inter-region replication
    • Currently, SimpleDB is shared between all availability-zones within a region but is not shared across regions. I don’t have a major qualm with this missing functionality, but you might. Essentially, you choose regions in order to achieve fast access to local data. In other words, your east coast customers should access SDB-US-East, while your west coast customers should access SDB-US-West. You could partition your data according to where you customers live (registered) or you could keep your data in sync between East and West regions and route traffic based on IP to the nearest coast. Whatever you choose, you will need to solve this problem.

Benefits of SimpleDB over RDBMS

Even with these deficits, I am quite pleased with SimpleDB. If you are starting a new company and need to decide on an OLTP system. Use SimpleDB. You won’t need to hire System Admins or DBAs. Your engineers won’t need to understand the need for connection pool management, SQL injection, prepared statements, query optimization, or other DB-related performance handicaps.

Your developers will simply make web service calls. There are some best-practices to follow, but they are few and simple to understand.

If you are transition from an RDBMS to a key-value store, your decision is more complex. I’ll write on that at a later time.

  1. rooksfury posted this
blog comments powered by Disqus
About Me

A blog describing my work in building websites that millions of people visit. I'm a senior member of LinkedIn's Distributed Data Systems team. I previously held technical and leadership roles at Netflix, Etsy, eBay & Siebel Systems.
Tumblelogs I follow: