The Oracle-SimpleDB Hybrid Part 3 : Defining the SimpleDB-Oracle translation

Preamble : See Part 2 : Solving the Eventual-Consistency Problem

When building a SimpleDB-Oracle (i.e. any key_value_store-RDBMS) hybrid system, translating between two very different data models presents a challenge. The challenge expands beyond the obvious ACID vs. BASE differences.

Most RDBMSs support the following features:

  • Triggers
  • Stored Procedures
  • Constraints (e.g. integrity, foreign key, unique, etc…)
  • Sequences
  • Sequences used as Primary Keys
  • Locks
  • Tables without Primary Keys or Unique Keys or both
  • Relationships between tables

These do not currently exist in SimpleDB. As we transition our applications to the cloud, the absence of some of these features are more problematic than others.

Unique Constraints and Primary Keys

Sometimes, a relational table might not have a Primary Key, but might have a Unique Key — the Unique Key might be a composite of several columns. In such a model, when translating the RDBMS data to SimpleDB, specify the Item Key to be the RDBMS Unique Key.

If an RDBMS table depends on a sequence, especially if the Unique Key or Primary Key contains a sequence, that can be a real problem in the cloud. A separate distributed sequence generator (e.g. Paxos) will be needed in the cloud as there is no way to generate sequences in SimpleDB. Paxos can potentially be a bottle-neck however.

Another option is to replace the Primary Key sequence (i.e. a number) with a GUID (i.e. a varchar2) in the RDBMS. We can generate GUIDs in the cloud easily as there is no need to gather consensus. This will then make it easy to insert new records in the cloud (i.e. SimpleDB) and data center (i.e. Oracle). It will also make the translation between the two by IR processes trivial.

Relationships between Tables

SimpleDB does not recognize relationships between domains. You have 2 options:

  • Option 1: Keep separate tables as separate domains and do joins in the cloud application
  • Option 2:¬†Collapse multiple-table relationships into a single domain

The second option is called denormalizing. SimpleDB, like most key-value stores, does not support join semantics, so one of these 2 options will be needed to overcome this deficit.

The option right for a particular set of data really winds down to replication. If data in the 2 RDBMS tables is altered as part of the same DB transaction, then denormalize (i.e. Option 1). If not, then keep the separate RDBMS tables in separate domains and do an application-level join.

Model All Deletes as Soft-deletes

This has more to do with multi-master replication than the translation between Oracle and SimpleDB. However, just keep it in mind. Hard-deletes can cause some confusion in the following quick succession of a chain of events:  create followed by a delete followed by a recreate. A soft-delete is one way to clear up the ambiguity this chain of events can cause when events are received out of order.

Discussion

These were the main pain points I encountered. I could work around many of the others. If you have any questions, feel free to note them here or DM me at @r39132 (Twitter).

  1. rooksfury posted this
blog comments powered by Disqus
About Me
A blog describing my work in building websites that hundreds of millions of people visit. I'm Chief Architect at ClipMine, an innovative video mining and search company. I previously held technical and leadership roles at LinkedIn, Netflix, Etsy, eBay & Siebel Systems. In addition to the nerdy stuff, I've included some stunning photography for your pure enjoyment!
Tumblelogs I follow: