The Oracle-SimpleDB Hybrid Part 3 : Defining the SimpleDB-Oracle translation

Preamble : See Part 2 : Solving the Eventual-Consistency Problem

When building a SimpleDB-Oracle (i.e. any key_value_store-RDBMS) hybrid system, translating between two very different data models presents a challenge. The challenge expands beyond the obvious ACID vs. BASE differences.

Most RDBMSs support the following features:

  • Triggers
  • Stored Procedures
  • Constraints (e.g. integrity, foreign key, unique, etc…)
  • Sequences
  • Sequences used as Primary Keys
  • Locks
  • Tables without Primary Keys or Unique Keys or both
  • Relationships between tables

Read More

The Oracle-SimpleDB Hybrid Part 2 : Solving the Eventual Consistency Problem

Preamble: Read Part 1: Pulling Data out of Oracle Efficiently

Creating the Oracle-SimpleDB Hybrid system is a challenge. For one, it is a multihomed system, accepting writes in both the cloud (i.e. SimpleDB) and in our data center (i.e. Oracle). Secondly, the data center and the AWS reqion (i.e. US-east) are on opposite coasts of the US with network latency ~50-100ms. Thirdly, clocks are not synchronized via network protocols like NTP across WANs. NTP across WANs introduce tens of milliseconds of inaccuracy, which may not be good enough to resolve all forms of conflict.

To build a multi-homed system, we needed to keep our Oracle DB in our data center in sync with our SimpleDB domains in the East Coast region.

This is a tough problem to solve. How consistent do we want the data? What if we shoot for strong-consistency?

In order to build a strongly-consistent link between Oracle and SimpleDB, we could use dual-writes via a 2 Phase Commit (a.k.a 2PC) protocol. However, 2PC over a 50-100ms link would be an availability bottleneck, and hence 2PC is not a viable option. Any consensus protocol would suffer the same short-coming.

Since we cannot achieve Strong Consistency between Oracle and SimpleDB, can we achieve Eventual Consistency?

Read More

The Oracle-SimpleDB Hybrid Part 1 : Pulling data out of Oracle Efficiently

Preamble : See my previous post titled “Introducing the Oracle-MySQL Hybrid”

In my previous post, I provided an overview of an Oracle-SimpleDB Hybrid system that I am building. It supports writes to multiple masters, replicates data between masters in single-digit seconds (i.e. in the absence of long-term network partitions), is eventually-consistent, and is designed for optimal AP — it survives Network Partitions and is highly available.

My company already relies on Oracle databases. In order to transition to SimpleDB, we will need to move one application at a time into the cloud while keeping our service running. As this cannot happen overnight, we need to keep both SimpleDB and Oracle in sync.

In Part 1 : Pulling data out of Oracle Efficiently, I’m going to discuss one of 3 methods we have devised to replicate data out of an RDBMS. This method is called Trigger-oriented Incremental Replication (a.k.a TIR) and is depicted below in the bottom gray-box.

Before the Oracle-SimpleDB Hybrid system could go live, we needed to copy a lot of data from Oracle to SimpleDB. There were 2 distinct goals:

  1. Copy historical data from Oracle to SimpleDB - i.e. a one-time data fork-lift
  2. Replicate incremental changes as they occur in the live system

Read More

RDBMS vs SimpleDB Overview

Enter the key-value store, exit the RDBMS

Anyone who has worked directly or indirectly with a relational database will tell you that it would be foolish to build a business that didn’t use one to store your business’s data.

One may argue whether MySQL or Oracle is the better choice, but would someone actually argue that an RDBMS (a.k.a. relation database) was not the best choice for storing your data?

Yes! There is a movement, the NoSQL movement, that is challenging the supremacy of RDBMSs for storing your data!

Some are listed here

Now 10 years after Eric Brewer’s game-changing introduction of the CAP theorem, a mass exodus is starting towards AP (availability + partition tolerance) and away from CP (strong consistency + partition tolerance).

Amazon’s SimpleDB is one such alternative to an RDBMS. Simply put, it is a distributed, replicated, eventually-consistent, always-available, key-value store owned and operated (i.e. hosted) by Amazon’s Web Services division.

Read More

Eventual Consistency Explained for Techies

Preamble: Have a look at my previous article titled “Eventually Consistency Explained for non-Techies”

Eventual and Weak Consistency

SimpleDB is eventually consistent. Eventual consistency is a version of weak consistency — you may not see the latest writes committed to the system.

Imagine that you have a system of N nodes. Of these, W nodes are involved in any write sent to the system and R nodes are contacted on any read from the system. Strong consistency can be achieved if R+W > N. In other words, if the read sets and write sets overlap, the read can discover the most recent write to the system.

Read More

About Me

A blog describing my work in building websites that millions of people visit. I'm a senior member of LinkedIn's Distributed Data Systems team. I previously held technical and leadership roles at Netflix, Etsy, eBay & Siebel Systems.
Tumblelogs I follow: