How Process Priority Inversion Can Burn CPU via “Waiting” Processes

For the past few weeks, we have been wrestling with an interesting bug in Oracle 11g at Netflix.

We are seeing high CPU attributed with a high number of wait events for the following:

  • cursor: mutex S
  • latch: shared pool

I was perplexed as to why waits would result in high CPU so I hopped on to Google. I didn’t find an answer for 11g, but I did find something reported in 10.2 and reportedly fixed in 11g.

In Oracle 10.2, on Operating Systems (e.g. AIX, HPUX, etc..) that support process priority decay (e.g. fair round robin, default on AIX 5), priority inversion can cause “blocked threads” to burn CPU. Ignore the fact that this mentions “cursor pin s” waits instead of “cursor mutex s” waits, the pattern is the same.

http://blog.tanelpoder.com/2010/04/21/cursor-pin-s-waits-sporadic-cpu-spikes-and-systematic-troubleshooting/

In Oracle 10.2, latches cause waits, while mutexes cause CPU spins.

Read More

The Oracle-SimpleDB Hybrid Part 3 : Defining the SimpleDB-Oracle translation

Preamble : See Part 2 : Solving the Eventual-Consistency Problem

When building a SimpleDB-Oracle (i.e. any key_value_store-RDBMS) hybrid system, translating between two very different data models presents a challenge. The challenge expands beyond the obvious ACID vs. BASE differences.

Most RDBMSs support the following features:

  • Triggers
  • Stored Procedures
  • Constraints (e.g. integrity, foreign key, unique, etc…)
  • Sequences
  • Sequences used as Primary Keys
  • Locks
  • Tables without Primary Keys or Unique Keys or both
  • Relationships between tables

Read More

The Oracle-SimpleDB Hybrid Part 2 : Solving the Eventual Consistency Problem

Preamble: Read Part 1: Pulling Data out of Oracle Efficiently

Creating the Oracle-SimpleDB Hybrid system is a challenge. For one, it is a multihomed system, accepting writes in both the cloud (i.e. SimpleDB) and in our data center (i.e. Oracle). Secondly, the data center and the AWS reqion (i.e. US-east) are on opposite coasts of the US with network latency ~50-100ms. Thirdly, clocks are not synchronized via network protocols like NTP across WANs. NTP across WANs introduce tens of milliseconds of inaccuracy, which may not be good enough to resolve all forms of conflict.

To build a multi-homed system, we needed to keep our Oracle DB in our data center in sync with our SimpleDB domains in the East Coast region.

This is a tough problem to solve. How consistent do we want the data? What if we shoot for strong-consistency?

In order to build a strongly-consistent link between Oracle and SimpleDB, we could use dual-writes via a 2 Phase Commit (a.k.a 2PC) protocol. However, 2PC over a 50-100ms link would be an availability bottleneck, and hence 2PC is not a viable option. Any consensus protocol would suffer the same short-coming.

Since we cannot achieve Strong Consistency between Oracle and SimpleDB, can we achieve Eventual Consistency?

Read More

The Oracle-SimpleDB Hybrid Part 1 : Pulling data out of Oracle Efficiently

Preamble : See my previous post titled “Introducing the Oracle-MySQL Hybrid”

In my previous post, I provided an overview of an Oracle-SimpleDB Hybrid system that I am building. It supports writes to multiple masters, replicates data between masters in single-digit seconds (i.e. in the absence of long-term network partitions), is eventually-consistent, and is designed for optimal AP — it survives Network Partitions and is highly available.

My company already relies on Oracle databases. In order to transition to SimpleDB, we will need to move one application at a time into the cloud while keeping our service running. As this cannot happen overnight, we need to keep both SimpleDB and Oracle in sync.

In Part 1 : Pulling data out of Oracle Efficiently, I’m going to discuss one of 3 methods we have devised to replicate data out of an RDBMS. This method is called Trigger-oriented Incremental Replication (a.k.a TIR) and is depicted below in the bottom gray-box.

Before the Oracle-SimpleDB Hybrid system could go live, we needed to copy a lot of data from Oracle to SimpleDB. There were 2 distinct goals:

  1. Copy historical data from Oracle to SimpleDB - i.e. a one-time data fork-lift
  2. Replicate incremental changes as they occur in the live system

Read More

Introducing the Oracle-SimpleDB Hybrid

My company would like to migrate its systems to the cloud. As this will take several months, the engineering team needs to support data access in both the cloud and its data center in the interim. Also, the RDBMS system might be maintained until some functionality (e.g. Backup-Restore) is created in SimpleDB.

To this aim, for the past 9 months, I have been building an eventually-consistent, multi-master data store. This system is comprised of an Oracle replica and several SimpleDB replicas. As I near completion of this system, I’d like to share its design.

Here’s the system:

We plan on accepting reads and writes in our data center (Oracle) and in our AWS region (SimpleDB). There are 2 Incremental Replicators (IRs) that transmit the changes between Oracle and SimpleDB. One replicates data from Oracle to SimpleDB, the other replicates data back from SimpleDB to Oracle.

Read More

About Me

A blog describing my work in building websites that millions of people visit. I'm a senior member of LinkedIn's Distributed Data Systems team. I previously held technical and leadership roles at Netflix, Etsy, eBay & Siebel Systems.
Tumblelogs I follow: