December 2009
13 posts
4 tags
Website Performance - Why you should care and what...
Why Does Performance Matter?
Oftentimes, people speak interchangeably about web site performance, scalability, and availability. Although these 3 terms are related, they are distinct and unique. Here are their definitions:
availability - what is the total length of time that [some part of] a web site is available during a hour/day/year?
scalability - what is the largest number of concurrent...
3 tags
Denial of Service (DoS) : Some Thoughts
About a year ago, I had the opportunity to solve a class of Denial-of-Service attacks that were compromising our availability and scalability. During that investigation, I happened upon a revelation. That revelation led to a solution. I’ve since seen that learning applied to other systems, including Amazon’s SimpleDB, so I wanted to share it here.
Consider the following scenario (also...
5 tags
SimpleDB Recommended Reading List (12/23/09)
Below is a list of recommended reading to understand SimpleDB and other cloud-related topics. The reading list starts with distributed computing basics and ends with in-depth SimpleDB best-practices.
The CAP Theorem Distilled
The “Consistency, Not Accuracy” Principle
Eventual Consistency Explained for Non-techies
Eventual Consistency Explained for Techies
RDBMS vs. SimpleDB...
6 tags
The Oracle-SimpleDB Hybrid Part 3 : Defining the...
Preamble : See Part 2 : Solving the Eventual-Consistency Problem
When building a SimpleDB-Oracle (i.e. any key_value_store-RDBMS) hybrid system, translating between two very different data models presents a challenge. The challenge expands beyond the obvious ACID vs. BASE differences.
Most RDBMSs support the following features:
Triggers
Stored Procedures
Constraints (e.g. integrity,...
6 tags
The Oracle-SimpleDB Hybrid Part 2 : Solving the...
Preamble: Read Part 1: Pulling Data out of Oracle Efficiently
Creating the Oracle-SimpleDB Hybrid system is a challenge. For one, it is a multihomed system, accepting writes in both the cloud (i.e. SimpleDB) and in our data center (i.e. Oracle). Secondly, the data center and the AWS reqion (i.e. US-east) are on opposite coasts of the US with network latency ~50-100ms. Thirdly, clocks are not...
5 tags
The Oracle-SimpleDB Hybrid Part 1 : Pulling data...
Preamble : See my previous post titled “Introducing the Oracle-MySQL Hybrid”
In my previous post, I provided an overview of an Oracle-SimpleDB Hybrid system that I am building. It supports writes to multiple masters, replicates data between masters in single-digit seconds (i.e. in the absence of long-term network partitions), is eventually-consistent, and is designed for optimal...
4 tags
Introducing the Oracle-SimpleDB Hybrid
My company would like to migrate its systems to the cloud. As this will take several months, the engineering team needs to support data access in both the cloud and its data center in the interim. Also, the RDBMS system might be maintained until some functionality (e.g. Backup-Restore) is created in SimpleDB.
To this aim, for the past 9 months, I have been building an eventually-consistent,...
7 tags
Cloud Tips: How to Efficiently Forklift 1 Billion...
About 9 months ago, I was tasked with fork-lifting a massive amount of data into Amazon’s SimpleDB in a short amount of time. I achieved it. Here’s what you need to know.
If you read-on, I’ll show you how to achieve data upload rates of around 10K items/second
SimpleDB Basics
First of all, if you have 1 billion rows to upload, you will need more than 1 domain. This is...
5 tags
RDBMS vs SimpleDB Overview
Enter the key-value store, exit the RDBMS
Anyone who has worked directly or indirectly with a relational database will tell you that it would be foolish to build a business that didn’t use one to store your business’s data.
One may argue whether MySQL or Oracle is the better choice, but would someone actually argue that an RDBMS (a.k.a. relation database) was not the best choice for...
6 tags
Eventual Consistency Explained for Techies
Preamble: Have a look at my previous article titled “Eventually Consistency Explained for non-Techies”
Eventual and Weak Consistency
SimpleDB is eventually consistent. Eventual consistency is a version of weak consistency — you may not see the latest writes committed to the system.
Imagine that you have a system of N nodes. Of these, W nodes are involved in any write sent to...
3 tags
Eventual Consistency Explained for Non-techies
If you work in the Computer industry, especially the Internet industry, chances are good that you have encountered an eventually-consistent system.
For example, when managing an internet or IT business, you might have considered one of all of the following DB architectures:
Use a single DB host e.g. MyHost
Use a single DB host for your writes, but several for your reads e.g. MyWriteHost...
5 tags
The "Consistency, Not Accuracy" Principle
Preamble: Read my post “The CAP Theorem distilled”
In my previous post, I started talking about the “Consistency, Not Accuracy” Principle (a.k.a. The CNA Principle)
Essentially, in order to scale your web site and to keep running amidst unpredictable network and system outages, you need to have a replicated, fault-tolerant data store that accepts reads and writes in...
4 tags
The CAP Theorem Distilled
Anyone who has studied Distributed Computing in a graduate school computer science course appreciates that one of the hardest problems to solve is that of keeping writes in multiple locations synchronized. Are the writes synchronous or asynchronous? If the latter, what is the replication delay in the average and worst cases? How are write-inconsistencies resolved in the face of parallel...