December 2009
13 posts
4 tags
Website Performance - Why you should care and what...
Why Does Performance Matter? Oftentimes, people speak interchangeably about web site performance, scalability, and availability. Although these 3 terms are related, they are distinct and unique. Here are their definitions: availability - what is the total length of time that [some part of] a web site is available during a hour/day/year? scalability - what is the largest number of concurrent...
Dec 31st
3 tags
Denial of Service (DoS) : Some Thoughts
About a year ago, I had the opportunity to solve a class of Denial-of-Service attacks that were compromising our availability and scalability. During that investigation, I happened upon a revelation. That revelation led to a solution. I’ve since seen that learning applied to other systems, including Amazon’s SimpleDB, so I wanted to share it here. Consider the following scenario (also...
Dec 28th
3 notes
5 tags
SimpleDB Recommended Reading List (12/23/09)
Below is a list of recommended reading to understand SimpleDB and other cloud-related topics. The reading list starts with distributed computing basics and ends with in-depth SimpleDB best-practices. The CAP Theorem Distilled The “Consistency, Not Accuracy” Principle Eventual Consistency Explained for Non-techies Eventual Consistency Explained for Techies RDBMS vs. SimpleDB...
Dec 24th
4 notes
6 tags
The Oracle-SimpleDB Hybrid Part 3 : Defining the...
Preamble : See Part 2 : Solving the Eventual-Consistency Problem When building a SimpleDB-Oracle (i.e. any key_value_store-RDBMS) hybrid system, translating between two very different data models presents a challenge. The challenge expands beyond the obvious ACID vs. BASE differences. Most RDBMSs support the following features: Triggers Stored Procedures Constraints (e.g. integrity,...
Dec 24th
73 notes
6 tags
The Oracle-SimpleDB Hybrid Part 2 : Solving the...
Preamble: Read Part 1: Pulling Data out of Oracle Efficiently Creating the Oracle-SimpleDB Hybrid system is a challenge. For one, it is a multihomed system, accepting writes in both the cloud (i.e. SimpleDB) and in our data center (i.e. Oracle). Secondly, the data center and the AWS reqion (i.e. US-east) are on opposite coasts of the US with network latency ~50-100ms. Thirdly, clocks are not...
Dec 23rd
12 notes
5 tags
The Oracle-SimpleDB Hybrid Part 1 : Pulling data...
Preamble : See my previous post titled “Introducing the Oracle-MySQL Hybrid” In my previous post, I provided an overview of an Oracle-SimpleDB Hybrid system that I am building. It supports writes to multiple masters, replicates data between masters in single-digit seconds (i.e. in the absence of long-term network partitions), is eventually-consistent, and is designed for optimal...
Dec 23rd
11 notes
4 tags
Introducing the Oracle-SimpleDB Hybrid
My company would like to migrate its systems to the cloud. As this will take several months, the engineering team needs to support data access in both the cloud and its data center in the interim. Also, the RDBMS system might be maintained until some functionality (e.g. Backup-Restore) is created in SimpleDB. To this aim, for the past 9 months, I have been building an eventually-consistent,...
Dec 16th
14 notes
7 tags
Cloud Tips: How to Efficiently Forklift 1 Billion...
About 9 months ago, I was tasked with fork-lifting a massive amount of data into Amazon’s SimpleDB in a short amount of time. I achieved it. Here’s what you need to know. If you read-on, I’ll show you how to achieve data upload rates of around 10K items/second SimpleDB Basics First of all, if you have 1 billion rows to upload, you will need more than 1 domain. This is...
Dec 15th
11 notes
5 tags
RDBMS vs SimpleDB Overview
Enter the key-value store, exit the RDBMS Anyone who has worked directly or indirectly with a relational database will tell you that it would be foolish to build a business that didn’t use one to store your business’s data. One may argue whether MySQL or Oracle is the better choice, but would someone actually argue that an RDBMS (a.k.a. relation database) was not the best choice for...
Dec 14th
4 notes
6 tags
Eventual Consistency Explained for Techies
Preamble: Have a look at my previous article titled “Eventually Consistency Explained for non-Techies” Eventual and Weak Consistency SimpleDB is eventually consistent. Eventual consistency is a version of weak consistency — you may not see the latest writes committed to the system. Imagine that you have a system of N nodes. Of these, W nodes are involved in any write sent to...
Dec 14th
3 notes
3 tags
Eventual Consistency Explained for Non-techies
If you work in the Computer industry, especially the Internet industry, chances are good that you have encountered an eventually-consistent system. For example, when managing an internet or IT business, you might have considered one of all of the following DB architectures: Use a single DB host e.g. MyHost Use a single DB host for your writes, but several for your reads e.g. MyWriteHost...
Dec 14th
3 notes
5 tags
The "Consistency, Not Accuracy" Principle
Preamble: Read my post “The CAP Theorem distilled” In my previous post, I started talking about the “Consistency, Not Accuracy” Principle (a.k.a. The CNA Principle) Essentially, in order to scale your web site and to keep running amidst unpredictable network and system outages, you need to have a replicated, fault-tolerant data store that accepts reads and writes in...
Dec 12th
9 notes
4 tags
The CAP Theorem Distilled
Anyone who has studied Distributed Computing in a graduate school computer science course appreciates that one of the hardest problems to solve is that of keeping writes in multiple locations synchronized. Are the writes synchronous or asynchronous? If the latter, what is the replication delay in the average and worst cases? How are write-inconsistencies resolved in the face of parallel...
Dec 12th
10 notes