January 2012
1 post
6 tags
The State of NoSQL in 2012
Preamble Ramble If you’ve been working in the online (e.g. internet) space over the past 3 years, you are no stranger to terms like “the cloud” and “NoSQL”.  In 2007, Amazon published a paper on Dynamo. The paper detailed how Dynamo, employing a collection of techniques to solve several problems in fault-tolerance, provided a resilient solution to the on-line...
Jan 19th
6 notes
December 2011
1 post
2 tags
Netflix & LinkedIn
   After 4+ years at Netflix, I am about to embark on a new challenge. On Monday, I start at LinkedIn as a senior member of the Service Infrastructure team. It has been my privilege and honor to work at Netflix these many years. To date, Netflix is the best company that I have worked at. Netflix Central to the company’s success is its unique culture of “Freedom and...
Dec 11th
3 notes
November 2011
2 posts
Slides from my QCon SF 2011 talk on Fault Tolerant...
Abstract : http://qconsf.com/sf2011/speaker/Siddharth+Anand Keeping Movies Running Amid Thunderstorms! View more presentations from Sid Anand
Nov 14th
6 notes
2 tags
"Head in the Cloud" Article about Netflix in the...
Check out the latest Cornell Engineering Magazine Issue for an article on how Netflix leverages the cloud! The story starts on page 10.
Nov 4th
6 notes
September 2011
1 post
Sep 9th
2 notes
July 2011
3 posts
3 tags
"NoSQL @ Netflix, Part 2" Talk at OSCON 2011
I’ve been having a great week so far at OSCON. Excellent talks from Riak, Facebook, 10gen, DataStax, Acunu, StumbleUpon, etc…. Uploading slides from my talk here.. OSCON Data 2011 — NoSQL @ Netflix, Part 2 View more presentations from Sid Anand
Jul 27th
16 notes
NoSQL @ Netfix @ OSCON Data, July 25
During July 25-27, O’Reilly will be dedicating a special section of their OSCON open source conference to big data use-cases and technologies. Tracks cover a range of topics, from NoSQL to relational technologies. Additionally tracks dedicated to Hadoop and Visualization are also included. This section is aptly named OSCON Data. I’ll be speaking during the Monday, 10:40 am slot...
Jul 11th
2 notes
Jul 7th
12 notes
June 2011
5 posts
3 tags
Acunu - Cassandra optimized (SSDs, new data...
] For some workloads, they achieve 2 orders of magnitude better latency results relative to a vanilla Cassandra distribution. This is achieved through a combination of SSDs and a new data structure (Stratified B-tree). Acunu White paper. http://bit.ly/jQc423    Short Blog entries on Cassandra performance http://bit.ly/mUqWLO Cassandra under heavy write load, Part...
Jun 28th
4 notes
WatchWatch
Recent GigaOM Structure 2011 Cloud Guru  Panel Video (6/22/2011)
Jun 23rd
4 notes
4 tags
Netflix @ DataStax SF 2011 - Monday, July 11
Netflix will be sponsoring DataStax SF 2011 on Monday, July 11. Though Netflix does not often sponsor events or recruit at conferences, we hope to engage with others active in the Cassandra and Distributed Systems community. Stop by our booth to learn about what we are working on! Bring your resume as well! Sid, Cloud Systems, Netflix
Jun 13th
3 notes
4 tags
Some upcoming conferences that I will be speaking...
Jun 3rd
6 notes
4 tags
Netflix for Mobile in the Cloud
Last week, I attended Intuit’s annual CTOF (Create The OFfering) conference in San Diego. The focus this year was on mobile best practices. I was happy to represent Netflix at this venue and learn from other companies with a foot in either mobile, the cloud, or both. Netflix is an example of both with support on several mobile platforms (e.g. iOS, Android, Windows Mobile) and traffic...
Jun 1st
3 notes
May 2011
1 post
3 tags
Distributed Systems & Dev Ops Engineers, Netflix...
  Hi Folks! As many of you know, Netflix has been expanding its video streaming offering throughout the US and the world. We have been successful thanks to our move to Amazon Web Services’ cloud and our adoption of NoSQL solutions. Senior Software Engineer - Cloud Database Performance Senior Operations Engineer - Cloud Database -s
May 24th
3 notes
March 2011
1 post
3 tags
Talks, Videos, and White Papers
I thought that I would put up a list of previous and upcoming talks that I will be giving. These talks will tend to focus on my work in Distributed “Cloud” Computing and NoSQL. Recent & Upcoming Speaking Appearances QCon SF 2010 Cornell Silicon Valley 2010 Cloud Panel Silicon Valley Cloud Computing Group (Meetup) — “NoSQL @ Netflix” QCon London 2011 (2...
Mar 24th
5 notes
February 2011
2 posts
5 tags
NoSQL @ Netflix : Part 1
Hi Folks! I had a great time this past Thursday (Feb 17, 2011) speaking at the Silicon Valley Cloud Computing Group Meetup at Facebook. The talk covered Netflix’s move from RDBMS to NoSQL, specifically SimpleDB. Subsequent parts will provide our experiences with Cassandra, HBase, and other technologies. Video is now available. The first 10 minutes are from sponsors, VMWare, RackSpace,...
Feb 21st
5 notes
3 tags
Silicon Valley Meetup & QCon London
Hi Folks! I’ll be giving a 2-part lecture on NoSQL @ Netflix at the Silicon Valley Cloud Computing Meetup in Mountain View on Feb 17. The lectures will be a month apart. I will detail the challenges involved in going from an RDBMS in our Data Center to AWS’s SimpleDB and S3 in the Cloud. I was intimately involved in this transition. Now, my team at Netflix is investing in Cassandra...
Feb 2nd
3 notes
January 2011
1 post
Red Black Trees vs Skip Lists →
Jan 27th
3 notes
December 2010
1 post
3 tags
How Process Priority Inversion Can Burn CPU via...
For the past few weeks, we have been wrestling with an interesting bug in Oracle 11g at Netflix. We are seeing high CPU attributed with a high number of wait events for the following: cursor: mutex S latch: shared pool I was perplexed as to why waits would result in high CPU so I hopped on to Google. I didn’t find an answer for 11g, but I did find something reported in 10.2 and...
Dec 7th
4 notes
November 2010
2 posts
4 tags
Cassandra : Row Cache & Memtable Q&A
Below, you will find 2 sets of questions and answers regarding Cassandra’s (now v0.7) Row Cache and Memtable. These were answered by Matthew Dennis at Riptano, a company that is actively developing both Cassandra and a management suite known as RipCord. Questions       Hi!       I have a super column family. Writes either modify a column within a super column or add a super column to the...
Nov 11th
5 notes
5 tags
Netflix's Transition to High-Availability Storage...
I presented at QCon SF (2010) yesterday on Netlflix’s transition to high-availability storage. The slides are on slideshare. I expect that the presentation will be available on the QCon SF site in a few days. It’s (loosely) based on my previous white paper of the same title - see this post.  
Nov 6th
6 notes
October 2010
1 post
4 tags
Netflix's Transition to High-Availability Storage...
I just published a white paper titled Netflix’s Transition to High-Availability Storage Systems Feel free to email me your thoughts at siddharthanand@yahoo.com To download this paper as PDF, click on this .
Oct 8th
8 notes
June 2010
3 posts
3 tags
SimpleDB Essentials for High Performance Users :...
This is Part 3 of SimpleDB Essentials for High Performance Users. Check out Part 2 Work around Attribute Value Length Limits If you need to store data that is vastly larger than 1024 bytes in a SimpleDB attribute, consider storing that data in S3 and putting a pointer (i.e. bucket name + object key) to the data in the simpleDB attribute. However, the drawback from this approach is that you...
Jun 21st
5 notes
3 tags
SimpleDB Essentials for High Performance Users :...
This Part 2 of SimpleDB Essentials for High Performance Users. Check out Part 1   Beware of Case-senstivity Since domain names and attribute names are case-sensitive, for all domain and attribute names, use uppercase lettering and separate words with “_” When sharding domains, adopt zero-based index numbering and separate it from the root name with “_” e.g....
Jun 21st
4 notes
4 tags
SimpleDB Essentials for High Performance Users :...
Preamble I’ve been a heavy-user of SimpleDB since January 2009, storing, writing, and reading billions of items. Based on my experience, I’ve compiled a list of best practices and conventions to simplify working with SimpleDB.  I’ve divided this into multiple parts to ease readability. Details Since sorting is...
Jun 19th
7 notes
March 2010
1 post
6 tags
A Java Out-of-Memory Error involving GZIP, Typica,...
UPDATE I am providing an update here to the root cause. Overview I ran into an interesting Out of Memory bug this week. It occurs if you use gzip to send/receive data and under-utilize your Java Heap memory. This land-mine has existed since 2004, though hopefully you will not be bitten by it. Problem Stack A Java process was throwing the following Out-of-Memory Error. JVMDUMP013I Processed Dump...
Mar 13th
12 notes
January 2010
3 posts
2 tags
SimpleDB Performance : 5 Steps to Achieving High...
I was recently tasked with fork-lifting ~1 billion rows from Oracle into SimpleDB. I completed this forklift in November 2009 after many attempts. To make this as efficient as possible, I worked closely with Amazon’s SimpleDB folks to troubleshoot performance problems and create new APIs. I’d like to share some recommendations and observations. Although I have covered these...
Jan 3rd
11 notes
2 tags
"The cloud" Explained for Normal People
If you are like most people in software, you have heard the term “The cloud” but have no idea what it means. If you are industrious enough to buy a book, google the term, or troll twitter for related tweets, you are likely exasperated by the shear marketing buzzword blast you encounter. To make it easier on you, I am going to tell you what it means to me with a very specific example: ...
Jan 1st
5 notes
4 tags
HTML 5's Web Sockets explained
I’ve been reading a bit about HTML 5’s WebSockets lately. First, here are some definitions: Comet - an umbrella term referring to techniques that provide “server push” using standard browser functionality — i.e. without the aid of specialty browser plug-ins. In practice, Comet in most-often implemented via Ajax with long polling. Long polling - (from Wikipedia)...
Jan 1st
4 notes
December 2009
13 posts
4 tags
Website Performance - Why you should care and what...
Why Does Performance Matter? Oftentimes, people speak interchangeably about web site performance, scalability, and availability. Although these 3 terms are related, they are distinct and unique. Here are their definitions: availability - what is the total length of time that [some part of] a web site is available during a hour/day/year? scalability - what is the largest number of concurrent...
Dec 31st
3 tags
Denial of Service (DoS) : Some Thoughts
About a year ago, I had the opportunity to solve a class of Denial-of-Service attacks that were compromising our availability and scalability. During that investigation, I happened upon a revelation. That revelation led to a solution. I’ve since seen that learning applied to other systems, including Amazon’s SimpleDB, so I wanted to share it here. Consider the following scenario (also...
Dec 28th
3 notes
5 tags
SimpleDB Recommended Reading List (12/23/09)
Below is a list of recommended reading to understand SimpleDB and other cloud-related topics. The reading list starts with distributed computing basics and ends with in-depth SimpleDB best-practices. The CAP Theorem Distilled The “Consistency, Not Accuracy” Principle Eventual Consistency Explained for Non-techies Eventual Consistency Explained for Techies RDBMS vs. SimpleDB...
Dec 24th
4 notes
6 tags
The Oracle-SimpleDB Hybrid Part 3 : Defining the...
Preamble : See Part 2 : Solving the Eventual-Consistency Problem When building a SimpleDB-Oracle (i.e. any key_value_store-RDBMS) hybrid system, translating between two very different data models presents a challenge. The challenge expands beyond the obvious ACID vs. BASE differences. Most RDBMSs support the following features: Triggers Stored Procedures Constraints (e.g. integrity,...
Dec 24th
69 notes
6 tags
The Oracle-SimpleDB Hybrid Part 2 : Solving the...
Preamble: Read Part 1: Pulling Data out of Oracle Efficiently Creating the Oracle-SimpleDB Hybrid system is a challenge. For one, it is a multihomed system, accepting writes in both the cloud (i.e. SimpleDB) and in our data center (i.e. Oracle). Secondly, the data center and the AWS reqion (i.e. US-east) are on opposite coasts of the US with network latency ~50-100ms. Thirdly, clocks are not...
Dec 23rd
7 notes
5 tags
The Oracle-SimpleDB Hybrid Part 1 : Pulling data...
Preamble : See my previous post titled “Introducing the Oracle-MySQL Hybrid” In my previous post, I provided an overview of an Oracle-SimpleDB Hybrid system that I am building. It supports writes to multiple masters, replicates data between masters in single-digit seconds (i.e. in the absence of long-term network partitions), is eventually-consistent, and is designed for optimal...
Dec 23rd
6 notes
4 tags
Introducing the Oracle-SimpleDB Hybrid
My company would like to migrate its systems to the cloud. As this will take several months, the engineering team needs to support data access in both the cloud and its data center in the interim. Also, the RDBMS system might be maintained until some functionality (e.g. Backup-Restore) is created in SimpleDB. To this aim, for the past 9 months, I have been building an eventually-consistent,...
Dec 16th
9 notes
7 tags
Cloud Tips: How to Efficiently Forklift 1 Billion...
About 9 months ago, I was tasked with fork-lifting a massive amount of data into Amazon’s SimpleDB in a short amount of time. I achieved it. Here’s what you need to know. If you read-on, I’ll show you how to achieve data upload rates of around 10K items/second SimpleDB Basics First of all, if you have 1 billion rows to upload, you will need more than 1 domain. This is...
Dec 15th
11 notes
5 tags
RDBMS vs SimpleDB Overview
Enter the key-value store, exit the RDBMS Anyone who has worked directly or indirectly with a relational database will tell you that it would be foolish to build a business that didn’t use one to store your business’s data. One may argue whether MySQL or Oracle is the better choice, but would someone actually argue that an RDBMS (a.k.a. relation database) was not the best choice for...
Dec 14th
4 notes
6 tags
Eventual Consistency Explained for Techies
Preamble: Have a look at my previous article titled “Eventually Consistency Explained for non-Techies” Eventual and Weak Consistency SimpleDB is eventually consistent. Eventual consistency is a version of weak consistency — you may not see the latest writes committed to the system. Imagine that you have a system of N nodes. Of these, W nodes are involved in any write sent to...
Dec 14th
3 notes
3 tags
Eventual Consistency Explained for Non-techies
If you work in the Computer industry, especially the Internet industry, chances are good that you have encountered an eventually-consistent system. For example, when managing an internet or IT business, you might have considered one of all of the following DB architectures: Use a single DB host e.g. MyHost Use a single DB host for your writes, but several for your reads e.g. MyWriteHost...
Dec 14th
5 tags
The "Consistency, Not Accuracy" Principle
Preamble: Read my post “The CAP Theorem distilled” In my previous post, I started talking about the “Consistency, Not Accuracy” Principle (a.k.a. The CNA Principle) Essentially, in order to scale your web site and to keep running amidst unpredictable network and system outages, you need to have a replicated, fault-tolerant data store that accepts reads and writes in...
Dec 12th
4 notes
4 tags
The CAP Theorem Distilled
Anyone who has studied Distributed Computing in a graduate school computer science course appreciates that one of the hardest problems to solve is that of keeping writes in multiple locations synchronized. Are the writes synchronous or asynchronous? If the latter, what is the replication delay in the average and worst cases? How are write-inconsistencies resolved in the face of parallel...
Dec 12th
10 notes