Cassandra: Structured Key-Value based storage system

Cassandra is an open source distributed database management system. It was initially developed by Facebook for storing very large amounts of data. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on a Amazon Dynamo-like infrastructure.

As described on Wikipedia.org:

Cassandra provides a structured key-value store with eventual consistency. Keys map to multiple values, which are grouped into column families. The column families are fixed when a Cassandra database is created, but columns can be added to a family at any time. Furthermore, columns are added only to specified keys, so different keys can have different numbers of columns in any given family. The values from a column family for each key are stored together, making Cassandra a hybrid between a column-oriented DBMS and a row-oriented store.

Facebook released Cassandra as open source in July 2008, it is now an Apache Incubator project.

Digg announced that it started to roll out its use of Cassandra on Sep 9th, 2009. Twitter is also working on using Cassandra to replace their currents storage for all tweets.

Twitter has a cluster in production that’s being populated outside the the user-critical path (ie, the cassandra writing is async).  They also evaluated a lot of solutions, including a custom mysql impl, voldemort, hbase, mongodb, memcachdb, hypertable, etc. Finally they settled on Cassandra.

It’s really a nice promation for the NoSQL Movement, and if you faced similiar issues on processing large amount of data, it’s suggested to have a try with it.

Share Button

Leave a comment

Your email address will not be published. Required fields are marked *