This week Facebook should have completed its new photo storage system which is designed to reduce the social network’s reliance on expensive proprietary solutions from NetApp and Akamai. The new large blob storage system, named Haystack, is a custom-built file system solution for the over 850 million photos uploaded to facebook each month (500 GB per day!). Haystack stores photo data inside 10 GB bucket with 1 MB of metadata for every GB stored. Metadata is guaranteed to be memory-resident, leading to one disk seek for each photo on average.
Haystack servers are built from commodity servers and disks assembled by Facebook to reduce costs associated with proprietary systems. The typical hardware configuration of a 2U storage blade is
- 2 x quad-core CPUs
- 16GB – 32GB memory
- hardware raid controller with 256MB – 512MB of NVRAM cache
- 12+ 1TB SATA drives in RAID-6
The Haystack index stores metadata about the one needle it needs to find within the Haystack. Incoming requests for a given photo asset are interpreted as before, but now contain a direct reference to the storage offset containing the appropriate data.
Cachr remains a first line-of-defense to Haystack lookups, quickly processing requests and loading images from memcached where appropriate. Haystack provides a fast and reliable file backing for these specialized requests.
This new infrastructure implements a HTTP based photo server which stores photos in a generic object store. The main requirement for the new tier was to eliminate any unnecessary metadata overhead for photo read operations, so that each read I/O operation was only reading actual photo data (instead of filesystem metadata). Haystack can be broken down into these functional layers:
- HTTP server
- Photo Store
- Haystack Object Store
The architecture is listed as below:
In the following sections we look closely at each of the functional layers from the bottom up.
Please read more at Facebook’s photo storage rewrite.
Here’s another version in video at Flowgram.com