15/12/2011
In describing Timeline’s software infrastructure, Piantino will only go so far. But he does say that one of the keys to the system was that Facebook locates the aggregation code — the stuff that sorts through a user’s Timeline information — on the same machine as the data. “If you can ship your aggregation code to the box itself, that’s easier than using a network link,” he says. “We’re using the CPU for aggregation and the disks and input-output system for MySQL.”
Yes, Timeline uses MySQL, not Hadoop Hbase (which Facebook uses in other parts of its site) or some other NoSQL database. Whereas NoSQL databases are meant to spread vast amounts of unstructured data across vast array of machines, MySQL is relational database designed to organize data in neat rows and columns on a single machine. But MySQL can be “sharded” across many machines, and that’s what Facebook does.
“A lot of people are surprised that for this shiny new thing for Facebook, we’re using MySQL,” Piantino says. “We treat [MySQL] as a generic engine for data manipulation. We use it as a storage engine. And it’s really efficient.”
In 2008, Piantino saw a presentation by an engineer from InnoDB — an outfit that does storage engines for MySQL. He remembers thinking that if he was ever trying to solve the problem of finding data on disk, there “wasn’t a chance” he’d come up with a better way than the engine InnoDB had built for MySQL.
Piantino points out that Timeline fundamentally deals with ordered data — where ordering is its most important quality. The connection to other Facebook “events” is secondary. This is different from “graphed data,” which lets you quickly traverse different kinds of information — from comments on a picture to geo-location. News Feed is a graphed product. Timeline is a log product.
„Facebook Pulls Back Curtain on ‘Timeline’ | Wired Enterprise | Wired.com
Quote posted at 14:01 Comments
blog comments powered by Disqus