Five years ago, LinkedIn was a shell of the technology company it is today. Here’s an inside look at where it came from, what it’s become and where it’s going.
Next up, Hadoop
Thus far, LinkedIn’s biggest push has been in improving its nearline and online systems (“Basically, we’ve hit the ball out of the park here,” Ghosh said), so its next big push is offline — Hadoop, in particular. The company already uses Hadoop for the usual gamut of workloads — ETL, model-building, exploratory analytics and pre-computing data for nearline applications — and Ghosh wants to take it even further.
He laid out a multipart vision, most of which centers around tight integration between the company’s Hadoop clusters and relational database systems. Among the goals: better ETL frameworks, ad-hoc queries, alternative storage formats and an integrated metadata framework — which Ghosh calls the holy grail — that will make it easier for various analytic systems to use each other’s data. He said LinkedIn has something half-built that should be finished this year.
See on gigaom.com