Large-scale machine learning at twitter

11 years 5 months ago
Large-scale machine learning at twitter
The success of data-driven solutions to difficult problems, along with the dropping costs of storing and processing massive amounts of data, has led to growing interest in largescale machine learning. This paper presents a case study of Twitter’s integration of machine learning tools into its existing Hadoop-based, Pig-centric analytics platform. We begin with an overview of this platform, which handles “traditional” data warehousing and business intelligence tasks for the organization. The core of this work lies in recent Pig extensions to provide predictive analytics capabilities that incorporate machine learning, focused specifically on supervised classification. In particular, we have identified stochastic gradient descent techniques for online learning and ensemble methods as being highly amenable to scaling out to large amounts of data. In our deployed solution, common machine learning tasks such as data sampling, feature generation, training, and testing can be accompl...
Jimmy Lin, Alek Kolcz
Added 27 Sep 2012
Updated 27 Sep 2012
Type Journal
Year 2012
Authors Jimmy Lin, Alek Kolcz
Comments (0)