Sciweavers

IEEECIT
2010
IEEE

Scaling the iHMM: Parallelization versus Hadoop

13 years 3 months ago
Scaling the iHMM: Parallelization versus Hadoop
—This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic model under study is the infinite HMM [1], in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance, as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging.
Sebastien Bratieres, Jurgen Van Gael, Andreas Vlac
Added 26 Jan 2011
Updated 26 Jan 2011
Type Journal
Year 2010
Where IEEECIT
Authors Sebastien Bratieres, Jurgen Van Gael, Andreas Vlachos, Zoubin Ghahramani
Comments (0)