Scaling the iHMM: Parallelization versus Hadoop

13 years 3 months ago

Download eprints.pascal-network.org

—This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic model under study is the inﬁnite HMM [1], in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance, as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging.

Sebastien Bratieres, Jurgen Van Gael, Andreas Vlac

Real-time Traffic

Facility Computing Clouds | IEEECIT 2010 | Information Technology | Inﬁnite Hmm | Iteration Duration |

claim paper

Post Info
More Details (n/a)

Added	26 Jan 2011
Updated	26 Jan 2011
Type	Journal
Year	2010
Where	IEEECIT
Authors	Sebastien Bratieres, Jurgen Van Gael, Andreas Vlachos, Zoubin Ghahramani

Comments (0)

Sciweavers

Scaling the iHMM: Parallelization versus Hadoop

Facility Computing Clouds | IEEECIT 2010 | Information Technology | Inﬁnite Hmm | Iteration Duration |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers