Sciweavers

ICML
2008
IEEE

Fully distributed EM for very large datasets

14 years 5 months ago
Fully distributed EM for very large datasets
In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the Mstep can be impractical. We present a framework that fully distributes the entire EM procedure. Each node interacts only with parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce topology, on two tasks: word alignment and topic modeling.
Jason Wolfe, Aria Haghighi, Dan Klein
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2008
Where ICML
Authors Jason Wolfe, Aria Haghighi, Dan Klein
Comments (0)