Sciweavers

WWW
2010
ACM

Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce

13 years 11 months ago
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce
The Web abounds with dyadic data that keeps increasing by every single second. Previous work has repeatedly shown the usefulness of extracting the interaction structure inside dyadic data [21, 9, 8]. A commonly used tool in extracting the underlying structure is the matrix factorization, whose fame was further boosted in the Netflix challenge [26]. When we were trying to replicate the same success on real-world Web dyadic data, we were seriously challenged by the scalability of available tools. We therefore in this paper report our efforts on scaling up the nonnegative matrix factorization (NMF) technique. We show that by carefully partitioning the data and arranging the computations to maximize data locality and parallelism, factorizing a tens of millions by hundreds of millions matrix with billions of nonzero cells can be accomplished within tens of hours. This result effectively assures practitioners of the scalability of NMF on Web-scale dyadic data. Categories and Subject Descr...
Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He,
Added 14 May 2010
Updated 14 May 2010
Type Conference
Year 2010
Where WWW
Authors Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min Wang
Comments (0)