MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop...
Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each ...
Matthias Bender, Sebastian Michel, Peter Triantafi...
Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated c...
Erik Linstead, Paul Rigor, Sushil Krishna Bajracha...
We consider a network of autonomous peers forming a logically global but physically distributed search engine, where every peer has its own local collection generated by independe...
Josiane Xavier Parreira, Sebastian Michel, Gerhard...
One of the challenges facing the on-line gaming community is the delivery of new content to players. While the initial distribution of a game is typically done via large media for...