We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use ...
We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use...
A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerfu...
Hongyu Liu, Evangelos E. Milios, Jeannette Janssen
The Web has the potential to become the world’s
largest knowledge base. In order to unleash this potential,
the wealth of information available on the Web needs to be
extracte...
Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifr...
Many real world datasets are represented in the form of graphs. The classical graph properties found in the data, like cliques or independent sets, can reveal new interesting info...