Collections are a fundamental tool for reproducible evaluation of information retrieval techniques. We describe a new method for distributing the document lengths and term counts ...
The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are on...
A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerfu...
Hongyu Liu, Evangelos E. Milios, Jeannette Janssen
While the Web has been increasingly recognized as a culturally valuable social artifact, many nations endeavor to create national Web archives for long term preservation. However, ...
The study of the Web graph not only yields valuable insight into Web algorithms for crawling, searching and community discovery, and the sociological phenomena that characterize it...