An ever-increasing amount of information on the Web today is available only through search interfaces: the users have to type in a set of keywords in a search form in order to acc...
Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search e...
Two trends are converging to make the CPU cost of a table scan a more important component of database performance. First, table scans are becoming a larger fraction of the query p...
Allison L. Holloway, Vijayshankar Raman, Garret Sw...
Web masters usually place certain web pages such as home pages and index pages in front of others. Under such a design, it is necessary to go through some pages to reach the desti...
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...