Tasks recognizing named entities such as products, people names, or locations from documents have recently received significant attention in the literature. Many solutions to thes...
The dominant method for evaluating search engines is the Cranfield paradigm, but the existing metrics do not consider some modern search engines features, such as document snippets...
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
More and more structured information in the form of semantic data is nowadays available. It offers a wide range of new possibilities especially for semantic search and Web data in...
Large-scale Parallel Web Search Engines (WSEs) needs to adopt a strategy for partitioning the inverted index among a set of parallel server nodes. In this paper we are interested ...