In prior work we have demonstrated that search engine caches and archiving projects like the Internet Archive’s Wayback Machine can be used to “lazily preserve” websites and...
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Aggregated search is the task of incorporating results from different specialized search services, or verticals, into Web search results. While most prior work focuses on deciding...
Jaime Arguello, Fernando Diaz, Jamie Callan, Ben C...
In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set...
Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu H...
The notion of searching a hypertext corpus has been around for some time, and is an especially important topic given the growth of the World Wide Web and the general dissatisfacti...