We present a new algorithm for finding large, dense subgraphs in massive graphs. Our algorithm is based on a recursive application of fingerprinting via shingles, and is extreme...
Today, Web pages are usually accessed using text search engines, whereas documents stored in the deep Web are accessed through domain-specific Web portals. These portals rely on e...
Contextual advertising (also called content match) refers to the placement of small textual ads within the content of a generic web page. It has become a significant source of rev...
Xuerui Wang, Andrei Z. Broder, Marcus Fontoura, Va...
Entity information management (EIM) is a nascent IR research area that investigates the information management process about entities instead of documents. It is motivated by the ...
We assess the current state of the art in speech summarization, by comparing a typical summarizer on two different domains: lecture data and the SWITCHBOARD corpus. Our results ca...