Web spam research has been hampered by a lack of statistically significant collections. In this paper, we perform the first large-scale characterization of web spam using conten...
We introduce a novel infrastructure supporting automatic updates for dynamic content browsing on resource constrained mobile devices. Currently, the client is forced to continuous...
: In this paper we discuss several issues related to the influence of expansion of a Web document representation on quality of topical categorization of Web pages. We consider a W...
We assess the current state of the art in speech summarization, by comparing a typical summarizer on two different domains: lecture data and the SWITCHBOARD corpus. Our results ca...
A common practice in work groups is to share links to interesting web pages. Moreover, passages in these web pages are often cut-and-pasted, and used in various other contexts. In...