We present Opal, a light-weight framework for interactively locating missing web pages (http status code 404). Opal is an example of “in vivo” preservation: harnessing the col...
Relevance feedback (RF) has been extensively studied in the content-based image retrieval community. However, no commercial Web image search engines support RF because of scalabil...
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment th...
Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa...
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments. In co...
Queries over XML documents challenge search engines to return the most relevant XML components that satisfy the query concepts. In a previous work[6] we described an algorithm to ...