Text reuse occurs in many different types of documents and for many different reasons. One form of reuse, duplicate or near-duplicate documents, has been a focus of researchers be...
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments. In co...
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...
The paper describes the first version of the TextMOLE (Text Mining Operations Library and Environment) system for textual data mining. Currently TextMOLE acts as an advanced inde...
This paper describes our participation at the GeoCLEF 2005 task. We detail the main software components of our Geo-IR system, its adaptation for the participation at GeoCLEF and d...
Nuno Cardoso, Bruno Martins, Marcirio Silveira Cha...