Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
The analysis of the leading social video sharing platform YouTube reveals a high amount of redundancy, in the form of videos with overlapping or duplicated content. In this paper,...
This paper introduces a content-based information retrieval method inspired by the ideas of spreading activation models. In response to a given query, the proposed approach comput...
Language modeling is an effective and theoretically attractive probabilistic framework for text information retrieval. The basic idea of this approach is to estimate a language mo...
This paper details the participation of the XLDB group from the University of Lisbon at the GeoCLEF task of CLEF 2006. We tested text mining methods that make use of an ontology t...
Bruno Martins, Nuno Cardoso, Marcirio Silveira Cha...