The vocabulary of the TREC Legal OCR collection is noisy and huge. Standard techniques for improving retrieval performance such as content-based query expansion are ineffective fo...
Several information organization, access, and filtering systems can benefit from different kind of document representations than those used in traditional Information Retrieval (I...
Our participation in TREC 2003 aims to adapt the use of the DFR (Divergence From Randomness) models with Query Expansion (QE) to the robust track and the topic distillation task o...
Giambattista Amati, Claudio Carpineto, Giovanni Ro...
Cross-language text classification (CLTC) aims to take advantage of existing training data from one language to construct a classifier for another language. In addition to the expe...
Abstract. In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We...