We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
The classical (ad hoc) document retrieval problem has been traditionally approached through ranking according to heuristically developed functions (such as tf.idf or bm25) or gene...
We approached the problem as learning how to order documents by estimated relevance with respect to a user query. Our support vector machines based classifier learns from the rele...
Dmitri Roussinov, Weiguo Fan, Fernando A. Das Neve...
Search engines that support structured documents typically support structure created by the author (e.g., title, section), and may also support structure added by an annotation pr...
This paper reports on work done for the Genomics Track at TREC 2004 by ConverSpeech LLC in conjunction with scientists at the Saccharomyces Genome Database (SGD), the model organi...
Colleen E. Crangle, Alex Zbyslaw, J. Michael Cherr...