This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Context influences the search process, but to date research has not definitively identified which aspects of context are the most influential for information retrieval, and thus a...
Luanne Freund, Elaine G. Toms, Charles L. A. Clark...
Non-negative Matrix Factorization (NMF, [5]) and Probabilistic Latent Semantic Analysis (PLSA, [4]) have been successfully applied to a number of text analysis tasks such as docum...
Knowledge-mapping tools enable users to quickly identify relevant information and expertise. This paper discusses a number of natural-language phenomena that limit the performance...
Anjo Anjewierden, Willem-Olaf Huijsen, Marjan Groo...
Term-based representations of documents have found widespread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard le...