Global term weights in distributed environments

12 years 3 months ago
Global term weights in distributed environments
This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection
Hans Friedrich Witschel
Added 12 Dec 2010
Updated 12 Dec 2010
Type Journal
Year 2008
Where IPM
Authors Hans Friedrich Witschel
Comments (0)