Sciweavers

ICML
1997
IEEE

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

14 years 5 months ago
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the Rocchio algorithm,particularly the word weighting scheme and the similarity metric. It also suggests improvements which lead to a probabilistic variantof the Rocchio classi er. The Rocchio classi er, its probabilistic variant, and a naive Bayes classi er are compared on six text categorization tasks. The results show that the probabilistic algorithms are preferable to the heuristic Rocchio classi er not only because they are more well-founded, but also because they achieve better performance.
Thorsten Joachims
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 1997
Where ICML
Authors Thorsten Joachims
Comments (0)