On the Complexity of Rocchio's Similarity-Based Relevance Feedback Algorithm

14 years 1 months ago

Download www.cs.panam.edu

In this paper, we prove for the ﬁrst time that the learning complexity of Rocchio’s algorithm is O(d+d2 (log d+log n)) over the discretized vector space {0, . . . , n − 1}d , when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classiﬁer (q, 0) over {0, . . . , n − 1}d can be improved to O(d + 2k(n − 1)(log d + log(n − 1))), where k is the number of nonzero components in q. An Ω((d 2) log n) lower bound on the learning complexity is also obtained for Rocchio’s algorithm over {0, . . . , n − 1}d . In practice, Rocchio’s algorithm often uses ﬁxed query updating factors. When this is the case, the lower bound is strengthened to 2Ω(d) over the binary vector space {0, 1}d . In general, if the query updating factors are bounded by O(nc ) for some constant c ≥ 0, an Ω(nd−1−c /(n−1)) lower bound is obtained over {0, . . . , n−1}d .

Zhixiang Chen, Bin Fu

Real-time Traffic