Sciweavers

ICML
2007
IEEE

Focused crawling with scalable ordinal regression solvers

14 years 5 months ago
Focused crawling with scalable ordinal regression solvers
In this paper we propose a novel, scalable, clustering based Ordinal Regression formulation, which is an instance of a Second Order Cone Program (SOCP) with one Second Order Cone (SOC) constraint. The main contribution of the paper is a fast algorithm, CBOR, which solves the proposed formulation more efficiently than general purpose solvers. Another main contribution of the paper is to pose the problem of focused crawling as a large scale Ordinal Regression problem and solve using the proposed CB-OR. Focused crawling is an efficient mechanism for discovering resources of interest on the web. Posing the problem of focused crawling as an Ordinal Regression problem avoids the need for a negative class and topic hierarchy, which are the main drawbacks of the existing focused crawling methods. Experiments on large synthetic and benchmark datasets show the scalability of CB-OR. Experiments also show that the proposed focused crawler outperforms the state-of-the-art.
Rashmin Babaria, J. Saketha Nath, S. Krishnan, K.
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2007
Where ICML
Authors Rashmin Babaria, J. Saketha Nath, S. Krishnan, K. R. Sivaramakrishnan, Chiranjib Bhattacharyya, M. Narasimha Murty
Comments (0)