Automatic Hidden Web Database Classification

13 years 11 months ago
Automatic Hidden Web Database Classification
In this paper, a method for automatic classification of Hidden-Web databases is addressed. In our approach, the classification tree for Hidden Web databases is constructed by tailoring the well accepted classification tree of DMOZ Directory. Then the feature for each class is extracted from randomly selected Web documents in the corresponding category. For each Web database, query terms are selected from the class features based on their weights. A hidden-web database is then probed by analyzing the results of the class-specific query. To raise the performance further, we also use Web pages which have links pointing to the hidden-web database (HW-DB) as another important source to represent the database. We combine link-based evaluation and query-based probing as our final classification solution. The experiment shows that the combined method can produce much better performance for classification of hidden Web Databases.
Zhiguo Gong, Jingbai Zhang, Qian Liu
Added 09 Jun 2010
Updated 09 Jun 2010
Type Conference
Year 2007
Where PKDD
Authors Zhiguo Gong, Jingbai Zhang, Qian Liu
Comments (0)