Sciweavers

WEBI
2009
Springer

Learning Deep Web Crawling with Diverse Features

13 years 11 months ago
Learning Deep Web Crawling with Diverse Features
—The key to Deep Web crawling is to submit promising keywords to query form and retrieve Deep Web content efficiently. To select keywords, existing methods make a decision based on keywords’ statistic information deriving from TF and DF in local acquired records, thus work well only in textual databases providing full text search interfaces, whereas not well in structured databases of multi-attribute or field-restricted search interfaces. This paper proposes a novel Deep Web crawling method. Keywords are encoded as a tuple by its linguistic, statistic and HTML features so that a harvest rate evaluation model can be learned from the issued keywords for the un-issued in future. The method breaks through the assumption of plain-text search made by existing methods. Experimental results show that the method outperforms the state of the art methods. Keywords-Hidden Web; Deep Web surfacing; machine learning
Lu Jiang, Zhaohui Wu, Qinghua Zheng, Jun Liu
Added 25 May 2010
Updated 25 May 2010
Type Conference
Year 2009
Where WEBI
Authors Lu Jiang, Zhaohui Wu, Qinghua Zheng, Jun Liu
Comments (0)