Combining classifiers to identify online databases

12 years 10 months ago
Combining classifiers to identify online databases
We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database domain D, our goal is to select from F only the forms that are entry points to databases in D. Having a set of Web forms that serve as entry points to similar online databases is a requirement for many applications and techniques that aim to extract and integrate hidden-Web information, such as meta-searchers, online database directories, hidden-Web crawlers, and form-schema matching and merging. We propose a new strategy that automatically and accurately classifies online databases based on features that can be easily extracted from Web forms. By judiciously partitioning the space of form features, this strategy allows the use of simpler classifiers that can be constructed using learning techniques that are better suited for the features of each partition. Experiments using real Web data in a representative...
Luciano Barbosa, Juliana Freire
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Luciano Barbosa, Juliana Freire
Comments (0)