Using Latent-Structure to Detect Objects on the Web

10 years 3 months ago
Using Latent-Structure to Detect Objects on the Web
An important requirement for emerging applications which aim to locate and integrate content distributed over the Web is to identify pages that are relevant for a given domain or task. In this paper, we address the problem of identifying pages that contain objects with a latent structure, i.e., the structure is implicitly represented in the page. We propose an algorithm which, given a set of instances of an object type, derives rules by automatically extracting statistically significant patterns present inside the objects. These rules can then be used to detect the presence of these objects in new, unseen pages. Our approach has several advantages when compared against learning-based text classifiers. Because it relies only on positive examples, constructing accurate object detectors is simpler than constructing learning classifiers, which require both positive and negative examples. Also, besides providing a classification decision for the presence of an object, the derived detec...
Luciano Barbosa, Juliana Freire
Added 11 Jul 2010
Updated 11 Jul 2010
Type Conference
Year 2010
Authors Luciano Barbosa, Juliana Freire
Comments (0)