This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...
Recently, web mining that tries to find useful knowledge from the vast amount of web pages has attracted a lot of research interests. Besides, it is becoming an essential task to...
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
This paper introduces the Book Structure Extraction competition run at ICDAR 2009. The goal of the competition is to evaluate and compare automatic techniques for deriving structu...
Antoine Doucet, Gabriella Kazai, Bodin Dresevic, A...
Regularity extraction is an important step in the design ow of datapath-dominated circuits. This paper outlines a new method that automatically extracts regular structures from th...