We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
We present an experimental framework for Entity Mention Detection in which two different classifiers are combined to exploit Data Redundancy attained through the annotation of a l...
A large number of web sites publish pages containing structured information about recognizable concepts, but these data are only partially used by current applications. Although s...
Paolo Papotti, Valter Crescenzi, Paolo Merialdo, M...
Abstract. In this document we describe our approach to a specific subtask of ontology population, the extraction of instances of relations. We present a generic approach with which...
Viktor de Boer, Maarten van Someren, Bob J. Wielin...
These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditi...