During the last years, significant attention has been paid to the problem of building wrappers for extracting data from semistructured web sources. Nevertheless, since web sources...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Abstract— In this work, web-based metrics for semantic similarity computation between words or terms are presented and compared with the state-of-the-art. Starting from the funda...
Tables on web pages contain a huge amount of semantically explicit information, which makes them a worthwhile target for automatic information extraction and knowledge acquisition...
The Text Analysis Conference (TAC) is a series of Natural Language Processing evaluation workshops organized by the National Institute of Standards and Technology. The Knowledge B...
Heather Simpson, Stephanie Strassel, Robert Parker...