Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...
Existing techniques extract term candidates by looking for internal and contextual information associated with domain specific terms. The algorithms always face the dilemma that f...
The IDEX system is a prototype of an interactive dynamic Information Extraction (IE) system. A user of the system expresses an information request for a topic description which is ...
XML is becoming a prevalent format for data exchange. Many XML documents have complex schemas that are not always known, and can vary widely between information sources and applica...
Eugene Agichtein, C. T. Howard Ho, Vanja Josifovsk...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...