Sciweavers

368 search results - page 2 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
ACL
2011
12 years 8 months ago
Template-Based Information Extraction without the Templates
Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an ...
Nathanael Chambers, Dan Jurafsky
ICADL
2007
Springer
112views Education» more  ICADL 2007»
13 years 11 months ago
Automated Template-Based Metadata Extraction Architecture
This paper describes our efforts to develop a toolset and process for automated metadata extraction from large, diverse, and evolving document collections. A number of federal agen...
Paul Flynn, Li Zhou, Kurt Maly, Steven J. Zeil, Mo...
SIGIR
2005
ACM
13 years 10 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...
KDD
2002
ACM
148views Data Mining» more  KDD 2002»
14 years 5 months ago
Discovering informative content blocks from Web documents
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
Shian-Hua Lin, Jan-Ming Ho
WSDM
2012
ACM
252views Data Mining» more  WSDM 2012»
12 years 15 days ago
WebSets: extracting sets of entities from the web using unsupervised information extraction
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining cluste...
Bhavana Bharat Dalvi, William W. Cohen, Jamie Call...