Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

16

IJCAI
2003

favoriteEmaildiscussreport

120views Artificial Intelligence» more IJCAI 2003»

Information Extraction from Tree Documents by Learning Subtree Delimiters

13 years 5 months ago

Information Extraction from Tree Documents by Learning Subtree Delimiters

Download www.isi.edu

Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHTML formats open an opportunity to treat Web pages as trees, to mine the rich structural context in the trees and to learn accurate extraction rules. In this paper, we generalize the notion of delimiter developed for the string information extraction to tree documents. Similar to delimiters in strings, we deﬁne delimiters in tree documents as subtrees surrounding the text leaves. We formalize the wrapper induction for tree documents as learning the classiﬁcation rules based on the subtree delimiters. We analyze a restricted case of subtree delimiters in the form of simple paths. We design an efﬁcient data structure for storing candidate delimiters and an incremental algorithm for ﬁnding most discriminative subtree delimiters for the wrapper.

Boris Chidlovskii

Real-time Traffic

Delimiters | IJCAI 2003 | IJCAI 2007 | Information Extraction | Tree Documents |

claim paper

Related Content

» Extracting Partial Structures from HTML Documents

» Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference

» Information extraction from structured documents using ktestable tree automaton inference

» Learning Logic Wrappers for Information Extraction from the Web

» Rule Learning for Feature Values Extraction from HTML Product Information Sheets

» Thresher automating the unwrapping of semantic content from the World Wide Web

» IEPAD information extraction based on pattern discovery

» Semantic Tree Kernels to Classify Predicate Argument Structures

» Integrating web directories by learning their structures

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	IJCAI
Authors	Boris Chidlovskii

Comments (0)