Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference

14 years 11 months ago

Download dli.iiit.ac.in

Information extraction (IE) aims at extracting specific information from a collection of documents. A lot of previous work on 10 from semi-structured documents (in XML or HTML) uses learning techniques based on strings. Some recent work converts the document to a ranked tree and uses tree automaton induction. This paper introduces an algorithm that uses unranked trees to induce an automaton. Experiments show that this gives the best results obtained so far for IE from semi-structured documents based on learning.

Raymond Kosala, Maurice Bruynooghe, Jan Van den Bu

Real-time Traffic

IJCAI 2003 | IJCAI 2007 | Ranked Tree | Semi-structured Documents | Tree Automaton Induction |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	IJCAI
Authors	Raymond Kosala, Maurice Bruynooghe, Jan Van den Bussche, Hendrik Blockeel

Comments (0)

Sciweavers

Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference

IJCAI 2003 | IJCAI 2007 | Ranked Tree | Semi-structured Documents | Tree Automaton Induction |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers