Learning to Harvest Information for the Semantic Web

16 years 29 days ago

Download eprints.aktors.org

Abstract. In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-deﬁned lexicon. Retrieved information is then used to partially annotate documents. Annotated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future...

Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yori

Real-time Traffic

ESWS 2004 | Information | Information Extraction | Methodology |

claim paper

» Semantic Relation Analysis and Its Application in Cognitive Profiling

» Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

» From information to knowledge harvesting entities and relationships from web sources

» An Architecture for a Semantic Portal

» eScience and the Semantic Web A Symbiotic Relationship

» NAGA harvesting searching and ranking knowledge

» Harvesting relational tables from lists on the web

» Harvesting Relational and Structured Knowledge for Ontology Building in the WPro Architect...

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	ESWS
Authors	Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yorick Wilks

Comments (0)

Sciweavers

Learning to Harvest Information for the Semantic Web

ESWS 2004 | Information | Information Extraction | Methodology |

Explore & Download

Productivity Tools

Sciweavers