Machine-generated documents containing semi-structured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data,...
In this paper, we describe research into the use of ontologies to integrate access to cultural heritage and photographic archives. The use of the CIDOC CRM and CRM Core ontologies...
Patrick A. S. Sinclair, Paul H. Lewis, Kirk Martin...
The Web consists of a large amount of unstructured information that hardly can be elaborated by automatic agents. In recent years, a considerable number of techniques for informat...
Leonardo Rigutini, Ernesto Di Iorio, Marco Ernande...
The discovery and extraction of general lists on the Web continues to be an important problem facing the Web mining community. There have been numerous studies that claim to autom...
Tim Weninger, Fabio Fumarola, Rick Barber, Jiawei ...
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...