Metadata plays an important role in discovering, collecting, extracting and aggregating Web data. This paper proposes a method of constructing metadata for a specific topic. The m...
The decomposition of a document into segments such as text regions and graphics is a significant part of the document analysis process. The basic requirement for rating and impro...
An ad hoc data format is any non-standard, semi-structured data format for which robust data processing tools are not available. In this paper, we present ANNE, a new kind of mark...
This demonstration paper presents a probabilistic XML data merging tool, that represents the outcome of semi-structured document integration as a probabilistic tree. The system is...
The aggregated structure of documents plays a key role in full-text, multimedia, and network Information Retrieval (IR). Considering aggregation provides new querying facilities a...