Structured documents, especially the XML documents, are made up of a few logical components, such as title, sections, subsections and paragraphs. The components in each structured...
Abstract. We have defined an XML structural index called the Structure Index Tree (SIT), which eliminates duplicate structures arising from the equivalent subtrees in an XML docume...
Although indexes may overlap, the output of an automatic indexer is generally presented as a fiat and unstructured list of terms. Our purpose is to exploit term overlap and embedd...
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
During the last decade national archives, libraries, museums and companies started to make their records, books and files electronically available. In order to allow efficient ac...
Andreas Stoffel, David Spretke, Henrik Kinnemann, ...