Numerous approaches, including textual, structural and featural, to detecting duplicate documents have been investigated. Considering document images are usually stored and transm...
Text documents often embed data that is structured in nature. This structured data is increasingly exposed using information extraction systems, which generate structured relation...
Increasingly, companies recognize that most of their important information does not exist in relational stores but in documents. For a long time, textual information has been rela...
In this paper, we present a document model which integrates the logical structure and hypertext link structure of hyperdocuments in order to manage structured documents with hyper...
Yong Kyu Lee, Seong-Joon Yoo, Kyoungro Yoon, P. Br...
Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special i...
Brian F. Cooper, Neal Sample, Michael J. Franklin,...