Systems designed for efficient retrieval of conventional data can be very inefficient at retrieving documents. Documents have more complex structure than conventional data, and th...
Search engine technology plays an important role in Web information retrieval. However, with Internet information explosion, traditional searching techniques cannot provide satisfa...
Baile Shi, Guoyu Hao, Hongtao Xu, Mei Wang, Qi Zha...
An Embedded Media Marker (EMM) is a transparent mark printed on a paper document that signifies the availability of additional media associated with that part of the document. Use...
Qiong Liu, Chunyuan Liao, Lynn Wilcox, Anthony Dun...
Nowadays, structured data such as sales and business forms are stored in data warehouses for decision makers to use. Further, unstructured data such as emails, html texts, images,...
Abstract. We present a model for complex documents possibly consisting of a hierarchically structured set of images or texts. Documents are represented both at the form level (as s...
Carlo Meghini, Fabrizio Sebastiani, Umberto Stracc...