Abstract. In this paper INDIGO, an approach to infrastructures for digital libraries is presented. It fulfills two crucial requirements to digital libraries: scalability and the ab...
This paper presents an efficient compression-oriented segmentation algorithm for computer-generated document images. In this algorithm, a document image is represented in a block-...
When scanning documents with a large number of pages such as books, it is often feasible to provide a minimal number of training samples to personalize the system to compensate fo...
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and d...
A surrogate is an object that stands for a document and enables navigation to that document. Hypermedia is often represented with textual surrogates, even though studies have show...
Eunyee Koh, Daniel Caruso, Andruid Kerne, Ricardo ...