A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...
With large databases of document images available, a method for users to find keywords in documents will be useful. One approach is to perform Optical Character Recognition (OCR) ...
Movies segmentation into semantically correlated units is a quite tedious task due to ”semantic gap”. Low-level features do not provide useful information about the semantical...
Abstract. Social bookmarking has become an important web2.0 application recently, which is concerned with the dual user behavior to search - tagging. Although social bookmarking we...