The difficulty with information retrieval for OCR documents lies in the fact that OCR documents comprise of a significant amount of erroneous words and unfortunately most informat...
When one scans a document page from a thick bound volume, the curvature of the page to be scanned results in two kinds of distortion in the scanned document images: i) shade along...
We argue that expert finding is sensitive to multiple document features in an organization, and therefore, can benefit from the incorporation of these document features. We propos...
In this paper, we argue that the agglomerative clustering with vector cosine similarity measure performs poorly due to two reasons. First, the nearest neighbors of a document belo...
—The choices made by user in processing a set of documents is related, in a broad sense, to the sum of influences coming from the documents in the user situation, which does not...