This paper describes a complete system for reading typewritten lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforw...
In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the ...
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
We propose a new webpage ranking algorithm which is personalized. Our idea is to rely on the attention time spent on a document by the user as the essential clue for producing the...