Sciweavers

170 search results - page 19 / 34
» Text Retrieval from Document Images based on N-Gram Algorith...
Sort
View
ICDAR
2003
IEEE
15 years 5 months ago
Fast Lexicon-Based Word Recognition in Noisy Index Card Images
This paper describes a complete system for reading typewritten lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforw...
Simon M. Lucas, Gregory Patoulas, Andy C. Downton
ICDAR
2003
IEEE
15 years 5 months ago
A Model-based Line Detection Algorithm in Documents
In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the ...
Yefeng Zheng, Huiping Li, David S. Doermann
WWW
2009
ACM
16 years 10 days ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth
DOCENG
2009
ACM
15 years 6 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
AAAI
2008
15 years 2 months ago
A User-Oriented Webpage Ranking Algorithm Based on User Attention Time
We propose a new webpage ranking algorithm which is personalized. Our idea is to rely on the attention time spent on a document by the user as the essential clue for producing the...
Songhua Xu, Yi Zhu, Hao Jiang, Francis C. M. Lau