227views Database» more  ICDE 2012»
11 years 9 months ago
Horizontal Reduction: Instance-Level Dimensionality Reduction for Similarity Search in Large Document Databases
—Dimensionality reduction is essential in text mining since the dimensionality of text documents could easily reach several tens of thousands. Most recent efforts on dimensionali...
Min-Soo Kim 0001, Kyu-Young Whang, Yang-Sae Moon
13 years 2 months ago
Identification of scripts and orientations of degraded document images
This paper presents a pair of identification technique that automatically detect scripts and orientations of document images suffering from various types of document degradation. ...
Shijian Lu, Linlin Li, Chew Lim Tan
94views more  PAMI 2002»
13 years 7 months ago
Imaged Document Text Retrieval Without OCR
: We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely the Vertical Traverse D...
Chew Lim Tan, Weihua Huang, Zhaohui Yu, Yi Xu
118views Database» more  DATESO 2004»
13 years 8 months ago
LSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval
Abstract. In the area of information retrieval, the dimension of document vectors plays an important role. Firstly, with higher dimensions index structures suffer the "curse o...
Pavel Moravec, Michal Kolovrat, Václav Sn&a...
13 years 11 months ago
Text Retrieval from Document Images based on N-Gram Algorithm
In this paper, we propose a method of text retrieval from document images using a similarity measure based on an N-Gram algorithm. We directly extract image features instead of us...
Chew Lim Tan, Sam Yuan Sung, Zhaohui Yu, Yi Xu
14 years 9 months ago
Similarity measure for CCITT Group 4 compressed document images
Similarity measure of document images acts a crucial role in the area of document image retrieval. A method of measuring the similarity of CCITT Group 4 compressed document images...
Yue Lu, Chew Lim Tan, Liying Fan, Weihua Huang