Sciweavers

MM
2006
ACM

Automatic document orientation detection and categorization through document vectorization

13 years 10 months ago
Automatic document orientation detection and categorization through document vectorization
This paper presents an automatic orientation detection and categorization technique that is capable of detecting the orientation of multilingual documents with arbitrary skew and categorizing document images according to the underlying languages. We carry out orientation detection and categorization through document vectorization, which encodes document orientation and language information and converts each document image into an electronic document vector through the exploitation of the density and distribution of vertical component runs. For each language of interest, a pair of vector templates is first constructed through a training process. Orientation and category of the query image are then determined based on distances between the query document vector and the constructed vector templates. Experiments over 492 testing document images show that the average orientation detection and categorization rates reach up to 97.56% and 99.59%, respectively. Categories and Subject Descript...
Shijian Lu, Chew Lim Tan
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where MM
Authors Shijian Lu, Chew Lim Tan
Comments (0)