Generation of ground-truths is of great importance for unbiased performance evaluation of document layout analysis methods. This is especially necessary because many methods are c...
For character recognition in document analysis, some classes are closely overlapped but are not necessarily to be separated before contextual information is exploited. For classifi...
This paper proposes an algorithm called Imprecise Spectrum Analysis (ISA) to carry out fast dimension reduction for document classification. ISA is designed based on the one-sided...
Hu Guan, Bin Xiao, Jingyu Zhou, Minyi Guo, Tao Yan...
Recognizing polarity requires a list of polar words and phrases. For the purpose of building such lexicon automatically, a lot of studies have investigated (semi-) unsupervised me...
In this paper, we describe how meta-data of indexation can be extracted from historical document images using an interactive process with a software called AGORA. The algorithms i...