Despite the increase in email and other forms of digital communication, the use of printed documents continues to increase every year. Many types of printed documents need to be &...
Aravind K. Mikkilineni, Gazi N. Ali, Pei-Ju Chiang...
Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a pr...
Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulat...
Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas...
There is no blank to mark word boundaries in Chinese text. As a result, identifying words is difficult, because of segmentation ambiguities and occurrences of unknown words. Conve...
In this article, we propose a special type of decision tree, called a decision cascade, for binarizing document images. Such images are produced by cameras, resulting in varying de...