An approach to simultaneous document classification and word clustering is developed using a two-way mixture model of Poisson distributions. Each document is represented by a vect...
Abstract This research deals with the use of self-organising maps for the classification of text documents. The aim was to classify documents to separate classes according to their...
Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel defined feature space. A machine learning algorithm b...
This paper describes an algorithm for the determination of zone content type of a given zone within a document image. We take a statistical based approach and represent each zone ...
Document image analysis is used to segment and classify regions of a document image into categories such as text, graphic and background. In this paper we first review existing doc...