A two-stage approach for word-wise identification of English (Roman), Devnagari and Bengali (Bangla) scripts is proposed. This approach balances the tradeoff between recognition a...
Sukalpa Chanda, Srikanta Pal, Katrin Franke, Umapa...
An approach to simultaneous document classification and word clustering is developed using a two-way mixture model of Poisson distributions. Each document is represented by a vect...
This work presents the application of a first-order logic incremental learning system, INTHELEX, to learn rules for the automatic identification of a wide range of significant docu...
Teresa Maria Altomare Basile, Stefano Ferilli, Nic...
Abstract. This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled...
Many text mining applications, especially when investigating Text Classification (TC), require experiments to be performed using common textcollections, such that results can be co...
Yanbo J. Wang, Robert Sanderson, Frans Coenen, Pau...