A two-stage approach for word-wise identification of English (Roman), Devnagari and Bengali (Bangla) scripts is proposed. This approach balances the tradeoff between recognition a...
Sukalpa Chanda, Srikanta Pal, Katrin Franke, Umapa...
In this paper we present an algorithm for automatic extraction of textual elements, namely titles and full text, associated with news stories in news web pages. We propose a super...
We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabele...
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin ...
Many important application areas of text classifiers demand high precision and it is common to compare prospective solutions to the performance of Naive Bayes. This baseline is us...