Abstract. The number of features to be considered in a text classification system is given by the size of the vocabulary and this is normally in the range of the tens or hundreds o...
David Vilar, Hermann Ney, Alfons Juan, Enrique Vid...
Classification of texts potentially containing a complex and specific terminology requires the use of learning methods that do not rely on extensive feature engineering. In this w...
Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...
—We carried out a series of experiments on text classification using multi-word features. An automated method was proposed to extract the multi-words from text data set and two d...
The majority of text retrieval and mining techniques are still based on exact feature (e.g. words) matching and unable to incorporate text semantics. Many researchers believe that...