Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unse...
Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization err...
ct Computer generated academic papers have been used to expose a lack of thorough human review at several computer science conferences. We assess the problem of classifying such do...
Classification is a key problem in machine learning/data mining. Algorithms for classification have the ability to predict the class of a new instance after having been trained on...
Jerffeson Teixeira de Souza, Stan Matwin, Nathalie...
This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Tradit...