Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...
Abstract. One of the de ning challenges for the KDD research community is to enable inductive learning algorithms to mine very large databases. This paper summarizes, categorizes, ...
We introduce a novel approach to incremental e-mail categorization based on identifying and exploiting "clumps" of messages that are classified similarly. Clumping reflec...
In data publishing, anonymization techniques such as generalization and bucketization have been designed to provide privacy protection. In the meanwhile, they reduce the utility o...
—Typical information extraction (IE) systems can be seen as tasks assigning labels to words in a natural language sequence. The performance is restricted by the availability of l...
Yanjun Qi, Pavel Kuksa, Ronan Collobert, Kunihiko ...