Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers t...
Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly...
Patrick Pantel, Eric Crestan, Arkady Borkovsky, An...
Abstract--Imbalanced data sets present a particular challenge to the data mining community. Often, it is the rare event that is of interest and the cost of misclassifying the rare ...
Discriminative learning methods are widely used in natural language processing. These methods work best when their training and test data are drawn from the same distribution. For...
We present a description of three different algorithms that use background knowledge to improve text classifiers. One uses the background knowledge as an index into the set of tra...