This paper introduces a collaborative project OSSmole which collects, shares, and stores comparable data and analyses of free, libre and open source software (FLOSS) development f...
We have been working on two different KDD systems for scientific data. One system involves comparative genomics, where the database contains more than 60,000 plant gene and protei...
Sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of users publishing sentiment data (e.g., reviews, blogs). Although traditio...
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model ...
This paper uncovers a new phenomenon in web search that we call domain bias — a user’s propensity to believe that a page is more relevant just because it comes from a particul...
Samuel Ieong, Nina Mishra, Eldar Sadikov, Li Zhang