The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numer...
The success of popular algorithms such as k-means clustering or nearest neighbor searches depend on the assumption that the underlying distance functions reflect domain-specific n...
Mining user preferences plays a critical role in many important applications such as customer relationship management (CRM), product and service recommendation, and marketing camp...
Bin Jiang, Jian Pei, Xuemin Lin, David W. Cheung, ...
In large scale online systems like Search, eCommerce, or social network applications, user queries represent an important dimension of activities that can be used to study the imp...
We propose a family of novel cost-sensitive boosting methods for multi-class classification by applying the theory of gradient boosting to p-norm based cost functionals. We establ...
Naive Bayes and logistic regression perform well in different regimes. While the former is a very simple generative model which is efficient to train and performs well empirically...
Many time series prediction methods have focused on single step or short term prediction problems due to the inherent difficulty in controlling the propagation of errors from one ...
Traditional association mining algorithms use a strict definition of support that requires every item in a frequent itemset to occur in each supporting transaction. In real-life d...
Rohit Gupta, Gang Fang, Blayne Field, Michael Stei...
This paper presents a new algorithm for sequence prediction over long categorical event streams. The input to the algorithm is a set of target event types whose occurrences we wis...