Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. We introduce an algorithm that...
Retweeting is an important action (behavior) on Twitter, indicating the behavior that users re-post microblogs of their friends. While much work has been conducted for mining text...
Zi Yang, Jingyi Guo, Keke Cai, Jie Tang, Juanzi Li...
In distributed software development, two sorts of dependencies can arise. The structure of the software system itself can create dependencies between software elements, while the ...
Cleidson R. B. de Souza, Jon Froehlich, Paul Douri...
If the dataset available to machine learning results from cluster sampling (e.g. patients from a sample of hospital wards), the usual cross-validation error rate estimate can lead...
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) op...