Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated c...
Erik Linstead, Paul Rigor, Sushil Krishna Bajracha...
There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to ...
Abstract. Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To addre...
Citizen scientists, who are volunteers from the community that participate as field assistants in scientific studies [3], enable research to be performed at much larger spatial and...
Multimedia Data Mining requires the ability to automatically analyze and understand the content. The Community of Multimedia Agents project (COMMA) is devoted to creating an open ...