Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated c...
Erik Linstead, Paul Rigor, Sushil Krishna Bajracha...
Network traffic can be represented by a Traffic Dispersion Graph (TDG) that contains an edge between two nodes that send a particular type of traffic (e.g., DNS) to one another. T...
Marios Iliofotou, Michalis Faloutsos, Michael Mitz...
Most highly accurate predictive modeling techniques produce opaque models. When comprehensible models are required, rule extraction is sometimes used to generate a transparent mod...
: Background Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering...
Background: Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This ...