Distributed Data Collector (DDC) is a framework to ease and automate repetitive executions of console applications (probes) over a set of LAN networked Windows personal computers....
Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with different a...
In this paper we present a method for clustering SAGE (Serial Analysis of Gene Expression) data to detect similarities and dissimilarities between different types of cancer on the...
A serious drawback of kernel methods, and Support Vector Machines (SVM) in particular, is the difficulty in choosing a suitable kernel function for a given dataset. One of the appr...
Huyen Do, Alexandros Kalousis, Adam Woznica, Melan...
We consider the problem of numerical stability and model density growth when training a sparse linear model from massive data. We focus on scalable algorithms that optimize certain...