Background: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST ...
Protecting data privacy is an important problem in microdata distribution. Anonymization algorithms typically aim to protect individual privacy, with minimal impact on the quality...
Kristen LeFevre, David J. DeWitt, Raghu Ramakrishn...
In this paper, we propose a general framework for distributed boosting intended for efficient integrating specialized classifiers learned over very large and distributed homogeneo...
Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data ...
In this paper, we highlight the use of synthetic data sets to analyze learners behavior under bounded complexity. We propose a method to generate synthetic data sets with a specif...