Binary classification is a core data mining task. For large datasets or real-time applications, desirable classifiers are accurate, fast, and need no parameter tuning. We presen...
Abstract. We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogen...
Doina Caragea, Jyotishman Pathak, Jie Bao, Adrian ...
Life science researchers frequently need to query large protein data sets in a variety of different ways. Protein data sets have a rich structure that includes its primary structu...
Similarity search and similarity join on strings are important for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences....
In the life sciences, genomic databases for sequence search have been growing exponentially in size. As a result, faster sequencesearch algorithms to search these databases contin...
Oystein Thorsen, Brian E. Smith, Carlos P. Sosa, K...