We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
Instance selection and feature selection are two orthogonal methods for reducing the amount and complexity of data. Feature selection aims at the reduction of redundant features i...
Abstract—Ortholog detection methods present a powerful approach for finding genes that participate in similar biological processes across different organisms, extending our unde...
Fadi Towfic, M. Heather West Greenlee, Vasant Hona...
The Human Protein Atlas is a rich source of location proteomics data. In this work, we present an automated approach for processing and classifying major subcellular patterns in t...
Justin Newberg, Jieyue Li, Arvind Rao, Fredrik Pon...