: Background Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering...
Distance metric is widely used in similarity estimation. In this paper we find that the most popular Euclidean and Manhattan distance may not be suitable for all data distribution...
Background: Biological information is commonly used to cluster or classify entities of interest such as genes, conditions, species or samples. However, different sources of data c...
Abstract— Feature Selection (FS) is a technique for dimensionality reduction. Its aims are to select a subset of the original features of a dataset which are rich in the most use...
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...