Background: With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with hig...
—We consider approaches for similarity search in correlated, high-dimensional data-sets, which are derived within a clustering framework. We note that indexing by “vector appro...
Abstract. This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, ...
Scarcity and infeasibility of human supervision for large
scale multi-class classification problems necessitates active
learning. Unfortunately, existing active learning methods
...
Prateek Jain (University of Texas at Austin), Ashi...
This paper introduces a method for clustering complex and linearly non-separable datasets, without any prior knowledge of the number of naturally occurring clusters. The proposed ...