Current data repositories include a variety of data types, including audio, images and time series. State of the art techniques for indexing such data and doing query processing r...
Understanding high-dimensional real world data usually requires learning the structure of the data space. The structure maycontain high-dimensional clusters that are related in co...
Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive exp...
Some online algorithms for linear classification model the uncertainty in their weights over the course of learning. Modeling the full covariance structure of the weights can prov...
Justin Ma, Alex Kulesza, Mark Dredze, Koby Crammer...
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vecto...