A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic pro...
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T...
The core task of sponsored search is to retrieve relevant ads for the user’s query. Ads can be retrieved either by exact match, when their bid term is identical to the query, or...
Michael Bendersky, Evgeniy Gabrilovich, Vanja Josi...
When you write papers, how many times do you want to make some citations at a place but you are not sure which papers to cite? Do you wish to have a recommendation system which ca...
Qi He, Jian Pei, Daniel Kifer, Prasenjit Mitra, C....
In this paper we present a method to jointly optimise the relevance and the diversity of the results in image retrieval. Without considering diversity, image retrieval systems oft...
Thomas Deselaers, Tobias Gass, Philippe Dreuw, Her...