Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many s...
We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based samp...
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of rev...
Abstract. We show that eigenvector decomposition can be used to extract a term taxonomy from a given collection of text documents. So far, methods based on eigenvector decompositio...
Holger Bast, Georges Dupret, Debapriyo Majumdar, B...
In this article we present Supervised Semantic Indexing (SSI) which defines a class of nonlinear (quadratic) models that are discriminatively trained to directly map from the word...
Bing Bai, Jason Weston, David Grangier, Ronan Coll...