Sciweavers

367 search results - page 66 / 74
» Indexing Text Documents Based on Topic Identification
Sort
View
VLDB
2002
ACM
161views Database» more  VLDB 2002»
14 years 11 months ago
Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many s...
Panagiotis G. Ipeirotis, Luis Gravano
68
Voted
SIGIR
2006
ACM
15 years 5 months ago
Distributed query sampling: a quality-conscious approach
We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based samp...
James Caverlee, Ling Liu, Joonsoo Bae
WWW
2006
ACM
16 years 10 days ago
Finding advertising keywords on web pages
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of rev...
Wen-tau Yih, Joshua Goodman, Vitor R. Carvalho
EWMF
2005
Springer
15 years 5 months ago
Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis
Abstract. We show that eigenvector decomposition can be used to extract a term taxonomy from a given collection of text documents. So far, methods based on eigenvector decompositio...
Holger Bast, Georges Dupret, Debapriyo Majumdar, B...
110
Voted
IR
2010
14 years 10 months ago
Learning to rank with (a lot of) word features
In this article we present Supervised Semantic Indexing (SSI) which defines a class of nonlinear (quadratic) models that are discriminatively trained to directly map from the word...
Bing Bai, Jason Weston, David Grangier, Ronan Coll...