Sciweavers

UAI
2008
13 years 5 months ago
Latent Topic Models for Hypertext
Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collect...
Amit Gruber, Michal Rosen-Zvi, Yair Weiss
INFOSCALE
2007
ACM
13 years 5 months ago
Query-driven indexing for scalable peer-to-peer text retrieval
We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been ide...
Gleb Skobeltsyn, Toan Luu, Ivana Podnar Zarko, Mar...
DAS
2008
Springer
13 years 6 months ago
HistoSketch: A Semi-Automatic Annotation Tool for Archival Documents
This article describes a sketch-based framework for semi-automatic annotation of historical document collections. It is motivated by the fact that fully automatic methods, while h...
Joan Mas, José A. Rodríguez, Dimosth...
CIKM
2008
Springer
13 years 6 months ago
Peer-to-peer similarity search over widely distributed document collections
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar d...
Christos Doulkeridis, Kjetil Nørvåg, ...
DEXA
2006
Springer
193views Database» more  DEXA 2006»
13 years 8 months ago
Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing
Abstract. Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval...
Xiang Wang 0002, Xiaoming Jin
VLDB
1994
ACM
148views Database» more  VLDB 1994»
13 years 8 months ago
Fast Incremental Indexing for Full-Text Information Retrieval
Full-text information retrieval systems have traditionally been designed for archival environments. They often provide little or no support for adding new documents to an existing...
Eric W. Brown, James P. Callan, W. Bruce Croft
CIKM
1997
Springer
13 years 8 months ago
The Need for Metrics in Visual Information Analysis
CT This paper explores several methods for visualizing the thematic content of large document collections. As opposed to traditional query-driven document retrieval, these methods ...
Nancy Miller, Elizabeth G. Hetzler, Grant Nakamura...
SIGMOD
2000
ACM
85views Database» more  SIGMOD 2000»
13 years 8 months ago
Finding Replicated Web Collections
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
Junghoo Cho, Narayanan Shivakumar, Hector Garcia-M...
ERCIMDL
2001
Springer
132views Education» more  ERCIMDL 2001»
13 years 8 months ago
A Combined Phrase and Thesaurus Browser for Large Document Collections
A hierarchical browsing interface to a document collection can be constructed by identifying the phrases that recur in the full text of the documents and structuring them into a h...
Gordon W. Paynter, Ian H. Witten
VLDB
2005
ACM
126views Database» more  VLDB 2005»
13 years 9 months ago
Hubble: An Advanced Dynamic Folder Technology for XML
A significant amount of information is stored in computer systems today, but people are struggling to manage their documents such that the information is easily found. XML is a de...
Ning Li, Joshua Hui, Hui-I Hsiao, Kevin S. Beyer