Sciweavers

241 search results - page 17 / 49
» Detecting Co-Derivative Documents in Large Text Collections
Sort
View
ICDE
2008
IEEE
113views Database» more  ICDE 2008»
15 years 4 months ago
A rank-rewrite framework for summarizing XML documents
Abstract— With XML becoming a standard for data representation and exchange, we can expect to see large scale repositories and warehouses of XML data. In order for users to under...
Maya Ramanath, Kondreddi Sarath Kumar
IPM
2007
95views more  IPM 2007»
14 years 9 months ago
Using structural contexts to compress semistructured text collections
We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the stru...
Joaquín Adiego, Gonzalo Navarro, Pablo de l...
IPM
2006
151views more  IPM 2006»
14 years 9 months ago
Document clustering using nonnegative matrix factorization
A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank no...
Farial Shahnaz, Michael W. Berry, V. Paul Pauca, R...
CIKM
2006
Springer
15 years 1 months ago
A document-centric approach to static index pruning in text retrieval systems
We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a documentcentric approach to decide whether a posting for a given term shoul...
Stefan Büttcher, Charles L. A. Clarke
83
Voted
EMNLP
2009
14 years 7 months ago
Polylingual Topic Models
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive colle...
David M. Mimno, Hanna M. Wallach, Jason Naradowsky...