Sciweavers

367 search results - page 22 / 74
» Indexing Text Documents Based on Topic Identification
Sort
View
134
Voted
TOIS
2010
128views more  TOIS 2010»
15 years 2 months ago
Learning author-topic models from text corpora
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a...
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas L...
152
Voted
TREC
2007
15 years 4 months ago
Parsimonious Language Models for a Terabyte of Text
: The aims of this paper are twofold. Our first aim is to compare results of the earlier Terabyte tracks to the Million Query track. We submitted a number of runs using different ...
Djoerd Hiemstra, Rongmei Li, Jaap Kamps, Rianne Ka...
118
Voted
ADC
2007
Springer
108views Database» more  ADC 2007»
15 years 9 months ago
Distributed Text Retrieval From Overlapping Collections
In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple colle...
Milad Shokouhi, Justin Zobel, Yaniv Bernstein
123
Voted
CICLING
2009
Springer
15 years 7 months ago
Language Identification on the Web: Extending the Dictionary Method
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Radim Rehurek, Milan Kolkus
118
Voted
ICDAR
2009
IEEE
15 years 10 months ago
Finding Images and Line-Drawings in Document-Scanning Systems
The system presented in this paper finds images and line-drawings in scanned pages; it is a crucial processing step in the creation of a large-scale system to detect and index ima...
Shumeet Baluja, Michele Covell