Sciweavers

367 search results - page 22 / 74
» Indexing Text Documents Based on Topic Identification
Sort
View
TOIS
2010
128views more  TOIS 2010»
14 years 10 months ago
Learning author-topic models from text corpora
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a...
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas L...
TREC
2007
15 years 1 months ago
Parsimonious Language Models for a Terabyte of Text
: The aims of this paper are twofold. Our first aim is to compare results of the earlier Terabyte tracks to the Million Query track. We submitted a number of runs using different ...
Djoerd Hiemstra, Rongmei Li, Jaap Kamps, Rianne Ka...
ADC
2007
Springer
108views Database» more  ADC 2007»
15 years 6 months ago
Distributed Text Retrieval From Overlapping Collections
In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple colle...
Milad Shokouhi, Justin Zobel, Yaniv Bernstein
CICLING
2009
Springer
15 years 3 months ago
Language Identification on the Web: Extending the Dictionary Method
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Radim Rehurek, Milan Kolkus
ICDAR
2009
IEEE
15 years 6 months ago
Finding Images and Line-Drawings in Document-Scanning Systems
The system presented in this paper finds images and line-drawings in scanned pages; it is a crucial processing step in the creation of a large-scale system to detect and index ima...
Shumeet Baluja, Michele Covell