Sciweavers

378 search results - page 35 / 76
» Finding document topics for improving topic segmentation
Sort
View
COLING
2010
14 years 11 months ago
Simultaneous Ranking and Clustering of Sentences: A Reinforcement Approach to Multi-Document Summarization
Multi-document summarization aims to produce a concise summary that contains salient information from a set of source documents. In this field, sentence ranking has hitherto been ...
Xiaoyan Cai, Wenjie Li, Ouyang You, Hong Yan
TREC
2004
15 years 5 months ago
Language Models for Searching in Web Corpora
: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and...
Jaap Kamps, Gilad Mishne, Maarten de Rijke
LREC
2008
141views Education» more  LREC 2008»
15 years 5 months ago
New Resources for Document Classification, Analysis and Translation Technologies
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
Stephanie Strassel, Lauren Friedman, Safa Ismael, ...
ICDM
2008
IEEE
80views Data Mining» more  ICDM 2008»
15 years 10 months ago
Collective Latent Dirichlet Allocation
In this paper, we propose a new variant of Latent Dirichlet Allocation(LDA): Collective LDA (C-LDA), for multiple corpora modeling. C-LDA combines multiple corpora during learning...
Zhiyong Shen, Jun Sun, Yi-Dong Shen
JASIS
2006
96views more  JASIS 2006»
15 years 4 months ago
Learning to classify documents according to genre
Genre or style analysis can be used to improve results achieved using standard IR techniques. A genre class is a group of documents that are written in a similar style. Genre clas...
Aidan Finn, Nicholas Kushmerick