In this paper, we report our experiments on the Web Track TREC-2003. We submitted five runs for the topic distillation task. Our goal was to evaluate the standard language modeli...
We present a simple method for language independent and task independent text categorization learning, based on character-level n-gram language models. Our approach uses simple in...
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a...
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas L...
Identifying the most influential documents in a corpus is an important problem in many fields, from information science and historiography to text summarization and news aggregati...
Documents often contain inherently many concepts reflecting specific and generic aspects. To automatically generate a short summary text of documents on similar topics, it is im...