Sciweavers

106 search results - page 11 / 22
» Document Representation and Dimension Reduction for Text Clu...
Sort
View
98
Voted
SIGIR
2008
ACM
14 years 9 months ago
Enhancing text clustering by leveraging Wikipedia semantics
Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the ...
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua L...
COLING
2008
14 years 11 months ago
A Framework for Identifying Textual Redundancy
The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional ...
Kapil Thadani, Kathleen McKeown
NIPS
2008
14 years 11 months ago
DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
Probabilistic topic models have become popular as methods for dimensionality reduction in collections of text documents or images. These models are usually treated as generative m...
Simon Lacoste-Julien, Fei Sha, Michael I. Jordan
RIAO
2004
14 years 10 months ago
Multilingual document clusters discovery
Cross Language Information Retrieval community has brought up search engines over multilingual corpora, and multilingual text categorization systems. In this paper, we focus on th...
Benoît Mathieu, Romaric Besançon, Chr...
68
Voted
DAS
2004
Springer
15 years 2 months ago
Unity Is Strength: Coupling Media for Thematic Segmentation
Abstract. This paper presents the evaluation methods and the preliminary results of a combined thematic segmentation of (a) meeting documents and (b)meeting speech transcript. Our ...
Dalila Mekhaldi, Denis Lalanne, Rolf Ingold