Combining Statistics and Semantics for Word and Document Clustering

15 years 6 months ago

Download sunsite.informatik.rwth-aachen.de

A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity reflects both statistical and semantic properties, respectively detected through Latent Semantic Analysis and WordNet. Sense Units are used to recode documents and are evaluated from the performance increase they permit in classification tasks. Experimental results show that accounting for semantic information in fact decreases the performances compared to LSI standalone. The main weakenesses of the current hybrid scheme are discussed and several tracks for improvement are sketched.

Alexandre Termier, Michèle Sebag, Marie-Chr

Real-time Traffic

IJCAI 2001 | IJCAI 2007 | Latent Semantic Analysis | Semantic | Sense Units |

claim paper

» Biomedical concept extraction based on combining the contentbased and word order similarit...

» Combining Vector Space Model and Multi Word Term Extraction for Semantic Query Expansion

» An Hybrid Approach for Improving Word Sense Disambiguation and Text Clustering

» Combining Statistical Techniques and Lexicosyntactic Patterns for Semantic Relations Extra...

» Combining concept hierarchies and statistical topic models

» Combining Global and Local Semantic Contexts for Improving Biomedical Information Retrieva...

» Semantic Smoothing of Document Models for Agglomerative Clustering

» Word Segmentation of Handwritten Dates in Historical Documents by Combining Semantic APrio...

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	IJCAI
Authors	Alexandre Termier, Michèle Sebag, Marie-Christine Rousset

Comments (0)

Sciweavers

Combining Statistics and Semantics for Word and Document Clustering

IJCAI 2001 | IJCAI 2007 | Latent Semantic Analysis | Semantic | Sense Units |

Explore & Download

Productivity Tools

Sciweavers