Sciweavers

LREC
2008

Unsupervised Parts-of-Speech Induction for Bengali

13 years 6 months ago
Unsupervised Parts-of-Speech Induction for Bengali
We present a study of the word interaction networks of Bengali in the framework of complex networks. The topological properties of these networks reveal interesting insights into the morpho-syntax of the language, whereas clustering helps in the induction of the natural word classes leading to a principled way of designing POS tagsets. We compare different network construction techniques and clustering algorithms based on the cohesiveness of the word clusters. Cohesiveness is measured against two gold-standard tagsets by means of the novel metric of tag-entropy. The approach presented here is a generic one that can be easily extended to any language.
Joy Deep Nath, Monojit Choudhury, Animesh Mukherje
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Joy Deep Nath, Monojit Choudhury, Animesh Mukherjee, Christian Biemann, Niloy Ganguly
Comments (0)