Monolingual and Bilingual Concept Visualization from Corpora

11 years 9 months ago
Monolingual and Bilingual Concept Visualization from Corpora
e by placing terms in an abstract ‘information space’ based on their occurrences in text corpora, and then allowing a user to visualize local regions of this information space. Words are plotted in a 2-dimensional picture so that related words are close together and whole classes of similar words occur in recognizable clusters which sometimes clearly signify a particular meaning. As well as giving a clear view of which concepts are related in a particular document collection, this technique also helps a user to interpret unknown words. The main technique we will demonstrate is planar projection of word-vectors from a vector space built using Latent Semantic Analysis (LSA) (Landauer and Dumais, 1997; Sch¨utze, 1998), a method which can be applied multilingually if translated corpora are available for training. Following the method of Sch¨utze (1998), we assign each word 1000 coordinates based on the number of times that word occurs in a 15 word window with one of 1000 ‘content-b...
Dominic Widdows, Scott Cederberg
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Authors Dominic Widdows, Scott Cederberg
Comments (0)