Sciweavers

EMNLP
2010

A Latent Variable Model for Geographic Lexical Variation

13 years 2 months ago
A Latent Variable Model for Geographic Lexical Variation
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as "sports" or "entertainment" are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author's geographic location from raw text, outperforming both text regression and supervised topic models.
Jacob Eisenstein, Brendan O'Connor, Noah A. Smith,
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, Eric P. Xing
Comments (0)