Sciweavers

ACL
2011

Discovering Sociolinguistic Associations with Structured Sparsity

12 years 7 months ago
Discovering Sociolinguistic Associations with Structured Sparsity
We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite 1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties.
Jacob Eisenstein, Noah A. Smith, Eric P. Xing
Added 23 Aug 2011
Updated 23 Aug 2011
Type Journal
Year 2011
Where ACL
Authors Jacob Eisenstein, Noah A. Smith, Eric P. Xing
Comments (0)