Sciweavers

LREC
2010

How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese

13 years 6 months ago
How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese
In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density languages, drawing on well-established psycholinguistic factors. Using the low-density language Maltese as a test case, we highlight the challenges that face researchers developing resources for languages with sparsely available data and identify a key empirical link between corpus and psycholinguistic research as a tool to evaluate corpus resources. Specifically, we compare two robust variables identified in the psycholinguistic literature: word frequency (as measured in a corpus) and word familiarity (as measured in a rating task). We then use three statistical methods to evaluate these comparisons. This research provides a multidisciplinary approach to corpus development and evaluation, in particular for less-resourced languages that lack a wide access to diverse language data.
Jerid Francom, Amy LaCross, Adam Ussishkin
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Jerid Francom, Amy LaCross, Adam Ussishkin
Comments (0)