The Web contains vast amounts of linguistic data. One key issue for linguists and language technologists is how to access it. Commercial search engines give highly compromised acc...
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the na
Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the ...
In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density lang...
Number and date expressions are essential information items in corpora and therefore play a major role in various text mining applications. However, so far number expressions were ...