Sciweavers

735 search results - page 11 / 147
» Corpora and data preparation
Sort
View
77
Voted
EACL
2006
ACL Anthology
14 years 11 months ago
Large Linguistically-Processed Web Corpora for Multiple Languages
The Web contains vast amounts of linguistic data. One key issue for linguists and language technologists is how to access it. Commercial search engines give highly compromised acc...
Marco Baroni, Adam Kilgarriff
ACL
2006
14 years 11 months ago
Scaling Distributional Similarity to Large Corpora
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the na
James Gorman, James R. Curran
INLG
2010
Springer
14 years 7 months ago
Extracting Parallel Fragments from Comparable Corpora for Data-to-text Generation
Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the ...
Anja Belz, Eric Kow
LREC
2010
155views Education» more  LREC 2010»
14 years 11 months ago
How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese
In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density lang...
Jerid Francom, Amy LaCross, Adam Ussishkin
GFKL
2007
Springer
166views Data Mining» more  GFKL 2007»
15 years 4 months ago
Classifying Number Expressions in German Corpora
Number and date expressions are essential information items in corpora and therefore play a major role in various text mining applications. However, so far number expressions were ...
Irene M. Cramer, Stefan Schacht, Andreas Merkel