Sciweavers

735 search results - page 22 / 147
» Corpora and data preparation
Sort
View
LREC
2008
157views Education» more  LREC 2008»
14 years 11 months ago
AnCora: Multilevel Annotated Corpora for Catalan and Spanish
This paper presents AnCora, a multilingual corpus annotated at different linguistic levels consisting of 500,000 words in Catalan (AnCora-Ca) and in Spanish (AnCora-Es). At presen...
Mariona Taulé, Maria Antònia Mart&ia...
LREC
2008
96views Education» more  LREC 2008»
14 years 11 months ago
An Empirical Approach to a Preliminary Successful Identification and Resolution of Temporal Expressions in Spanish News Corpora
Dating of contents is relevant to multiple advanced Natural Language Processing (NLP) applications, such as Information Retrieval or Question Answering. These could be improved by...
Maria Teresa Vicente-Díez, Doaa Samy, Palom...
COLING
1996
14 years 11 months ago
Aligning More Words with High Precision for Small Bilingual Corpora
In this paper, we propose an algorithm for identifying each word with its translations in a sentence and translation pair. Previously proposed methods require enormous amounts of ...
Sur-Jin Ker, Jason J. S. Chang
COLING
2002
14 years 9 months ago
Learning Verb Argument Structure from Minimally Annotated Corpora
In this paper we investigate the task of automatically identifying the correct argument structure for a set of verbs. The argument structure of a verb allows us to predict the rel...
Anoop Sarkar, Woottiporn Tripasai
NAACL
2010
14 years 7 months ago
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...
Jason R. Smith, Chris Quirk, Kristina Toutanova