Sciweavers

NAACL
1994
13 years 10 months ago
Multilingual Text Resources at the Linguistic Data Consortium
The Linguistic Data Consortium (LDC) is currently involved in a major effort to expand its multilingual text resources, in particular for machine translation, message understandin...
David Graft, Rebecca Finc
LREC
2008
91views Education» more  LREC 2008»
13 years 10 months ago
Diacritic Annotation in the Arabic Treebank and its Impact on Parser Evaluation
The Arabic Treebank (ATB), released by the Linguistic Data Consortium, contains multiple annotation files for each source file, due in part to the role of diacritic inclusion in t...
Mohamed Maamouri, Seth Kulick, Ann Bies
LREC
2008
81views Education» more  LREC 2008»
13 years 10 months ago
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium
The Linguistic Data Consortium (LDC) creates a variety of linguistic resources
Kazuaki Maeda, Haejoong Lee, Shawn Medero, Julie M...
LREC
2008
141views Education» more  LREC 2008»
13 years 10 months ago
New Resources for Document Classification, Analysis and Translation Technologies
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
Stephanie Strassel, Lauren Friedman, Safa Ismael, ...
IJCNLP
2004
Springer
14 years 2 months ago
Building a Parallel Bilingual Syntactically Annotated Corpus
This paper describes a process of building a bilingual syntactically annotated corpus, the PCEDT (Prague Czech-English Dependency Treebank). The corpus is being created at Charles...
Jan Curín, Martin Cmejrek, Jirí Have...