Sciweavers

LREC
2008
109views Education» more  LREC 2008»
13 years 6 months ago
Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has...
Kazuaki Maeda, Xiaoyi Ma, Stephanie Strassel
LREC
2008
123views Education» more  LREC 2008»
13 years 6 months ago
Evaluating Complement-Modifier Distinctions in a Semantically Annotated Corpus
We evaluate the extent to which the distinction between semantically core and non-core dependents as used in the FrameNet corpus corresponds to the traditional distinction between...
Mark McConville, Myroslava Dzikovska
LREC
2008
94views Education» more  LREC 2008»
13 years 6 months ago
The PIT Corpus of German Multi-Party Dialogues
The PIT corpus is a German multi-media corpus of multi-party dialogues recorded in a Wizard-of-Oz environment at the University of Ulm. The scenario involves two human dialogue pa...
Petra-Maria Strauß, Holger Hoffmann, Wolfgan...
LREC
2008
75views Education» more  LREC 2008»
13 years 6 months ago
Selection of Japanese-English Equivalents by Integrating High-quality Corpora and Huge Amounts of Web Data
As a first step to developing systems that enable non-native speakers to output near-perfect English sentences for given mixed EnglishJapanese sentences, we propose new approaches...
Qing Ma, Koichi Nakao, Masaki Murata, Hitoshi Isah...
LREC
2008
101views Education» more  LREC 2008»
13 years 6 months ago
Investigating the Structure of Procedural Texts for Answering How-to Questions
This paper presents ongoing work dedicated to parsing the textual structure of procedural texts. We propose here a model for the intructional structure and criteria to identify it...
Estelle Delpech, Patrick Saint-Dizier
LREC
2008
78views Education» more  LREC 2008»
13 years 6 months ago
A Grid of Regional Language Archives
About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places aro...
Paul Trilsbeek, Daan Broeder, Tobias Valkenhoef, P...
LREC
2008
77views Education» more  LREC 2008»
13 years 6 months ago
Certification and Cleaning up of a Text Corpus: Towards an Evaluation of the "Grammatical" Quality of a Corpus
We present in this article the methods we used for obtaining measures to ensure the quality and well-formedness of a text corpus. These measures allow us to determine the compatib...
Cyril Grouin
LREC
2008
111views Education» more  LREC 2008»
13 years 6 months ago
Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output: BLEU vs Task-Based Evaluation Methods
We report the results of an experiment to assess the ability of automated MT evaluation metrics to remain sensitive to variations in MT quality as the average quality of the compa...
Bogdan Babych, Anthony Hartley
LREC
2008
102views Education» more  LREC 2008»
13 years 6 months ago
Synchronizing Translated Movie Subtitles
This paper addresses the problem of synchronizing movie subtitles, which is necessary to improve alignment quality when building a parallel corpus out of translated subtitles. In ...
Jörg Tiedemann