Sciweavers

735 search results - page 64 / 147
» Corpora and data preparation
Sort
View
IJCNLP
2005
Springer
15 years 3 months ago
A Chunking Strategy Towards Unknown Word Detection in Chinese Word Segmentation
This paper proposes a chunking strategy to detect unknown words in Chinese word segmentation. First, a raw sentence is pre-segmented into a sequence of word atoms 1 using a maximum...
Guodong Zhou
AAAI
2008
15 years 13 days ago
Cross-lingual Propagation for Morphological Analysis
Multilingual parallel text corpora provide a powerful means for propagating linguistic knowledge across languages. We present a model which jointly learns linguistic structure for...
Benjamin Snyder, Regina Barzilay
LREC
2008
96views Education» more  LREC 2008»
14 years 11 months ago
Thai Broadcast News Corpus Construction and Evaluation
Large speech and text corpora are crucial to the development of a state-of-the-art speech recognition system. This paper reports on the construction and evaluation of the first Th...
Markpong Jongtaveesataporn, Chai Wutiwiwatchai, Ko...
LREC
2008
111views Education» more  LREC 2008»
14 years 11 months ago
The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech
Air traffic control (ATC) is based on voice communication between pilots and controllers and uses a highly task and domain specific language. Due to this very reason, spoken langu...
Konrad Hofbauer, Stefan Petrik, Horst Hering
LREC
2008
135views Education» more  LREC 2008»
14 years 11 months ago
CORP-ORAL: Spontaneous Speech Corpus for European Portuguese
Research activity on the Portuguese language for speech synthesis and recognition has suffered from a considerable lack of human and material resources. This has raised some obsta...
Fabíola Santos, Tiago Freitas