The Arabic Treebank (ATB) Project at the Linguistic Data Consortium (LDC) has embarked on a large corpus of Broadcast News (BN) transcriptions, and this has led to a number of new...
Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zag...
Short Messaging Service (SMS) texts behave quite differently from normal written texts and have some very special phenomena. To translate SMS texts, traditional approaches model s...
We present a FrameNet-based semantic role labeling system for Swedish text. As training data for the system, we used an annotated corpus that we produced by transferring FrameNet ...
Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, Dynamic Markov Compression (D...
Fauzia S. Awan, Nan Zhang 0005, Nitin Motgi, Raja ...
The output of a speech recognition system is not always ideal for subsequent downstream processing, in part because speakers themselves often make mistakes. A system would accompl...