This paper describes the collect and transcription of a large set of Arabic broadcast news speech data. A total of more than 2000 hours of data was transcribed. The transcription ...
The vast majority of medical computer-based training (CBT) systems aim at problem-oriented case based training. A crucial issue in the design of CBT systems is the selection of ap...
We present a novel mathematical formalism for the idea of a "local model" of an uncontrolled dynamical system, a model that makes only certain predictions in only certai...
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...
We discuss an idea for collecting data in a relatively efficient manner. Our point of view is Bayesian and information-theoretic: on any given trial, we want to adaptively choose...