In this paper we discuss the design, acquisition and preprocessing of a Czech audio-visual speech corpus. The corpus is intended for training and testing of existing audio-visual ...
Entities -- people, organizations, locations and the like -- have long been a central focus of natural language processing technology development, since entities convey essential ...
We present the machine learning framework that we are developing, in order to support explorative search for non-trivial linguistic configurations in low-density languages (langua...
In this paper we present two large-scale verbal lexicons, AnCora-Verb-Ca for Catalan and AnCora-Verb-Es for Spanish, which are the basis for the semantic annotation with arguments...
We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine tr...