Abstract. This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utte...
We present a new algorithm to measure domain-specific readability. It iteratively computes the readability of domainspecific resources based on the difficulty of domain-specific c...
The paper presents Bulgarian National Corpus project (BulNC) - a large-scale, representative, online available corpus of Bulgarian. The BulNC is also a monolingual general corpus,...
The Sign Linguistics Corpora Network is a three-year network initiative that aims to collect existing knowledge and practices on the creation and use of signed language resources....
An analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for thes...
Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl...