Abstract. This paper presents the final version of the Czech Broadcast Conversation Corpus released at the Linguistic Data Consortium (LDC). The corpus contains 72 recordings of a...
Structural metadata extraction (MDE) research aims to develop techniques for automatic conversion of raw speech recognition output to forms that are more useful to humans and to d...
We investigate genre effects on the task of automatic sentence segmentation, focusing on two important domains – broadcast news (BN) and broadcast conversation (BC). We employ a...
This paper presents the EPAC corpus which is composed by a set of 100 hours of conversational speech manually transcribed and by the outputs of automatic tools (automatic segmenta...
Speaker role recognition in TV Broadcast News shows is addressed in this paper with a particular focus on speaker turn role labeling. A mixed approach combining speaker clustering...