Design and Data Collection for Spoken Polish Dialogs Database

12 years 11 months ago
Design and Data Collection for Spoken Polish Dialogs Database
Spoken corpora provide a critical resource for research, development and evaluation of spoken dialog systems. This paper describes the telephone spoken dialog corpus for Polish created by Polish-Japanese Institute of Information Technology team within the LUNA project (IST 033549). The main goal of this project is to create a robust natural spoken language understanding (SLU) toolkit, which can be used to improve the speech-enabled telecom services in multilingual context (Italian, French and Polish). The corpus has been collected at the call center of Warsaw Transport Authority, manually transcribed and richly annotated on acoustic, syntactic and semantic levels. The most frequent users' requests concern city traffic information (public transportation stops, routes, schedules, trip planning etc.). The collected database consists of two parts: 500 human-human dialogs of approx. 670 minutes long with a vocabulary of ca. 8000 words and 500 human-machine dialogs recorded via the use...
Krzysztof Marasek, Ryszard Gubrynowicz
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Krzysztof Marasek, Ryszard Gubrynowicz
Comments (0)