An overwhelming number of legal documents is available in digital form. However, most of the texts are usually only provided in a semi-structured form, i.e. the documents are stru...
It has been shown that speech spectrograms can be read by trained experts. In this work, we regard the speech spectrogram image as a written text in some unknown language and perf...
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segm...
Elizabeth Shriberg, Andreas Stolcke, Dilek Z. Hakk...
We describe our work on text-image alignment in context of building a historical document retrieval system. We aim at aligning images of words in handwritten lines with their text...
We are interested in diacritizing Semitic languages, especially Syriac, using only diacritized texts. Previous methods have required the use of tools such as part-of-speech tagger...