Sciweavers

ICDAR
2011
IEEE

HMM-Based Alignment of Inaccurate Transcriptions for Historical Documents

12 years 9 months ago
HMM-Based Alignment of Inaccurate Transcriptions for Historical Documents
—For historical documents, available transcriptions typically are inaccurate when compared with the scanned document images. Not only the position of the words and sentences are unknown, but also the correct image transcription may not be matched exactly. An error-tolerant alignment is needed to make the document images amenable to browsing and searching in digital libraries. In this paper, we propose a novel multi-pass alignment method based on Hidden Markov Models (HMM) that combines text line recognition, string alignment, and keyword spotting to cope with word substitutions, deletions, and insertions in the transcription. In a segmentation-free approach, transcriptions of complete pages are aligned with sequences of text line images. On the Parzival data set, results are reported for several degrees of artificial distortions. Both the accuracy and the efficiency of the proposed system are promising for real-world applications. Keywords-Handwriting recognition; Hidden Markov mod...
Andreas Fischer, Emanuel Indermühle, Volkmar
Added 24 Dec 2011
Updated 24 Dec 2011
Type Journal
Year 2011
Where ICDAR
Authors Andreas Fischer, Emanuel Indermühle, Volkmar Frinken, Horst Bunke
Comments (0)