Despite ubiquitous claims that optical character recognition (OCR) is a "solved problem," many categories of documents continue to break modern OCR software such as docu...
This paper presents a quantitative performance analysis of two different approaches to the lemmatization of the Czech text data. The first one is based on manually prepared diction...
This paper presents a complete system that historians/archivists can use to digitize whole collections of documents relating to personal information. The system integrates tools an...
A flexible method to store XML documents in relational or object-relational databases is presented that is based on an adaptable fragmentation. Whereas most known approaches decom...