Image-based electronic editions enable researchers to view and study in an electronic environment historical manuscript images intricately linked to edition, transcript, glossary a...
Alex Dekhtyar, Ionut Emil Iacob, Jerzy W. Jaromczy...
This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system ...
In this paper, an efficient and computationally fast method for segmenting text and graphics part of document images based on textural cues is presented. We assume that the graphic...
Abstract. Regular expressions, or simply regex, have been widely used as a powerful pattern matching and text extractor tool through decades. Although they provide a powerful and f...
Author identification models fall into two major categories according to the way they handle the training texts: profile-based models produce one representation per author while in...