Existing Language Identification (LID) approaches do reach 100% precision, in most common situations, when dealing with documents written in just one language, and when those docu...
Abstract. Optical music recognition (OMR) enables librarians to digitise early music sources on a large scale. The cost of expert human labour to correct automatic recognition erro...
Laurent Pugin, John Ashley Burgoyne, Ichiro Fujina...
In this paper, we discuss tree kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees, in which every node might...
In this paper, we examine the performance of the two policies for keyword selection over standard document corpora of varying properties. While in corpus-based policy a single set ...
Counterfeiting of valuable documents is an increasingly serious problem. Banknotes, drivers licenses, passports, diplomas and stock certificates are all the subjects of increasing...