A method is presented for segmenting documents into conceptually related areas. Determining the equivalence of text is often based on the number of word repetitions. This approach...
In this paper, we present a new technique for estimating an arbitrary skew angle using a simple and efficient text row accumulation based on the statistics of the 1st and 2nd orde...
Reliable and generic methods for skew detection are a necessity for any large-scale digitization projects. As one of the first processing steps, skew detection and correction has...
Iuliu Vasile Konya, Stefan Eickeler, Christoph Sei...
For more than a decade, researches on OLAP and multidimensional databases have generated methodologies, tools and resource management systems for the analysis of numeric data. With...
Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to...