Text extraction is an important phase in document recognition systems. In order to segment text from a page document it is necessary to detect all the possible manuscript text reg...
Rodolfo P. dos Santos, Gabriela S. Clemente, Ing R...
This paper proposes the use of a new symmetry property based on proximity of the median moments in the wavelet domain. The method divides a given frame into 16 equally sized blocks...
Parallel corpora are indispensable resources for a variety of multilingual natural language processing tasks. This paper presents a technique for fully automatic construction of c...
This paper discusses the use of computational linguistic technology to extract definitions from a large corpus of German court decisions. We present a corpus-based survey of defin...
The new wrapper model for extractiong text data from HTML documents is introduced. The Kushmerick's wrapper class (Kusshmerick 2000) may be unsuccessful in the case that suff...