Sciweavers

14 search results - page 3 / 3
» Feature-Based Method for Document Alignment in Comparable Ne...
Sort
View
LREC
2008
115views Education» more  LREC 2008»
13 years 6 months ago
Experiments on Processing Overlapping Parallel Corpora
The number and sizes of parallel corpora keep growing, which makes it necessary to have automatic methods of processing them: combining, checking and improving corpora quality, et...
Mark Fishel, Heiki Jaan Kaalep
ACL
2011
12 years 8 months ago
An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment
We propose a language-independent method for the automatic extraction of transliteration pairs from parallel corpora. In contrast to previous work, our method uses no form of supe...
Hassan Sajjad, Alexander Fraser, Helmut Schmid
CLEF
2011
Springer
12 years 4 months ago
A Language-Independent Approach to Identify the Named Entities in Under-Resourced Languages and Clustering Multilingual Document
Abstract. This paper presents a language-independent Multilingual Document Clustering (MDC) approach on comparable corpora. Named entites (NEs) such as persons, locations, organiza...
N. Kiran Kumar, G. S. K. Santosh, Vasudeva Varma
ICDAR
2009
IEEE
13 years 11 months ago
Automated Ground Truth Data Generation for Newspaper Document Images
In document image understanding, public datasets with ground-truth are an important part of scientific work. They are not only helpful for developing new methods, but also provid...
Thomas Strecker, Joost van Beusekom, Sahin Albayra...