The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible...
Marcis Pinnis, Radu Ion, Dan Stefanescu, Fangzhong...
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) t...
We present novel kernels based on structured and unstructured features for reranking the N-best hypotheses of conditional random fields (CRFs) applied to entity extraction. The fo...
Truc-Vien T. Nguyen, Alessandro Moschitti, Giusepp...
A prerequisite for all higher level information extraction tasks is the identication of unknown names in text. Today, when large corpora can consist of billions of words, it is of...
This paper discusses the problem of utilising multiply annotated data in training biomedical information extraction systems. Two corpora, annotated with entities and relations, an...