A number of recent articles in computational linguistics venues called for a closer examination of the type of noise present in annotated datasets used for benchmarking (Reidsma a...
We investigate in this paper the adequate unit of analysis for Arabic Mention Detection. We experiment different segmentation schemes with various feature-sets. Results show that ...
Most work on language acquisition treats word segmentation--the identification of linguistic segments from continuous speech-and word learning--the mapping of those segments to me...
Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of ot...
This paper analyzes the topic identification stage of single-document automatic text summarization across four different domains, consisting of newswire, literary, scientific and ...
Hakan Ceylan, Rada Mihalcea, Umut O'zertem, Elena ...