We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques app...
away concepts from the surface form of the text. The authors argue that while there has been research into automatic classification, general classification schemes are unsuitable f...
We present a general methodology for extracting multi-word expressions (of various types), along with their translations, from small parallel corpora. We automatically align the p...
In this paper we address the problem of discovering word semantic similarities via statistical processing of text corpora. We propose a knowledge-poor method that exploits the sen...
Aristomenis Thanopoulos, Nikos Fakotakis, George K...
This paper describes a new method for extracting open compounds (uninterrupted sequences of words) from text corpora of languages, such as Thai, Japanese and Korea that exhibit un...