Fusional languages have rich inflection. As a consequence, tagsets capturing their morphological features are necessarily large. A natural way to make a tagset manageable is to us...
This paper describes a new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of medi...
We investigate the impact of input data scale in corpus-based learning using a study style of Zipf's law. In our research, Chinese word segmentation is chosen as the study ca...
This paper presents FOLKER, an annotation tool developed for the efficient transcription of natural, multi-party interaction in a conversation analysis framework. FOLKER is being ...
Text clustering is potentially very useful for exploration of text sets that are too large to study manually. The success of such a tool depends on whether the results can be expl...