Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation a...
We present the problem of categorizing web services according to a shallow ontology for presentation on a specialist portal, using their WSDL and associated textual documents foun...
Abstract--Statistical approaches to document content modeling typically focus either on broad topics or on discourselevel subtopics of a text. We present an analysis of the perform...
Leonhard Hennig, Thomas Strecker, Sascha Narr, Ern...
Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also p...
Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is on...