1 The latent semantic indexing (LSI) methodology for information retrieval applies the singular value decomposition to identify an eigensystem for a large matrix, in which cells re...
Compilation of a 100 million words balanced corpus called the Balanced Corpus of Contemporary Written Japanese (or BCCWJ) is underway at the National Institute for Japanese Langua...
Background: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical c...
Sampo Pyysalo, Tapio Salakoski, Sophie Aubin, Adel...
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail we are given a set ...
Word form normalization through lemmatization or stemming is a standard procedure in information retrieval because morphological variation needs to be accounted for and several la...