In this paper, we present a new co-training strategy that makes use of unlabelled data. It trains two predictors in parallel, with each predictor labelling the unlabelled data for...
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and ...
We built a system for the automatic creation of a textbased topic hierarchy, meant to be used in a geographically defined community. This poses two main problems. First, the appea...
Text representation is a central task for any approach to automatic learning from texts. It requires a format which allows to interrelate texts even if they do not share content w...