Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this pap...
Abstract. Extracting information automatically from texts for database representation requires previously well-grouped phrases so that entities can be separated adequately. This pr...
Polysemy is one of the most difficult problems when dealing with natural language resources. Consequently, automated ontology learning from textual sources (such as web resources) ...
Leximancer is a software system for performing conceptual analysis of text data in a largely language independent manner. The system is modelled on Content Analysis and provides u...
This paper presents a cluster-based text categorization system which uses class distributional clustering of words. We propose a new clustering model which considers the global in...