Abstract. A major problem encountered by text clustering practitioners is the difficulty of determining a priori which is the optimal text representation and clustering technique f...
Mining for outliers in sequential databases is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. ...
The work described here builds on [1], where we presented a categorisation of norms or provisions in legislation. We claimed that the categories are characterized by the use of ty...
Awide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus w...
This paper presents a new corpus project, aiming at building a national corpus of Polish. What makes it different from a typical YACP (Yet Another Corpus Project) is 1) the fact t...