Abstract. This study focuses on the contribution of sentence length for a quantitative text typology. Therefore, 333 Slovenian texts are analyzed with regard to their sentence leng...
Emmerich Kelih, Peter Grzybek, Gordana Antic, Erns...
Abstract. We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Mod...
We describe how a feature-based semantic lexicon can be automatically extended using large, unstructured text corpora. Experiments are carried out using the lexicon HaGenLex and th...
One of the key challenges in large information systems such as online shops and digital libraries is to discover the relevant knowledge from the enormous volume of information. Rec...
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine ass...
In classification, with an increasing number of variables, the required number of observations grows drastically. In this paper we present an approach to put into effect the maxi...
Abstract. We consider the problem of finding communities in large linked networks such as web structures or citation networks. We review similarity measures for linked objects and...
Abstract. Existing methods to text plagiarism analysis mainly base on “chunking”, a process of grouping a text into meaningful units each of which gets encoded by an integer nu...