This paper presents the results of classifying Arabic text documents using the N-gram frequency statistics technique employing a dissimilarity measure called the "Manhattan di...
This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods we...
Abstract- In order to prevent detection and evade signature-based scanning methods, which are normally exploited by antivirus softwares, metamorphic viruses use several various obf...
This work addresses the problem of classifying the genre of text, which is useful for a variety of language processing problems. We propose statistics of POS histograms as classiï...
Sergey Feldman, Marius A. Marin, Mari Ostendorf, M...
Abstract. Previous researches on advanced representations for document retrieval have shown that statistical state-of-the-art models are not improved by a variety of different ling...