In this paper we propose and test the use of hierarchical clustering for feature selection. The clustering method is Ward's with a distance measure based on GoodmanKruskal ta...
In machine learning problems with tens of thousands of features and only dozens or hundreds of independent training examples, dimensionality reduction is essential for good learni...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowled...
Semi-naive Bayesian classifiers seek to retain the numerous strengths of naive Bayes while reducing error by weakening the attribute independence assumption. Backwards Sequential ...
In this paper, we present the performance of machine learning-based methods for detection of phishing sites. We employ 9 machine learning techniques including AdaBoost, Bagging, S...