Sciweavers

JMLR
2010

Learning Dissimilarities for Categorical Symbols

12 years 11 months ago
Learning Dissimilarities for Categorical Symbols
In this paper we learn a dissimilarity measure for categorical data, for effective classification of the data points. Each categorical feature (with values taken from a finite set of symbols) is mapped onto a continuous feature whose values are real numbers. Guided by the classification error based on a nearest neighbor based technique, we repeatedly update the assignment of categorical symbols to real numbers to minimize this error. Intuitively, the algorithm pushes together points with the same class label, while enlarging the distances to points labeled differently. Our experiments show that 1) the learned dissimilarities improve classification accuracy by using the affinities of categorical symbols; 2) they outperform dissimilarities produced by previous data-driven methods; 3) our enhanced nearest neighbor classifier (called LD) based on the new space is competitive compared with classifiers such as decision trees, RBF neural networks, Na
Jierui Xie, Boleslaw K. Szymanski, Mohammed J. Zak
Added 19 May 2011
Updated 19 May 2011
Type Journal
Year 2010
Where JMLR
Authors Jierui Xie, Boleslaw K. Szymanski, Mohammed J. Zaki
Comments (0)