Non-parametric Jensen-Shannon Divergence

9 years 8 months ago

Download eda.mmci.uni-saarland.de

Quantifying the difference between two distributions is a common problem in many machine learning and data mining tasks. What is also common in many tasks is that we only have empirical data. That is, we do not know the true distributions nor their form, and hence, before we can measure their divergence we ﬁrst need to assume a distribution or perform estimation. For exploratory purposes this is unsatisfactory, as we want to explore the data, not our expectations. In this paper we study how to non-parametrically measure the divergence between two distributions. More in particular, we formalise the well-known JensenShannon divergence using cumulative distribution functions. This allows us to calculate divergences directly and efﬁciently from data without the need for estimation. Moreover, empirical evaluation shows that our method performs very well in detecting differences between distributions, outperforming the state of the art in both statistical power and efﬁciency for a wide...

Hoang Vu Nguyen, Jilles Vreeken

Real-time Traffic