The output of a data mining algorithm is only as good as its inputs, and individuals are often unwilling to provide accurate data about sensitive topics such as medical history an...
Background: Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but ...
With the ubiquity of information networks and their broad applications, the issue of similarity computation between entities of an information network arises and draws extensive r...
In statistics, mixture models consisting of several component subpopulations are used widely to model data drawn from heterogeneous sources. In this paper, we consider maximum lik...
Abstract. Due to the changing scope of data management towards the management of heterogeneous and distributed systems and applications, integration processes gain in importance. T...