For categorical data there does not exist any similarity measure which is as straight forward and general as the numerical distance between numerical items. Due to this it is ofte...
This paper finds and analyzes a formula for the total variation distance between iterations of riffle shuffles and iterations of "cut and then riffle shuffle". This allo...
Health information systems (HIS) in their current form are rarely sustainable. In order to sustain our health information systems and with it our health systems, we need to focus ...
Sebastian Garde, Carola M. Hullin, Rong Chen, Thil...
Metric access methods (MAMs) serve as a tool for speeding similarity queries. However, all MAMs developed so far are index-based; they need to build an index on a given database. T...
Record linkage, the problem of determining when two records refer to the same entity, has applications for both data cleaning (deduplication) and for integrating data from multipl...