Clustering algorithms typically operate on a feature vector representation of the data and find clusters that are compact with respect to an assumed (dis)similarity measure betwee...
Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign l...
Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
Computer-mediated communication systems can be used to bridge the gap between doctors in underserved regions with local shortages of medical expertise and medical specialists worl...
A novel measure for automatically quantifying the amount of interpersonal influence present in face-toface conversations is proposed based on the visualattention patterns of the p...