Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
Outlier detection has many important applications in sensor networks, e.g., abnormal event detection, animal behavior change, etc. It is a difficult problem since global informati...
Abstract. Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To addre...
—One common approach to active learning is to iteratively train a single classifier by choosing data points based on its uncertainty, but it is nontrivial to design uncertainty ...
— In a data-mining approach, a model for estimation of Aerosol Optical Depth (AOD) from satellite observations is learned using collocated satellite and groundbased observations....