We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet...
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can ser...
It is a challenging and important task to retrieve images from a large and highly varied image data set based on their visual contents. Problems like how to fill the semantic gap b...
Under the umbrella of buzzwords such as “Business Activity Monitoring” (BAM) and “Business Process Intelligence” (BPI) both academic (e.g., EMiT, Little Thumb, InWoLvE, Pro...
Boudewijn F. van Dongen, Ana Karla A. de Medeiros,...