Sequential pattern mining has been studied extensively in data mining community. Most previous studies require the specification of a minimum support threshold to perform the min...
Using visualization techniques to explore and understand high-dimensional data is an efficient way to combine human intelligence with the immense brute force computation power ava...
Record linkage is the problem of identifying similar records across different data sources. The similarity between two records is defined based on domain-specific similarity functi...
Abstract The application of data mining algorithms needs a goal-oriented preprocessing of the data. In practical applications the preprocessing task is very time consuming and has ...
The problem of similarity search (query-by-content) has attracted much research interest. It is a difficult problem because of the inherently high dimensionality of the data. The ...