Researchers in the data mining area frequently have to spend significant portion of their time on preprocessing the data in order to apply their algorithms to real-world datasets...
Zhaoqi Chen, Dmitri V. Kalashnikov, Sharad Mehrotr...
Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in...
Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lu...
: Ideas from terminology management, the science of terms and definitions, can be used to improve the quality of software and data models, as well as to facilitate the achievement ...
Correlated or discriminative pattern mining is concerned with finding the highest scoring patterns w.r.t. a correlation measure (such as information gain). By reinterpreting corre...
In order to formulate a meaningful XML query, a user must have some knowledge of the schema of the XML documents to be queried. The query will succeed only if the schema of the ac...
Cindy X. Chen, George A. Mihaila, Sriram Padmanabh...