We present an unusual algorithm involving classification trees-CARTwheels--where two trees are grown in opposite directions so that they are joined at their leaves. This approach ...
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
Government regulations are semi-structured text documents that are often voluminous, heavily cross-referenced between provisions and even ambiguous. Multiple sources of regulation...
Understanding the differences between contrasting groups is a fundamental task in data analysis. This realization has led to the development of a new special purpose data mining t...
Geoffrey I. Webb, Shane M. Butler, Douglas A. Newl...
Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which in...