We apply a well-known Bayesian probabilistic model to textual information retrieval: the classification of documents based on their relevance to a query. This model was previously...
We have been working on two different KDD systems for scientific data. One system involves comparative genomics, where the database contains more than 60,000 plant gene and protei...
In data mining, similarity or distance between attributes is one of the central notions. Such a notion can be used to build attribute hierarchies etc. Similarity metrics can be us...
We consider the problem of nding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple examp...
Gautam Das, King-Ip Lin, Heikki Mannila, Gopal Ren...
Wedescribe an industrial-strength data mining application in telecommunications.Theapplication requires building a short (7 byte) profile for all telephonenumbersseen on a large t...
WHIRL is an extensionof relational databasesthat canperform "soft joins" basedon the similarity of textual identifiers;thesesoftjoins extendthe traditional operationof j...
Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clusteri...
Direct marketing response models seek to identify individuals most likely to respond to marketing solicitations. Such models are commonly evaluatedon classification accuracyand so...