Sciweavers

KDD
2005
ACM
92views Data Mining» more  KDD 2005»
14 years 5 months ago
Summarizing itemset patterns: a profile-based approach
Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of...
Xifeng Yan, Hong Cheng, Jiawei Han, Dong Xin
KDD
2005
ACM
124views Data Mining» more  KDD 2005»
14 years 5 months ago
CLICKS: an effective algorithm for mining subspace clusters in categorical datasets
We present a novel algorithm called Clicks, that finds clusters in categorical datasets based on a search for k-partite maximal cliques. Unlike previous methods, Clicks mines subs...
Mohammed Javeed Zaki, Markus Peters, Ira Assent, T...
KDD
2005
ACM
109views Data Mining» more  KDD 2005»
14 years 5 months ago
Formulating distance functions via the kernel trick
Tasks of data mining and information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be for...
Gang Wu, Edward Y. Chang, Navneet Panda
KDD
2005
ACM
137views Data Mining» more  KDD 2005»
14 years 5 months ago
Pattern-based similarity search for microarray data
One fundamental task in near-neighbor search as well as other similarity matching efforts is to find a distance function that can efficiently quantify the similarity between two o...
Haixun Wang, Jian Pei, Philip S. Yu
KDD
2005
ACM
194views Data Mining» more  KDD 2005»
14 years 5 months ago
Web object indexing using domain knowledge
Web object is defined to represent any meaningful object embedded in web pages (e.g. images, music) or pointed to by hyperlinks (e.g. downloadable files). Users usually search for...
Muyuan Wang, Zhiwei Li, Lie Lu, Wei-Ying Ma, Naiya...
KDD
2005
ACM
111views Data Mining» more  KDD 2005»
14 years 5 months ago
Finding partial orders from unordered 0-1 data
Antti Ukkonen, Mikael Fortelius, Heikki Mannila
KDD
2005
ACM
130views Data Mining» more  KDD 2005»
14 years 5 months ago
Regression error characteristic surfaces
This paper presents a generalization of Regression Error Characteristic (REC) curves. REC curves describe the cumulative distribution function of the prediction error of models an...
Luís Torgo
KDD
2005
ACM
185views Data Mining» more  KDD 2005»
14 years 5 months ago
Mining comparable bilingual text corpora for cross-language information integration
Integrating information in multiple natural languages is a challenging task that often requires manually created linguistic resources such as a bilingual dictionary or examples of...
Tao Tao, ChengXiang Zhai
KDD
2005
ACM
125views Data Mining» more  KDD 2005»
14 years 5 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
KDD
2005
ACM
135views Data Mining» more  KDD 2005»
14 years 5 months ago
A hybrid unsupervised approach for document clustering
We propose a hybrid, unsupervised document clustering approach that combines a hierarchical clustering algorithm with Expectation Maximization. We developed several heuristics to ...
Mihai Surdeanu, Jordi Turmo, Alicia Ageno