Sciweavers

2249 search results - page 372 / 450
» Representations for category disambiguation
Sort
View
KDD
2007
ACM
141views Data Mining» more  KDD 2007»
15 years 10 months ago
Detecting anomalous records in categorical datasets
We consider the problem of detecting anomalies in high arity categorical datasets. In most applications, anomalies are defined as data points that are 'abnormal'. Quite ...
Kaustav Das, Jeff G. Schneider
KDD
2007
ACM
167views Data Mining» more  KDD 2007»
15 years 10 months ago
Generalized component analysis for text with heterogeneous attributes
We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. O...
Xuerui Wang, Chris Pal, Andrew McCallum
KDD
2007
ACM
169views Data Mining» more  KDD 2007»
15 years 10 months ago
Exploiting underrepresented query aspects for automatic query expansion
Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely ...
Daniel Crabtree, Peter Andreae, Xiaoying Gao
KDD
2007
ACM
181views Data Mining» more  KDD 2007»
15 years 10 months ago
BoostCluster: boosting clustering by pairwise constraints
Data clustering is an important task in many disciplines. A large number of studies have attempted to improve clustering by using the side information that is often encoded as pai...
Yi Liu, Rong Jin, Anil K. Jain
KDD
2006
ACM
381views Data Mining» more  KDD 2006»
15 years 10 months ago
GPLAG: detection of software plagiarism by program dependence graph analysis
Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source pr...
Chao Liu 0001, Chen Chen, Jiawei Han, Philip S. Yu