Sciweavers

WG
2010
Springer
13 years 2 months ago
Generalized Graph Clustering: Recognizing (p, q)-Cluster Graphs
Cluster Editing is a classical graph theoretic approach to tackle the problem of data set clustering: it consists of modifying a similarity graph into a disjoint union of cliques,...
Pinar Heggernes, Daniel Lokshtanov, Jesper Nederlo...
PVLDB
2010
151views more  PVLDB 2010»
13 years 2 months ago
Data Auditor: Exploring Data Quality and Semantics using Pattern Tableaux
We present Data Auditor, a tool for exploring data quality and data semantics. Given a rule or an integrity constraint and a target relation, Data Auditor computes pattern tableau...
Lukasz Golab, Howard J. Karloff, Flip Korn, Divesh...
GEOINFORMATICA
2002
77views more  GEOINFORMATICA 2002»
13 years 4 months ago
On the Generation of Time-Evolving Regional Data
Benchmarking of spatio-temporal databases is an issue of growing importance. In case large real data sets are not available, benchmarking requires the generation of arti
Theodoros Tzouramanis, Michael Vassilakopoulos, Ya...
JIIS
2006
76views more  JIIS 2006»
13 years 4 months ago
Holes in joins
A join of two relations in real databases is usually much smaller than their cartesian product. This means that most of the combinations of tuples in the crossproduct of the respe...
Jarek Gryz, Dongming Liang
CIKM
2000
Springer
13 years 8 months ago
Vector Approximation based Indexing for Non-uniform High Dimensional Data Sets
With the proliferation of multimedia data, there is increasing need to support the indexing and searching of high dimensional data. Recently, a vector approximation based techniqu...
Hakan Ferhatosmanoglu, Ertem Tuncel, Divyakant Agr...
ICDM
2003
IEEE
184views Data Mining» more  ICDM 2003»
13 years 9 months ago
Analyzing High-Dimensional Data by Subspace Validity
We are proposing a novel method that makes it possible to analyze high dimensional data with arbitrary shaped projected clusters and high noise levels. At the core of our method l...
Amihood Amir, Reuven Kashi, Nathan S. Netanyahu, D...
SIGMOD
2009
ACM
175views Database» more  SIGMOD 2009»
14 years 4 months ago
Ranking distributed probabilistic data
Ranking queries are essential tools to process large amounts of probabilistic data that encode exponentially many possible deterministic instances. In many applications where unce...
Feifei Li, Ke Yi, Jeffrey Jestes
VLDB
2009
ACM
159views Database» more  VLDB 2009»
14 years 4 months ago
Anytime measures for top-k algorithms on exact and fuzzy data sets
Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In this article, we initiate research on the anytime ...
Benjamin Arai, Gautam Das, Dimitrios Gunopulos, Ni...
KDD
2003
ACM
180views Data Mining» more  KDD 2003»
14 years 4 months ago
Classifying large data sets using SVMs with hierarchical clusters
Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convey several salient ...
Hwanjo Yu, Jiong Yang, Jiawei Han
KDD
2003
ACM
99views Data Mining» more  KDD 2003»
14 years 4 months ago
Fragments of order
High-dimensional collections of 0-1 data occur in many applications. The attributes in such data sets are typically considered to be unordered. However, in many cases there is a n...
Aristides Gionis, Teija Kujala, Heikki Mannila