Sciweavers

HCI
2007
13 years 6 months ago
FPF-SB : A Scalable Algorithm for Microarray Gene Expression Data Clustering
Efficient and effective analysis of large datasets from microarray gene expression data is one of the keys to time-critical personalized medicine. The issue we address here is the ...
Filippo Geraci, Mauro Leoncini, Manuela Montangero...
EMNLP
2008
13 years 6 months ago
Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce
This paper explores the challenge of scaling up language processing algorithms to increasingly large datasets. While cluster computing has been available in commercial environment...
Jimmy J. Lin
ADMA
2005
Springer
124views Data Mining» more  ADMA 2005»
13 years 6 months ago
Finding All Frequent Patterns Starting from the Closure
Efficient discovery of frequent patterns from large databases is an active research area in data mining with broad applications in industry and deep implications in many areas of d...
Mohammad El-Hajj, Osmar R. Zaïane
DBVIS
1995
162views Database» more  DBVIS 1995»
13 years 8 months ago
LadMan: A Large Data Management System
More and more of our customers have to deal with very large datasets like elevation data and digital roadmaps covering Europe or even the entire world, very large images e.g. from...
Walter Schmeing
EDBT
2000
ACM
13 years 8 months ago
Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes
Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when...
Giovanni Giuffrida, Wesley W. Chu, Dominique M. Ha...
CLADE
2004
IEEE
13 years 8 months ago
Grid Service for Visualization and Analysis of Remote Fusion Data
Simulations and experiments in the fusion and plasma physics community generate large datasets at remote sites. Visualization and analysis of these datasets are difficult because ...
Svetlana G. Shasharina, Nanbor Wang, John R. Cary
SIGMOD
1996
ACM
151views Database» more  SIGMOD 1996»
13 years 8 months ago
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters...
Tian Zhang, Raghu Ramakrishnan, Miron Livny
KDD
1998
ACM
120views Data Mining» more  KDD 1998»
13 years 8 months ago
Large Datasets Lead to Overly Complex Models: An Explanation and a Solution
This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments ...
Tim Oates, David Jensen
ACSC
2002
IEEE
13 years 9 months ago
Using Finite State Automata for Sequence Mining
We show how frequently occurring sequential patterns may be found from large datasets by first inducing a finite state automaton model describing the data, and then querying the m...
Philip Hingston
CLOUD
2010
ACM
13 years 9 months ago
Towards automatic optimization of MapReduce programs
Timely and cost-effective processing of large datasets has become a critical ingredient for the success of many academic, government, and industrial organizations. The combination...
Shivnath Babu