Sciweavers

ICMLA
2004
13 years 6 months ago
RAIN: data clustering using randomized interactions between data points
Abstract-- This paper introduces a generalization of the Gravitational Clustering Algorithm proposed by Gomez et all in [1]. First, it is extended in such a way that not only the G...
Jonatan Gómez, Olfa Nasraoui, Elizabeth Leo...
ICAI
2004
13 years 6 months ago
Inductive System Health Monitoring
- The Inductive Monitoring System (IMS) software was developed to provide a technique to automatically produce health monitoring knowledge bases for systems that are either difficu...
David L. Iverson
ICAD
2004
13 years 6 months ago
A Toolkit for Interactive Sonification
This paper describes work-in-progress on an Interactive Sonification Toolkit which has been developed in order to aid the analysis of general data sets. The toolkit allows the des...
Sandra Pauletto, Andy Hunt
ESANN
2006
13 years 6 months ago
Visualizing gene interaction graphs with local multidimensional scaling
Several bioinformatics data sets are naturally represented as graphs, for instance gene regulation, metabolic pathways, and proteinprotein interactions. The graphs are often large ...
Jarkko Venna, Samuel Kaski
EMNLP
2006
13 years 6 months ago
Random Indexing using Statistical Weight Functions
Random Indexing is a vector space technique that provides an efficient and scalable approximation to distributional similarity problems. We present experiments showing Random Inde...
James Gorman, James R. Curran
DMIN
2006
142views Data Mining» more  DMIN 2006»
13 years 6 months ago
Parallel Hybrid Clustering using Genetic Programming and Multi-Objective Fitness with Density (PYRAMID)
Clustering is the process of locating patterns in large data sets. It is an active research area that provides value to scientific as well as business applications. Practical clust...
Junping Sun, William Sverdlik, Samir Tout
EMNLP
2004
13 years 6 months ago
Automatic Paragraph Identification: A Study across Languages and Domains
In this paper we investigate whether paragraphs can be identified automatically in different languages and domains. We propose a machine learning approach which exploits textual a...
Caroline Sporleder, Mirella Lapata
SDM
2007
SIAM
118views Data Mining» more  SDM 2007»
13 years 6 months ago
On Privacy-Preservation of Text and Sparse Binary Data with Sketches
In recent years, privacy preserving data mining has become very important because of the proliferation of large amounts of data on the internet. Many data sets are inherently high...
Charu C. Aggarwal, Philip S. Yu
SODA
2008
ACM
126views Algorithms» more  SODA 2008»
13 years 6 months ago
On distributing symmetric streaming computations
A common approach for dealing with large data sets is to stream over the input in one pass, and perform computations using sublinear resources. For truly massive data sets, howeve...
Jon Feldman, S. Muthukrishnan, Anastasios Sidiropo...
NAACL
2007
13 years 6 months ago
Entity Extraction is a Boring Solved Problem - Or is it?
This paper presents empirical results that contradict the prevailing opinion that entity extraction is a boring solved problem. In particular, we consider data sets that resemble ...
Marc Vilain, Jennifer Su, Suzi Lubar