9 years 8 months ago
Identifying Health-Related Topics on Twitter - An Exploration of Tobacco-Related Tweets as a Test Topic
Public health-related topics are difficult to identify in large conversational datasets like Twitter. This study examines how to model and discover public health topics and themes ...
Kyle W. Prier, Matthew S. Smith, Christophe G. Gir...
9 years 8 months ago
Social Content Matching in MapReduce
Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for soci...
Gianmarco De Francisci Morales, Aristides Gionis, ...
9 years 10 months ago
Searching in one billion vectors: re-rank with source coding
Recent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that ...
Hervé Jégou and Romain Tavenard and Matthijs Dou...
9 years 11 months ago
Selecting Features for Ordinal Text Classification
We present four new feature selection methods for ordinal regression and test them against four different baselines on two large datasets of product reviews.
Stefano Baccianella, Andrea Esuli, Fabrizio Sebast...
10 years 27 days ago
Parallel graphics and visualization
Parallel volume rendering is one of the most efficient techniques to achieve real time visualization of large datasets by distributing the data and the rendering process over a c...
Luís Paulo Santos, Bruno Raffin, Alan Heiri...
10 years 1 months ago
A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data
Background: Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible f...
Luca Corradi, Marco Fato, Ivan Porro, Silvia Scagl...
10 years 1 months ago
R-Gada: a fast and flexible pipeline for copy number analysis in association studies
Background: Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genom...
Roger Pique-Regi, Alejandro Cáceres, Juan R...
10 years 1 months ago
Data reduction for spectral clustering to analyze high throughput flow cytometry data
Background: Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular ...
Habil Zare, Parisa Shooshtari, Arvind Gupta, Ryan ...
10 years 2 months ago
Efficient Kernel Machines Using the Improved Fast Gauss Transform
The computation and memory required for kernel machines with N training samples is at least O(N2 ). Such a complexity is significant even for moderate size problems and is prohibi...
Changjiang Yang, Ramani Duraiswami, Larry S. Davis
10 years 2 months ago
Hierarchical Representatives Clustering with Hybrid Approach
Clustering is a discoveringprocess of meaningfulintbrmationby groupingsimilar data into compactclusters. Mostof traditional clustering methodsare in favor of small datasets andhav...
Byung-Joo An, Eunju Kim, Yillbyung Lee