Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a cluster...
Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled da...
Abstract—Gene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes ...
An ensemble is a group of learners that work together as a committee to solve a problem. However, the existing ensemble training algorithms sometimes generate unnecessary large en...
Online communities like Flickr, del.icio.us and YouTube have established themselves as very popular and powerful services for publishing and searching contents, but also for ident...
Tom Crecelius, Mouna Kacimi, Sebastian Michel, Tho...