We propose a principled account on multiclass spectral clustering. Given a discrete clustering formulation, we first solve a relaxed continuous optimization problem by eigendecomp...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
With the explosive growth of digital cameras and online media, it has become crucial to design efficient methods that help users browse and search large image collections. The rec...
Liangliang Cao, Andrey Del Pozo, Xin Jin, Jiebo Lu...
We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called “GaP” for Gamma-Poisson, the distri...
Given a set of model graphs D and a query graph q, containment search aims to find all model graphs g D such that q contains g (q g). Due to the wide adoption of graph models, f...
Chen Chen, Xifeng Yan, Philip S. Yu, Jiawei Han, D...