We consider the problem of segmenting a webpage into visually and semantically cohesive pieces. Our approach is based on formulating an appropriate optimization problem on weighte...
We address the problem of measuring global quality metrics of search engines, like corpus size, index freshness, and density of duplicates in the corpus. The recently proposed est...
In this paper we study the privacy preservation properties of a specific technique for query log anonymization: tokenbased hashing. In this approach, each query is tokenized, and ...
Ravi Kumar, Jasmine Novak, Bo Pang, Andrew Tomkins
In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously....
Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, Che...
As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly sup...