Quantization index modulation (QIM) techniques have been gaining popularity in the data hiding community because of their robustness and information-theoretic optimality against a...
In this work we present a new string similarity feature, the sparse spatial sample (SSS). An SSS is a set of short substrings at specific spatial displacements contained in the or...
In this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlyin...
It is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of "how ...
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...