Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
With the emergence of large wired and wireless sensor networks, many real-time applications need to operate on continuous unbounded data streams. At the same time, many of these s...
Yuan Wei, Vibha Prasad, Sang Hyuk Son, John A. Sta...
We study in this paper the Web forum crawling problem, which is a very fundamental step in many Web applications, such as search engine and Web data mining. As a typical user-crea...
Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, Lei ...
Abstract. This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quant...
We propose a unified global entropy reduction maximization (GERM) framework for active learning and semi-supervised learning for speech recognition. Active learning aims to select...
Dong Yu, Balakrishnan Varadarajan, Li Deng, Alex A...