Sciweavers

103 search results - page 20 / 21
» Online Maintenance of Very Large Random Samples
Sort
View
WWW
2010
ACM
14 years 1 months ago
A pattern tree-based approach to learning URL normalization rules
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
RTSS
2006
IEEE
14 years 8 days ago
Prediction-Based QoS Management for Real-Time Data Streams
With the emergence of large wired and wireless sensor networks, many real-time applications need to operate on continuous unbounded data streams. At the same time, many of these s...
Yuan Wei, Vibha Prasad, Sang Hyuk Son, John A. Sta...
WWW
2008
ACM
14 years 7 months ago
iRobot: an intelligent crawler for web forums
We study in this paper the Web forum crawling problem, which is a very fundamental step in many Web applications, such as search engine and Web data mining. As a typical user-crea...
Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, Lei ...
ASIAN
2004
Springer
180views Algorithms» more  ASIAN 2004»
13 years 11 months ago
Counting by Coin Tossings
Abstract. This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quant...
Philippe Flajolet
CSL
2010
Springer
13 years 6 months ago
Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maxi
We propose a unified global entropy reduction maximization (GERM) framework for active learning and semi-supervised learning for speech recognition. Active learning aims to select...
Dong Yu, Balakrishnan Varadarajan, Li Deng, Alex A...