This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
Active learning may hold the key for solving the data scarcity problem in supervised learning, i.e., the lack of labeled data. Indeed, labeling data is a costly process, yet an ac...
Several algorithms have been proposed to learn to rank entities modeled as feature vectors, based on relevance feedback. However, these algorithms do not model network connections...
This paper presents a demand-driven, flow-insensitive analysis algorithm for answering may-alias queries. We formulate the computation of alias queries as a CFL-reachability probl...
Named entity recognition aims at extracting named entities from unstructured text. A recent trend of named entity recognition is finding approximate matches in the text with respe...
Wei Wang 0011, Chuan Xiao, Xuemin Lin, Chengqi Zha...