As the World Wide Web in China grows rapidly, mining knowledge in Chinese Web pages becomes more and more important. Mining Web information usually relies on the machine learning ...
This paper describes the WebCLEF 2007 task. The task definition—which goes beyond traditional navigational queries and is concerned with undirected information search goals—c...
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
Similarity search, namely, finding approximate nearest neighborhoods, is the core of many large scale machine learning or vision applications. Recently, many research results dem...
Trust is a necessary concept to realize the Semantic Web. But how can we build a “Web of Trust”? We first argue that a small “Web of Trust” for each community is very esse...