Web 2.0 technologies have enabled more and more people to freely comment on different kinds of entities (e.g. sellers, products, services). The large scale of information poses th...
RSS is the XML-based format for syndication of Web contents, and users aggregate RSS feeds with RSS feed aggregators. There are RSS aggregation policies that help aggregate RSS fe...
Young Geun Han, Sang Ho Lee, Jae Hwi Kim, Yanggon ...
Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an...
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
Researchers of commercial search engines often collect data using the application programming interface (API) or by "scraping" results from the web user interface (WUI),...