When integrating data from multiple sources, a key task that online communities often face is to match the schemas of the data sources. Today, such matching often incurs a huge wor...
Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage c...
Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo...
Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques...
Querying and integrating sources of structured data from the Web in most cases requires similarity-based concepts to deal with data level conflicts. This is due to the often errone...
The Web of Data is growing at an ever increasing rate, with RDF datasets being produced in the order of billions of triples. The effect of this increase has meant that many entitie...