We present an approach to the discovery of semantically similar terms that utilizes a web search engine as both a source for generating related terms and a tool for estimating the...
Probabilistic data is coming as a new deluge along with the technical advances on geographical tracking, multimedia processing, sensor network and RFID. While similarity search is...
Generic concept detection has been a widely studied topic in recent research on multimedia analysis and retrieval, but the issue of how to exploit the structure of a multimedia on...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Set similarity join has played an important role in many real-world applications such as data cleaning, near duplication detection, data integration, and so on. In these applicati...