The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies, that are increasingly concerned about monitoring the disc...
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...
Heterogeneous entities or objects are very common and are usually interrelated with each other in many scenarios. For example, typical Web search activities involve multiple types...
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific compu...
Internet users regularly have the need to find biographies and facts of people of interest. Wikipedia has become the first stop for celebrity biographies and facts. However, Wik...
Xiaojiang Liu, Zaiqing Nie, Nenghai Yu, Ji-Rong We...