Popular entities often have thousands of instances on the Web. In this paper, we focus on the case where they are presented in table-like format, namely appearing with their attri...
Conglei Yao, Yongjian Yu, Sicong Shou, Xiaoming Li
Focused Web browsing activities such as periodically looking up headline news, weather reports, etc., which require only selective fragments of particular Web pages, can be made m...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When label...
Shipeng Yu, Kai Yu, Volker Tresp, Hans-Peter Krieg...