We consider boosting algorithms that maintain a distribution over a set of examples. At each iteration a weak hypothesis is received and the distribution is updated. We motivate t...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Popular entities often have thousands of instances on the Web. In this paper, we focus on the case where they are presented in table-like format, namely appearing with their attri...
Conglei Yao, Yongjian Yu, Sicong Shou, Xiaoming Li
Given its importance, the problem of predicting rare classes in large-scale multi-labeled data sets has attracted great attentions in the literature. However, the rare-class probl...
It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this pape...
Shihao Ji, Ke Zhou, Ciya Liao, Zhaohui Zheng, Gui-...