We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
Work on evaluating and improving the relevance of web search engines typically use human relevance judgments or clickthrough data. Both these methods look at the problem of learni...
Hao Ma, Raman Chandrasekar, Chris Quirk, Abhishek ...
An optimal sensor layout is attained when a limited number of sensors are placed in an area such that the cost of the placement is minimised while the value of the obtained inform...
X. Rosalind Wang, George Mathews, Don Price, Mikha...
Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1, which uses explicitly modeling of the s...
Social media are becoming increasingly popular and have attracted considerable attention from spammers. Using a sample of more than ninety thousand known spam Web sites, we found ...