Detection of template and noise blocks in web pages is an important step in improving the performance of information retrieval and content extraction. Of the many approaches propos...
The Web has been rapidly "deepened" by myriad searchable databases online, where data are hidden behind query interfaces. As an essential task toward integrating these m...
Abstract. The web with its rapid expansion has become an excellent resource for gathering information and people’s opinion. A company owner wants to know who is the competitor, a...
Rui Li, Shenghua Bao, Jin Wang, Yuanjie Liu, Yong ...
—We introduce a novel set of social network analysis based algorithms for mining the Web, blogs, and online forums to identify trends and find the people launching these new tren...
Peter A. Gloor, Jonas Krauss, Stefan Nann, Kai Fis...
Text classification systems on biomedical literature aim to select relevant articles to a specific issue from large corpora. Most systems with an acceptable accuracy are based o...