We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
A data-driven approach can be fruitfully used in the speci cation and automatic generation of data-intensive Web applications, i.e., applications which make large amounts of data ...
This paper aims to quantify two common assumptions about social tagging: (1) that tags are “meaningful” and (2) that the tagging process is influenced by tag suggestions. For...
Fabian M. Suchanek, Milan Vojnovic, Dinan Gunaward...
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
It is observed that a better approach to Web information understanding is to base on its document framework, which is mainly consisted of (i) the title and the URL name of the pag...