Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable info...
Bin Liu 0002, Laura Chiticariu, Vivian Chu, H. V. ...
Influenced by the linking model which is implicit in HTML, today’s publishing model on the Web is contentcentered, with the emphasis of publishing on content rather than links....
In this paper, we address the problem of query formulation in the context of multi-domain integration of heterogeneous data on the Web. We argue that effectively tackling this pro...
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The...
Abstract. The present web is existing in the HTML and XML formats for persons to browse. Recently there is a trend towards the semantic web where the information can be can be proc...