Accelerated by the technological advances in the domain, the size of the biomedical literature has been growing rapidly. As a result, it is not feasible for individual researchers...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Given a document repository, search engine is very helpful to retrieve information. Currently, vertical search is a hot topic, and Google Scholar [4] is an example for academic se...
Ye Wang, Zhihua Geng, Sheng Huang, Xiaoling Wang, ...
In this paper, we investigate an approach for creating a comprehensive textual overview of a subject composed of information drawn from the Internet. We use the high-level structu...
Slavic languages are characteristic by their relatively high degree of word order freedom. In the process of automatic generation from an underlying representation of the content, ...