Abstract. Recently it was shown that existing general-purpose inductive logic programming systems are useful for learning wrappers (known as L-wrappers) to extract data from HTML d...
Background: In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput t...
Topic models have been studied extensively in the context of monolingual corpora. Though there are some attempts to mine topical structure from cross-lingual corpora, they require ...
The growing availability of on-line textual sources and the potential number of applications of knowledge acquisition from textual data has lead to an increase in Information Extr...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...