Government regulations are semi-structured text documents that are often voluminous, heavily cross-referenced between provisions and even ambiguous. Multiple sources of regulation...
In this paper we propose to define a measure of visual similarity to compare different pages in a corpus. This measure is based on the analysis of the visual layout saliency of th...
Accelerated by the technological advances in the domain, the size of the biomedical literature has been growing rapidly. As a result, it is not feasible for individual researchers...
In this work we consider ontologies as knowledge structures that specify terms, their properties and relations among them to enable knowledge extraction from texts. We represent o...
In this paper we present the prototype based text matching methodology used in the Routing Sub-Task of TREC 2001 Filtering Track. The methodology examines texts on word and senten...
Ari Visa, Jarmo Toivonen, Tomi Vesanen, Jarno M&au...