We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Background: The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data min...
The domain of Digital Libraries presents specific challenges for unsupervised information extraction to support both the automatic classification of documents and the enhancement ...
Mikalai Krapivin, Maurizio Marchese, Andrei Yadran...
There has been an explosion of hyperlinked data in many domains, e.g., the biological Web. Expressive query languages and effective ranking techniques are required to convert this...
In many areas of NLP reuse of utility tools such as parsers and POS taggers is now common, but this is still rare in NLG. The subfield of surface realisation has perhaps come clos...
Anja Belz, Mike White, Josef van Genabith, Deirdre...