Web pages are more than text and they contain much contextual and structural information, e.g., the title, the meta data, the anchor text, etc., each of which can be seen as a dat...
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...
This paper presents a flexible and effective examplebased approach for labeling title pages which can be used for automated extraction of bibliographic data. The labels of intere...
Joost van Beusekom, Daniel Keysers, Faisal Shafait...
The applications described here follow up the works performed in the recent last years by the Data Structures and Computational Linguistics Group at Las Palmas de Gran Canaria Univ...
Missing web pages, URIs that return the 404 “Page Not Found” error or the HTTP response code 200 but dereference unexpected content, are ubiquitous in today’s browsing exper...
Martin Klein, Jeffery L. Shipman, Michael L. Nelso...