Complex documents stored in a flat or partially marked up file format require layout sensitive preprocessing before any natural language processing can be carried out on their tex...
The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasibl...
With the rise of community-generated web content, the need for automatic assessment of resource quality has grown, particularly in the realm of educational digital libraries. We d...
Philipp G. Wetzler, Steven Bethard, Kirsten R. But...
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
In this paper, we propose an alternative method for accessing the content of Greek historical documents printed during the 17th and 18th centuries by searching words directly in d...
Anastasios L. Kesidis, Eleni Galiotou, Basilios Ga...