: This article is a revised and extended version of [VBG, 07]. We conjecture that the digitalization of historical text documents as a basis of data mining and information retrieva...
Text reuse occurs in many different types of documents and for many different reasons. One form of reuse, duplicate or near-duplicate documents, has been a focus of researchers be...
This report concerns the last edition of the XML Mining Track at INEX 2007. A preceding report has been already published concerning the two preceding editions of the track. We pr...
TextGrid is a new Grid project in the framework of the German D-Grid initiative, with the aim to deploy Grid technologies for humanities scholars working on historical (German) te...
We investigate the use of clustering methods for the task of grouping the text spans in a news article that refer to the same event. We provide evidence that the order in which eve...