: We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the do...
Camelia Ignat, Bruno Pouliquen, Ralf Steinberger, ...
With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the presen...
Andreas Hauser, Markus Heller, Elisabeth Leiss, Kl...
Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents according to a query, users of large document collections would rather like to receiv...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
The use of blogs to track and comment on real world (political, news, entertainment) events is growing. Similarly, as more individuals start relying on the Web as their primary in...