We report about an empirical software engineering course for PhD students. We introduce its syllabus and two different pedagogical strategies. The first strategy is based on indiv...
An overwhelming number of legal documents is available in digital form. However, most of the texts are usually only provided in a semi-structured form, i.e. the documents are stru...
The amount of information available in the MEDLINE database makes it very hard for a researcher to retrieve a reasonable amount of relevant documents using a simple query language ...
Existing Language Identification (LID) approaches do reach 100% precision, in most common situations, when dealing with documents written in just one language, and when those docu...
Typographic and visual information is an integral part of textual documents. Most information extraction systems ignore most of this visual information, processing the text as a l...