Background: The development of text mining systems that annotate biological entities with their properties using scientific literature is an important recent research topic. These...
– Better understanding the document logical components is crucial to many applications, e.g., document classification or data integration. As the development of digital libraries...
This paper presents PDF-TREX, an heuristic approach for table recognition and extraction from PDF documents. The heuristics starts from an initial set of basic content elements an...
Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not en...
The complexity of preserving the web is becoming one of the most important information and communication media. While the quantity of digital resources available through the web i...