Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis ...
Information extraction is concerned with applying natural language processing to automatically extract the essential details from text documents. A great disadvantage of current ap...
In this paper we present a prototype system to enrich audiovisual contents with annotations, which exploits existing technologies for automatic extraction of metadata (such as OCR...
Giuseppe Amato, Paolo Bolettieri, Franca Debole, F...
Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic appro...
Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten...
— We present a general approach for the hierarchical segmentation and labeling of document layout structures. This approach models document layout as a grammar and performs a glo...