Nowadays, structured data such as sales and business forms are stored in data warehouses for decision makers to use. Further, unstructured data such as emails, html texts, images,...
Extracting entities (such as people, movies) from documents and identifying the categories (such as painter, writer) they belong to enable structured querying and data analysis ov...
We cope with the metadata recognition in layoutoriented documents. We address the problem as a classification task and propose a method for automatic extraction of relevant featu...
A large amount of handwritten documents exist in image form, as scanned documents. The supporting electronic media allows for better preservation, but to access their content they...
Antonio Clavelli, Luigi P. Cordella, Claudio De St...
The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one ...