Sciweavers

DOCENG
2009
ACM
13 years 11 months ago
Web document text and images extraction using DOM analysis and natural language processing
: © Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing Parag Mulendra Joshi, Sam Liu HP Laboratories HPL-2009-187 Web page text extraction,...
Parag Mulendra Joshi, Sam Liu
DOCENG
2009
ACM
13 years 11 months ago
Web article extraction for web printing: a DOM+visual based approach
: © Web Article Extraction for Web Printing: a DOM+Visual based Approach Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong, Jerry; Liu HP Laboratories HPL-2009-185 Article extrac...
Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong...
DOCENG
2009
ACM
13 years 11 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
DOCENG
2009
ACM
13 years 11 months ago
From rhetorical structures to document structure: shallow pragmatic analysis for document engineering
In this paper, we extend previous work on the automatic structuring of medical documents using content analysis. Our long-term objective is to take advantage of specific rhetoric ...
Gersende Georg, Hugo Hernault, Marc Cavazza, Helmu...
DOCENG
2009
ACM
13 years 11 months ago
Differential synchronization
This paper describes the Differential Synchronization (DS) method for keeping documents synchronized. The key feature of DS is that it is simple and well suited for use in both no...
Neil Fraser
DOCENG
2009
ACM
13 years 11 months ago
Deriving image-text document surrogates to optimize cognition
The representation of information collections needs to be optimized for human cognition. While documents often include rich visual components, collections, including personal coll...
Eunyee Koh, Andruid Kerne
DOCENG
2009
ACM
13 years 11 months ago
Annotations with EARMARK for arbitrary, overlapping and out-of order markup
In this paper we propose a novel approach to markup, called Extreme Annotational RDF Markup (EARMARK), using RDF and OWL to annotate features in text content that cannot be mapped...
Silvio Peroni, Fabio Vitali
DOCENG
2009
ACM
13 years 11 months ago
Creation and maintenance of multi-structured documents
In this article, we introduce a new problem: the construction of multi-structured documents. We first offer an overview of existing solutions to the representation of such docum...
Pierre-Edouard Portier, Sylvie Calabretto
DOCENG
2009
ACM
13 years 11 months ago
On the analysis of queries with counting constraints
We study the analysis problem of XPath expressions with counting constraints. Such expressions are commonly used in document transformations or programs in which they select porti...
Everardo Bárcenas, Pierre Genevès, N...
DOCENG
2009
ACM
13 years 11 months ago
Effect of copying and restoration on color barcode payload density
Steven J. Simske, Margaret Sturgill, Jason S. Aron...