Identifying the true type of a computer file can be a difficult problem. Previous methods of file type recognition include fixed file extensions, fixed “magic numbers” stored ...
The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc...
Ernesto Di Iorio, Michelangelo Diligenti, Marco Go...
Table is a commonly used presentation scheme, especially for describing relational information. However, table understanding remains an open problem. In this paper, we consider th...
In this paper, we present a method for structuring a document according to the information present in its Table of Contents. The detection of the ToC as well as the determination ...
Today, mass-scale electronic content distribution systems embed forensic tracking watermarks primarily at the distribution server. For limiting the bandwidth usage and server compl...
Mehmet Utku Celik, Aweke N. Lemma, Stefan Katzenbe...