More and more documents on the World Wide Web are based on templates. On a technical level this causes those documents to have a quite similar source code and DOM tree structure. G...
Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches...
The primary objective of document annotation in whatever form, manual or electronic is to allow those who may not have control to original document to provide personal view on inf...
I consider the problems of process system architecture in the context of the Perry-Wolf model of software architecture: process elements are executed in process systems by both ma...
Noun phrases of a document usually are the main information bearers. Thus, the detection of these units is crucial in many applications related to information retrieval, such as co...