In order to embed a watermark into a binary document image, some subset of image pixels needs to be modified. This modification will cause a document image distortion. Careful sel...
Effective document classification is a long-pursued goal in knowledge management. This paper proposes a novel hybrid approach of semantic representation and statistical measuremen...
Statistical language modeling has been successfully used for speech recognition, part-of-speech tagging, and syntactic parsing. Recently, it has also been applied to information r...
We consider the problem of querying XML documents which are not valid with respect to given DTDs. We propose a framework for measuring the invalidity of XML documents and compactly...
The problem of representing text documents within an Information Retrieval system is formulated as an analogy to the problem of representing the quantum states of a physical syste...
Alvaro Francisco Huertas-Rosero, Leif Azzopardi, C...