With the pervasive use of handheld digital devices such as camera phones and PDAs, people have started to capture images as a way of recording information. However, due to the non...
The proliferation of electronic content has notably lead to the apparition of large corpora of interrelated structured documents (such as HTML and XML Web pages) and semantic annot...
This paper describes the THISL system that participated in the TREC-7 evaluation, Spoken Document Retrieval (SDR) Track, and presents the results obtained, together with some anal...
Dave Abberley, Steve Renals, Gary Cook, Anthony J....
Abstract. One major goal of text mining is to provide automatic methods to help humans grasp the key ideas in ever-increasing text corpora. To this effect, we propose a statistica...
It is common for libraries to provide public access to historical and ancient document image collections. It is common for such document images to require specialized processing i...