In aiming at research and development on machine translation, we produced a test collection for Japanese-English machine translation in the seventh NTCIR Workshop. This paper desc...
—In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where p...
Good quality documentation is crucial for the effective reuse of object-oriented frameworks, and must be adaptable to the needs of different audiences. To satisfy these needs, fra...
Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation b...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
In this study I use statistical Natural Language Processing and adapted Controlled Language methods to preprocess individual documents before they are used as source documents for ...