This paper proposes a word segmentation method for machine-printed text lines. It utilizes gaps and special symbols as delimiters between words. A gap clustering technique is used...
Soo-Hyung Kim, Chang Bu Jeong, Hee K. Kwag, Ching ...
The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorith...
In this paper we describe the parallelization of two nearest neighbour classification algorithms. Nearest neighbour methods are well-known machine learning techniques. They have be...
We present BlogScope (www.blogscope.net), a system for online analysis of temporally ordered streaming text, currently applied to the analysis of the Blogosphere1 . The system cur...
The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to pa...
R. Yugo Kartono Isal, Alistair Moffat, A. C. H. Ng...