Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...
In this paper, we describe a new approach for retrieval in texts with non-standard spelling, which is important for historic texts in English or German. For this purpose, we presen...
We propose a computational model of text reuse tailored for ancient literary texts, available to us often only in small and noisy samples. The model takes into account source alte...
A new technique to enter text using a mobile phone keypad, Less-Tap, is described. The traditional touchtone phone keypad is ambiguous for text input because each button encodes 3...
This paper proposes a word segmentation method for machine-printed text lines. It utilizes gaps and special symbols as delimiters between words. A gap clustering technique is used...
Soo-Hyung Kim, Chang Bu Jeong, Hee K. Kwag, Ching ...