This paper describes a new method for extracting open compounds (uninterrupted sequences of words) from text corpora of languages, such as Thai, Japanese and Korea that exhibit un...
In this paper we investigate the word extraction task in on-line recognition of cursively handwritten text lines. For the segmentation we propose a method which is based on the as...
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...
We present a novel fine-grained semantic representation of text and an approach to constructing it. This representation is largely extractable by today's technologies and fac...
Rodney D. Nielsen, Wayne Ward, James H. Martin, Ma...
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...