This paper proposes a chunking strategy to detect unknown words in Chinese word segmentation. First, a raw sentence is pre-segmented into a sequence of word atoms 1 using a maximum...
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...
We present a simple method for language independent and task independent text categorization learning, based on character-level n-gram language models. Our approach uses simple in...
Abstract. This paper proposes a hierarchical model to parse both English and Chinese sentences. This is done by iteratively constructing simple constituents first, so that complex...
Automatic image annotation is a newly developed and promising technique to provide semantic image retrieval via text descriptions. It concerns a process of automatically labeling t...