Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
This paper describes a hierarchical approach for fast audio stream segmentation and classification. With this approach, the audio stream is firstly segmented into audio clips by M...
Medical words exhibit a rich and productive morphology. Morphological knowledge is therefore very important for any medical language processing application. We propose a simple and...
We present a system for searching and classifying U.S. patent documents, based on Inquery. Patents are distributed through hundreds of collections, divided up by general area. The...
The development of a multilingual terminology is a very long and costly process. We present the creation of a multilingual terminological database called GRISP covering multiple t...