In recent years, privacy preserving data mining has become very important because of the proliferation of large amounts of data on the internet. Many data sets are inherently high...
Broad-coverage lexical resources such as WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering a...
The classical probabilistic models attempt to capture the Ad hoc information retrieval problem within a rigorous probabilistic framework. It has long been recognized that the prim...
Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in...
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...