We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, so...
This paper presents a near real-time multilingual news monitoring and analysis system that forms the backbone of our research work. The system integrates technologies to address t...
Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in...
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques ...