The induction of knowledge from a data set relies in the execution of multiple data mining actions: to apply filters to clean and select the data, to train different algorithms (...
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
— Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new kn...
Most queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this pa...
Yunhua Hu, Ya-nan Qian, Hang Li, Daxin Jiang, Jian...
In distributed data mining models, adopting a flat node distribution model can affect scalability. To address the problem of modularity, flexibility and scalability, we propose...