The primary purpose of news articles is to convey information about who, what, when and where. But learning and summarizing these relationships for collections of thousands to mil...
David Newman, Chaitanya Chemudugunta, Padhraic Smy...
Automated detection of the first document reporting each new event in temporally-sequenced streams of documents is an open challenge. In this paper we propose a new approach which...
Yiming Yang, Jian Zhang, Jaime G. Carbonell, Chun ...
In this paper we propose PARTfs which adopts a supervised machine learning algorithm, namely partial decision trees, as a method for feature subset selection. In particular, it is...
Clustering by document concepts is a powerful way of retrieving information from a large number of documents. This task in general does not make any assumption on the data distrib...
It is well known that many hard tasks considered in machine learning and data mining can be solved in an rather simple and robust way with an instance- and distance-based approach....