Abstract. A growing amounts of information are currently being generated and stored in the World Wide Web (WWW), in particular, researchers in any field can find a lot of publicati...
The paper proposes identifying relevant information sources from the history of combined searching and browsing behavior of many Web users. While it has been previously shown that...
The lack of a large scale Chinese test collection is an obstacle to the Chinese information retrieval development. In order to address this issue, we built such a collection compos...
The proliferation of digital libraries and the large amount of existing documents raise important issues in efficient handling of documents. Printed texts in documents need to be...
We introduce a multi-stage ensemble framework, ErrorDriven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a ...