Traditional bag-of-words model and recent wordsequence kernel are two well-known techniques in the field of text categorization. Bag-of-words representation neglects the word orde...
Lei Zhang, Debbie Zhang, Simeon J. Simoff, John K....
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Searching bio-chemical structures is becoming an important application domain of information retrieval. This paper introduces a protein structure matching problem and formulates i...
In TREC-10, Microsoft Research Asia (MSRA) participated in the Web track (ad hoc retrieval task and homepage finding task). The latest version of the Okapi system (Windows 2000 ve...
Jianfeng Gao, Guihong Cao, Hongzhao He, Min Zhang,...
Databases are a key technology for molecular biology which is a very data intensive discipline. Since molecular biological databases are rather heterogeneous, unification and data...