Inverse document frequency (IDF) is one of the most useful and widely used concepts in information retrieval. There have been various attempts to provide theoretical justification...
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...
In automated text categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high...
Zenglin Xu, Rong Jin, Kaizhu Huang, Michael R. Lyu...
This paper experimentally studies approaches to the problem of ranking information resources w.r.t. user queries in peer-to-peer information retrieval. In distributed environments...
Today, valuable business information is increasingly stored as unstructured data (documents, emails, etc.). For example, documents exchanged between business partners capture info...