The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes quest...
This paper describes our participation in the 2008 TREC Blog track. Our system consists of 3 components: data preprocessing, topic retrieval, and opinion finding. In the topic ret...
We investigated the dictionary-based query translation method combining the translation disambiguation process using statistic cooccurrence information trained from the provided c...
Background: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document ...
An indexing model is the heart of an Information Retrieval (IR) system. Data structures such as term based inverted indices have proved to be very effective for IR using vector sp...