Evaluating text fragments for positive and negative subjective expressions and their strength can be important in applications such as single- or multi- document summarization, do...
A major obstacle to the construction of a probabilistic translation model is the lack of large parallel corpora. In this paper we first describe a parallel text mining system that...
This paper presents an unsupervised opinion analysis method for debate-side classification, i.e., recognizing which stance a person is taking in an online debate. In order to hand...
Improving the precision of information retrieval has been a challenging issue on Chinese Web. As exemplified by Chinese recipes on the Web, it is not easy/natural for people to us...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...