Recently the problem of dimensionality reduction has received a lot of interests in many fields of information processing. We consider the case where data is sampled from a low d...
An unsupervised probabilistic learning framework for normalizing product records across different retailer Web sites is presented. Our framework decomposes the problem into two ta...
Multilingual parallel text corpora provide a powerful means for propagating linguistic knowledge across languages. We present a model which jointly learns linguistic structure for...
Wrapper is a traditional method to extract useful information from Web pages. Most previous works rely on the similarity between HTML tag trees and induced template-dependent wrap...
In this paper, we present methods to analyze dialog coherence that help us to automatically distinguish between coherent and incoherent conversations. We build a machine learning ...