Weblogs and message boards provide online forums for discussion that record the voice of the public. Woven into this mass of discussion is a wide range of opinion and commentary a...
Natalie S. Glance, Matthew Hurst, Kamal Nigam, Mat...
— Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new ex...
We develop a generic method for the review matching problem, which is to match unstructured text reviews to a list of objects, where each object has a set of attributes. To this e...
Nilesh N. Dalvi, Ravi Kumar, Bo Pang, Andrew Tomki...
Abstract— With XML becoming a standard for data representation and exchange, we can expect to see large scale repositories and warehouses of XML data. In order for users to under...
Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in...