In this paper we investigate a novel and important problem in multi-document summarization, i.e., how to extract an easy-tounderstand English summary for non-native readers. Exist...
Weblogs (blogs) serve as a gateway to a large blog reader population, so blog authors can potentially influence a large reader population by expressing their thoughts and expertise...
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
Topics in prior-art patent search are typically full patent applications and relevant items are patents often taken from sources in different languages. Cross language patent retr...
As massive repositories of real-time human commentary, social media platforms have arguably evolved far beyond passive facilitation of online social interactions. Rapid analysis o...