This paper describes a novel approach of improving multi-document summarization based on cross-document information extraction (IE). We describe a method to automatically incorpora...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
Document understanding techniques such as document clustering and multi-document summarization have been receiving much attention in recent years. Current document clustering meth...
Dingding Wang, Shenghuo Zhu, Tao Li, Yun Chi, Yiho...
We are interested in retrieving information from conversational speech corpora, such as call-center data. This data comprises spontaneous speech conversations with low recording q...
In many domains there are specific attributes in documents that carry more weight than the general words in the document. This paper proposes the use of information extraction tec...