The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text categorization is...
Document-centric XML collections contain text-rich documents, marked up with XML tags that add lightweight semantics to the text. Querying such collections calls for a hybrid quer...
The performance of web search engines may often deteriorate due to the diversity and noisy information contained within web pages. User click-through data can be used to introduce...
The data cube operator exemplifies two of the most important aspects of OLAP queries: aggregation and dimension hierarchies. In earlier work we presented Dwarf, a highly compress...
This paper addresses the issue of Web document summarization. As textual content of Web documents is often scarce or irrelevant and existing summarization techniques are based on ...