This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
A proposal is presented for a standard ontology language defined as ElPub/SciX Ontology, based on the content of a web digital library of conference proceedings. This content, i.e...
The representation of information collections needs to be optimized for human cognition. While documents often include rich visual components, collections, including personal coll...
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The presence of splogs degrades blog search results as well...
Yu-Ru Lin, Hari Sundaram, Yun Chi, Jun'ichi Tatemu...
Background: The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data min...