We analyse transaction logs for a large full-text document collection for Computer Science researchers. We report insights gained from this analysis and identify resulting search ...
Typographic and visual information is an integral part of textual documents. Most information extraction systems ignore most of this visual information, processing the text as a l...
Since the XML format became a de facto standard for structured documents, the IT research and industry have developed a number of XML editors to help users produce structured docu...
In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree ...
We propose a visualization method based on a topic model for discrete data such as documents. Unlike conventional visualization methods based on pairwise distances such as multi-d...