Traditionally hypertexts have been limited in size by the manual effort required to create hypertext links. In addition, large hyper–linked collections may overwhelm users with ...
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...
We are working on a project aimed at building next generation analyst support tools that focus analysts’ attention on the most critical and novel information found within the da...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...