We present an approach to document clustering based on winnowing fingerprints that achieved good values of effectiveness with considerable save in memory space and computation tim...
The aim of the Dublin City University’s participation in the CLEF 2005 ImageCLEF St Andrew’s Collection task was to explore an alternative approach to exploiting text annotatio...
With large databases of document images available, a method for users to find keywords in documents will be useful. One approach is to perform Optical Character Recognition (OCR) ...
PDF became a very common format for exchanging printable documents. Further, it can be easily generated from the major documents formats, which make a huge number of PDF documents...
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...