We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
The MEDLINE database is the world largest repository of bio-medical abstracts. It is a central information entry point for most biologists despite the growing availability of full-...
The sounds generated by a writing instrument provide a rich and under-utilized source of information for pattern recognition. We examine the feasibility of recognition of handwrit...
Intelligence analysts face the need for immediate, up-to-date information about individuals of interest. While biographies can be written and stored in text databases, we argue tha...
This paper argues that itisimpossible to separate lexicaland encyclopedic knowledge and describes an attempt to build a large lexical database that contains the range of informati...
Martha W. Evens, Joanne Dardaine, Yu-Feng Huang, S...