We present a method for testing subject’s performance in a realistic (end-to-end) information understanding task— rapid understanding of large document collections—and discu...
Term extraction relates to extracting the most characteristic or important terms (words or phrases) in a document. This information is commonly used for improving the accuracy of ...
A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, ...
Computational resources for research in legal environments have historically implied remote access to large databases of legal documents such as case law, statutes, law reviews an...
Jack G. Conrad, Khalid Al-Kofahi, Ying Zhao, Georg...
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...