The amount of text data on the Internet is growing at a very fast rate. Online text repositories for news agencies, digital libraries and other organizations currently store gigaan...
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
: XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in re...
Jayavel Shanmugasundaram, Eugene J. Shekita, Rimon...
ct 7 Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, 8 we present a novel Flocking based approach for doc...