In a higher level task such as clustering of web results or word sense disambiguation, knowledge of all possible distinct concepts in which an ambiguous word can be expressed woul...
Massive amounts of raw data are currently being generated by biologists while sequencing organisms. Outside of the largest, high-pro le projects such as the Human Genome Project, ...
: Content reuse on the Web is becoming even more common since the Web 2.0 "phenomenon". However, each time content is reused certain information is either completely lost...
Acronyms are widely used in many domains to abbreviate and stress important concepts. Due to its dynamicity and unbounded nature, manual attempts to compose a global scale reposito...
The SkyServer is an Internet portal to the Sloan Digital Sky Survey Catalog Archive Server. From 2001 to 2006, there were a million visitors in 3 million sessions generating 170 mi...
Vik Singh, Jim Gray, Ani Thakar, Alexander S. Szal...