We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Recent research has shown that developers spend significant amounts of time navigating around code. Much of this time is spent on redundant navigations to code that the developer ...
Michael J. Coblenz, Andrew Jensen Ko, Brad A. Myer...
Automatic keyword annotation is a promising solution to enable more effective image search by using keywords. In this paper, we propose a novel automatic image annotation method b...
The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of ...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...