In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
More and more documents on the World Wide Web are based on templates. On a technical level this causes those documents to have a quite similar source code and DOM tree structure. G...
We present the design of Plurality,1 an interactive tagging system. Plurality's modular architecture allows users to automatically generate high-quality tags over Web content...
While complete understanding of arbitrary input text remains in the future, it is currently possible to construct natural language processing systems that provide a partial unders...
Peggy M. Andersen, Philip J. Hayes, Steven P. Wein...
Indexing file systems is a powerful means of helping users locate documents, software, and other types of data among large repositories. In environments that contain many differen...