It is observed that a better approach to Web information understanding is to base on its document framework, which is mainly consisted of (i) the title and the URL name of the pag...
Developing better systems for document image analysis requires understanding errors, their sources, and their effects. The interactions between various processing steps are comple...
Tinderbox, a hypertext tool for making, analyzing, and sharing notes, explores the use of collage to build and share linked conceptual structures. Adopting a simple, regular data ...
Documents, such as those seen on Wikipedia and Folksonomy, have tended to be assigned with multiple topics as a meta-data. Therefore, it is more and more important to analyze a re...
York University evaluated a prepcessing approach for this year’s enterprise document search task. With different parsing tools, we create two data sets. Based on each data set,...