Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...
We describe a new paradigm for performing search in context. In the IntelliZap system we developed, search is initiated from a text query marked by the user in a document she view...
Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias...
Abstract. Textual reuse is an integral part of textual case-based reasoning (TCBR) which deals with solving new problems by reusing previous similar problem-solving experiences doc...
Ibrahim Adeyanju, Nirmalie Wiratunga, Juan A. Reci...
Documents, such as those seen on Wikipedia and Folksonomy, have tended to be assigned with multiple topics as a meta-data. Therefore, it is more and more important to analyze a re...
A major difficulty for designing a document image segmentation methodology is the proper value selection for all involved parameters. This is usually done after experimentations o...