In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Background: The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. I...
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
Most methods for document image retrieval rely solely on text information to find similar documents. This paper describes a way to use layout information for document image retrie...
Joost van Beusekom, Daniel Keysers, Faisal Shafait...
We present in this paper an approach to assessing student paraphrases in the intelligent tutoring system iSTART. The approach is based on measuring the semantic similarity between ...
Vasile Rus, Mihai C. Lintean, Arthur C. Graesser, ...