Vector-based models of lexical semantics retrieve semantically related words automatically from large corpora by exploiting the property that words with a similar meaning tend to ...
We describe an algorithm for similar-image search which
is designed to be efficient for extremely large collections of
images. For each query, a small response set is selected by...
Lorenzo Torresani (Dartmouth College), Martin Szum...
Phishing is a significant security threat to the Internet, which causes tremendous economic loss every year. In this paper, we proposed a novel hybrid phish detection method based...
Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
We consider the coverage testing problem where we are given a document and a corpus with a limited query interface and asked to find if the corpus contains a near-duplicate of th...
Ali Dasdan, Paolo D'Alberto, Santanu Kolay, Chris ...