This paper describes the architecture of the fourth version of the Evolutionary Virtual Agent (EVA). This new light-weight java-based implementation is based on a dynamical rule-b...
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
While being quite successful in providing keyword based access to web pages, commercial search portals, such as Google, Yahoo, AltaVista, and AOL, still lack the ability to answer...
There exist numerous systems for mining the web in search of relevant information but few exist for the discovery of interesting information. The discovery of interesting informat...
Abstract. We present a novel, simple, unsupervised method for characterizing the semantic relations that hold between nouns in noun-noun compounds. The main idea is to discover pre...