Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and mul...
Abstract. Malicious Web content poses a serious threat to the Internet, organizations and users. Current approaches to detecting malicious Web content employ high-powered honey cli...
The World Wide Web provides an increasingly powerful and popular publication mechanism. Web documents often contain a large number of images serving various different purposes. Th...
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...