Code clones in software increase maintenance cost and lower software quality. We have devised a new algorithm to detect duplicated parts of source code in large software. Our algo...
The retrieval of similar documents from large scale datasets has been the one of the main concerns in knowledge management environments, such as plagiarism detection, news impact a...
Felipe Bravo-Marquez, Gaston L'Huillier, Sebasti&a...
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the...
We mine a large taxonomic dataset for subject classification rules. We then use these rules to perform an extensive analysis of the subject matter of the largest general purpose in...
The Web contains a large amount of documents and increasingly, also semantic data in the form of RDF triples. Many of these triples are annotations that are associated with docume...