Sciweavers

COMPSAC
2002
IEEE

An Approach to Identify Duplicated Web Pages

13 years 8 months ago
An Approach to Identify Duplicated Web Pages
A relevant consequence of the unceasing expansion of the Web and e-commerce is the growth of the demand of new Web sites and Web applications. The software industry is facing the new opportunity under the pressure of a very short time-to-market and an extremely high competition. As a result, Web sites and applications are usually developed without a formalized process, but Web pages are directly coded in an incremental way, where new pages are obtained by duplicating existing ones. Duplicated Web pages, having the same structure and just differing for the data they include, can be considered as clones. The identification of clones may reduce the effort devoted to test, maintain and evolve Web sites and applications. Moreover, clone detection among different Web sites aims to detect cases of possible plagiarism. In this paper we propose an approach, based on similarity metrics, to detect duplicated pages in Web sites and applications, implemented with HTML language and ASP technology. ...
Giuseppe A. Di Lucca, Massimiliano Di Penta, Anna
Added 14 Jul 2010
Updated 14 Jul 2010
Type Conference
Year 2002
Where COMPSAC
Authors Giuseppe A. Di Lucca, Massimiliano Di Penta, Anna Rita Fasolino
Comments (0)