Sciweavers

ICCSA
2005
Springer

On URL Normalization

13 years 10 months ago
On URL Normalization
Since syntactically different URLs could represent the same resource in WWW, there are on-going efforts to define the URL normalization in the standard communities. This paper considers the three additional URL normalization steps beyond ones specified in the standard URL normalization. The idea behind our work is that in the URL normalization we want to minimize false negatives further while allowing false positives in a limited level. Two metrics are defined to analyze the effect of each step in the URL normalization. Over 170 million URLs that were collected in the real web pages, we did an experiment, and interesting statistical results are reported in this paper.
Sang Ho Lee, Sung Jin Kim, Seok-Hoo Hong
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ICCSA
Authors Sang Ho Lee, Sung Jin Kim, Seok-Hoo Hong
Comments (0)