Extracting spam blogs with co-citation clusters

9 years 9 months ago
Extracting spam blogs with co-citation clusters
This paper reports the estimated number of spam blogs in order to assess their current state in the blogosphere. To extract spam blogs, I developed a traversal method among co-citation clusters of blogs from a spam seed. Spam seeds were collected in terms of high out-degree and spam keyword. According to the experiment, a mixed seed set composed of high out-degree and spam keyword seeds is more effective than individual seed sets in terms of FMeasure. In conclusion, mixed seeds from different methods are effective in improving the F-Measure results of spam extraction with co-citation clusters. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval ? Information filtering. General Terms Algorithms, Measurement, Experimentation Keywords Spam Blog Extraction, Co-citation Cluster, Advertisement Link
Kazunari Ishida
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2008
Where WWW
Authors Kazunari Ishida
Comments (0)