Abstract. Large document collections, such as those delivered by Internet search engines, are difficult and time-consuming for users to read and analyse. The detection of common an...
This paper describes a system called STeP_IN (standing for Socio-Technical Platform for in situ Networking) that assists software developer to find and learn Java API libraries. I...
Cloaking and redirection are two possible search engine spamming techniques. In order to understand cloaking and redirection on the Web, we downloaded two sets of Web pages while ...
Web spider is a widely used approach to obtain information for search engines. As the size of the Web grows, it becomes a natural choice to parallelize the spider’s crawling proc...
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we de...