In this paper, we address the question of how we can identify hosts that will generate links to web spam. Detecting such spam link generators is important because almost all new s...
The problem of the resolution of the lexical ambiguity, which is commonly referred as Word Sense Disambiguation (WSD), seems to be stuck because of the knowledge acquisition bottle...
This paper proposes a method of crawling Web servers connected to the Internet without imposing a high processing load. We are using the crawler for a field survey of the digital ...
Katsuko T. Nakahira, Tetsuya Hoshino, Yoshiki Mika...
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we de...
The TREC-2001 Web track evaluation experiments at the Justsystem site are described with a focus on the "aboutness" based approach in text retrieval. In the web ad hoc t...