Crawling the client-side hidden web

15 years 5 months ago

Download www.tic.udc.es

There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually called hidden web data. To be able to deal with this problem, it is necessary to solve two tasks: crawling the client-side and crawling the server-side hidden web. In this paper we present an architecture and a set of related techniques for accessing the information placed in the client-side hidden web, dealing with aspects such as JavaScript technology, non-standard session maintenance mechanisms, client redirections, pop-up menus, etc. Our approach leverages current browser APIs and implements novel crawling models and algorithms. KEYWORDS Web-Crawler, Hidden Web, Client Side.

Manuel Álvarez, Alberto Pan, Juan Raposo, &

Real-time Traffic

Hidden Web | Hidden Web Data | IADIS 2004 | IADIS 2008 | Server-side Hidden Web |

claim paper

» Crawling the Hidden Web

» Crawling the Content Hidden Behind Web Forms

» Probabilistic models for focused web crawling

» Sitemaps above and beyond the crawl of duty

» Learning Deep Web Crawling with Diverse Features

» Query Selection Techniques for Efficient Crawling of Structured Web Sources

» Searching for HiddenWeb Databases

» Googles Deep Web crawl

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2004
Where	IADIS
Authors	Manuel Álvarez, Alberto Pan, Juan Raposo, Ángel Viña

Comments (0)

Sciweavers

Crawling the client-side hidden web

Hidden Web | Hidden Web Data | IADIS 2004 | IADIS 2008 | Server-side Hidden Web |

Explore & Download

Productivity Tools

Sciweavers