Sciweavers

IADIS
2004

Crawling the client-side hidden web

13 years 6 months ago
Crawling the client-side hidden web
There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually called hidden web data. To be able to deal with this problem, it is necessary to solve two tasks: crawling the client-side and crawling the server-side hidden web. In this paper we present an architecture and a set of related techniques for accessing the information placed in the client-side hidden web, dealing with aspects such as JavaScript technology, non-standard session maintenance mechanisms, client redirections, pop-up menus, etc. Our approach leverages current browser APIs and implements novel crawling models and algorithms. KEYWORDS Web-Crawler, Hidden Web, Client Side.
Manuel Álvarez, Alberto Pan, Juan Raposo, &
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where IADIS
Authors Manuel Álvarez, Alberto Pan, Juan Raposo, Ángel Viña
Comments (0)