Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

112

DEXA
2011
Springer

favoriteEmaildiscussreport

263views Database» more DEXA 2011»

Sampling the National Deep Web

13 years 11 months ago

Sampling the National Deep Web

Download www.mendeley.com

A huge portion of today’s Web consists of web pages ﬁlled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribution are somewhat disputable. In this paper, we revisit a problem of deep Web characterization: how to estimate the total number of online databases on the Web? We propose the Host-IP clustering sampling method to address the drawbacks of existing approaches for deep Web characterization and report our ﬁndings based on the survey of Russian Web. Obtained estimates together with a proposed sampling technique could be useful for further studies to handle data in the deep Web.

Denis Shestakov

Real-time Traffic

Database | DEXA 2011 | Sampling Method | Sampling Technique | Web Characterization |

claim paper

Related Content

» HostIP clustering technique for deep web characterization

» Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration

» Acquisition of OWL DL Axioms from Lexical Resources

» Exploring the academic invisible web

» Word Sense Disambiguation by Web Mining for Word Cooccurrence Probabilities

» Autonomy for SOHO Ground Operations

» Functional Validation in Grid Computing

» Mining templates from search result records of search engines

» FineGrain Morphological Analyzer and PartofSpeech Tagger for Arabic Text

Post Info
More Details (n/a)

Added	18 Dec 2011
Updated	18 Dec 2011
Type	Journal
Year	2011
Where	DEXA
Authors	Denis Shestakov

Comments (0)