Sciweavers

43 search results - page 8 / 9
» Crawling the Content Hidden Behind Web Forms
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 6 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
CIKM
2009
Springer
13 years 10 months ago
An empirical study on using hidden markov model for search interface segmentation
This paper describes a hidden Markov model (HMM) based approach to perform search interface segmentation. Automatic processing of an interface is a must to access the invisible co...
Ritu Khare, Yuan An
SSDBM
2008
IEEE
149views Database» more  SSDBM 2008»
14 years 4 days ago
Query Planning for Searching Inter-dependent Deep-Web Databases
Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming what is referred to as the deep web. It is desirable to have systems that can pr...
Fan Wang, Gagan Agrawal, Ruoming Jin
IPM
2007
156views more  IPM 2007»
13 years 5 months ago
p2pDating: Real life inspired semantic overlay networks for Web search
We consider a network of autonomous peers forming a logically global but physically distributed search engine, where every peer has its own local collection generated by independe...
Josiane Xavier Parreira, Sebastian Michel, Gerhard...
TOIS
2008
145views more  TOIS 2008»
13 years 5 months ago
Classification-aware hidden-web text database selection
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over multip...
Panagiotis G. Ipeirotis, Luis Gravano