Sciweavers

472 search results - page 58 / 95
» Crawling the Hidden Web
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
16 years 8 days ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
ISW
2009
Springer
15 years 6 months ago
Automated Spyware Collection and Analysis
Various online studies on the prevalence of spyware attest overwhelming numbers (up to 80%) of infected home computers. However, the term spyware is ambiguous and can refer to anyt...
Andreas Stamminger, Christopher Kruegel, Giovanni ...
SIGIR
2005
ACM
15 years 5 months ago
Server selection methods in hybrid portal search
The TREC .GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with ...
David Hawking, Paul Thomas
WWW
2008
ACM
16 years 16 days ago
Efficiently finding web services using a clustering semantic approach
Efficiently finding Web services on the Web is a challenging issue in service-oriented computing. Currently, UDDI is a standard for publishing and discovery of Web services, and U...
Jiangang Ma, Yanchun Zhang, Jing He
WWW
2001
ACM
16 years 16 days ago
Effective Web data extraction with standard XML technologies
We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...
Jussi Myllymaki