Search Sciweavers | Sciweavers

78

WWW
2010
ACM

220views Internet Technology» more WWW 2010»

Not so creepy crawler: easy crawler generation with standard xml queries

15 years 4 months ago

Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...

Franziska von dem Bussche, Klara A. Weiand, Benedi...

claim paper

Read More »

83

click to vote

WWW
2005
ACM

106views Internet Technology» more WWW 2005»

Fully automatic wrapper generation for search engines

15 years 10 months ago

Download www.cs.binghamton.edu

When a query is submitted to a search engine, the search engine returns a dynamically generated result page containing the result records, each of which usually consists of a link...

Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Ragha...

claim paper

Read More »

89

click to vote

KDD
2008
ACM

153views Data Mining» more KDD 2008»

Information extraction from Wikipedia: moving down the long tail

15 years 10 months ago

Download www.cs.washington.edu

Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-...

Fei Wu, Raphael Hoffmann, Daniel S. Weld

claim paper

Read More »

102

click to vote

AIRWEB
2007
Springer

214views Internet Technology» more AIRWEB 2007»

Extracting Link Spam using Biased Random Walks from Spam Seed Sets

15 years 4 months ago

Download airweb.cse.lehigh.edu

Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such ...

Baoning Wu, Kumar Chellapilla

claim paper

Read More »

76

click to vote

FGCS
2007

108views more FGCS 2007»

From bioinformatic web portals to semantically integrated Data Grid networks

14 years 9 months ago

Download lsirpeople.epfl.ch

We propose a semi-automated method for redeploying bioinformatic databases indexed in a Web portal as a decentralized, semantically integrated and service-oriented Data Grid. We g...

Adriana Budura, Philippe Cudré-Mauroux, Kar...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers