Search Sciweavers | Sciweavers

19 search results - page 4 / 4

» Incremental web page template detection

155

click to vote

ICEIS
2005
IEEE

126views Information Technology» more ICEIS 2005»

Change Detection and Maintenance of an XML Web Warehouse

15 years 12 months ago

Download www2.tku.edu.tw

The World Wide Web contains a huge and increasing volume of information. The web warehouse is an efficient and effective means to facilitate utilization of information on the Web,...

Ching-Ming Chao

claim paper

Read More »

149

click to vote

AIRWEB
2006
Springer

136views Internet Technology» more AIRWEB 2006»

Tracking Web Spam with Hidden Style Similarity

15 years 10 months ago

Download airweb.cse.lehigh.edu

Automatically generated content is ubiquitous in the web: dynamic sites built using the three-tier paradigm are good examples (e.g. commercial sites, blogs and other sites powered...

Tanguy Urvoy, Thomas Lavergne, Pascal Filoche

claim paper

Read More »

151

click to vote

CIKM
2008
Springer

133views Information Technology» more CIKM 2008»

Achieving both high precision and high recall in near-duplicate detection

15 years 8 months ago

Download www.infomall.cn

To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...

Lian'en Huang, Lei Wang, Xiaoming Li

claim paper

Read More »

174

click to vote

WWW
2009
ACM

213views Internet Technology» more WWW 2009»

Extracting article text from the web with maximum subsequence segmentation

16 years 7 months ago

Download www2009.org

Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...

Jeff Pasternack, Dan Roth

claim paper

Read More »

« Prev « First page 4 / 4 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers