Sciweavers

19 search results - page 4 / 4
» Incremental web page template detection
Sort
View
ICEIS
2005
IEEE
14 years 1 days ago
Change Detection and Maintenance of an XML Web Warehouse
The World Wide Web contains a huge and increasing volume of information. The web warehouse is an efficient and effective means to facilitate utilization of information on the Web,...
Ching-Ming Chao
AIRWEB
2006
Springer
13 years 10 months ago
Tracking Web Spam with Hidden Style Similarity
Automatically generated content is ubiquitous in the web: dynamic sites built using the three-tier paradigm are good examples (e.g. commercial sites, blogs and other sites powered...
Tanguy Urvoy, Thomas Lavergne, Pascal Filoche
CIKM
2008
Springer
13 years 8 months ago
Achieving both high precision and high recall in near-duplicate detection
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
Lian'en Huang, Lei Wang, Xiaoming Li
WWW
2009
ACM
14 years 7 months ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth