Sciweavers

311 search results - page 51 / 63
» Cleaning Web Pages for Effective Web Content Mining
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
15 years 4 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
KDD
2007
ACM
155views Data Mining» more  KDD 2007»
15 years 9 months ago
Mining templates from search result records of search engines
Metasearch engine, Comparison-shopping and Deep Web crawling applications need to extract search result records enwrapped in result pages returned from search engines in response ...
Hongkun Zhao, Weiyi Meng, Clement T. Yu
SIGSOFT
2008
ACM
15 years 10 months ago
Doloto: code splitting for network-bound web 2.0 applications
Modern Web 2.0 applications, such as GMail, Live Maps, Facebook and many others, use a combination of Dynamic HTML, JavaScript and other Web browser technologies commonly referred...
V. Benjamin Livshits, Emre Kiciman
ICDE
2004
IEEE
151views Database» more  ICDE 2004»
15 years 10 months ago
Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks
We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of impo...
Torsten Suel, Patrick Noel, Dimitre Trendafilov
KDD
2006
ACM
198views Data Mining» more  KDD 2006»
15 years 9 months ago
Event detection from evolution of click-through data
Previous efforts on event detection from the web have focused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose t...
Qiankun Zhao, Tie-Yan Liu, Sourav S. Bhowmick, Wei...