Sciweavers

492 search results - page 28 / 99
» Data quality in web archiving
Sort
View
ACL
2006
14 years 11 months ago
A DOM Tree Alignment Model for Mining Parallel Data from the Web
This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree align...
Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao
85
Voted
DEXAW
2002
IEEE
159views Database» more  DEXAW 2002»
15 years 2 months ago
Data Warehouse Clustering on the Web
In collaborative e-commerce environments, interoperation is a prerequisite for data warehouses that are physically scattered along the value chain. Adopting system and information...
Aristides Triantafillakis, Panagiotis Kanellis, Dr...
67
Voted
WWW
2007
ACM
15 years 10 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
SDM
2008
SIAM
135views Data Mining» more  SDM 2008»
14 years 11 months ago
A Spamicity Approach to Web Spam Detection
Web spam, which refers to any deliberate actions bringing to selected web pages an unjustifiable favorable relevance or importance, is one of the major obstacles for high quality ...
Bin Zhou 0002, Jian Pei, ZhaoHui Tang
BMCBI
2008
156views more  BMCBI 2008»
14 years 9 months ago
ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
Background: A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers)...
Todd H. Stokes, J. T. Torrance, Henry Li, May D. W...