Search Sciweavers | Sciweavers

492 search results - page 28 / 99

» Data quality in web archiving

112

click to vote

ACL
2006

141views Computational Linguistics» more ACL 2006»

A DOM Tree Alignment Model for Mining Parallel Data from the Web

15 years 3 months ago

Download research.microsoft.com

This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree align...

Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao

claim paper

Read More »

119

click to vote

DEXAW
2002
IEEE

159views Database» more DEXAW 2002»

Data Warehouse Clustering on the Web

15 years 7 months ago

Download selab.iecs.fcu.edu.tw

In collaborative e-commerce environments, interoperation is a prerequisite for data warehouses that are physically scattered along the value chain. Adopting system and information...

Aristides Triantafillakis, Panagiotis Kanellis, Dr...

claim paper

Read More »

click to vote

WWW
2007
ACM

162views Internet Technology» more WWW 2007»

Detecting near-duplicates for web crawling

16 years 2 months ago

Download infolab.stanford.edu

Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...

Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma

claim paper

Read More »

116

click to vote

SDM
2008
SIAM

135views Data Mining» more SDM 2008»

A Spamicity Approach to Web Spam Detection

15 years 3 months ago

Download www.cs.sfu.ca

Web spam, which refers to any deliberate actions bringing to selected web pages an unjustifiable favorable relevance or importance, is one of the major obstacles for high quality ...

Bin Zhou 0002, Jian Pei, ZhaoHui Tang

claim paper

Read More »

140

click to vote

BMCBI
2008

156views more BMCBI 2008»

ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses

15 years 2 months ago

Download www.biomedcentral.com

Background: A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers)...

Todd H. Stokes, J. T. Torrance, Henry Li, May D. W...

claim paper

Read More »

« Prev « First page 28 / 99 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers