Search Sciweavers | Sciweavers

311 search results - page 10 / 63

» Cleaning Web Pages for Effective Web Content Mining

117

click to vote

ACSW
2004

192views Security Privacy» more ACSW 2004»

Discovering Parallel Text from the World Wide Web

15 years 1 months ago

Download crpit.com

Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and mul...

Jisong Chen, Rowena Chau, Chung-Hsing Yeh

claim paper

Read More »

click to vote

DOCENG
2009
ACM

139views Document Analysis» more DOCENG 2009»

Web document text and images extraction using DOM analysis and natural language processing

15 years 6 months ago

Download www.hpl.hp.com

: © Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing Parag Mulendra Joshi, Sam Liu HP Laboratories HPL-2009-187 Web page text extraction,...

Parag Mulendra Joshi, Sam Liu

claim paper

Read More »

click to vote

COLCOM
2008
IEEE

121views Distributed And Parallel Com...» more COLCOM 2008»

Web Canary: A Virtualized Web Browser to Support Large-Scale Silent Collaboration in Detecting Malicious Web Sites

15 years 1 months ago

Download mason.gmu.edu

Abstract. Malicious Web content poses a serious threat to the Internet, organizations and users. Current approaches to detecting malicious Web content employ high-powered honey cli...

Jiang Wang, Anup K. Ghosh, Yih Huang

claim paper

Read More »

click to vote

ICDAR
2003
IEEE

127views Document Analysis» more ICDAR 2003»

Identifying Story and Preview Images in News Web Pages

15 years 4 months ago

Download www.cse.salford.ac.uk

The World Wide Web provides an increasingly powerful and popular publication mechanism. Web documents often contain a large number of images serving various different purposes. Th...

Jianying Hu, Amit Bagga

claim paper

Read More »

click to vote

KDD
2006
ACM

185views Data Mining» more KDD 2006»

Understanding Content Reuse on the Web: Static and Dynamic Analyses

16 years 7 hour ago

Download homepages.dcc.ufmg.br

Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...

Ricardo A. Baeza-Yates, Álvaro R. Pereira J...

claim paper

Read More »

« Prev « First page 10 / 63 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers