Search Sciweavers | Sciweavers

219 search results - page 10 / 44

» Web page language identification based on URLs

165

click to vote

AUSDM
2006
Springer

160views Data Mining» more AUSDM 2006»

Extraction of Flat and Nested Data Records from Web Pages

15 years 9 months ago

Download crpit.com

This paper deals with studies the problem of identification and extraction of flat and nested data records from a given web page. With the explosive growth of information sources ...

Siddu P. Algur, P. S. Hiremath

claim paper

Read More »

154

click to vote

WWW
2003
ACM

139views Internet Technology» more WWW 2003»

Detecting Near-replicas on the Web by Content and Hyperlink Analysis

16 years 6 months ago

Download nautilus.dii.unisi.it

The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc...

Ernesto Di Iorio, Michelangelo Diligenti, Marco Go...

claim paper

Read More »

169

click to vote

LREC
2008

169views Education» more LREC 2008»

A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure

15 years 7 months ago

Download www.lrec-conf.org

In recent years, language resources acquired from the Web are released, and these data improve the performance of applications in several NLP tasks. Although the language resource...

Keiji Shinzato, Daisuke Kawahara, Chikara Hashimot...

claim paper

Read More »

165

Voted

ACISP
2009
Springer

119views Security Privacy» more ACISP 2009»

Measurement Study on Malicious Web Servers in the .nz Domain

15 years 10 months ago

Download homepages.ecs.vuw.ac.nz

Client-side attacks have become an increasing problem on the Internet today. Malicious web pages launch so-called drive-by-download attacks that are capable to gain complete contro...

Christian Seifert, Vipul Delwadia, Peter Komisarcz...

claim paper

Read More »

176

click to vote

KDD
2007
ACM

193views Data Mining» more KDD 2007»

Joint optimization of wrapper generation and template detection

16 years 6 months ago

Download www.cse.psu.edu

Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...

Shuyi Zheng, Ruihua Song, Ji-Rong Wen, Di Wu

claim paper

Read More »

« Prev « First page 10 / 44 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers