Sciweavers

219 search results - page 10 / 44
» Web page language identification based on URLs
Sort
View
AUSDM
2006
Springer
160views Data Mining» more  AUSDM 2006»
15 years 4 months ago
Extraction of Flat and Nested Data Records from Web Pages
This paper deals with studies the problem of identification and extraction of flat and nested data records from a given web page. With the explosive growth of information sources ...
Siddu P. Algur, P. S. Hiremath
WWW
2003
ACM
16 years 1 months ago
Detecting Near-replicas on the Web by Content and Hyperlink Analysis
The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc...
Ernesto Di Iorio, Michelangelo Diligenti, Marco Go...
102
Voted
LREC
2008
169views Education» more  LREC 2008»
15 years 1 months ago
A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure
In recent years, language resources acquired from the Web are released, and these data improve the performance of applications in several NLP tasks. Although the language resource...
Keiji Shinzato, Daisuke Kawahara, Chikara Hashimot...
120
Voted
ACISP
2009
Springer
15 years 5 months ago
Measurement Study on Malicious Web Servers in the .nz Domain
Client-side attacks have become an increasing problem on the Internet today. Malicious web pages launch so-called drive-by-download attacks that are capable to gain complete contro...
Christian Seifert, Vipul Delwadia, Peter Komisarcz...
KDD
2007
ACM
193views Data Mining» more  KDD 2007»
16 years 23 days ago
Joint optimization of wrapper generation and template detection
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...
Shuyi Zheng, Ruihua Song, Ji-Rong Wen, Di Wu