Sciweavers

311 search results - page 34 / 63
» Cleaning Web Pages for Effective Web Content Mining
Sort
View
AIRWEB
2007
Springer
15 years 3 months ago
A Taxonomy of JavaScript Redirection Spam
Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediat...
Kumar Chellapilla, Alexey Maykov
ICDE
2010
IEEE
199views Database» more  ICDE 2010»
15 years 9 months ago
Fuzzy Matching of Web Queries to Structured Data
Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismat...
Tao Cheng, Hady Wirawan Lauw, Stelios Paparizos
WWW
2006
ACM
15 years 10 months ago
CWS: a comparative web search system
In this paper, we define and study a novel search problem: Comparative Web Search (CWS). The task of CWS is to seek relevant and comparative information from the Web to help users...
Jian-Tao Sun, Xuanhui Wang, Dou Shen, Hua-Jun Zeng...
WWW
2007
ACM
15 years 10 months ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
CIKM
2005
Springer
15 years 3 months ago
Fast webpage classification using URL features
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...
Min-Yen Kan, Hoang Oanh Nguyen Thi