Sciweavers

1195 search results - page 171 / 239
» Content Based Web Sampling
Sort
View
SIGIR
2004
ACM
15 years 3 months ago
Web-page classification through summarization
Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web...
Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Be...
KDD
2006
ACM
198views Data Mining» more  KDD 2006»
15 years 10 months ago
Event detection from evolution of click-through data
Previous efforts on event detection from the web have focused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose t...
Qiankun Zhao, Tie-Yan Liu, Sourav S. Bhowmick, Wei...
ICDAR
2009
IEEE
15 years 4 months ago
Scalable Feature Extraction from Noisy Documents
We cope with the metadata recognition in layoutoriented documents. We address the problem as a classification task and propose a method for automatic extraction of relevant featu...
Loïc Lecerf, Boris Chidlovskii
ICPR
2008
IEEE
15 years 4 months ago
Incremental clustering via nonnegative matrix factorization
Nonnegative matrix factorization (NMF) has been shown to be an efficient clustering tool. However, NMF`s batch nature necessitates recomputation of whole basis set for new samples...
Serhat Selcuk Bucak, Bilge Günsel
ICDE
2007
IEEE
167views Database» more  ICDE 2007»
15 years 4 months ago
Load Shedding for Window Joins on Multiple Data Streams
We consider the problem of semantic load shedding for continuous queries containing window joins on multiple data streams and propose a robust approach that is effective with the ...
Yan-Nei Law, Carlo Zaniolo