Sciweavers

311 search results - page 39 / 63
» Cleaning Web Pages for Effective Web Content Mining
Sort
View
ICDM
2007
IEEE
101views Data Mining» more  ICDM 2007»
15 years 3 months ago
Lightweight Distributed Trust Propagation
Using mobile devices, such as smart phones, people may create and distribute different types of digital content (e.g., photos, videos). One of the problems is that digital content...
Daniele Quercia, Stephen Hailes, Licia Capra
ACSAC
2010
IEEE
14 years 7 months ago
FIRM: capability-based inline mediation of Flash behaviors
The wide use of Flash technologies makes the security risks posed by Flash content an increasingly serious issue. Such risks cannot be effectively addressed by the Flash player, w...
Zhou Li, XiaoFeng Wang
105
Voted
SIGIR
2009
ACM
15 years 4 months ago
Web derived pronunciations for spoken term detection
Indexing and retrieval of speech content in various forms such as broadcast news, customer care data and on-line media has gained a lot of interest for a wide range of application...
Dogan Can, Erica Cooper, Arnab Ghoshal, Martin Jan...
WWW
2008
ACM
15 years 10 months ago
IRLbot: scaling to 6 billion pages and beyond
This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, Dmit...
PVLDB
2008
124views more  PVLDB 2008»
14 years 9 months ago
Google's Deep Web crawl
The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structu...
Jayant Madhavan, David Ko, Lucja Kot, Vignesh Gana...