Sciweavers

244 search results - page 33 / 49
» From HTML documents to web tables and rules
Sort
View
WWW
2010
ACM
15 years 6 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
DOCENG
2007
ACM
15 years 3 months ago
Elimination of junk document surrogate candidates through pattern recognition
A surrogate is an object that stands for a document and enables navigation to that document. Hypermedia is often represented with textual surrogates, even though studies have show...
Eunyee Koh, Daniel Caruso, Andruid Kerne, Ricardo ...
CIKM
2006
Springer
15 years 3 months ago
Multi-evidence, multi-criteria, lazy associative document classification
We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through...
Adriano Veloso, Wagner Meira Jr., Marco Cristo, Ma...
WWW
2009
ACM
16 years 14 days ago
Characterizing insecure javascript practices on the web
JavaScript is an interpreted programming language most often used for enhancing webpage interactivity and functionality. It has powerful capabilities to interact with webpage docu...
Chuan Yue, Haining Wang
DSN
2009
IEEE
15 years 6 months ago
Report generation for simulation traces with Traviando
Any model-based evaluation of the dependability of a system requires validation and verification to justify that its results are meaningful. Modern modeling frameworks enable us ...
Peter Kemper