We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
A surrogate is an object that stands for a document and enables navigation to that document. Hypermedia is often represented with textual surrogates, even though studies have show...
Eunyee Koh, Daniel Caruso, Andruid Kerne, Ricardo ...
We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through...
Adriano Veloso, Wagner Meira Jr., Marco Cristo, Ma...
JavaScript is an interpreted programming language most often used for enhancing webpage interactivity and functionality. It has powerful capabilities to interact with webpage docu...
Any model-based evaluation of the dependability of a system requires validation and verification to justify that its results are meaningful. Modern modeling frameworks enable us ...