We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality ...
This paper presents our work on the detection of temporal information in web pages. The pages examined within the scope of this study were taken from the tourism sector and the te...
A long-standing goal of Web research has been to construct a unified Web knowledge base. Information extraction techniques have shown good results on Web inputs, but even most dom...
Michael J. Cafarella, Jayant Madhavan, Alon Y. Hal...
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...