The Web is experiencing an exponential growth in the use of weblogs or blogs, websites containing dated journal-style entries. Blog entries are generally organised using informall...
Conor Hayes, Paolo Avesani, Sriharsha Veeramachane...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Collaborative tagging systems are becoming very popular recently. Web users use freely-chosen tags to describe shared resources, resulting in a folksonomy. One problem of folksono...
Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbo...
With the rapid and dramatic increase in web feeds published by different publishers, providers or websites via Really Simple Syndication (RSS) and Atom, users cannot be expected t...