A novel web surfer model, where the transitions between web pages are fuzzy quantities, is proposed in this article. Such a model is appropriate when the links between pages are i...
Currently, most of the web is inaccessible to mobile users. Few pages are designed with anything other than the Desktop PC in mind. The growing number of mobile devices with diffe...
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pag...
In this poster we present an overview of the techniques we used to develop and evaluate a text categorisation system for the PRINCIP project which sets out to automatically classi...