Enterprises provide professionally authored content about their products/services in different languages for use in web sites and customer care. For customer care, personalization...
Traditionally, search engines have ignored the reading difficulty of documents and the reading proficiency of users in computing a document ranking. This is one reason why Web se...
Kevyn Collins-Thompson, Paul N. Bennett, Ryen W. W...
Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...
We consider the paging problem where the pages have varying size. This problem has applications to page replacement policies for caches containing World Wide Web documents. We con...
There is a significant need to extract and analyse the text in images on Web documents, for effective indexing, semantic analysis and even presentation by non-visual means (e.g....