There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents a...
The subject of this paper is the semi-automatic construction of taxonomies over the Web. We address the problem of discovering high-quality resources that belong in a particular n...
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopala...
Conventionally, Web pages have been recognized as documents described by HTML. Image data, such as photographs, logos, maps, illustrations, and decorated text, have been treated a...
In this paper, we present Structon, a novel approach that uses Web mining together with inference and IP traceroute to geolocate IP addresses with significantly better accuracy t...
Chuanxiong Guo, Yunxin Liu, Wenchao Shen, Helen J....
Recently, there has been increased interest in the retrieval and integration of hidden Web data with a view to leverage high-quality information available in online databases. Alt...