Today, Web pages are usually accessed using text search engines, whereas documents stored in the deep Web are accessed through domain-specific Web portals. These portals rely on e...
Co-training is a semi-supervised technique that allows classifiers to learn with fewer labelled documents by taking advantage of the more abundant unclassified documents. However, ...
In this paper, we propose to use database technology to improve performance of web proxy servers. We view the cache at a proxy server as a web warehouse with data organized in a h...
Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utiliz...
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...