Many documents on the Web are formated in a weakly structured format. Because of their weak semantic and because of the heterogeneity of their formats, the information conveyed by...
—Advertising has become an integral and inseparable part of the World Wide Web. However, neither public auditing nor monitoring mechanisms still exist in this emerging area. In t...
Yong Wang, Daniel Burgener, Aleksandar Kuzmanovic,...
Abstract: Recently a growing demand has arisen for methods for the development of smalland medium scale Web Information Systems (WIS). Web applications are being built in a rapidly...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...
Many websites have a hierarchical organization of content. This organization may be quite different from the organization expected by visitors to the website. In particular, it is...