Given only the URL of a web page, can we identify its topic? This is the question that we examine in this paper. Usually, web pages are classified using their content [7], but a U...
These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditi...
As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that ...
Ming Ji, Jun Yan, Siyu Gu, Jiawei Han, Xiaofei He,...
The Web allows users to share their work very effectively leading to the rapid re-use and remixing of content on the Web including text, images, and videos. Scientific research d...
Web pages such as news and shopping sites often use modular layouts. When used effectively this practice allows authors to present clearly large amounts of information in a single...