Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...
Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
The demand of browsing information from general Web pages using a mobile phone is increasing. However, since the majority of Web pages on the Internet are optimized for browsing f...
Gen Hattori, Keiichiro Hoashi, Kazunori Matsumoto,...
Expressing web page content in a way that computers can understand is the key to a semantic web. Generating ontological information from the web automatically using machine learni...
When we describe a Web page informally, we often use phrases like it looks like a newspaper site", there are several unordered lists" or it's just a collection of li...
Isabel F. Cruz, Slava Borisov, Michael A. Marks, T...