HTML documents composed of frames can be difficult to write correctly. We demonstrate a technique that can be used by authors manually creating HTML documents (or by document edit...
Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and auto...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
Over the past decade the Internet has evolved into the largest public community in the world. It provides a wealth of data content and services in almost every field of science, t...
Semi-structured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured d...