Sciweavers

24 search results - page 1 / 5
» DOM-based content extraction of HTML documents
Sort
View
WWW
2003
ACM
14 years 5 months ago
DOM-based content extraction of HTML documents
Web pages often contain clutter (such as pop-up ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction o...
Suhit Gupta, Gail E. Kaiser, David Neistadt, Peter...
WWW
2005
ACM
14 years 5 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
ISEC
2001
Springer
180views ECommerce» more  ISEC 2001»
13 years 9 months ago
i-Cube: A Tool-Set for the Dynamic Extraction and Integration of Web Data Content
Over the past decade the Internet has evolved into the largest public community in the world. It provides a wealth of data content and services in almost every field of science, t...
Frankie Poon, Kostas Kontogiannis
SIGIR
2005
ACM
13 years 10 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...
APCCM
2009
13 years 5 months ago
Extracting and Modeling the Semantic Information Content of Web Documents to Support Semantic Document Retrieval
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
Shahrul Azman Noah, Lailatulqadri Zakaria, Arifah ...