Sciweavers

20 search results - page 1 / 4
» Untangling compound documents on the web
Sort
View
HT
2003
ACM
13 years 9 months ago
Untangling compound documents on the web
Most text analysis is designed to deal with the concept of a “document”, namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the ...
Nadav Eiron, Kevin S. McCurley
SIGDOC
1994
ACM
13 years 8 months ago
Untangling the World-Wide Web
Liam Relihan, Tony Cahill, Michael G. Hinchey
WWW
2005
ACM
13 years 10 months ago
Finding the boundaries of information resources on the web
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Pavel Dmitriev, Carl Lagoze, Boris Suchkov
CIKM
1999
Springer
13 years 8 months ago
Word Segmentation and Recognition for Web Document Framework
It is observed that a better approach to Web information understanding is to base on its document framework, which is mainly consisted of (i) the title and the URL name of the pag...
Chi-Hung Chi, Chen Ding, Andrew Lim
WWW
2008
ACM
14 years 5 months ago
As we may perceive: finding the boundaries of compound documents on the web
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Pavel Dmitriev