Sciweavers

41 search results - page 2 / 9
» Building Content Clusters Based on Modelling Page Pairs
Sort
View
WWW
2010
ACM
13 years 11 months ago
The paths more taken: matching DOM trees to search logs for accurate webpage clustering
An unsupervised clustering of the webpages on a website is a primary requirement for most wrapper induction and automated data extraction methods. Since page content can vary dras...
Deepayan Chakrabarti, Rupesh R. Mehta
WWW
2007
ACM
14 years 5 months ago
A link classification based approach to website topic hierarchy generation
Hierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a...
Nan Liu, Christopher C. Yang
NIPS
2000
13 years 6 months ago
The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...
David A. Cohn, Thomas Hofmann
HT
2005
ACM
13 years 10 months ago
As we may perceive: inferring logical documents from hypertext
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Pavel Dmitriev, Carl Lagoze, Boris Suchkov
SAINT
2003
IEEE
13 years 10 months ago
Bayesian Analysis of Online Newspaper Log Data
In this paper we address the problem of analyzing web log data collected at a typical online newspaper site. We propose a two-way clustering technique based on probability theory....
Hannes Wettig, Jussi Lahtinen, Tuomas Lepola, Petr...