As the web expands exponentially, the need to put some order to its content becomes apparent. Hypertext categorization, that is the automatic classification of web documents into ...
In this paper, we study the problem of learning block classification models to estimate block functions. We distinguish general models, which are learned across multiple sites, an...
Most classification algorithms are best at categorizing the Web documents into a few categories, such as the top two levels in the Open Directory Project. Such a classification me...
Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recent...
We investigates language models for informational and navigational web search. Retrieval on the web is a task that differs substantially from ordinary ad hoc retrieval. We perfor...