Sciweavers

57 search results - page 3 / 12
» Declustering Web Content Indices for Parallel Information Re...
Sort
View
AIRWEB
2008
Springer
13 years 8 months ago
Exploring linguistic features for web spam detection: a preliminary study
We study the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, we make t...
Jakub Piskorski, Marcin Sydow, Dawid Weiss
CIKM
2006
Springer
13 years 9 months ago
A probabilistic relevance propagation model for hypertext retrieval
A major challenge in developing models for hypertext retrieval is to effectively combine content information with the link structure available in hypertext collections. Although s...
Azadeh Shakery, ChengXiang Zhai
DOCENG
2007
ACM
13 years 10 months ago
Structure and content analysis for html medical articles: a hidden markov model approach
We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...
Jie Zou, Daniel X. Le, George R. Thoma
CIKM
2008
Springer
13 years 8 months ago
Book search: indexing the valuable parts
With massive book digitization efforts underway, there is a need for developing effective book retrieval strategies. This paper explores the relative contribution of different par...
Walid Magdy, Kareem Darwish
AIRS
2005
Springer
13 years 11 months ago
Subsite Retrieval: A Novel Concept for Topic Distillation
Topic distillation is one of the main information needs when users search the Web. In previous approaches to topic distillation, the single page was treated as the basic searching ...
Tao Qin, Tie-Yan Liu, Xu-Dong Zhang, Guang Feng, W...