Sciweavers

EMNLP
2009

Mining Search Engine Clickthrough Log for Matching N-gram Features

13 years 2 months ago
Mining Search Engine Clickthrough Log for Matching N-gram Features
User clicks on a URL in response to a query are extremely useful predictors of the URL's relevance to that query. Exact match click features tend to suffer from severe data sparsity issues in web ranking. Such sparsity is particularly pronounced for new URLs or long queries where each distinct query-url pair will rarely occur. To remedy this, we present a set of straightforward yet informative query-url n-gram features that allows for generalization of limited user click data to large amounts of unseen query-url pairs. The method is motivated by techniques leveraged in the NLP community for dealing with unseen words. We find that there are interesting regularities across queries and their preferred destination URLs; for example, queries containing "form" tend to lead to clicks on URLs containing "pdf". We evaluate our set of new query-url features on a web search ranking task and obtain improvements that are statistically significant at a p-value < 0.0001 l...
Huihsin Tseng, Longbin Chen, Fan Li, Ziming Zhuang
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Where EMNLP
Authors Huihsin Tseng, Longbin Chen, Fan Li, Ziming Zhuang, Lei Duan, Belle L. Tseng
Comments (0)