In pay-per click sponsored search auctions which are cur-
rently extensively used by search engines, the auction for
a keyword involves a certain number of advertisers (say k)
competing for available slots (say m) to display their ads.
This auction is typically conducted for a number of rounds
(say T ). There are click probabilities μij associated with
each agent-slot pairs. The goal of the search engine is to
maximize social welfare of the advertisers, that is, the sum
of values of the advertisers. The search engine does not know
the true values advertisers have for a click to their respec-
tive ads and also does not know the click probabilities μij s.
A key problem for the search engine therefore is to learn
these click probabilities during the T rounds of the auction
and also to ensure that the auction mechanism is truthful.
Mechanisms for addressing such learning and incentives is-
sues have recently been introduced and are aptly referred to
as multi-armed-bandit ...
Akash Das Sarma, Sujit Gujar, Y. Narahari