An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit proce

9 years 9 months ago

Download www.ece.mcgill.ca

— The multi-armed bandit (MAB) problem has been an active area of research since the early 1930s. The majority of the literature restricts attention to i.i.d. or Markov reward processes. In this paper, the ﬁnite-parameter MAB problem with time-dependent reward processes is investigated. An upper conﬁdence bound (UCB) based index policy, where the index is computed based on the maximum-likelihood estimate of the unknown parameter, is proposed. This policy locks on to the optimal arm in ﬁnite expected time but has a super-linear regret. As an example, the proposed index policy is used for minimizing prediction error when each arm is a auto-regressive moving average (ARMA) process.

Prokopis C. Prokopiou, Peter E. Caines, Aditya Mah

Real-time Traffic

CCECE 2015 | Electrical And Computer Engineering |

claim paper

Post Info
More Details (n/a)

Added	17 Apr 2016
Updated	17 Apr 2016
Type	Journal
Year	2015
Where	CCECE
Authors	Prokopis C. Prokopiou, Peter E. Caines, Aditya Mahajan

Comments (0)

Sciweavers

An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit proce

CCECE 2015 | Electrical And Computer Engineering |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers