Article contents
ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT
Published online by Cambridge University Press: 05 October 2016
Abstract
We consider the multi-armed bandit problem under a cost constraint. Successive samples from each population are i.i.d. with unknown distribution and each sample incurs a known population-dependent cost. The objective is to design an adaptive sampling policy to maximize the expected sum of n samples such that the average cost does not exceed a given bound sample-path wise. We establish an asymptotic lower bound for the regret of feasible uniformly fast convergent policies, and construct a class of policies, which achieve the bound. We also provide their explicit form under Normal distributions with unknown means and known variances.
- Type
- Research Article
- Information
- Probability in the Engineering and Informational Sciences , Volume 31 , Issue 3 , July 2017 , pp. 284 - 310
- Copyright
- Copyright © Cambridge University Press 2016
References
- 5
- Cited by