Evaluating policies for generalized bandits via a notion of duality

J. H. Crosbie; K. D. Glazebrook

doi:10.1239/jap/1014842557

Evaluating policies for generalized bandits via a notion of duality

Part of: Operations research and management science

Published online by Cambridge University Press: 14 July 2016

J. H. Crosbie and

K. D. Glazebrook

Show author details

J. H. Crosbie*: Affiliation:
University of Newcastle
K. D. Glazebrook*: Affiliation:
University of Newcastle
*: ∗Postal address: Department of Statistics, University of Newcastle, Newcastle upon Tyne NE1 7RU, UK
∗Postal address: Department of Statistics, University of Newcastle, Newcastle upon Tyne NE1 7RU, UK

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Nash's generalization of Gittins’ classic index result to so-called generalized bandit problems (GBPs) in which returns are dependent on the states of all arms (not only the one which is pulled) has proved important for applications. The index theory for special cases of this model in which all indices are positive is straightforward. However, this is not a natural restriction in practice. An earlier proposal for the general case did not yield satisfactory index-based suboptimality bounds for policies — a central feature of classical Gittins index theory. We develop such bounds via a notion of duality for GBPs which is of independent interest. The index which emerges naturally from this analysis is the reciprocal of the one proposed by Nash.

Keywords

Duality generalized bandits index policies suboptimality bounds

MSC classification

Primary: 90B36: Scheduling theory, stochastic

Secondary: 90C40: Markov and semi-Markov decision processes

Type: Research Papers
Information: Journal of Applied Probability , Volume 37 , Issue 2 , June 2000 , pp. 540 - 546

DOI: https://doi.org/10.1239/jap/1014842557 [Opens in a new window]
Copyright: Copyright © by the Applied Probability Trust 2000

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Fay, N. A., and Glazebrook, K. D. (1987). On the scheduling of alternative stochastic jobs on a single machine. Adv. Appl. Prob. 19, 955–973.Google Scholar

Fay, N. A., and Glazebrook, K. D. (1989). A general model of the scheduling of alternative tasks on a single machine. Proc. Eng. Inform. Sci. 3, 199–221.Google Scholar

Fay, N. A., and Walrand, J. C. (1991). On approximately optimal index strategies for generalised arm problems. J. Appl. Prob. 28, 602–612.CrossRef Google Scholar

Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B, 41, 148–177.Google Scholar

Gittins, J. C. (1989). Multi-armed Bandit Allocation Indices. John Wiley, Chichester.Google Scholar

Gittins, J. C., and Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, eds Gani, J. and Vince, I. North Holland, Amsterdam, pp. 241–266.Google Scholar

Glazebrook, K. D. (1982). On the evaluation of suboptimal strategies for families of alternative bandit processes. J. Appl. Prob. 19, 716–722.Google Scholar

Glazebrook, K. D., and Fay, N. A. (1988). Evaluating strategies for generalised bandit problems. Int. J. Syst. Sci. 19, 1605–1613.CrossRef Google Scholar

Glazebrook, K. D., and Greatrix, S. (1993). On scheduling influential stochastic tasks on a single machine. Eur. J. Operat. Res. 70, 405–424.Google Scholar

Glazebrook, K. D., and Greatrix, S. (1995). On transforming an index for generalised bandit problems. J. Appl. Prob. 32, 163–182.Google Scholar

Glazebrook, K. D., and Owen, R. W. (1991). New results for generalised bandit problems. Int. J. Syst. Sci. 22, 479–494.Google Scholar

Nash, P. (1973). Optimal allocation of resources between research projects. Ph.D. thesis, University of Cambridge.Google Scholar

Nash, P. (1980). A generalised bandit problem. J. R. Statist. Soc. B, 42, 165–169.Google Scholar

Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, London.Google Scholar

Article contents

Evaluating policies for generalized bandits via a notion of duality

Abstract

Keywords

MSC classification

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests