Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Arie Hordijk; Frank A. Van Der Duyn Schouten

doi:10.2307/1426437

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Published online by Cambridge University Press: 01 July 2016

Arie Hordijk and

Frank A. Van Der Duyn Schouten

Show author details

Arie Hordijk*: Affiliation:
University of Leiden
Frank A. Van Der Duyn Schouten*: Affiliation:
Free University, Amsterdam
*: ∗Postal address: Department of Mathematics, University of Leiden, Wassenaarseweg 80, Postbus 9512, 2300 RA Leiden, The Netherlands.
∗∗Postal address: Department of Actuarial Sciences and Econometrics, Free University, Postbus 7161, 1007 MC Amsterdam, The Netherlands.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Keywords

MARKOV DECISION DRIFT PROCESS AVERAGE COST CRITERION REGENERATIVE STOCHASTIC PROCESS LIMIT POINT OF DISCOUNTED OPTIMAL POLICIES M/M/1 QUEUEING MODEL MAINTENANCE-REPLACEMENT MODEL

Type: Research Article
Information: Advances in Applied Probability , Volume 15 , Issue 2 , June 1983 , pp. 274 - 303

DOI: https://doi.org/10.2307/1426437 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1983

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bellman, R. (1975) Dynamic Programming. Princeton University Press, Princeton, NJ.Google Scholar

Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist. 33, 719–726.Google Scholar

Çinlar, E. (1975) Introduction to Stochastic Processes. Prentice Hall, New Jersey.Google Scholar

Cohen, J. W. (1976) On Regenerative Processes in Queueing Theory. Lecture notes in Economics and Mathematical Systems 121, Springer-Verlag, Berlin.Google Scholar

Crabill, T. (1972) Optimal control of a service facility with variable exponential service time and constant arrival rate. Management Sci. 18, 560–566.Google Scholar

Deppe, H. (1981) Durchschnittskosten in Semiregenerativen Entscheidungsmodellen. , University of Bonn.Google Scholar

Derman, C. (1966) Denumerable state Markovian decision processes; average cost criterion. Ann. Math. Statist. 37, 1545–1553.Google Scholar

Derman, C. and Veinott, A. F. Jr. (1967) A solution to a countable system of equations arising in Markovian decision processes. Ann. Math. Statist. 38, 582–584.Google Scholar

Doshi, B. T. (1976) Continuous time control of Markov processes on an arbitrary state space: average return criterion. Stoch. Proc. Appl. 4, 55–77.Google Scholar

Federgruen, A., Hordijk, A. and Tijms, H. C. (1979) Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion. Stoch. Proc. Appl. 9, 223–235.Google Scholar

Federgruen, A., Schweitzer, P. J. and Tijms, H. C. (1980) Denumerable undiscounted semi-Markov decision processes with unbounded regions. Research Report 57, Free University, Amsterdam.Google Scholar

Fisher, L. and Ross, S. M. (1968) An example in denumerable decision processes. Ann. Math. Statist. 39, 674–675.Google Scholar

Hordijk, A. (1973) A sufficient condition for the existence of an optimal policy with respect to the average cost criterion in Markovian decision processes. Trans. 6th Prague Conf. Information Theory. 1971, Academia, Prague, 263–274.Google Scholar

Hordijk, A. (1974) Dynamic Programming and Markov Potential Theory. Mathematical Centre Tract 51, Amsterdam.Google Scholar

Hordijk, A. (1976) Regenerative Markov decision models. In Math. Programming Study 6, North-Holland, Amsterdam, 49–72.Google Scholar

Hordijk, A. and Van Der Duyn Schouten, F. A. (1983) Discretization and weak convergence in Markov decision drift processes. Math. Operat. Res. To appear.Google Scholar

Howard, R. A. (1960) Dynamic Programming and Markov Processes. The Massachusetts Technology Press, Cambridge, Ma.Google Scholar

Kakumanu, P. (1975) Continuous time Markovian decision processes; average return criterion. J. Math. Anal. Appl. 52, 173–188.CrossRef Google Scholar

Lippman, S. A. (1971) Maximal average-reward policies for semi-Markov decision processes with arbitrary state and action space. Ann. Math. Statist. 42, 1717–1726.Google Scholar

Lippman, S. A. (1975) Applying a new device in the optimization of exponential queueing systems. Operat. Res. 23, 687–710.Google Scholar

Low, D. W. (1974) Optimal dynamic pricing policies for an M/M/s queue. Operat. Res. 22, 545–561.Google Scholar

Miller, B. L. (1968) Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22, 552–569.Google Scholar

Neveu, J. (1965) Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco.Google Scholar

Pliska, S. R. (1975) Controlled jump processes. Stoch. Proc. Appl. 3, 259–282.Google Scholar

Ross, S. M. (1968) Non-discounted denumerable Markovian decision models. Ann. Math. Statist. 39, 412–423.Google Scholar

Ross, S. M. (1968a) Arbitrary state Markovian decision processes. Ann. Math. Statist. 39, 2118–2122.Google Scholar

Ross, S. M. (1970) Average cost semi-Markov decision processes. J. Appl. Prob. 7, 649–656.Google Scholar

Ross, S. M. (1970a) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar

Ross, S. M. (1971) On the non-existence of e-optimal randomized stationary policies in average cost Markov decision models. Ann. Math. Statist. 42, 1767–1768.Google Scholar

Sabeti, H. (1973) Optimal selection of service rates in queueing with different costs. J. Operat. Res. Soc. Japan 16, 15–35.Google Scholar

Schäl, M. (1977) On negative dynamic programming with irreducible Markov chains and the average cost criterion. Bonner Math. Schr. 98, 93–97.Google Scholar

Serfozo, R. F. (1981) Optimal control of random walks, birth and death processes, and queues. Adv. Appl. Prob. 13, 61–83.Google Scholar

Stone, C. J. (1972) An upper bound for the renewal function. Ann. Math. Statist. 43, 2050–2052.Google Scholar

Taylor, H. M. (1965) Markovian sequential replacement processes. Ann. Math. Statist. 36, 1677–1694.Google Scholar

Taylor, H. M. (1975) Optimal replacement under additive damage and other failure models. Naval Res. Logist. Quart. 22, 1–18.Google Scholar

Tijms, H. C. (1975) On dynamic programming with arbitrary state space, compact action space and the average return as criterion. Research Report BW 55/75, Mathematical Centre, Amsterdam.Google Scholar

Tijms, H. C. and Van Der Duyn Schouten, F. A. (1978) Inventory control with two switch-over levels for a class of M/G/1 queueing systems with variable arrival and service rate. Stoch. Proc. Appl. 6, 213–222.Google Scholar

Van Der Duyn Schouten, F. A. (1979) Markov Decision Processes with Continuous Time Parameter. , University of Leiden.Google Scholar

Widder, D. V. (1946) The Laplace Transform. Princeton University Press, Princeton, NJ.Google Scholar

Wijngaard, J. (1977) Stationary Markovian decision problems and perturbation theory of quasi-compact linear operators. Math. Operat. Res. 2, 91–102.Google Scholar

Zuckerman, D. (1977) Replacement models under additive damage. Naval Res. Logist. Quart. 24, 549–558.Google Scholar

Article contents

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests