Hostname: page-component-7479d7b7d-68ccn Total loading time: 0 Render date: 2024-07-12T20:22:11.171Z Has data issue: false hasContentIssue false

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Published online by Cambridge University Press:  01 July 2016

Arie Hordijk*
Affiliation:
University of Leiden
Frank A. Van Der Duyn Schouten*
Affiliation:
Free University, Amsterdam
*
Postal address: Department of Mathematics, University of Leiden, Wassenaarseweg 80, Postbus 9512, 2300 RA Leiden, The Netherlands.
∗∗Postal address: Department of Actuarial Sciences and Econometrics, Free University, Postbus 7161, 1007 MC Amsterdam, The Netherlands.

Abstract

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1983 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bellman, R. (1975) Dynamic Programming. Princeton University Press, Princeton, NJ.Google Scholar
Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist. 33, 719726.Google Scholar
Çinlar, E. (1975) Introduction to Stochastic Processes. Prentice Hall, New Jersey.Google Scholar
Cohen, J. W. (1976) On Regenerative Processes in Queueing Theory. Lecture notes in Economics and Mathematical Systems 121, Springer-Verlag, Berlin.Google Scholar
Crabill, T. (1972) Optimal control of a service facility with variable exponential service time and constant arrival rate. Management Sci. 18, 560566.Google Scholar
Deppe, H. (1981) Durchschnittskosten in Semiregenerativen Entscheidungsmodellen. , University of Bonn.Google Scholar
Derman, C. (1966) Denumerable state Markovian decision processes; average cost criterion. Ann. Math. Statist. 37, 15451553.Google Scholar
Derman, C. and Veinott, A. F. Jr. (1967) A solution to a countable system of equations arising in Markovian decision processes. Ann. Math. Statist. 38, 582584.Google Scholar
Doshi, B. T. (1976) Continuous time control of Markov processes on an arbitrary state space: average return criterion. Stoch. Proc. Appl. 4, 5577.Google Scholar
Federgruen, A., Hordijk, A. and Tijms, H. C. (1979) Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion. Stoch. Proc. Appl. 9, 223235.Google Scholar
Federgruen, A., Schweitzer, P. J. and Tijms, H. C. (1980) Denumerable undiscounted semi-Markov decision processes with unbounded regions. Research Report 57, Free University, Amsterdam.Google Scholar
Fisher, L. and Ross, S. M. (1968) An example in denumerable decision processes. Ann. Math. Statist. 39, 674675.Google Scholar
Hordijk, A. (1973) A sufficient condition for the existence of an optimal policy with respect to the average cost criterion in Markovian decision processes. Trans. 6th Prague Conf. Information Theory. 1971, Academia, Prague, 263274.Google Scholar
Hordijk, A. (1974) Dynamic Programming and Markov Potential Theory. Mathematical Centre Tract 51, Amsterdam.Google Scholar
Hordijk, A. (1976) Regenerative Markov decision models. In Math. Programming Study 6, North-Holland, Amsterdam, 4972.Google Scholar
Hordijk, A. and Van Der Duyn Schouten, F. A. (1983) Discretization and weak convergence in Markov decision drift processes. Math. Operat. Res. To appear.Google Scholar
Howard, R. A. (1960) Dynamic Programming and Markov Processes. The Massachusetts Technology Press, Cambridge, Ma.Google Scholar
Kakumanu, P. (1975) Continuous time Markovian decision processes; average return criterion. J. Math. Anal. Appl. 52, 173188.CrossRefGoogle Scholar
Lippman, S. A. (1971) Maximal average-reward policies for semi-Markov decision processes with arbitrary state and action space. Ann. Math. Statist. 42, 17171726.Google Scholar
Lippman, S. A. (1975) Applying a new device in the optimization of exponential queueing systems. Operat. Res. 23, 687710.Google Scholar
Low, D. W. (1974) Optimal dynamic pricing policies for an M/M/s queue. Operat. Res. 22, 545561.Google Scholar
Miller, B. L. (1968) Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22, 552569.Google Scholar
Neveu, J. (1965) Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco.Google Scholar
Pliska, S. R. (1975) Controlled jump processes. Stoch. Proc. Appl. 3, 259282.Google Scholar
Ross, S. M. (1968) Non-discounted denumerable Markovian decision models. Ann. Math. Statist. 39, 412423.Google Scholar
Ross, S. M. (1968a) Arbitrary state Markovian decision processes. Ann. Math. Statist. 39, 21182122.Google Scholar
Ross, S. M. (1970) Average cost semi-Markov decision processes. J. Appl. Prob. 7, 649656.Google Scholar
Ross, S. M. (1970a) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar
Ross, S. M. (1971) On the non-existence of e-optimal randomized stationary policies in average cost Markov decision models. Ann. Math. Statist. 42, 17671768.Google Scholar
Sabeti, H. (1973) Optimal selection of service rates in queueing with different costs. J. Operat. Res. Soc. Japan 16, 1535.Google Scholar
Schäl, M. (1977) On negative dynamic programming with irreducible Markov chains and the average cost criterion. Bonner Math. Schr. 98, 9397.Google Scholar
Serfozo, R. F. (1981) Optimal control of random walks, birth and death processes, and queues. Adv. Appl. Prob. 13, 6183.Google Scholar
Stone, C. J. (1972) An upper bound for the renewal function. Ann. Math. Statist. 43, 20502052.Google Scholar
Taylor, H. M. (1965) Markovian sequential replacement processes. Ann. Math. Statist. 36, 16771694.Google Scholar
Taylor, H. M. (1975) Optimal replacement under additive damage and other failure models. Naval Res. Logist. Quart. 22, 118.Google Scholar
Tijms, H. C. (1975) On dynamic programming with arbitrary state space, compact action space and the average return as criterion. Research Report BW 55/75, Mathematical Centre, Amsterdam.Google Scholar
Tijms, H. C. and Van Der Duyn Schouten, F. A. (1978) Inventory control with two switch-over levels for a class of M/G/1 queueing systems with variable arrival and service rate. Stoch. Proc. Appl. 6, 213222.Google Scholar
Van Der Duyn Schouten, F. A. (1979) Markov Decision Processes with Continuous Time Parameter. , University of Leiden.Google Scholar
Widder, D. V. (1946) The Laplace Transform. Princeton University Press, Princeton, NJ.Google Scholar
Wijngaard, J. (1977) Stationary Markovian decision problems and perturbation theory of quasi-compact linear operators. Math. Operat. Res. 2, 91102.Google Scholar
Zuckerman, D. (1977) Replacement models under additive damage. Naval Res. Logist. Quart. 24, 549558.Google Scholar