Hostname: page-component-586b7cd67f-rdxmf Total loading time: 0 Render date: 2024-11-23T11:14:42.119Z Has data issue: false hasContentIssue false

Geometric convergence of value-iteration in multichain Markov decision problems

Published online by Cambridge University Press:  01 July 2016

P. J. Schweitzer*
Affiliation:
I.B.M. Thomas J. Watson, Research Center
A. Federgruen*
Affiliation:
Mathematisch Centrum, Amsterdam
*
Present address: Graduate School of Management, University of Rochester, Rochester, N.Y. 14627, U.S.A.
Present address: Graduate School of Management, University of Rochester, Rochester, N.Y. 14627, U.S.A.

Abstract

This paper considers undiscounted Markov decision problems. With no restriction (on either the periodicity or chain structure of the problem) we show that the value iteration method for finding maximal gain policies exhibits a geometric rate of convergence, whenever convergence occurs. In addition, we study the behaviour of the value-iteration operator; we give bounds for the number of steps needed for contraction, describe the ultimate behaviour of the convergence factor and give conditions for the existence of a uniform convergence rate.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1979 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1. Bather, J. (1973) Optimal decision procedures for finite Markov chains, Parts I and II. Adv. Appl. Prob. 5, 328339; 521–540.Google Scholar
2. Bellman, R. (1957) A Markov decision process. J. Math. Mech. 6, 679684.Google Scholar
3. Brown, B. (1965) On the iterative method of dynamic programming on a finite state space discrete time Markov process. Ann. Math. Statist. 36, 12791285.CrossRefGoogle Scholar
4. Denardo, E. (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9, 165177.CrossRefGoogle Scholar
5. Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar
6. Federgruen, A. and Schweitzer, P. J. (1978) A Lyapunov function for Markov renewal programming. In preparation.Google Scholar
7. Federgruen, A., Schweitzer, P. J. and Tijms, H. C. (1978) Contraction mappings underlying undiscounted Markov decision problems. J. Math. Anal. Appl. To appear.CrossRefGoogle Scholar
8. Holladay, J. and Varga, R. (1958) On powers of non-negative matrices. Proc. Amer. Math. Soc. 9, 631634.Google Scholar
9. Howard, R. (1960) Dynamic Programming and Markov Processes. Wiley, New York.Google Scholar
10. Lanery, E. (1967) Etude asymptotique des systèmes markoviens à commande. Rev. Informat. Recherche Opérat. 1, 356.Google Scholar
11. Morton, T. and Wecker, W. (1977) Ergodicity and convergence for Markov decision processes. Management Sci. 23, 890900.Google Scholar
12. Odoni, A. (1969) On finding the maximal gain for Markov decision processes. Opns Res. 17, 857860.Google Scholar
13. Paz, A. (1971) Introduction to Probabilistic Automata. Academic Press, New York.Google Scholar
14. Schweitzer, P. J. (1965) Perturbation Theory and Markovian Decision Processes. Sc.D. Dissertation, M.I.T. M.I.T. Operations Research Center Report H15.Google Scholar
15. Schweitzer, P. J. (1968) A turnpike theorem for undiscounted Markovian decision processes. ORSA/TIMS National Meeting, May 1968.Google Scholar
16. Schweitzer, P. J. (1971) Iterative solution of the functional equations for undiscounted Markov renewal programming. J. Math. Anal. Appl. 14, 495501.Google Scholar
17. Schweitzer, P. J. and Federgruen, A. (1976) The asymptotic behaviour of undiscounted value-iteration in Markov decision problems. Maths Opns Res. 2, 360382.Google Scholar
18. Schweitzer, P. J. and Federgruen, A. (1978) Functional equations of undiscounted Markov renewal programming. Maths Opns Res. To appear.CrossRefGoogle Scholar
19. Seneta, E. (1973) Nonnegative Matrices. Allen and Unwin, London.Google Scholar
20. Su, Y. and Deininger, R. (1972) Generalization of White's method of successive approximations to periodic Markovian decision processes. Opns Res. 20, 318326.Google Scholar
21. Tijms, H. (1974) An iterative method for approximating average cost optimal (s, S) inventory policies. Z. Opns Res. 18, 215223.Google Scholar
22. White, D. (1963) Dynamic programming, Markov chains and the method of successive approximations. J. Math. Anal. Appl. 6, 373376.CrossRefGoogle Scholar