Geometric convergence of value-iteration in multichain Markov decision problems

P. J. Schweitzer; A. Federgruen

doi:10.2307/1426774

Geometric convergence of value-iteration in multichain Markov decision problems

Published online by Cambridge University Press: 01 July 2016

P. J. Schweitzer and

A. Federgruen

Show author details

P. J. Schweitzer*: Affiliation:
I.B.M. Thomas J. Watson, Research Center
A. Federgruen*: Affiliation:
Mathematisch Centrum, Amsterdam
*: ∗ Present address: Graduate School of Management, University of Rochester, Rochester, N.Y. 14627, U.S.A.
∗ Present address: Graduate School of Management, University of Rochester, Rochester, N.Y. 14627, U.S.A.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper considers undiscounted Markov decision problems. With no restriction (on either the periodicity or chain structure of the problem) we show that the value iteration method for finding maximal gain policies exhibits a geometric rate of convergence, whenever convergence occurs. In addition, we study the behaviour of the value-iteration operator; we give bounds for the number of steps needed for contraction, describe the ultimate behaviour of the convergence factor and give conditions for the existence of a uniform convergence rate.

Keywords

MARKOV DECISION PROBLEMS AVERAGE COST CRITERION VALUE-ITERATION METHOD GEOMETRIC CONVERGENCE CONVERGENCE FACTOR EXISTENCE OF A UNIFORM CONVERGENCE RATE

Information

Type: Research Article
Information: Advances in Applied Probability , Volume 11 , Issue 1 , March 1979 , pp. 188 - 217

DOI: https://doi.org/10.2307/1426774 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1979

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

1. Bather, J. (1973) Optimal decision procedures for finite Markov chains, Parts I and II. Adv. Appl. Prob. 5, 328–339; 521–540.Google Scholar

2. Bellman, R. (1957) A Markov decision process. J. Math. Mech. 6, 679–684.Google Scholar

3. Brown, B. (1965) On the iterative method of dynamic programming on a finite state space discrete time Markov process. Ann. Math. Statist. 36, 1279–1285.CrossRef Google Scholar

4. Denardo, E. (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9, 165–177.CrossRef Google Scholar

5. Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar

6. Federgruen, A. and Schweitzer, P. J. (1978) A Lyapunov function for Markov renewal programming. In preparation.Google Scholar

7. Federgruen, A., Schweitzer, P. J. and Tijms, H. C. (1978) Contraction mappings underlying undiscounted Markov decision problems. J. Math. Anal. Appl. To appear.CrossRef Google Scholar

8. Holladay, J. and Varga, R. (1958) On powers of non-negative matrices. Proc. Amer. Math. Soc. 9, 631–634.Google Scholar

9. Howard, R. (1960) Dynamic Programming and Markov Processes. Wiley, New York.Google Scholar

10. Lanery, E. (1967) Etude asymptotique des systèmes markoviens à commande. Rev. Informat. Recherche Opérat. 1, 3–56.Google Scholar

11. Morton, T. and Wecker, W. (1977) Ergodicity and convergence for Markov decision processes. Management Sci. 23, 890–900.Google Scholar

12. Odoni, A. (1969) On finding the maximal gain for Markov decision processes. Opns Res. 17, 857–860.Google Scholar

13. Paz, A. (1971) Introduction to Probabilistic Automata. Academic Press, New York.Google Scholar

14. Schweitzer, P. J. (1965) Perturbation Theory and Markovian Decision Processes. Sc.D. Dissertation, M.I.T. M.I.T. Operations Research Center Report H15.Google Scholar

15. Schweitzer, P. J. (1968) A turnpike theorem for undiscounted Markovian decision processes. ORSA/TIMS National Meeting, May 1968.Google Scholar

16. Schweitzer, P. J. (1971) Iterative solution of the functional equations for undiscounted Markov renewal programming. J. Math. Anal. Appl. 14, 495–501.Google Scholar

17. Schweitzer, P. J. and Federgruen, A. (1976) The asymptotic behaviour of undiscounted value-iteration in Markov decision problems. Maths Opns Res. 2, 360–382.Google Scholar

18. Schweitzer, P. J. and Federgruen, A. (1978) Functional equations of undiscounted Markov renewal programming. Maths Opns Res. To appear.CrossRef Google Scholar

19. Seneta, E. (1973) Nonnegative Matrices. Allen and Unwin, London.Google Scholar

20. Su, Y. and Deininger, R. (1972) Generalization of White's method of successive approximations to periodic Markovian decision processes. Opns Res. 20, 318–326.Google Scholar

21. Tijms, H. (1974) An iterative method for approximating average cost optimal (s, S) inventory policies. Z. Opns Res. 18, 215–223.Google Scholar

22. White, D. (1963) Dynamic programming, Markov chains and the method of successive approximations. J. Math. Anal. Appl. 6, 373–376.CrossRef Google Scholar

Article contents

Geometric convergence of value-iteration in multichain Markov decision problems

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests