A new policy iteration scheme for Markov decision processes using Schweitzer's formula

J. B. Lasserre

doi:10.2307/3215254

A new policy iteration scheme for Markov decision processes using Schweitzer's formula

Part of: Markov processes

Published online by Cambridge University Press: 14 July 2016

J. B. Lasserre

Show author details

J. B. Lasserre*: Affiliation:
Laboratoire d'Automatique et d'Analyse des Systèmes du CNRS, Toulouse
*: ∗ Postal address: Laboratoire d'Automatique et d'Analyse des Systèmes du CNRS, 7, Avenue du Colonel Roche, 31077 Toulouse Cedex, France.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Given a family of Markov chains with a single recurrent class, we present a potential application of Schweitzer's exact formula relating the steady-state probability and fundamental matrices of any two chains in the family. We propose a new policy iteration scheme for Markov decision processes where in contrast to policy iteration, the new criterion for selecting an action ensures the maximal one-step average cost improvement. Its computational complexity and storage requirement are analysed.

Keywords

FINITE-STATE MARKOV CHAINS STEADY-STATE PROBABILITIES FUNDAMENTAL MATRIX POLICY ITERATION PERTURBATION THEORY OF MARKOV CHAINS

MSC classification

Primary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces)

Information

Type: Short Communications
Information: Journal of Applied Probability , Volume 31 , Issue 1 , March 1994 , pp. 268 - 273

DOI: https://doi.org/10.2307/3215254 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1994

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1] Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist. 33, 719–726.Google Scholar

[2] Dantzig, G. B. (1963) Linear Programming and Extensions. Princeton University Press.Google Scholar

[3] Denardo, E. V. and Fox, B. (1968) Multichain Markov renewal programs. SIAM J. Appl. Math. 16, 468–487.Google Scholar

[4] Federgruen, A. and Schweitzer, P. J. (1980) A survey of asymptotic value-iteration for undiscounted Markovian decision processes. In Recent Developments in Markov Decision Processes, ed. Hartley, R., Thomas, L. C. and White, D. J. Academic Press, New York.Google Scholar

[5] Hastings, N. A. J. (1971) Bounds on the gain rate of a Markov decision process. Operat. Res. 19, 240–244.Google Scholar

[6] Haviv, M. and Van Der Heyden, L. (1984) Perturbation bounds for the stationary probabilities of a finite Markov chain. Adv. Appl. Prob. 16, 804–818.Google Scholar

[7] Herzberg, M. and Yechiali, U. (1991) Criteria for selecting the relaxation factor of the value iteration algorithm for undiscounted Markov and semi-Markov decision processes. Operat. Res. Lett. 10, 193–202.Google Scholar

[8] Hordijk, A. and Kallenberg, L. C. M. (1979) Linear programming and Markov decision chains. Management Sci. 25, 352–362.Google Scholar

[9] Howard, R. (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA.Google Scholar

[10] Howard, R. A. (1963) Semi-Markov decision processes. Bull. Internat. Inst. Statist. 40(2), 625–652.Google Scholar

[11] Jewell, W. S. (1963) Markov-renewal programming I and II. Operat. Res. 11, 938–972.Google Scholar

[12] Kemeny, J. G. and Snell, J. L. (1960) Finite Markov Chains. Van Nostrand, Princeton, NJ.Google Scholar

[13] Lasserre, J. B. (1991) Exact formula for sensitivity analysis of Markov chains. J. Optim. Theory Appl. 71, 407–413.CrossRef Google Scholar

[14] Lasserre, J. B. (1991) Updating formula for Markov chains and applications. LAAS Technical Report.Google Scholar

[15] Manne, A. (1960) Linear programming and sequential decisions. Management Sci 6, 259–267.CrossRef Google Scholar

[16] Odoni, A. (1969) On finding the maximal gain for MDP's. Operat. Res. 17, 857–860.Google Scholar

[17] Osaki, S. and Mine, H. (1968) Linear programming algorithms for semi-Markov decision processes. J. Math. Anal. Appl. 22, 356–381.Google Scholar

[18] Popyak, J. L., Brown, R. L. and White, C. C. (1979) Discrete versions of an algorithm due to Varaiya. IEEE Trans. Autom. Control 24, 504.Google Scholar

[19] Puterman, M. (1990) Markov decision processes. In Handbook in Operations Research and Management Science, ed. Heyman, D. P. and Sobel, M. North-Holland, Amsterdam.Google Scholar

[20] Puterman, M. L. and Brumelle, S. L. (1978) The analytic theory of policy iteration. In Dynamic Programming and its Applications, ed. Puterman, M. L. Academic Press, New York.Google Scholar

[21] Schweitzer, P. J. (1968) Perturbation theory and finite Markov chains. J. Appl. Prob. 5, 401–413.Google Scholar

[22] Schweitzer, P. J. (1968) Perturbation theory and undiscounted Markov programming. Operat. Res. 17, 716–727.Google Scholar

[23] Schweitzer, P. J. (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495–501.Google Scholar

[24] Veinott, A. F. (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37, 1284–1294.Google Scholar

[25] White, D. J. (1963) Dynamic programming, Markov chains and the method of successive approximations. J. Math. Anal. Appl. 6, 373–376.Google Scholar

Article contents

A new policy iteration scheme for Markov decision processes using Schweitzer's formula

Abstract

Keywords

MSC classification

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests