Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller

Rolando Cavazos-Cadena; Hugo Cruz-Suárez; Raúl Montes-de-Oca

doi:10.1017/jpr.2023.40

Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller

Part of: Control systems Stochastic systems and control

Published online by Cambridge University Press: 21 July 2023

Rolando Cavazos-Cadena ,

Hugo Cruz-Suárez

and

Raúl Montes-de-Oca

Show author details

Rolando Cavazos-Cadena*: Affiliation:
Universidad Autónoma Agraria Antonio Narro
Hugo Cruz-Suárez*: Affiliation:
Benemérita Universidad Autónoma de Puebla
Raúl Montes-de-Oca*: Affiliation:
Universidad Autónoma Metropolitana-Iztapalapa
*: *Postal address: Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Boulevard Antonio Narro 1923, Buenavista, COAH 25315, México. Email: rolando.cavazos@uaaan.edu.mx
**Postal address: Facultad de Ciencias Físico-Matemáticas, Benemérita Universidad Autónoma de Puebla, Ave. San Claudio y Río Verde, Col. San Manuel CU, PUE 72570, México. Email: hcs@fcfm.buap.mx
***Postal address: Departamento de Matemáticas, Universidad Autónoma Metropolitana-Iztapalapa, Av. Ferrocarril San Rafael Atlixco 186, Col. Leyes de Reforma Primera Sección, Alcaldía Iztapalapa, CDMX 09310, México. Email: momr@xanum.uam.mx

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This work concerns Markov decision chains on a denumerable state space endowed with a bounded cost function. The performance of a control policy is assessed by a long-run average criterion as measured by a risk-seeking decision maker with constant risk-sensitivity. Besides standard continuity–compactness conditions, the framework of the paper is determined by the following conditions: (i) the state process is communicating under each stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this framework it is shown that (i) the optimal superior and inferior limit average value functions coincide and are constant, and (ii) the optimal average cost is characterized via an extended version of the Collatz–Wielandt formula in the theory of positive matrices.

Keywords

Risk-lover controller exponential utility optimality inequality extended Collatz–Wielandt formula optional sampling theorem truncated cost function

MSC classification

Primary: 93E20: Optimal stochastic control

Secondary: 93C55: Discrete-time systems

Information

Type: Original Article
Information: Journal of Applied Probability , Volume 61 , Issue 1 , March 2024 , pp. 340 - 367

DOI: https://doi.org/10.1017/jpr.2023.40 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.Google Scholar

Balaji, S. and Meyn, S. P (2000). Multiplicative ergodicity and large deviations for an irreducible Markov chain. Stoch. Process. Appl. 90, 123–144.CrossRef Google Scholar

Bäuerle, N. and Reider, U. (2011). Markov Decision Processes with Applications to Finance. Springer, New York.CrossRef Google Scholar

Billingsley, P. (2012). Probability and Measure. Wiley, New York.Google Scholar

Borkar, V. S. and Meyn, S. P. (2002). Risk-sensitive optimal control for Markov decision process with monotone cost. Math. Operat. Res. 27, 192–209.CrossRef Google Scholar

Cavazos-Cadena, R. (2009). Solutions of the average cost optimality equation for finite Markov decision chains: Risk-sensitive and risk-neutral criteria. Math. Meth. Operat. Res. 70, 541–566.CrossRef Google Scholar

Cavazos-Cadena, R. (2018). Characterization of the optimal risk-sensitive average cost in denumerable Markov decision chains. Math. Operat. Res. 43, 1025–1050.CrossRef Google Scholar

Cavazos-Cadena, R. and Fernández-Gaucherand, E. (2002). Risk-sensitive control in communicating average Markov decision chains. In Modelling Uncertainty: An Examination of Stochastic Theory, Methods and Applications, eds. M. Dror, P. L’Ecuyer and F. Szidarovsky. Kluwer, Boston, MA, pp. 525–544.Google Scholar

Denardo, E. V. and Rothblum, U. G. (2006). A turnpike theorem for a risk-sensitive Markov decision process with stopping. SIAM J. Control Optimization 45, 414–431.CrossRef Google Scholar

Di Masi, G. B. and Stettner, L. (2000). Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Systems Control Lett. 40, 305–321.Google Scholar

Di Masi, G. B. and Stettner, L. (2007). Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J. Control Optimization 38, 61–78.CrossRef Google Scholar

Di Masi, G. B. and Stettner, L. (2007). Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J. Control Optimization 46, 231–252.CrossRef Google Scholar

Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.CrossRef Google Scholar

Howard, R. A. and Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Manag. Sci. 18, 356–369.CrossRef Google Scholar

Jaśkiewicz, A. (1989). Average optimality for risk sensitive control with general state space. Ann. Appl. Prob. 17, 654–675.Google Scholar

Kontoyiannis, I. and Meyn, S. P. (2013). Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Prob. 13, 304–362.Google Scholar

Meyer, C. D. (2000). Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia.CrossRef Google Scholar

Pitera, M. and Stettner, L. (2015). Long run risk sensitive portfolio with general factors. Math. Meth. Operat. Res. 82, 265–293.Google Scholar

Puterman, M. (1994). Markov Decision Processes. Wiley, New York.CrossRef Google Scholar

Sladký, K. (2008). Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika 44, 205–226.Google Scholar

Stettner, L. (1999). Risk sensitive portfolio optimization. Math. Meth. Operat. Res. 50, 463–474.CrossRef Google Scholar

Zaleskiewicz, T. (2001). Beyond risk seeking and risk aversion: Personality and the dual nature of economic risk taking. Europ. J. Pers. 15, S105–S122.CrossRef Google Scholar

Article contents

Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller

Abstract

Keywords

MSC classification

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests