On the policy improvement algorithm for ergodic risk-sensitive control

Ari Arapostathis; Anup Biswas; Somnath Pradhan

doi:10.1017/prm.2020.61

On the policy improvement algorithm for ergodic risk-sensitive control

Part of: Markov processes Spectral theory and eigenvalue problems Stochastic systems and control

Published online by Cambridge University Press: 02 September 2020

and

Ari Arapostathis: Affiliation:
Department of Electrical and Computer Engineering, The University of Texas at Austin, EER 7.824, Austin, TX 78712 (ari@utexas.edu)
Anup Biswas: Affiliation:
Department of Mathematics, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pune 411008, India (anup@iiserpune.ac.in; somnath@iiserpune.ac.in)
Somnath Pradhan: Affiliation:
Department of Mathematics, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pune 411008, India (anup@iiserpune.ac.in; somnath@iiserpune.ac.in)

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this article we consider the ergodic risk-sensitive control problem for a large class of multidimensional controlled diffusions on the whole space. We study the minimization and maximization problems under either a blanket stability hypothesis, or a near-monotone assumption on the running cost. We establish the convergence of the policy improvement algorithm for these models. We also present a more general result concerning the region of attraction of the equilibrium of the algorithm.

Keywords

Principal eigenvalue semilinear differential equations stochastic representation policy improvement

MSC classification

Primary: 35P30: Nonlinear eigenvalue problems, nonlinear spectral theory

Secondary: 93E20: Optimal stochastic control 60J60: Diffusion processes

Information

Type: Research Article
Information: Proceedings of the Royal Society of Edinburgh Section A: Mathematics , Volume 151 , Issue 4 , August 2021 , pp. 1305 - 1330

DOI: https://doi.org/10.1017/prm.2020.61 [Opens in a new window]
Copyright: Copyright © The Author(s) 2020. Published by Cambridge University Press on behalf of The Royal Society of Edinburgh

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Anulova, S., Mai, H. and Veretennikov, A.. On averaged expected cost control as reliability for 1d ergodic diffusions. Reliab. Theory Appl. 12 (2017), 31–38.Google Scholar

Anulova, S., Mai, H. and Veretennikov, A.. Yet again on iteration improvement for averaged expected cost control for 1d ergodic diffusions. ArXiv e-prints, 1812.10665, 2018.Google Scholar

Arapostathis, A.. On the policy iteration algorithm for nondegenerate controlled diffusions under the ergodic criterion. In Optimization, control, and applications of stochastic systems, Systems Control Found. Appl., eds. Hernández-Hernández, Daniel and Minjárez-Sosa, J. Adolfo, pp. 1–12 (New York: Birkhäuser/Springer, 2012). doi:10.1007/978-0-8176-8337-5-1.Google Scholar

Arapostathis, A. and Biswas, A.. Infinite horizon risk-sensitive control of diffusions without any blanket stability assumptions. Stochastic Process. Appl. 128 (2018), 1485–1524, ISSN0304-4149. doi:10.1016/j.spa.2017.08.001.CrossRef Google Scholar

Arapostathis, A. and Biswas, A.. Risk-sensitive control for a class of diffusions with jumps. ArXiv e-prints, 1910.05004, 2019.Google Scholar

Arapostathis, A., Biswas, A.. A variational formula for risk-sensitive control of diffusions in

$\mathbb {R}^d$. SIAM J. Control Optim. 58 (2020), 85–103, ISSN . doi:10.1137/18M1218704.CrossRef Google Scholar

Arapostathis, A., Biswas, A., Borkar, V. S. and Suresh Kumar, K.. A variational characterization of the risk-sensitive average reward for controlled diffusions in

$\mathbb {R}^d$. ArXiv e-prints, 1903.08346, 2019a.Google Scholar

Arapostathis, A., Biswas, A. and Saha, S.. Strict monotonicity of principal eigenvalues of elliptic operators in

$\mathbb {R}^d$ and risk-sensitive control. J. Math. Pures Appl. 124 2019b, 169–219. doi:10.1016/j.matpur.2018.05.008.CrossRef Google Scholar

Arapostathis, A.. Borkar, V. S. and Ghosh, M. K. Ergodic control of diffusion processes. Encyclopedia of Mathematics and its Applications vol. 143. (Cambridge: Cambridge University Press, 2012).Google Scholar

Arapostathis, A., Caffarelli, L., Pang, G. and Zheng, Y.. Ergodic control of a class of jump diffusions with finite Lévy measures and rough kernels. SIAM J. Control Optim. 57 (2019c), 1516–1540. ISSN . doi:10.1137/18M1166717.CrossRef Google Scholar

Arapostathis, A., Ghosh, M. K. and Marcus, S. I.. Harnack's inequality for cooperative weakly coupled elliptic systems. Comm. Partial Differ. Equ. 24 (1999), 1555–1571. doi:10.1080/03605309908821475.CrossRef Google Scholar

Arapostathis, A., Hmedi, H. and Pang, G.. On uniform exponential ergodicity of Markovian multiclass many-server queues in the Halfin–Whitt regime. Math. Oper. Res. (to appear), (2020).Google Scholar

Armstrong, S. N.. Principal eigenvalues and anti-maximum principle for homogeneous fully nonlinear elliptic equations. J. Diff. Equ. 246 (2009), 2958–2987. ISSN . doi:10.1016/j.jde.2008.10.026.CrossRef Google Scholar

Berestycki, H., Nirenberg, L. and Varadhan, S. R. S.. The principal eigenvalue and maximum principle for second-order elliptic operators in general domains. Comm. Pure Appl. Math. 47 (1994), 47–92. ISSN . doi:10.1002/cpa.3160470105.CrossRef Google Scholar

Berestycki, H. and Rossi, L.. Generalizations and properties of the principal eigenvalue of elliptic operators in unbounded domains. Comm. Pure Appl. Math. 68 (2015), 1014–1065. doi:10.1002/cpa.21536.CrossRef Google Scholar

Bielecki, T. R. and Pliska, S. R.. Risk-sensitive dynamic asset management. Appl. Math. Optim. 39 (1999), 337–360. ISSN . doi:10.1007/s002459900110.CrossRef Google Scholar

Biswas, A.. An eigenvalue approach to the risk sensitive control problem in near monotone case. Systems Control Lett. 60 (2011a), 181–184. doi:10.1016/j.sysconle.2010.12.002.CrossRef Google Scholar

Biswas, A.. Risk sensitive control of diffusions with small running cost. Appl. Math. Optim. 64 (2011b), 1–12. doi:10.1007/s00245-010-9127-4.CrossRef Google Scholar

Borkar, V. S. and Meyn, S. P.. Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27 (2002), 192–209. ISSN . doi:10.1287/moor.27.1.192.334.CrossRef Google Scholar

Chen, Y-Z. and Wu, L-C.. Second order elliptic equations and elliptic systems. Translations of Mathematical Monographs vol. 174. (Providence, RI: American Mathematical Society, 1998). ISBN 0-8218-0970-9. Translated from the 1991 Chinese original by Bei Hu.CrossRef Google Scholar

Fleming, W. H., McEneaney, W. M.. Risk-sensitive control on an infinite time horizon. SIAM J. Control Optim. 33 (1995), 1881–1915. ISSN . doi:10.1137/S0363012993258720.CrossRef Google Scholar

Fleming, W. H. and Sheu, S. J.. Risk-sensitive control and an optimal investment model. Math. Finance, 10 (2000), 197–213. ISSN . doi:10.1111/1467-9965.00089. INFORMS Applied Probability Conference (Ulm, 1999).CrossRef Google Scholar

Gilbarg, D. and Trudinger, N. S.. Elliptic partial differential equations of second order, volume 224 of Grundlehren der Mathematischen Wissenschaften. (Berlin: Springer-Verlag, 1983), 2nd edn. doi:10.1007/978-3-642-61798-0.Google Scholar

Gyöngy, I. and Krylov, N.. Existence of strong solutions for Itô's stochastic equations via approximations. Probab. Theory Relat. Fields 105 (1996), 143–158. doi:10.1007/BF01203833.CrossRef Google Scholar

Kaise, H. and Sheu, S-J.. On the structure of solutions of ergodic type Bellman equation related to risk-sensitive control. Ann. Probab. 34 (2006), 284–320. doi:10.1214/009117905000000431.CrossRef Google Scholar

Krylov, N. V.. Controlled diffusion processes. Applications of Mathematics vol. 14. (New York-Berlin: Springer-Verlag, 1980). ISBN 0-387-90461-1.CrossRef Google Scholar

Menaldi, J-L. and Robin, M.. Remarks on risk-sensitive control problems. Appl. Math. Optim. 52 (2005), 297–310. doi:10.1007/s00245-005-0829-y.CrossRef Google Scholar

Meyn, S. P. and Tweedie, R. L.. Computable bounds for geometric convergence rates of Markov chains. Ann. Appl. Probab. 4 (1994), 981–1011.CrossRef Google Scholar

Nagai, H.. Bellman equations of risk-sensitive control. SIAM J. Control Optim. 34 (1996), 74–101. ISSN . doi:10.1137/S0363012993255302.CrossRef Google Scholar

Speyer, J.. An adaptive terminal guidance scheme based on an exponential cost criterion with application to homing missile guidance. IEEE Trans. Autom. Control, 21 (1976), 371–375. ISSN . doi:10.1109/TAC.1976.1101206.CrossRef Google Scholar

Veretennikov, A. Yu.. Strong solutions and explicit formulas for solutions of stochastic integral equations. Mat. Sb. (N.S.) 111 (1980), 434–452, 480. ISSN .Google Scholar

Veretennikov, A. Yu.. Parabolic equations and stochastic equations of Itô with coefficients that are discontinuous with respect to time. Mat. Zametki, 31 (1982), 549–557, 654 . ISSN .Google Scholar

Whittle, P.. Risk-sensitive optimal control. Wiley-Interscience Series in Systems and Optimization. (Chichester: John Wiley & Sons, Ltd., 1990). ISBN 0-471-92622-1.Google Scholar

Yoshimura, Y.. A note on demi-eigenvalues for uniformly elliptic Isaacs operators. Viscosity Solution Theory of Differential Equations and its Developments, pp. 106–114, 2006.Google Scholar

Article contents

On the policy improvement algorithm for ergodic risk-sensitive control

Abstract

Keywords

MSC classification

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests