Hostname: page-component-586b7cd67f-2plfb Total loading time: 0 Render date: 2024-11-22T02:26:55.887Z Has data issue: false hasContentIssue false

Bias and Overtaking Optimality for Continuous-Time Jump Markov Decision Processes in Polish Spaces

Published online by Cambridge University Press:  14 July 2016

Quanxin Zhu*
Affiliation:
South China Normal University
Tomás Prieto-Rumeau*
Affiliation:
Universidad Nacional de Educación a Distancia
*
. Research partially supported by the Natural Science Foundation of China (10626021), the Natural Science Foundation of Guangdong Province (06300957), and CONACYT grant 45693-F.
. Research partially supported by the Natural Science Foundation of China (10626021), the Natural Science Foundation of Guangdong Province (06300957), and CONACYT grant 45693-F.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper we study the bias and the overtaking optimality criteria for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. Under appropriate hypotheses, we prove the existence of solutions to the bias optimality equations, the existence of bias optimal policies, and an equivalence relation between bias and overtaking optimality.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 2008 

References

[1] Arapostathis, A. et al. (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optimization 31, 282344.CrossRefGoogle Scholar
[2] Cao, X. R. (1998). The relations among potentials, perturbation analysis and Markov decision processes. Discrete Event Dyn. Syst. 8, 7187.CrossRefGoogle Scholar
[3] Cao, X. R. and Chen, H. F. (1997). Potentials, perturbation realization and sensitivity analysis of Markov processes. IEEE Trans. Automatic Control 42, 13821397.Google Scholar
[4] Guo, X. P. (2007). Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 7387.CrossRefGoogle Scholar
[5] Guo, X. P. and Liu, K. (2001). A note on optimality conditions for continuous-time Markov decision processes with average cost criterion. IEEE Trans. Automatic Control 46, 19841984.Google Scholar
[6] Guo, X. P. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730756.CrossRefGoogle Scholar
[7] Guo, X. P., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of recent results on continuous-time Markov decision processes. Top 14, 177261.CrossRefGoogle Scholar
[8] Haviv, M. and Puterman, M. L. (1998). Bias optimality in controlled queueing systems. J. Appl. Prob. 35, 136150.CrossRefGoogle Scholar
[9] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York.CrossRefGoogle Scholar
[10] Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
[11] Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G. (1999). Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optimization 38, 7993.CrossRefGoogle Scholar
[12] Jasso-Fuentes, H. and Hernández-Lerma, O. (2008). Characterizations of overtaking optimality for controlled diffusion processes. Appl. Math. Optimization 57, 349369.CrossRefGoogle Scholar
[13] Jasso-Fuentes, H. and Hernández-Lerma, O. (2008). Ergodic control, bias, and sensitive discount optimality for Markov diffusion processes. To appear in Stoch. Ann. Appl. Google Scholar
[14] Lund, R. B., Meyn, S. P. and Tweedie, R. L. (1996). Computable exponential convergence rates for stochastically ordered Markov processes. Ann. Appl. Prob. 6, 218237.CrossRefGoogle Scholar
[15] Prieto-Rumeau, T. and Hernández-Lerma, O. (2005). The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains. Math. Meth. Operat. Res. 61, 123145.CrossRefGoogle Scholar
[16] Prieto-Rumeau, T. and Hernández-Lerma, O. (2006). Bias optimality for continuous-time controlled Markov chains. SIAM J. Control Optimization 45, 5173.CrossRefGoogle Scholar
[17] Prieto-Rumeau, T. and Hernández-Lerma, O. (2006). Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains. Submitted.Google Scholar
[18] Puterman, M. L. (1974). Sensitive discount optimality in controlled one-dimensional diffusions. Ann. Prob. 2, 408419.CrossRefGoogle Scholar
[19] Puterman, M. L. (1994). Markov Decision Process. John Wiley, New York.CrossRefGoogle Scholar
[20] Zhu, Q. X. (2007). Average optimality inequality for continuous-time Markov decision processes in Polish spaces. Math. Meth. Operat. Res. 66, 299313.CrossRefGoogle Scholar
[21] Zhu, Q. X. (2008). Average optimality for continuous-time Markov decision processes with a policy iteration approach. J. Math. Analysis Appl. 339, 691704.CrossRefGoogle Scholar
[22] Zhu, Q. X. and Guo, X. P. (2005). Another set of conditions for strong n (n=-1,0) discount optimality in Markov decision processes. Stoch. Anal. Appl. 23, 953974.CrossRefGoogle Scholar
[23] Zhu, Q. X. and Guo, X. P. (2007). Markov decision processes with variance minimization: a new condition and approach. Stoch. Anal. Appl. 25, 577592.CrossRefGoogle Scholar