Article contents
Computing Optimal Policies for Markovian Decision Processes Using Simulation
Published online by Cambridge University Press: 27 July 2009
Abstract
A simulation method is developed for computing average reward optimal policies, for a finite state and action Markovian decision process. It is shown that the method is consistent; i.e., it produces solutions arbitrarily close to the optimal. Various types of estimation errors and confidence bounds are examined. Finally, it is shown that the probability distribution of the number of simulation cycles required to compute an e-optimal policy satisfies a large deviations property.
- Type
- Research Article
- Information
- Probability in the Engineering and Informational Sciences , Volume 9 , Issue 4 , October 1995 , pp. 525 - 537
- Copyright
- Copyright © Cambridge University Press 1995
References
- 1
- Cited by