Skip to main content Accessibility help
×
Hostname: page-component-77c89778f8-7drxs Total loading time: 0 Render date: 2024-07-24T16:59:47.991Z Has data issue: false hasContentIssue false

6 - Fully observed Markov decision processes

from Part II - Partially Observed Markov Decision Processes: Models and Applications

Published online by Cambridge University Press:  05 April 2016

Vikram Krishnamurthy
Affiliation:
Cornell University/Cornell Tech
Get access

Summary

A Markov decision process (MDP) is a Markov process with feedback control. That is, as illustrated in Figure 6.1, a decision-maker (controller) uses the state xk of the Markov process at each time k to choose an action uk. This action is fed back to the Markov process and controls the transition matrix P(uk). This in turn determines the probability that the Markov process jumps to a particular state xk+1 at time k + 1 and so on. The aim of the decision-maker is to choose a sequence of actions over a time horizon to minimize a cumulative cost function associated with the expected value of the trajectory of the Markov process.

MDPs arise in stochastic optimization models in telecommunication networks, discrete event systems, inventory control, finance, investment and health planning. Also POMDPs can be viewed as continuous state MDPs.

This chapter gives a brief description of MDPs which provides a starting point for POMDPs. The main result is that optimal choice of actions by the controller in Figure 6.1 is obtained by solving a backward stochastic dynamic programming problem.

Finite state finite horizon MDP

Let k = 0, 1, …, N denote discrete time. N is called the time horizon or planning horizon. In this section we consider MDPs where the horizon N is finite.

The finite state MDP model consists of the following ingredients:

1. X = {1, 2, …, X} denotes the state space and xkX denotes the state of the controlled Markov chain at time k = 0, 1, …, N.

2. U = {1, 2, …, U} denotes the action space. The elements uU are called actions. In particular, ukU denotes the action chosen at time k.

3. For each action uU and time k ∈ {0, …, N−1}, P(u, k) denotes an X×X transition probability matrix with elements

Pij(u, k) = ℙ(xk+1 = j|xk = i, uk = u), i, jX.

4. For each state iX, action uU and time k ∈ {0, 1, …, N −1}, the scalar c(i, u, k) denotes the one-stage cost incurred by the decision-maker (controller).

Type
Chapter
Information
Partially Observed Markov Decision Processes
From Filtering to Controlled Sensing
, pp. 121 - 146
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×