Skip to main content Accessibility help
×
Hostname: page-component-77c89778f8-9q27g Total loading time: 0 Render date: 2024-07-24T23:43:41.631Z Has data issue: false hasContentIssue false

7 - Partially observed Markov decision processes (POMDPs)

from Part II - Partially Observed Markov Decision Processes: Models and Applications

Published online by Cambridge University Press:  05 April 2016

Vikram Krishnamurthy
Affiliation:
Cornell University/Cornell Tech
Get access

Summary

A POMDP is a controlled HMM. Recall from §2.4 that an HMM consists of an X-state Markov chain {xk} observed via a noisy observation process {yk}. Figure 7.1 displays the schematic setup of a POMDP where the action uk affects the state and/or observation (sensing) process of the HMM. The HMM filter (discussed extensively in Chapter 3) computes the posterior distribution πk of the state. The posterior πk is called the belief state. In a POMDP, the stochastic controller depicted in Figure 7.1 uses the belief state to choose the next action.

This chapter is organized as follows. §7.1 describes the POMDP model. Then §7.2 gives the belief state formulation and the Bellman's dynamic programming equation for the optimal policy of a POMDP. It is shown that a POMDP is equivalent to a continuous-state MDP where the states are belief states (posteriors). Bellman's equation for continuous-state MDP was discussed in §6.3. §7.3 gives a toy example of a POMDP. Despite being a continuous-state MDP, §7.4 shows that for finite horizon POMDPs, Bellman's equation has a finite dimensional characterization. §7.5 discusses several algorithms that exploit this finite dimensional characterization to compute the optimal policy. §7.6 considers discounted cost infinite horizon POMDPs. As an example of a POMDP, optimal search of a moving target is discussed in §7.7.

Finite horizon POMDP

A POMDP model with finite horizon N is a 7-tuple

(X, U,Y, P(u), B(u), c(u), cN).

Partially observed Markov decision process (POMDP) schematic setup. The Markov system together with noisy sensor constitute a hidden Markov model (HMM). The HMM filter computes the posterior (belief state) πk of the state of the Markov chain. The controller (decision-maker) then chooses the action uk at time k based on πk.

1. X = {1, 2, …, X} denotes the state space and xkX denotes the state of a controlled Markov chain at time k = 0, 1, …, N.

2. U = {1, 2, …, U} denotes the action space with ukU denoting the action chosen at time k by the controller.

Type
Chapter
Information
Partially Observed Markov Decision Processes
From Filtering to Controlled Sensing
, pp. 147 - 178
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×