Maximum a-posteriori approximation

Shinji Watanabe; Jen-Tzung Chien

doi:10.1017/CBO9781107295360.005

4 - Maximum a-posteriori approximation

from Part II - Approximate inference

Published online by Cambridge University Press: 05 August 2015

Shinji Watanabe and

Jen-Tzung Chien

Show author details

Shinji Watanabe: Affiliation:
Mitsubishi Electric Research Laboratories, Cambridge, Massachusetts
Jen-Tzung Chien: Affiliation:
National Chiao Tung University, Taiwan

Book contents

Get access

Summary

Maximum a-posteriori (MAP) approximation is a well-known and widely used approximation for Bayesian inference. The approximation covers all variables including model parameters Θ, latent variables Z, and classification categories C (word sequence W in the automatic speech recognition case). For example, the Viterbi algorithm (arg maxZp(Z|O)) in the continuous density hidden Markov model (CDHMM), as discussed in Section 3.3.2, corresponds to the MAP approximation of latent variables, while the forward–backward algorithm, as discussed in Section 3.3.1, corresponds to an exact inference of these variables. As another example, the MAP decision rule (arg maxCp(C|O)) in Eq. (3.2) also corresponds to the MAP approximation of inferring the posterior distribution of classification categories. Since the final goal of automatic speech recognition is to output the word sequence, the MAP approximation of the word sequence matches the final goal. Thus, the MAP approximation can be applied to all probabilistic variables in speech and language processing as an essential technique.

This chapter starts to discuss the MAP approximation of Bayesian inference in detail, but limits the discussion only to model parameters Θ in Section 4.1. In the MAP approximation for model parameters, the prior distributions work as a regularization of these parameters, which makes the estimation of the parameters more robust than that of the maximum likelihood (ML) approach. Another interesting property of the MAP approximation for model parameters is that we can easily involve the inference of latent variables by extending the EM algorithm from ML to MAP estimation. Section 4.2 describes the general EM algorithm with the MAP approximation by following the ML-based EM algorithm, as discussed in Section 3.4. Based on the general MAP–EM algorithm, Section 4.3 provides MAP–EM solutions for CDHMM parameters, and introduces the well-known applications based on speaker adaptation. Section 4.5 describes the parameter smoothing method in discriminative training of the CDHMM, which actually corresponds to the MAP solution for discriminative parameter estimation. Section 4.6 focuses on the MAP estimation of GMM parameters, which is a subset of the MAP estimation of CDHMM parameters. It is used to construct speaker GMMs that are used for speaker verification. Section 4.7 provides an MAP solution of n –gram parameters that leads to one instance of interpolation smoothing, as discussed in Section 3.6.2. Finally, Section 4.8 deals with the adaptive MAP estimation of latent topic model parameters.

Information

Type: Chapter
Information: Bayesian Speech and Language Processing , pp. 137 - 183

DOI: https://doi.org/10.1017/CBO9781107295360.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.