Book contents
- Frontmatter
- Contents
- Preface
- Notation and abbreviations
- Part I General discussion
- Part II Approximate inference
- 4 Maximum a-posteriori approximation
- 5 Evidence approximation
- 6 Asymptotic approximation
- 7 Variational Bayes
- 8 Markov chain Monte Carlo
- Appendix A Basic formulas
- Appendix B Vector and matrix formulas
- Appendix C Probabilistic distribution functions
- References
- Index
4 - Maximum a-posteriori approximation
from Part II - Approximate inference
Published online by Cambridge University Press: 05 August 2015
- Frontmatter
- Contents
- Preface
- Notation and abbreviations
- Part I General discussion
- Part II Approximate inference
- 4 Maximum a-posteriori approximation
- 5 Evidence approximation
- 6 Asymptotic approximation
- 7 Variational Bayes
- 8 Markov chain Monte Carlo
- Appendix A Basic formulas
- Appendix B Vector and matrix formulas
- Appendix C Probabilistic distribution functions
- References
- Index
Summary
Maximum a-posteriori (MAP) approximation is a well-known and widely used approximation for Bayesian inference. The approximation covers all variables including model parameters Θ, latent variables Z, and classification categories C (word sequence W in the automatic speech recognition case). For example, the Viterbi algorithm (arg maxZp(Z|O)) in the continuous density hidden Markov model (CDHMM), as discussed in Section 3.3.2, corresponds to the MAP approximation of latent variables, while the forward–backward algorithm, as discussed in Section 3.3.1, corresponds to an exact inference of these variables. As another example, the MAP decision rule (arg maxCp(C|O)) in Eq. (3.2) also corresponds to the MAP approximation of inferring the posterior distribution of classification categories. Since the final goal of automatic speech recognition is to output the word sequence, the MAP approximation of the word sequence matches the final goal. Thus, the MAP approximation can be applied to all probabilistic variables in speech and language processing as an essential technique.
This chapter starts to discuss the MAP approximation of Bayesian inference in detail, but limits the discussion only to model parameters Θ in Section 4.1. In the MAP approximation for model parameters, the prior distributions work as a regularization of these parameters, which makes the estimation of the parameters more robust than that of the maximum likelihood (ML) approach. Another interesting property of the MAP approximation for model parameters is that we can easily involve the inference of latent variables by extending the EM algorithm from ML to MAP estimation. Section 4.2 describes the general EM algorithm with the MAP approximation by following the ML-based EM algorithm, as discussed in Section 3.4. Based on the general MAP–EM algorithm, Section 4.3 provides MAP–EM solutions for CDHMM parameters, and introduces the well-known applications based on speaker adaptation. Section 4.5 describes the parameter smoothing method in discriminative training of the CDHMM, which actually corresponds to the MAP solution for discriminative parameter estimation. Section 4.6 focuses on the MAP estimation of GMM parameters, which is a subset of the MAP estimation of CDHMM parameters. It is used to construct speaker GMMs that are used for speaker verification. Section 4.7 provides an MAP solution of n –gram parameters that leads to one instance of interpolation smoothing, as discussed in Section 3.6.2. Finally, Section 4.8 deals with the adaptive MAP estimation of latent topic model parameters.
- Type
- Chapter
- Information
- Bayesian Speech and Language Processing , pp. 137 - 183Publisher: Cambridge University PressPrint publication year: 2015
- 1
- Cited by