Book contents
- Frontmatter
- Contents
- Preface
- Notation and abbreviations
- Part I General discussion
- Part II Approximate inference
- 4 Maximum a-posteriori approximation
- 5 Evidence approximation
- 6 Asymptotic approximation
- 7 Variational Bayes
- 8 Markov chain Monte Carlo
- Appendix A Basic formulas
- Appendix B Vector and matrix formulas
- Appendix C Probabilistic distribution functions
- References
- Index
7 - Variational Bayes
from Part II - Approximate inference
Published online by Cambridge University Press: 05 August 2015
- Frontmatter
- Contents
- Preface
- Notation and abbreviations
- Part I General discussion
- Part II Approximate inference
- 4 Maximum a-posteriori approximation
- 5 Evidence approximation
- 6 Asymptotic approximation
- 7 Variational Bayes
- 8 Markov chain Monte Carlo
- Appendix A Basic formulas
- Appendix B Vector and matrix formulas
- Appendix C Probabilistic distribution functions
- References
- Index
Summary
Variational Bayes (VB) was developed in the machine learning community in the 1990s (Attias 1999, Jordan, Ghahramani, Jaakkola et al. 1999) and has now become a standard technique to approximated Bayesian inference for latent models, based on the EM-like algorithm. In Chapter 4, we have also dealt with latent models based on the maximum a-posteriori (MAP) EM algorithm. However, the MAP approximation uses the point estimation of model parameters instead of the distribution estimation, which is far from a true Bayesian manner of regarding all the variables introduced in our problem as probabilistic random variables. Another approximation based on the asymptotic approximation in Chapter 6 assumes a complex posterior distribution as a single Gaussian distribution without latent variables, which is not a true assumption for many of our applications. The evidence approximation in Chapter 5 also does not explicitly deal with latent models (can be obtained by combining MAP, VB, or MCMC). Instead of considering the MAP, evidence, and asymptotic approximations, VB can efficiently approximate complicated integrals and expectations over model parameters, based on variational method within a specific family of distribution types (exponential family, as discussed in Section 2.1.3). The key idea of the variational technique is to find the lower bound of the marginal log likelihood, similar to the EM algorithm in Section 3.4, and obtain the posterior distributions directly based on the variational method. This chapter starts to explain the general framework of VB in Section 7.1, and more specific pattern recognition problems in Section 7.2. Then this chapter goes on to provide a VB version of the EM algorithm for statistical models and model selection in speech and language processing, including speech recognition in Sections 7.3 and 7.4 and speaker verification in Section 7.5. Sections 7.6 and 7.7 also deal with latent topic models and their extensions; these try to capture long-range topic information from (poken) documents, based on VB solutions.
Variational inference in general
This section starts by describing a general latent model with observation data X = {xn|n = 1,…, N}, and the set of all variables introduced in our model including latent variables, parameters, hyperparameters, and model structure Z. The latter sections specify Z with more specific variables.
- Type
- Chapter
- Information
- Bayesian Speech and Language Processing , pp. 242 - 336Publisher: Cambridge University PressPrint publication year: 2015