Book contents
- Frontmatter
- Contents
- Preface
- Guide to the chapters
- Acknowledgment of support
- Part I Introduction to the four themes
- Part II Studies on the four themes
- 5 Parametric Inference
- 6 Polytope Propagation on Graphs
- 7 Parametric Sequence Alignment
- 8 Bounds for Optimal Sequence Alignment
- 9 Inference Functions
- 10 Geometry of Markov Chains
- 11 Equations Defining Hidden Markov Models
- 12 The EM Algorithm for Hidden Markov Models
- 13 Homology Mapping with Markov Random Fields
- 14 Mutagenetic Tree Models
- 15 Catalog of Small Trees
- 16 The Strand Symmetric Model
- 17 Extending Tree Models to Splits Networks
- 18 Small Trees and Generalized Neighbor-Joining
- 19 Tree Construction using Singular Value Decomposition
- 20 Applications of Interval Methods to Phylogenetics
- 21 Analysis of Point Mutations in Vertebrate Genomes
- 22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
- References
- Index
12 - The EM Algorithm for Hidden Markov Models
from Part II - Studies on the four themes
Published online by Cambridge University Press: 04 August 2010
- Frontmatter
- Contents
- Preface
- Guide to the chapters
- Acknowledgment of support
- Part I Introduction to the four themes
- Part II Studies on the four themes
- 5 Parametric Inference
- 6 Polytope Propagation on Graphs
- 7 Parametric Sequence Alignment
- 8 Bounds for Optimal Sequence Alignment
- 9 Inference Functions
- 10 Geometry of Markov Chains
- 11 Equations Defining Hidden Markov Models
- 12 The EM Algorithm for Hidden Markov Models
- 13 Homology Mapping with Markov Random Fields
- 14 Mutagenetic Tree Models
- 15 Catalog of Small Trees
- 16 The Strand Symmetric Model
- 17 Extending Tree Models to Splits Networks
- 18 Small Trees and Generalized Neighbor-Joining
- 19 Tree Construction using Singular Value Decomposition
- 20 Applications of Interval Methods to Phylogenetics
- 21 Analysis of Point Mutations in Vertebrate Genomes
- 22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
- References
- Index
Summary
As discussed in Chapter 1, the EM algorithm is an iterative procedure used to obtain maximum likelihood estimates (MLEs) for the parameters of statistical models which are induced by a hidden variable construct, such as the hidden Markov model (HMM). The tree structure underlying the HMM allows us to organize the required computations efficiently, which leads to an efficient implementation of the EM algorithm for HMMs known as the Baum–Welch algorithm. For several examples of two-state HMMs with binary output we plot the likelihood function and relate the paths taken by the EM algorithm to the gradient of the likelihood function.
The EM algorithm for hidden Markov models
The hidden Markov model is obtained from the fully observed Markov model by marginalization; see Sections 1.4.2 and 1.4.3. We will use the same notation as there, so σ = σ1σ2 … σn ∈ Σn is a sequence of states and τ = τ1τ2 … τn ∈ (Σ′)n a sequence of output variables. We assume that we observe N sequences, τ1, τ2, …, τN ∈ (Σ′)n, each of length n but that the corresponding state sequences, σ1, σ2, …, σN ∈ Σn, are not observed (hidden).
In Section 1.4.2 it is assumed that there is a uniform distribution on the first state in each sequence, i.e., Prob(σ1 = r) = 1/l for each r ∈ Σ where l = |Σ|.
- Type
- Chapter
- Information
- Algebraic Statistics for Computational Biology , pp. 250 - 263Publisher: Cambridge University PressPrint publication year: 2005