Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Pairwise alignment
- 3 Markov chains and hidden Markov models
- 4 Pairwise alignment using HMMs
- 5 Profile HMMs for sequence families
- 6 Multiple sequence alignment methods
- 7 Building phylogenetic trees
- 8 Probabilistic approaches to phylogeny
- 9 Transformational grammars
- 10 RNA structure analysis
- 11 Background on probability
- References
- Index
4 - Pairwise alignment using HMMs
Published online by Cambridge University Press: 06 January 2010
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Pairwise alignment
- 3 Markov chains and hidden Markov models
- 4 Pairwise alignment using HMMs
- 5 Profile HMMs for sequence families
- 6 Multiple sequence alignment methods
- 7 Building phylogenetic trees
- 8 Probabilistic approaches to phylogeny
- 9 Transformational grammars
- 10 RNA structure analysis
- 11 Background on probability
- References
- Index
Summary
In the BSA Chapter 3 we learned that a DP algorithm for pairwise sequence alignment allows a probabilistic interpretation. Indeed, the equivalent equations appear in the logarithmic form of the Viterbi algorithm for the hidden Markov model of a gapped sequence alignment. The hidden states of such a model, called a pair HMM, correspond to the alignment match, the x-gap, and the y-gap positions. The pair HMM state diagram is topologically similar to the diagram of the finite state machine (Durbin et al. (1998), Fig. 4.1), although the pair HMM parameters have clear probabilistic meanings. The optimal finite state machine alignment found by standard DP is equivalent to the most probable path through the pair HMM determined by the Viterbi algorithm. Both global and local optimal DP alignment algorithms have Viterbi counterparts for suitably defined HMMs. Interestingly, the HMM has an advantage over the finite state machine because the HMM can compute the full probability that sequences X and Y could be generated by a given pair HMM; thus, a probabilistic measure can be introduced to help establish evolutionary relationships. This full probabilistic model also defines (i) the posterior distribution over all possible alignments given sequences X and Y and (ii) the posterior probability that a particular symbol x of sequence X is aligned to a given symbol y of sequence Y. However, real biological sequences cannot be considered to be exact realizations of probabilistic models. This explains the difficulties met by the HMM based alignment methods for the similarity search (Durbin et al. (1998), Sect. 4.5), while more simplistic finite state machine methods perform sufficiently well.
- Type
- Chapter
- Information
- Problems and Solutions in Biological Sequence Analysis , pp. 104 - 125Publisher: Cambridge University PressPrint publication year: 2006