Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Pairwise alignment
- 3 Markov chains and hidden Markov models
- 4 Pairwise alignment using HMMs
- 5 Profile HMMs for sequence families
- 6 Multiple sequence alignment methods
- 7 Building phylogenetic trees
- 8 Probabilistic approaches to phylogeny
- 9 Transformational grammars
- 10 RNA structure analysis
- 11 Background on probability
- References
- Index
1 - Introduction
Published online by Cambridge University Press: 06 January 2010
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Pairwise alignment
- 3 Markov chains and hidden Markov models
- 4 Pairwise alignment using HMMs
- 5 Profile HMMs for sequence families
- 6 Multiple sequence alignment methods
- 7 Building phylogenetic trees
- 8 Probabilistic approaches to phylogeny
- 9 Transformational grammars
- 10 RNA structure analysis
- 11 Background on probability
- References
- Index
Summary
The reader will quickly discover that the organization of this book was chosen to be parallel to the organization of Biological Sequence Analysis by Durbin et al. (1998). The first chapter of BSA contains an introduction to the fundamental notions of biological sequence analysis: sequence similarity, homology, sequence alignment, and the basic concepts of probabilistic modeling.
Finding these distinct concepts described back-to-back is surprising at first glance. However, let us recall several important bioinformatics questions. How could we construct a pairwise sequence alignment? How could we build an alignment of multiple sequences? How could we create a phylogenetic tree for several biological sequences? How could we predict an RNA secondary structure? None of these questions can be consistently addressed without use of probabilistic methods. The mathematical complexity of these methods ranges from basic theorems and formulas to sophisticated architectures of hidden Markov models and stochastic grammars able to grasp fine compositional characteristics of empirical biological sequences.
The explosive growth of biological sequence data created an excellent opportunity for the meaningful application of discrete probabilistic models. Perhaps, without much exaggeration, the implications of this new development could be compared with implications of the revolutionary use of calculus and differential equations for solving problems of classic mechanics in the eighteenth century.
The problems considered in this introductory chapter are concerned with the fundamental concepts that play an important role in biological sequence analysis: the maximum likelihood and the maximum a posteriori (Bayesian) estimation of the model parameters. These concepts are crucial for understanding statistical inference from experimental data and are impossible to introduce without notions of conditional, joint, and marginal probabilities.
- Type
- Chapter
- Information
- Publisher: Cambridge University PressPrint publication year: 2006