Book contents
- Frontmatter
- Contents
- Preface
- Guide to the chapters
- Acknowledgment of support
- Part I Introduction to the four themes
- Part II Studies on the four themes
- 5 Parametric Inference
- 6 Polytope Propagation on Graphs
- 7 Parametric Sequence Alignment
- 8 Bounds for Optimal Sequence Alignment
- 9 Inference Functions
- 10 Geometry of Markov Chains
- 11 Equations Defining Hidden Markov Models
- 12 The EM Algorithm for Hidden Markov Models
- 13 Homology Mapping with Markov Random Fields
- 14 Mutagenetic Tree Models
- 15 Catalog of Small Trees
- 16 The Strand Symmetric Model
- 17 Extending Tree Models to Splits Networks
- 18 Small Trees and Generalized Neighbor-Joining
- 19 Tree Construction using Singular Value Decomposition
- 20 Applications of Interval Methods to Phylogenetics
- 21 Analysis of Point Mutations in Vertebrate Genomes
- 22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
- References
- Index
19 - Tree Construction using Singular Value Decomposition
from Part II - Studies on the four themes
Published online by Cambridge University Press: 04 August 2010
- Frontmatter
- Contents
- Preface
- Guide to the chapters
- Acknowledgment of support
- Part I Introduction to the four themes
- Part II Studies on the four themes
- 5 Parametric Inference
- 6 Polytope Propagation on Graphs
- 7 Parametric Sequence Alignment
- 8 Bounds for Optimal Sequence Alignment
- 9 Inference Functions
- 10 Geometry of Markov Chains
- 11 Equations Defining Hidden Markov Models
- 12 The EM Algorithm for Hidden Markov Models
- 13 Homology Mapping with Markov Random Fields
- 14 Mutagenetic Tree Models
- 15 Catalog of Small Trees
- 16 The Strand Symmetric Model
- 17 Extending Tree Models to Splits Networks
- 18 Small Trees and Generalized Neighbor-Joining
- 19 Tree Construction using Singular Value Decomposition
- 20 Applications of Interval Methods to Phylogenetics
- 21 Analysis of Point Mutations in Vertebrate Genomes
- 22 Ultra-Conserved Elements in Vertebrate and Fly Genomes
- References
- Index
Summary
We present a new, statistically consistent algorithm for phylogenetic tree construction that uses the algebraic theory of statistical models (as developed in Chapters 1 and 3). Our basic tool is Singular Value Decomposition (SVD) from numerical linear algebra.
Starting with an alignment of n DNA sequences, we show that SVD allows us to quickly decide whether a split of the taxa occurs in their phylogenetic tree, assuming only that evolution follows a tree Markov model. Using this fact, we have developed an algorithm to construct a phylogenetic tree by computing only O(n2) SVDs.
We have implemented this algorithm using the SVDLIBC library (available at http://tedlab.mit.edu/~dr/SVDLIBC/) and have done extensive testing with simulated and real data. The algorithm is fast in practice on trees with 20–30 taxa.
We begin by describing the general Markov model and then show how to flatten the joint probability distribution along a partition of the leaves in the tree. We give rank conditions for the resulting matrix; most notably, we give a set of new rank conditions that are satisfied by non-splits in the tree. Armed with these rank conditions, we present the tree-building algorithm, using SVD to calculate how close a matrix is to a certain rank. Finally, we give experimental results on the behavior of the algorithm with both simulated and real-life (ENCODE) data.
The general Markov model
We assume that evolution follows a tree Markov model, as introduced in Section 1.4, with evolution acting independently at different sites of the genome.
- Type
- Chapter
- Information
- Algebraic Statistics for Computational Biology , pp. 347 - 358Publisher: Cambridge University PressPrint publication year: 2005
- 14
- Cited by