Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgements
- Summary of most significant capabilities of BEAST 2
- Part I Theory
- Part II Practice
- 6 Bayesian evolutionary analysis by sampling trees
- 7 Setting up and running a phylogenetic analysis
- 8 Estimating species trees from multilocus data
- 9 Advanced analysis
- 10 Posterior analysis and post-processing
- 11 Exploring phylogenetic tree space
- Part III Programming
- References
- Index of authors
- Index of subjects
8 - Estimating species trees from multilocus data
from Part II - Practice
Published online by Cambridge University Press: 05 October 2015
- Frontmatter
- Contents
- Preface
- Acknowledgements
- Summary of most significant capabilities of BEAST 2
- Part I Theory
- Part II Practice
- 6 Bayesian evolutionary analysis by sampling trees
- 7 Setting up and running a phylogenetic analysis
- 8 Estimating species trees from multilocus data
- 9 Advanced analysis
- 10 Posterior analysis and post-processing
- 11 Exploring phylogenetic tree space
- Part III Programming
- References
- Index of authors
- Index of subjects
Summary
The increasing availability of sequence data from multiple loci raises the question of how to determine the species tree from such data. It is well established that just concatenating nucleotide sequences results in misleading estimates (Degnan and Rosenberg 2006; Heled and Drummond 2010; Kubatko and Degnan 2007). There are a number of more sophisticated methods to infer a species phylogeny from sequences obtained from multiple genes. This chapter starts with an example of a single locus analysis to highlight some of the issues, then details the multispecies coalescent. The remainder describes two multilocus methods for inferring a species phylogeny from DNA and SNP data respectively. Though even multispecies coalescent may suffer from detectable model misspecification (Reid et al. 2013), it has not been shown that it is worse than concatenation.
Darwin's finches
Consider the situation where you have data from a single locus, but have a number of gene sequences sampled from each species and you are interested in estimating the species phylogeny. Arguably, even in this case, an approach that explicitly models incomplete lineage sorting is warranted. The ancestral relationships in the species tree can differ considerably from those of an individual gene tree, due (among other things) to incomplete lineage sorting. This arises from the fact that in the absence of gene flow the divergence times of a pair of genes sampled from related species must diverge earlier than the corresponding speciation time (Pamilo and Nei 1988). More generally, a species is defined by the collection of all its genes (each with their own history of ancestry) and analysing just a single gene to determine a species phylogeny may therefore be misleading, unless the potential discrepancy between the gene tree and the species tree is explicitly modelled.
For example, consider a small multiple sequence alignment of the mitochondrial control region, sampled from 16 specimens representing four species of Darwin's finches. The variable columns of the sequence alignment are presented in Figure 8.1.
The alignment is composed of three partial sequences from each of Camarhynchus parvulus and Certhidea olivacea, four from Geospiza fortis and six from G. magnirostris (Sato et al. 1999). The full alignment has 1121 columns and can be found in the examples/nexus directory of the BEAST2 distribution.
- Type
- Chapter
- Information
- Bayesian Evolutionary Analysis with BEAST , pp. 116 - 126Publisher: Cambridge University PressPrint publication year: 2015