Estimating species trees from multilocus data

Alexei J. Drummond; Remco R. Bouckaert

doi:10.1017/CBO9781139095112.009

8 - Estimating species trees from multilocus data

from Part II - Practice

Published online by Cambridge University Press: 05 October 2015

Alexei J. Drummond and

Remco R. Bouckaert

Show author details

Alexei J. Drummond: Affiliation:
University of Auckland
Remco R. Bouckaert: Affiliation:
University of Auckland

Book contents

Get access

Summary

The increasing availability of sequence data from multiple loci raises the question of how to determine the species tree from such data. It is well established that just concatenating nucleotide sequences results in misleading estimates (Degnan and Rosenberg 2006; Heled and Drummond 2010; Kubatko and Degnan 2007). There are a number of more sophisticated methods to infer a species phylogeny from sequences obtained from multiple genes. This chapter starts with an example of a single locus analysis to highlight some of the issues, then details the multispecies coalescent. The remainder describes two multilocus methods for inferring a species phylogeny from DNA and SNP data respectively. Though even multispecies coalescent may suffer from detectable model misspecification (Reid et al. 2013), it has not been shown that it is worse than concatenation.

Darwin's finches

Consider the situation where you have data from a single locus, but have a number of gene sequences sampled from each species and you are interested in estimating the species phylogeny. Arguably, even in this case, an approach that explicitly models incomplete lineage sorting is warranted. The ancestral relationships in the species tree can differ considerably from those of an individual gene tree, due (among other things) to incomplete lineage sorting. This arises from the fact that in the absence of gene flow the divergence times of a pair of genes sampled from related species must diverge earlier than the corresponding speciation time (Pamilo and Nei 1988). More generally, a species is defined by the collection of all its genes (each with their own history of ancestry) and analysing just a single gene to determine a species phylogeny may therefore be misleading, unless the potential discrepancy between the gene tree and the species tree is explicitly modelled.

For example, consider a small multiple sequence alignment of the mitochondrial control region, sampled from 16 specimens representing four species of Darwin's finches. The variable columns of the sequence alignment are presented in Figure 8.1.

The alignment is composed of three partial sequences from each of Camarhynchus parvulus and Certhidea olivacea, four from Geospiza fortis and six from G. magnirostris (Sato et al. 1999). The full alignment has 1121 columns and can be found in the examples/nexus directory of the BEAST2 distribution.

Information

Type: Chapter
Information: Bayesian Evolutionary Analysis with BEAST , pp. 116 - 126

DOI: https://doi.org/10.1017/CBO9781139095112.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.