INTRODUCTION
DNA barcoding: ‘the utilisation of DNA sequences of short standardised gene fragments for quick and accurate determination of the species’ (D'Avila-Levy et al. Reference D'Avila-Levy, Boucinha, Kostygov, Santos, Morelli, Grybchuk-Ieremenko, Duval, Votýpka, Yurchenko, Grellier and Lukeš2015)
Trypanosoma parasites are flagellated protozoa within the class Kinetoplastida, which is characterized by the presence of a kinetoplast: a mass of mitochondrial ‘kDNA’ (Adl et al. Reference Adl, Simpson, Lane, Lukes, Bass, Bowser, Brown, Burki, Dunthorn, Hampl, Heiss, Hoppenrath, Lara, La Gall, Lynn, McManus, Mitchell, Mozley-Stanridge, Parfrey, Pawlowski, Rueckert, Shadwick, Schoch and Smirnov2012). These parasites cause a wide range of diseases in both humans and animals, and are often transmitted between hosts by insect vectors (Fig. 1). Human diseases caused by parasitic trypanosomes carry a combined health burden of 2·2 million daily adjusted life years and primarily affect people from the poorest demographics in tropical and subtropical climates (Stuart et al. Reference Stuart, Brun, Croft, Fairlamb, Gürtler, McKerrow, Reed and Tarleton2008), while in African animals, trypanosomiasis costs the livestock industry over US$ 4·5 billion every year (Yaro et al. Reference Yaro, Munyard, Stear and Groth2016). Despite their devastating social and economic impact, these diseases remain widely under-reported; misdiagnosed, unidentified or asymptomatic cases, limited funding and the lack of a universal method for parasite detection and identification make surveillance and monitoring of these parasites difficult (Wastling and Welburn, Reference Wastling and Welburn2011; Auty et al. Reference Auty, Anderson, Picozzi, Lembo, Mubanga, Hoare, Fyumagwa, Mable, Hamill, Cleaveland and Welburn2012a; Stockdale and Newton, Reference Stockdale and Newton2013; Franco et al. Reference Franco, Simarro, Diarra, Ruiz-Postigo and Jannin2014).
Since the development of the first DNA-based identification methods for trypanosomes in the 1980s, the number of molecular detection techniques available (and iterations on these techniques) has increased dramatically; for examples, see the following reviews: Adams and Hamilton (Reference Adams and Hamilton2008); Taberlet et al. (Reference Taberlet, Coissac, Pompanon, Brochmann and Willerslev2012). Although they constitute a vast improvement in sensitivity and specificity of diagnosis compared with microscopy methods (Gibson, Reference Gibson2007; Enyaru et al. Reference Enyaru, Ouma, Malele, Matovu and Masiga2010), the absence of a ‘gold standard’ for the detection and classification of trypanosomes has resulted in a distinct lack of comparable data between surveys (Auty et al. Reference Auty, Picozzi, Malele, Torr, Cleaveland and Welburn2012b; Hernández and Ramírez, Reference Hernández and Ramírez2013; D'Avila-Levy et al. Reference D'Avila-Levy, Boucinha, Kostygov, Santos, Morelli, Grybchuk-Ieremenko, Duval, Votýpka, Yurchenko, Grellier and Lukeš2015). Most molecular techniques are too costly or complex for general use in front-line field diagnostics and, while developments in the transport of blood specimens have allowed samples to be analysed at centralized clinical laboratory facilities, the majority of molecular methods are still confined to research laboratories (Deborggraeve and Büscher, Reference Deborggraeve and Büscher2010).
Nonetheless, in other areas of biology and medicine, standardized, sequence-based barcoding (Hebert et al. Reference Hebert, Cywinska, Ball and deWaard2003) has provided a sensitive, reliable method for the identification of species across a vast range of taxa and is now used by thousands of researchers worldwide (Coissac et al. Reference Coissac, Hollingsworth, Lavergne and Taberlet2016). However, despite the growing reference libraries of DNA barcodes for animals, plants and fungi (Ratnasingham and Hebert, Reference Ratnasingham and Hebert2007), there is currently no universal genetic barcoding marker available for trypanosome species. Accordingly, there is a clear need for a definitive, simple test suitable for the detection of all trypanosomes (Wastling and Welburn, Reference Wastling and Welburn2011), with sensitivity and specificity sufficient to differentiate between infections at the subspecies level, and usable for known, unknown and mixed infections. This is particularly pertinent from an epidemiological perspective for organisms that are morphologically identical but which require different treatments, such as the two human-infective trypanosomes that cause sleeping sickness (human African trypanosomiasis, HAT), Trypanozoma brucei rhodesiense and Trypanozoma brucei gambiense (Adams and Hamilton, Reference Adams and Hamilton2008).
A key point regarding the continued relevance (or otherwise) of sequence-based barcoding, irrespective of the target locus, is the need for such a test to provide information on unknown taxa. This represents a very different requirement from a binary yes/no diagnostic test – generally an antibody-based method, which requires screening against panels of known potential infective agents to establish antibody specificity, levels of cross-reactivity and the likelihood of scoring false-positives. In this context, a simple sequence-based test continues to offer advantages over an antibody-based diagnostic as, even with an unknown or previously unencountered taxon, such a test will yield a result that allows identification of an unknown organism as being most closely related to an organism of known sequence identity. Having established the continued benefits, a further major requirement is for such a test to work with sub-optimal sample material (and potentially degraded DNA), as is frequently encountered in field and/or clinical situations.
This review provides a critical overview of the development of barcoding techniques from traditional methods of trypanosome detection and identification, and examines the requirements of an ‘ideal’ barcode. An alternative approach to barcoding, based on the distribution of phylogenetically informative regions along a target gene, is presented and we discuss whether barcoding can fulfil all the necessary requirements to become a truly universal method of identification. In other words: can barcoding be all things to all people?
TIMELINE OF TRYPANOSOME DETECTION
Old faithful: microscopy
Despite the development of a variety of molecular methods for the detection and identification of infectious agents, the usual method for diagnosing trypanosome infections in vertebrate hosts remains the most basic: microscopic examination of sample preparations (Mugasa et al. Reference Mugasa, Adams, Boer, Dyserinck, Büscher, Schallig and Leeflang2012; Ricciardi and Ndao, Reference Ricciardi and Ndao2015). However, this method is time consuming, dependent on operator expertise, unreliable for mixed infections, fails to detect immature infections and, in the case of African trypanosomes, is only useful for distinguishing between parasites to the level of subgenus (Ouma et al. Reference Ouma, Masake, Masiga, Moloo, Njuguna and Ndung'u2000; Gibson, Reference Gibson2009; Enyaru et al. Reference Enyaru, Ouma, Malele, Matovu and Masiga2010; Auty et al. Reference Auty, Picozzi, Malele, Torr, Cleaveland and Welburn2012b; Mugasa et al. Reference Mugasa, Katiti, Boobo, Lubega, Schallig and Matovu2014).
Early attempts to define the identity of pathogenic trypanosomes relied on a combination of microscopy and the ability, or otherwise, to passage parasites through laboratory host animals. In vertebrate hosts, where bloodstream-form trypanosomes exhibit a variety of distinctive morphological characteristics, this approach worked relatively well. However, the insect stages of trypanosomes from a range of subgenera are morphologically indistinguishable and, prior to the advent of enzymatic and molecular methods, the identification of different trypanosome species relied heavily on the site of infection in the insect vector (Hoare, Reference Hoare1972; Enyaru et al. Reference Enyaru, Ouma, Malele, Matovu and Masiga2010).
For human African trypanosomiasis, microscopic examination of cerebral spinal fluid can be used to determine the stage of disease progression, but the invasive procedure (lumbar puncture) required to collect samples often discourages patients from seeking medical help. A lack of formal training for front-line medical workers, local stigma surrounding diagnosis of sleeping sickness and a delay in patients contacting medical services only exacerbates the problem of surveillance and monitoring of this disease (Mpanya et al. Reference Mpanya, Hendrickx, Vuna, Kanyinda, Lumbala, Tshilombo, Mitashi, Luboya, Kande, Boelaert, Lefèvre and Lutumba2012; Acup et al. Reference Acup, Bardosh, Picozzi, Waiswa and Welburn2016).
Not-so-quick kit: isoenzyme analysis
In the late 1960s, Lanham and Godfrey developed a cellulose column-based method utilizing the differential surface charge between trypanosomes and red blood cells to reliably separate parasites from host blood (Lanham and Godfrey, Reference Lanham and Godfrey1970). With this method, they were able to obtain relatively large-scale, pure preparations of live, undisrupted parasites suitable for subsequent biochemical analysis. At around the same time, Godfrey and colleagues developed a method to characterize trypanosomes using isoenzymes (Kilgour and Godfrey, Reference Kilgour and Godfrey1973) and the characterization of many trypanosome species, subspecies and strains quickly followed (e.g. Godfrey and Kilgour, Reference Godfrey and Kilgour1976; Miles et al. Reference Miles, Toye, Oswald and Godfrey1977). Several major isoenzyme-based studies followed and succeeded in defining the species and groupings of epidemiological significance recognized today (e.g. Gashumba et al. Reference Gashumba, Gibson and Opiyo1986; Gibson et al. Reference Gibson, Dukes and Gashumba1988; Godfrey et al. Reference Godfrey, Baker, Rickman and Mehlitz1990). Attempts were made subsequently to both streamline the methodology and to optimize the discriminatory power of the enzymes used (e.g. Stevens and Godfrey, Reference Stevens and Godfrey1992; Abderrazak et al. Reference Abderrazak, Guerrini, Mathieu-Daudé, Truc, Neubauer, Lewicka, Barnabé, Tibayrenc and Hyde1993), but ultimately the practical difficulties associated with isolating and preserving parasite enzyme extracts, reproducibility and issues of homoplasy in banding patterns led to the approach being superseded by DNA-based methodologies (e.g. Gibson and Borst, Reference Gibson and Borst1986; Hide et al. Reference Hide, Cattand, Le Ray, Barry and Tait1990).
Quick kit: serological tests
Antibody-detection tests, such as the card agglutination tests and the direct agglutination test, are widely used for the detection of trypanosomes in human hosts (Ricciardi and Ndao, Reference Ricciardi and Ndao2015; Lutumba et al. Reference Lutumba, Matovu, Boelaert, Gyapong and Boatin2016). These tests have excellent field application as they do not require a constant supply of electricity and are cheaper and more rapid than equivalent molecular techniques, although they can vary significantly in their sensitivity and specificity (Ricciardi and Ndao, Reference Ricciardi and Ndao2015). Serological tests require relatively large samples and have the potential to yield false-negative results where parasitaemia is low or where antibody production is reduced, such as in immunocompromised patients (Papadopoulos et al. Reference Papadopoulos, Abel, Agranoff, Stich, Tarelli, Bell, Planche, Loosemore, Saadoun, Wilkins and Krishna2004; World Health Organisation, 2013). In addition, positive diagnoses obtained using serological tests nearly always require confirmation by microscopy, as these methods cannot distinguish between active infection and residual antigens from past infection or vaccination (Uilenberg and Boyt, Reference Uilenberg and Boyt1998; Woods, Reference Woods2013). Misdiagnosis of trypanosome infections remains a major problem, as treatment often carries a significant inherent risk (Barrett and Croft, Reference Barrett and Croft2012; Field et al. Reference Field, Horn, Fairlamb, Ferguson, Gray, Read, De Rycker, Torrie, Wyatt, Wyllie and Gilbert2017).
The enzyme-linked immunosorbent assay, offers higher sensitivity than many other serological tests available, but it requires a sophisticated laboratory set-up that has restricted its use for diagnosis in the field (Chappuis et al. Reference Chappuis, Loutan, Simarro, Lejon and Büscher2005).
The rise of molecular methods
DNA probes based on non-coding satellite repeats were the first molecular methods sensitive enough for the direct identification of trypanosomes in both host and vector samples without requiring cell cultures (Kukla et al. Reference Kukla, Majiwa, Young, Moloo and Ole-Moiyoi1987; Gibson et al. Reference Gibson, Dukes and Gashumba1988; McNamara et al. Reference McNamara, Dukes, Snow and Gibson1989). The development of the polymerase chain reaction (PCR) heralded a major advance in the sensitivity of diagnostic techniques; PCR-based methods can identify trypanosomes at the subspecies level, they are suitable for analysis of mixed infections and can be applied to samples where parasite numbers are vanishingly low (Adams and Hamilton, Reference Adams and Hamilton2008; Gibson, Reference Gibson2009; Matovu et al. Reference Matovu, Mugasa, Ekangu, Deborggraeve, Lubega, Laurent, Schoone, Schallig and Büscher2010). Species-specific PCR-based methods are the most frequently used molecular tests for detection and identification of trypanosomes, but are limited by the number of species for which species-specific primers are available. Critically, these methods only detect known species: they cannot prove an absence of trypanosomes. In addition, screening samples for multiple trypanosome species using species-specific PCR methods requires a panel of probes; this can be expensive, time consuming and limits the number of samples that it is practical to analyse (Gibson, Reference Gibson2009; Adams et al. Reference Adams, Hamilton and Gibson2010; De Waal, Reference De Waal2012).
GENERIC PCR-BASED METHODS AND THE ‘IDEAL’ BARCODE
Historically, generic PCR methods have been less sensitive than species-specific PCR methods, but allow for multiple trypanosome species to be identified with a single test (Gibson, Reference Gibson2009). Most generic methods, such as restriction fragment length polymorphism PCR (RFLP-PCR) and ribosomal length-based methods, utilize multipurpose primers that target a semi-conserved region of the genome. Identification of an organism is made based on the length of the amplified regions (Adams and Hamilton, Reference Adams and Hamilton2008). Although these methods each result in a species-specific ‘barcode’, none fulfil the requirements for the ‘ideal’ trypanosome barcode (Box 1).
An ideal trypanosome barcode should be:
1. Optimal length: short enough to be sequenced in a single reaction, but long enough to capture all inter-taxon sequence variation.
2. Conserved: contain regions suitable for targeting with universal primers.
3. Phylogenetically informative: contain enough variability both between and within species to capture the full extent of trypanosome diversity.
4. Utilize a standardized set of universal primers applicable to all trypanosomes.
5. Reliable: the primer binding sites should be highly conserved and/or multicopy, so the primers are still applicable for field samples that may have suffered a degree of DNA degradation.
(Savolainen et al. Reference Savolainen, Cowan, Vogler, Roderick and Lane2005; Ferri et al. Reference Ferri, Alù, Corradini, Licata and Beduschi2009; Valentini et al. Reference Valentini, Pompanon and Taberlet2009; Pečnikar and Buzan, Reference Pečnikar and Buzan2014)
Valentini et al. (Reference Valentini, Pompanon and Taberlet2009) discussed the different requirements of DNA barcodes for different users, and highlighted the differences between DNA barcoding ‘sensu stricto’ and ‘sensu lato’. DNA barcoding ‘sensu stricto’ is favoured by taxonomists and prioritizes standardization of primers with enough variation to elucidate a high level of phylogenetic information. DNA barcoding ‘sensu lato’ is most suited to environmental samples and prioritizes short, robust primer binding sites that are resistant to degradation.
Target gene
The success of a gene as a DNA barcode depends on a number of attributes, which must be considered when selecting gene targets: Is it a multicopy gene? How conserved is the sequence? How much does it vary across/between taxa/species? Is this level of variation constant across the gene? Some genes have been identified as universal barcodes, and are suitable for vast groups of organisms: the mitochondrial gene cytochrome c oxidase subunit 1 (cox1/COI) is the accepted gold standard for molecular species identification of animals, and equivalents are available for plants and fungi. However, identifying universal barcodes in eukaryotic groups has proved difficult, not least because the level of genetic variability possible within each species is poorly understood (Enyaru et al. Reference Enyaru, Ouma, Malele, Matovu and Masiga2010), and consensus is yet to be reached regarding which genes to target and the criteria for delimiting species groups (Pawlowski et al. Reference Pawlowski, Audic, Adl, Bass, Belbahri, Berney, Bowser, Cepicka, Decelle, Dunthorn, Fiore-Donno, Gile, Holzmann, Jahn, Jirků, Keeling, Kostka, Kudryavtsev, Lara, Lukeš, Mann, Mitchell, Nitsche, Romeralo, Saunders, Simpson, Smirnov, Spouge, Stern, Stoeck, Zimmermann, Schindel and De Vargas2012; Pečnikar and Buzan, Reference Pečnikar and Buzan2014).
Molecular markers have been developed to target a wide range of trypanosome gene regions (Fig. 2A), but few have been the target of barcoding approaches (Fig. 2B). Fluorescent fragment length barcoding (FFLB) has been used to amplify small target regions in both the 18S small subunit ribosomal RNA (rRNA) and the 28S large subunit rRNA (Hamilton et al. Reference Hamilton, Adams, Malele and Gibson2008; Hamilton et al. Reference Hamilton, Lewis, Cruickshank, Gaunt, Yeo, Llewellyn, Valente, Da Silva, Stevens, Miles and Teixeira2011; Silva-Iturriza et al. Reference Silva-Iturriza, Nassar, Garcia-Rawlins, Rosales and Mijares2013). This highly sensitive, PCR-based method uses four sets of primers: two target the 18S and are specific to trypanosomes, two target the 28S and are specific to all trypanosomatids (Hamilton et al. Reference Hamilton, Lewis, Cruickshank, Gaunt, Yeo, Llewellyn, Valente, Da Silva, Stevens, Miles and Teixeira2011). The length of the resulting fragments produces a pattern unique to each species, which can be matched to reference pattern profiles for species identification. FFLB can also detect novel trypanosome species and, although further analysis is needed to identify these novel species, the fragment patterns may provide an indication of phylogenetic relationships (Hamilton et al. Reference Hamilton, Adams, Malele and Gibson2008). However, there are a limited number of reference profiles available for FFLB, which restricts its use as a trypanosome identification tool at this time (Hamilton et al. Reference Hamilton, Lewis, Cruickshank, Gaunt, Yeo, Llewellyn, Valente, Da Silva, Stevens, Miles and Teixeira2011; Silva-Iturriza et al. Reference Silva-Iturriza, Nassar, Garcia-Rawlins, Rosales and Mijares2013), and this method cannot be used to discriminate between T. brucei subspecies (Hamilton et al. Reference Hamilton, Adams, Malele and Gibson2008).
The 18S rRNA gene has long been a popular target for molecular detection methods in protists (D'Avila-Levy et al. Reference D'Avila-Levy, Boucinha, Kostygov, Santos, Morelli, Grybchuk-Ieremenko, Duval, Votýpka, Yurchenko, Grellier and Lukeš2015). It is a highly expressed multicopy gene, present in all eukaryotes, with an assortment of conserved and variable nucleotide sequences that offer targets for universal primers, whilst still providing a wealth of taxonomic information. As sequence-based molecular methods gained popularity, the 18S rRNA gene succeeded protein-coding genes (e.g. Fernandes et al. Reference Fernandes, Nelson and Beverley1993; Hashimoto et al. Reference Hashimoto, Nakamura, Kamaishi, Adachi, Nakamura, Okamoto and Hasegawa1995; Adjé et al. Reference Adjé, Opperdoes and Michels1998) to become the gene of choice for nearly all trypanosome evolutionary analysis (Maslov et al. Reference Maslov, Lukes, Jirku and Simpson1996; Lukes et al. Reference Lukes, Jirku, Dolezel, Kral'ova, Hollar and Maslov1997; Haag et al. Reference Haag, O'hUigin and Overath1998; Stevens et al. Reference Stevens, Noyes and Gibson1998, Reference Stevens, Noyes, Dover and Gibson1999) and, as a result, has formed the basis of all modern trypanosome taxonomic frameworks (e.g. Hamilton et al. Reference Hamilton, Gibson and Stevens2007; Lima et al. Reference Lima, Espinosa-Álvarez, Pinto, Cavazzana, Pavan, Carranza, Lim, Campaner, Takata, Camargo, Hamilton and Teixeira2015; Dario et al. Reference Dario, Moratelli, Schwabi, Jansen and Llewellyn2017). However, while nearly all trypanosome phylogenies have been constructed using 18S rRNA sequences, inadequate signals at certain depths of phylogenetic reconstruction have necessitated the use of additional trypanosome gene markers such as the glyceraldehyde phosphate dehydrogenase (GAPDH) gene. Nonetheless, the framework described using the 18S rRNA has proven robust: other gene markers have complemented and strengthened this framework without fundamentally changing the nature of the basic relationships described based on 18S rRNA data; ultimately, this framework has also been fully supported by whole-genome phylogenetic comparisons (Leonard et al. Reference Leonard, Soanes and Stevens2011). In addition, as the 18S rRNA is one of the most widely used markers for trypanosomes, it is well represented in sequence databases such as GenBank (D'Avila-Levy et al. Reference D'Avila-Levy, Boucinha, Kostygov, Santos, Morelli, Grybchuk-Ieremenko, Duval, Votýpka, Yurchenko, Grellier and Lukeš2015).
Another popular molecular marker is the GAPDH gene. Few if any gaps are required for alignment of trypanosome GAPDH sequences, and sequences are shorter than those of 18S rRNA; sequencing this ‘housekeeping gene’ can be more economical, but provides a complementary depth of phylogenetic information in trypanosomes (Hamilton et al. Reference Hamilton, Stevens, Gaunt, Gidley and Gibson2004; Adams and Hamilton, Reference Adams and Hamilton2008). GAPDH genes are relatively conserved and are therefore useful for resolving deep phylogenetic relationships (Hamilton et al. Reference Hamilton, Stevens, Gaunt, Gidley and Gibson2004). However, in order to determine close relationships, GAPDH must be used in conjunction with another barcoding marker; GAPDH has been used successfully with the 18S rRNA for trypanosome identification, and has proven suitable for novel species and mixed infections (e.g. Hamilton et al. Reference Hamilton, Adams, Malele and Gibson2008; Barbosa et al. Reference Barbosa, Mackie, Stenner, Gillett, Irwin and Ryan2016).
Internal transcribed spacer (ITS) regions have been widely used for barcoding in some organisms, e.g. fungi (Pawlowski et al. Reference Pawlowski, Audic, Adl, Bass, Belbahri, Berney, Bowser, Cepicka, Decelle, Dunthorn, Fiore-Donno, Gile, Holzmann, Jahn, Jirků, Keeling, Kostka, Kudryavtsev, Lara, Lukeš, Mann, Mitchell, Nitsche, Romeralo, Saunders, Simpson, Smirnov, Spouge, Stern, Stoeck, Zimmermann, Schindel and De Vargas2012); however, while they have long been utilized for the detection of trypanosomes (Desquesnes and Davila, Reference Desquesnes and Davila2002; Adams et al. Reference Adams, Hamilton, Malele and Gibson2008; Desquesnes et al. Reference Desquesnes, Kamyingkird, Yangtara, Milocco, Ravel, Wang, Lun, Morand and Jittapalapong2011; Hernández and Ramírez, Reference Hernández and Ramírez2013), they have not yet been used specifically for the barcoding of different trypanosome species. Identification of species depends on the length of the amplified fragments of ribosomal RNA produced via PCR using primers complementary to conserved regions of the 18S, 28S and 5·8S rRNA genes matching all species of interest. This means species determination is possible for mixed infections, except in cases where the amplicon length is similar between species or there is intra-species variation (Adams and Hamilton, Reference Adams and Hamilton2008; Hamilton et al. Reference Hamilton, Adams, Malele and Gibson2008; Gibson, Reference Gibson2009). Another constraint of the ITS region, as with all mitochondrial genes as targets for barcoding, is its relatively low copy number (100–200 repeats), compared with that of satellite DNA (10 000–20 000 repeats), which can limit the sensitivity of tests (Desquesnes and Davila, Reference Desquesnes and Davila2002).
The kinetoplast is a modified mitochondrion unique to kinetoplast protists and kinetoplast DNA (kDNA) minicircles have been successfully used in PCR assays for the identification of a number of Trypanosoma species. The high copy number of these minicircles – several thousand per cell – lends itself to highly sensitive diagnostics. However, high levels of nucleotide polymorphism between repeats of kDNA fragments make these genes unsuitable for sequence alignment (De Oliveira Ramos Pereira and Brandão, Reference De Oliveira Ramos Pereira and Brandão2013). Only very short regions (100–200 base pairs) of kDNA minicircles are conserved and for some trypanosomes, such as T. brucei, there is only one of these regions per minicircle (Jensen and Englund, Reference Jensen and Englund2012). Low levels of conserved sequences in kDNA make it difficult to develop universal primers and limit the depth of phylogenetic information that can be elucidated from these sequences.
Spliced leader RNA (SL RNA) or ‘mini-exon donor RNA’ is another feature unique to kinetoplastid protists and has also been used as a target for barcoding (Rodrigues et al. Reference Rodrigues, Garcia, Batista, Minervino, Góes-Cavalcante, Da Silva, Ferreira, Campaner, Paiva and Teixeira2010; Lima et al. Reference Lima, Espinosa-Álvarez, Pinto, Cavazzana, Pavan, Carranza, Lim, Campaner, Takata, Camargo, Hamilton and Teixeira2015). The SL RNA genes are arranged as tandem repeats, with each repeat comprising many repeat units with regions of differing variability (Rodrigues et al. Reference Rodrigues, Garcia, Batista, Minervino, Góes-Cavalcante, Da Silva, Ferreira, Campaner, Paiva and Teixeira2010). The conserved regions are convenient for primer targeting, whilst the more variable intergenic regions permit distinction between closely related trypanosomes (Westenberger et al. Reference Westenberger, Sturm, Yanega, Podlipaev, Zeledon, Campbell and Maslov2004). However, there are no primers currently available that are applicable to all trypanosomes (D'Avila-Levy et al. Reference D'Avila-Levy, Boucinha, Kostygov, Santos, Morelli, Grybchuk-Ieremenko, Duval, Votýpka, Yurchenko, Grellier and Lukeš2015), and the high mutation rate of intergenic regions makes it difficult to compare sequences across the full spectrum of trypanosomes or to define any meaningful phylogeny beyond closely related taxa (Gibson et al. Reference Gibson, Bingle, Blendeman, Brown, Wood and Stevens2000). Previous attempts to use SL RNA barcodes for trypanosomes delimited species using an arbitrary level of sequence similarity (90%) (Votýpka et al. Reference Votýpka, Maslov, Yurchenko, Jirků, Kment, Lun and Lukeš2010). However, this threshold is insufficient for discriminating between closely related Trypanosoma species that share up to 98% similarity in their SL transcripts (Gibson et al. Reference Gibson, Bingle, Blendeman, Brown, Wood and Stevens2000).
A significant (and pragmatic) consideration when choosing a target gene for barcoding is the availability of sequences. Protists are poorly represented in sequence libraries and comprise just over 2% of the sequences currently in GenBank (National Center for Biotechnology Information (NCBI), 2017), despite constituting the majority of samples in environmental surveys (Del Campo et al. Reference Del Campo, Sieracki, Molestina, Keeling, Massana and Ruiz-Trillo2015). In addition, the sequence availability of Trypanosoma species is further skewed towards human-infective species and those infecting important agricultural species, such as cattle, which are over-represented in sequence databases relative to other trypanosomes, including parasites of insects and plants (D'Avila-Levy et al. Reference D'Avila-Levy, Boucinha, Kostygov, Santos, Morelli, Grybchuk-Ieremenko, Duval, Votýpka, Yurchenko, Grellier and Lukeš2015).
Whilst a bias towards medically important parasites is understandable, the paucity of genomic data from other Trypanosoma is a continuing impediment to our understanding of the evolutionary history and intricate phylogenetic relationships within this diverse group of parasites.
Gene or genes?
As the number of genes scrutinized for their barcoding potential has increased, it has become apparent that no test amplifying a single fragment has the differential power necessary to fully and reliably resolve the phylogeny of all trypanosomes (Hamilton et al. Reference Hamilton, Gibson and Stevens2007; Adams and Hamilton, Reference Adams and Hamilton2008; Pompanon and Samadi, Reference Pompanon and Samadi2015). Barcoding methods that utilize multiple loci have the advantage of additional power and accuracy (Mallo and Posada, Reference Mallo and Posada2016), and nested strategies that utilize ‘a universal pre-barcode’ and a ‘group specific’ barcode have been proposed by the Protist Working Group (ProWG) as alternative methods to resolve interspecies relationships (Pawlowski et al. Reference Pawlowski, Audic, Adl, Bass, Belbahri, Berney, Bowser, Cepicka, Decelle, Dunthorn, Fiore-Donno, Gile, Holzmann, Jahn, Jirků, Keeling, Kostka, Kudryavtsev, Lara, Lukeš, Mann, Mitchell, Nitsche, Romeralo, Saunders, Simpson, Smirnov, Spouge, Stern, Stoeck, Zimmermann, Schindel and De Vargas2012). In addition, we anticipate that the increasing ease and ever reducing costs of genome-wide SNP discovery in non-model organisms will lead to major advances in the use of SNP chip-based diagnostics in the near future.
Optimizing fragment length
In the past, target fragment length has been limited by the technology available. When molecular methods were first introduced, sequencing was only possible up to a few hundred base pairs. However, with the growth of Next-Generation Sequencing, the cost of sequencing has decreased by a factor of 104 in the last 10 years (Hayden, Reference Hayden2014; Van Nimwegen et al. Reference Van Nimwegen, Van Soest, Veltman, Nelen, Van Der Wilt, Vissers and Grutters2016).
But is bigger always better?
Should we strive for barcode fragments with a length at the ever-increasing limit of our sequencing ability? Here, there is a significant trade-off to consider; optimal sequence length of the target region is highly dependant on the user's requirements. Shorter fragments result in higher sensitivity tests, favourable for analysis of degraded DNA from field samples. In diagnostic or clinical situations, for example, where the objective is to discriminate between the two human-infective subspecies of T. brucei, a shorter fragment is likely to provide all the required information. However, it is only with longer fragments that we can infer robust phylogenetic information at the subspecies level (Pompanon and Samadi, Reference Pompanon and Samadi2015); recreating the evolutionary history of a collection of poorly known or newly discovered species is likely to call for a very long target region, though this is, of course, a very different task than routine, high-throughput barcoding of large numbers of specimens.
An alternative future for diagnostics? Isothermal techniques
The use of isothermal amplification molecular methods, such as loop-mediated isothermal amplification and nucleic acid sequence-based amplification are becoming increasingly popular for the detection of trypanosomes as they offer simple, rapid and cheap alternatives to traditional PCR-based methods (Mugasa et al. Reference Mugasa, Katiti, Boobo, Lubega, Schallig and Matovu2014; Besuschio et al. Reference Besuschio, Murcia, Benatar, Monnerat, Cruz, Picado, Curto, Kubota, Wehrendt, Pavia, Mori, Puerta, Ndung'u and Schijman2017; Rivero et al. Reference Rivero, Bisio, Velázquez, Esteva, Scollo, González, Altcheh and Ruiz2017). Isothermal tests involve a single reaction in a single tube incubated at a constant temperature; therefore, these techniques do not require the expensive thermocycling equipment that is necessary for PCR (Matovu et al. Reference Matovu, Mugasa, Ekangu, Deborggraeve, Lubega, Laurent, Schoone, Schallig and Büscher2010; Wastling and Welburn, Reference Wastling and Welburn2011). The simplicity, sensitivity and low cost of isothermal techniques make them strong candidates for the application of molecular methods in field diagnostics in resource-poor areas (Laohasinnarong, Reference Laohasinnarong2011; Ricciardi and Ndao, Reference Ricciardi and Ndao2015). However, a number of additional costs must be considered when evaluating the suitability of these methods for field diagnostics, including: the need for six primers, heating and maintaining samples at 65 °C, and expensive dyes for visualization of results (Enyaru et al. Reference Enyaru, Ouma, Malele, Matovu and Masiga2010; Wastling and Welburn, Reference Wastling and Welburn2011). In addition, the ability of these tests to amplify extremely small amounts of DNA mean that they are highly prone to contamination. Developing simplified ‘kit’ forms of these techniques, and refining those already available, may yield promising alternatives to sequence-based barcoding for clinical purposes (Mugasa et al. Reference Mugasa, Katiti, Boobo, Lubega, Schallig and Matovu2014).
DISCUSSION
Towards a spectrum of similarity: an alternative approach to barcoding
Rather than identifying species by the length of their amplified fragments, we propose the adoption of a technique that identifies species, within a defined group, by the level of concordance across a selected gene, e.g. 18S rRNA, or partial gene (Fig. 3). Sequence differences between a cohort of species are tracked along a specified gene, highlighting regions rich with phylogenetic variety. Species can then be identified by the degree of similarity across the selected region(s), for example, see Stevens and Wall (Reference Stevens and Wall2001). The resulting spectrum of similarity can provide a valuable tool for understanding the relative level of sequence differentiation of any putative species, as their place in the spectrum will provide clues as to their phylogenetic placement.
Such an approach offers several benefits, including (as with any barcoding approach) the adoption of a standardized marker (or set of markers) and the ability to compare findings across studies, together with the practical benefits of being able to utilize a limited number of standardized primers. In the ‘sliding window’ approach proposed by Stevens and Wall (Reference Stevens and Wall2001), the use of a given molecular marker in conjunction with a particular group of taxa allows the gene region (to be adopted for subsequent barcoding) to be selected based on the degree of phylogenetic resolution delivered by the particular sequence positions used within the target gene. More recently, Hadziavdic et al. (Reference Hadziavdic, Lekang, Lanzen, Jonassen, Thompson and Troedsson2014) undertook a much broader study along similar lines, screening for variation across more than 500 000 eukaryote 18S rRNA sequences (see also Pawlowski et al. (Reference Pawlowski, Audic, Adl, Bass, Belbahri, Berney, Bowser, Cepicka, Decelle, Dunthorn, Fiore-Donno, Gile, Holzmann, Jahn, Jirků, Keeling, Kostka, Kudryavtsev, Lara, Lukeš, Mann, Mitchell, Nitsche, Romeralo, Saunders, Simpson, Smirnov, Spouge, Stern, Stoeck, Zimmermann, Schindel and De Vargas2012) for a review of the potential role of the V4 region of 18S rRNA as a candidate universal barcoding marker). Such approaches go a long way towards fulfilling the requirements for marker selection as set out in Box 1. To date, however, while several studies have focused on the use of the V7–V8 sub-region of 18S rRNA (e.g. Smith et al. Reference Smith, Clark, Averis, Lymbery, Wayne, Morris and Thompson2008; Averis et al. Reference Averis, Thompson, Lymbery, Wayne, Morris and Smith2009), citing its phylogenetic informativeness (but, see Hamilton and Stevens, Reference Hamilton and Stevens2011), this approach remains to be systematically applied across the full 18S rRNA gene in trypanosomes.
Can barcoding be all things to all people?
The ideal barcode from a gene region that yields enough sequence variation to capture the vast diversity of trypanosomes may provide a level of discrimination sufficient for diagnostic and identification purposes. However, it is questionable whether the same barcode could also provide enough variation to fully capture the phylogenetic relationships or complex evolutionary history of such a diverse group of organisms. In cases where genetic functionality is the key interest, barcoding is likely to be of little use. In the field, adequate preservation methods would be required to maintain the integrity of DNA from samples in order to apply any barcoding method successfully (Reeves et al. Reference Reeves, Holderman, Gillett-Kaufman, Kawahara and Kaufman2016).
The development of a perfect and truly universal barcode, based on a single primer pair, may be not only unattainable but also impractical. Different avenues of research have different requirements, in terms of both the techniques they use and the information required/acquired. A geneticist studying the evolution of trypanosomes needs a way to detect intricate relationships over a range of evolutionary timescales (from, for example, the (putatively) most ancient to most recent: Simpson et al. Reference Simpson, Gill, Callahan, Litaker and Roger2004; Flegontov et al. Reference Flegontov, Votýpka, Skalický, Logacheva, Penin, Tanifuji, Onodera, Kondrashov, Volf, Archibald and Lukeš2013; Hamilton et al. Reference Hamilton, Stevens, Gaunt, Gidley and Gibson2004; Stevens & Rambaut, Reference Stevens and Rambaut2001; Haag et al. Reference Haag, O'hUigin and Overath1998; Lima et al. Reference Lima, Espinosa-Álvarez, Pinto, Cavazzana, Pavan, Carranza, Lim, Campaner, Takata, Camargo, Hamilton and Teixeira2015; Balmer et al. Reference Balmer, Beadell, Gibson and Caccone2011; Messenger et al. Reference Messenger, Llewellyn, Bhattacharyya, Franzén, Lewis, Ramírez, Carrasco, Andersson and Miles2012), and it may be that a suite of gene markers is required to provide sufficient detail at all levels of phylogenetic depth. Conversely, for a clinician diagnosing patients in a resource-poor community, the nuances of an organism's evolutionary history are all but irrelevant. Identification of the parasite often determines treatment, so in this case the sensitivity and specificity of a diagnostic test becomes the overriding priority.
The range of requirements for the detection and identification of trypanosomes must be considered when selecting gene targets for barcoding, and the benefits of each molecular marker weighed against its limitations. For example, SL RNA is an ideal marker for detection of parasites in field samples, as this region is not present in either insect or vertebrate hosts (Westenberger et al. Reference Westenberger, Sturm, Yanega, Podlipaev, Zeledon, Campbell and Maslov2004). However, 18S rRNA may be preferable for field samples with potentially poor quality template DNA, as this region is relatively well protected against degradation (Basiye et al. Reference Basiye, Schoone, Beld, Minnaar, Ngeranwa, Wasunna and Schallig2011). Moreover, if a sample is for clinical diagnosis, diagnostic sensitivity is likely to be a priority – especially if parasitaemia is low. Therefore, a target marker would ideally be one with a high copy number (Hernández and Ramírez, Reference Hernández and Ramírez2013).
To date, there has been limited investigation into the comparative efficacy of different target regions for barcoding in trypanosomes. The barcoding technique presented in Fig. 3 can be applied to existing barcoding markers, as well as identifying the most phylogenetically informative regions, guiding the development of new primer targets. Rather than striving for a single, universal trypanosome barcode it may be advisable to adopt a multi-locus barcoding approach, similar to that suggested by Pawlowski et al. (Reference Pawlowski, Audic, Adl, Bass, Belbahri, Berney, Bowser, Cepicka, Decelle, Dunthorn, Fiore-Donno, Gile, Holzmann, Jahn, Jirků, Keeling, Kostka, Kudryavtsev, Lara, Lukeš, Mann, Mitchell, Nitsche, Romeralo, Saunders, Simpson, Smirnov, Spouge, Stern, Stoeck, Zimmermann, Schindel and De Vargas2012) that can be adapted depending on the user's particular circumstances and requirements.
Concluding remarks and future directions
At present, molecular methods are mostly used only in sophisticated research laboratories, and there is a concern that new techniques are ‘merely another addition to an ever-expanding toolbox of molecular assays for research’ (Wastling and Welburn, Reference Wastling and Welburn2011), rather than having any clinical diagnostic utility. And, whilst there has been a drive to develop and refine new molecular diagnostics, the sensitivity of existing techniques may be greatly improved if more research was conducted on initial stages, such as sample preparation and DNA extraction (Dunlop et al. Reference Dunlop, Thompson, Godfrey and Thompson2014). However, recent developments in molecular methods for trypanosome identification have succeeded in unveiling a number of previously unidentified species (Adams et al. Reference Adams, Hamilton and Gibson2010; Hutchinson and Gibson, Reference Hutchinson and Gibson2015) and may offer new opportunities for the identification of novel hybrids (Koffi et al. Reference Koffi, De Meeûs, Séré, Bucheton, Simo, Njiokou, Salim, Kaboré, MacLeod, Camara, Solano, Belem and Jamonneau2015; Tihon et al. Reference Tihon, Imamura, Dujardin, Van Den Abbeele and Van den Broeck2017) and the epidemiological tracking of trypanosome strains spread by the movement of host cattle (Févre et al. Reference Févre, Picozzi, Fyfe, Waiswa, Odiit, Coleman and Welburn2005). Nonetheless, a lack of comparable data between parasite surveys makes it difficult to draw any firm conclusions regarding species prevalence, and the full extent of trypanosome diversity remains unknown at this time (Adams et al. Reference Adams, Hamilton and Gibson2010; D'Avila-Levy et al. Reference D'Avila-Levy, Boucinha, Kostygov, Santos, Morelli, Grybchuk-Ieremenko, Duval, Votýpka, Yurchenko, Grellier and Lukeš2015). Priority should be given to the establishment of a standardized barcoding protocol for the detection and identification of trypanosomes (matching as close as possible the criteria given in Box 1). A standard barcoding protocol with requirement-dependant refinements is likely to be the closest we can ever come to obtaining a truly universal barcode for trypanosomes.
DATA STATEMENT
The research materials supporting this publication can be publicly accessed at the Parasitology journal website as Supplementary Information and/or by contacting the corresponding author.
SUPPLEMENTARY MATERIAL
The supplementary material for this article can be found at https://doi.org/10.1017/S0031182017002049.
AUTHOR ORCIDS
R. Hutchinson: orcid ID 0000-0003-0336-9594; J. R. Stevens: orcid ID: 0000-0002-1317-6721.
FINANCIAL SUPPORT
R. H. is funded by a UK Biotechnology and Biological Sciences Research Council PhD studentship (grant No. BB/M009122/1) held by Professor Wendy Gibson (Bristol) and J. R. S., as part of the South West Doctoral Training Partnership (SWBio DTP).