Skip to main content Accessibility help
×
Hostname: page-component-5c6d5d7d68-wtssw Total loading time: 0 Render date: 2024-08-18T19:08:28.583Z Has data issue: false hasContentIssue false

9 - Burrows–Wheeler indexes

from Part III - Genome-Scale Index Structures

Published online by Cambridge University Press:  05 May 2015

Veli Mäkinen
Affiliation:
University of Helsinki
Djamal Belazzougui
Affiliation:
University of Helsinki
Fabio Cunial
Affiliation:
University of Helsinki
Alexandru I. Tomescu
Affiliation:
University of Helsinki
Get access

Summary

Consider the largest sequenced genome to date, the approximately 25.5 · 109 basepair-long genome of the white spruce Picea glauca. The suffix array of this genome would take approximately 1630 · 109 bits of space, or approximately 204 gigabytes, if we represent it with 64-bit integers, and it would take approximately 110 gigabytes if we represent it with fixed-length bit-field arrays. The suffix tree of the same genome would occupy from three to five times more space. However, the genome itself can be represented in just 51 · 109 bits, or approximately 6.4 gigabytes, assuming an alphabet of size σ = 4: this is equivalent to n log σ bits, where n is the length of the string. The gaps between the sizes of the suffix tree, of the suffix array, and of the original string are likely to grow as more species are sequenced: for example, the unsequenced genome of the flowering plant Paris japonica is estimated to contain 150 · 109 base pairs. The size of metagenomic samples is increasing at an even faster pace, with files containing approximately 160 · 109 base pairs already in public datasets.

Burrows–Wheeler indexes are a space-efficient variant of suffix arrays and suffix trees that take n log σ(1 + o(1)) bits for a genome of length n on an alphabet of size σ (or just about 10 gigabytes for Picea glauca), and that support a number of high-throughput sequencing analyses approximately as fast as suffix arrays and suffix trees. Recall from Section 2.2 that we use the term succinct for data structures that, like Burrows–Wheeler indexes, occupy n log σ(1 + o(1)) bits. This chapter walks the reader through the key algorithmic concepts behind Burrows–Wheeler indexes, leaving applications to Part IV and Part V.

After describing the Burrows–Wheeler transform of a string and related data structures for counting and locating all the occurrences of a pattern, the focus shifts to the bidirectional Burrows–Wheeler index, a powerful data structure that allows one to enumerate all internal nodes of the suffix tree of a string in a small amount of space.

Type
Chapter
Information
Genome-Scale Algorithm Design
Biological Sequence Analysis in the Era of High-Throughput Sequencing
, pp. 157 - 198
Publisher: Cambridge University Press
Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×