Predicting Species: Statistical Models

Rex A. Dwyer

doi:10.1017/CBO9781139164764.005

4 - Predicting Species: Statistical Models

Published online by Cambridge University Press: 05 June 2012

Rex A. Dwyer

Show author details

Rex A. Dwyer: Affiliation:
The BioAlgorithmic Consultancy

Book contents

Get access

Summary

Suppose we are given a strand of DNA and asked to determine whether it comes from corn (Zea mays) or from fruit flies (Drosophila melanogaster). One very simple way to attack this problem is to analyze the relative frequencies of the nucleotides in the strand. Even before the double-helix structure of DNA was determined, researchers had observed that, while the numbers of Gs and Cs in a DNA were roughly equal (and likewise for As and Ts), the relative numbers of G + C and A + T differed from species to species. This relationship is usually expressed as percent GC, and species are said to be GC-rich or GC-poor. Corn is slightly GC-poor, with 49% GC. Fruit fly is GC-rich, with 55% GC.

We examine the first ten bases of our DNA and see: GATGTCGTAT. Is this DNA from corn or fruit fly?

First of all, it should be clear that we cannot get a definitive answer to the question by observing bases, especially just a few. Corn's protein-coding sequences are distinctly GC-rich, while its noncoding DNA is GC-poor. Such variations in GC content within a single genome are sometimes exploited to find the starting point of genes in the genome (near so-called CpG islands). In the absence of additional information, the best we can hope for is to learn whether it's “more likely” that we have corn or fly DNA, and how much more likely.

Type: Chapter
Information: Genomic Perl
From Bioinformatics Basics to Working Code
, pp. 44 - 54

DOI: https://doi.org/10.1017/CBO9781139164764.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2002

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

4 - Predicting Species: Statistical Models

Summary

Access options

Book purchase

Temporarily unavailable

Book contents

4 - Predicting Species: Statistical Models

Summary

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive