Finding genes: going ashore at CpG islands

Tore Samuelsson

doi:10.1017/CBO9781139022095.017

15 - Finding genes: going ashore at CpG islands

Published online by Cambridge University Press: 05 August 2012

Tore Samuelsson

Show author details

Tore Samuelsson: Affiliation:
Göteborgs Universitet, Sweden

Book contents

Get access

Summary

Elucidation of the human genome sequence was a significant milestone in the life sciences (Lander et al., 2001; Venter et al., 2001; International Human Genome Sequencing Consortium, 2004). However, with access to this information an obvious but entirely non-trivial problem was encountered. What does all the genetic information in the form of some three billion bases represent in biological terms? One important category of information is the sequences that specify genes, i.e. regions that give rise to mRNAs that in turn encode specific protein molecules. Not only proteins are specified by the genome; also a large number of RNAs are transcribed from DNA that do not give rise to mRNA, but have other functions. (These are non-coding RNAs and will be discussed in Chapter 17.) In the next three chapters we will deal with the computational problems of finding proteins and non-coding RNA genes, starting out with a genomic sequence.

When it comes to the protein-coding genes of a mammalian genome, only a very small fraction, about 1.5–2%, of the genome codes for protein. For these genomes we are faced with the problem of identifying relatively small and scattered coding regions in a vast sea of non-coding material. There is a striking difference in this respect between mammals and a bacterium like Escherichia coli, whose genome contains as much as 83% of coding sequence. In the next chapter the focus will be on prediction of exon regions of protein-coding genes. Here we will address another sub-problem of finding protein-coding genes. We will see how a simple prediction of regions known as CpG islands will help us to locate sites in the genome that are close to the transcription start sites of genes.

Type: Chapter
Information: Genomics and Bioinformatics
An Introduction to Programming Tools for Life Scientists
, pp. 198 - 207

DOI: https://doi.org/10.1017/CBO9781139022095.017 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Durbin, R. Eddy, S. R. Krogh, A. Mitchison, G. 2007 Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Cambridge Cambridge University Press Google Scholar

Gardiner-Garden, M. Frommer, M. 1987 CpG islands in vertebrate genomes J Mol Biol 196 261 CrossRef Google Scholar PubMed

Illingworth, R. S. Bird, A. P. 2009 CpG islands: ’a rough guide FEBS Lett 583 1713 CrossRef Google Scholar

International Human Genome Sequencing Consortium 2004 Finishing the euchromatic sequence of the human genome Nature 431 931 CrossRef Google Scholar PubMed

Lander, E. S. Linton, L. M. Birren, B. 2001 Initial sequencing and analysis of the human genome Nature 409 860 Google Scholar PubMed

Takai, D. Jones, P. A. 2002 Comprehensive analysis of CpG islands in human chromosomes 21 and 22 Proc Natl Acad Sci U S A 99 3740 CrossRef Google Scholar PubMed

Takai, D. Jones, P. A. 2003 The CpG island searcher: a new WWW resource In Silico Biol 3 235 Google Scholar PubMed

Venter, J. C. Adams, M. D. Myers, E. W. 2001 The sequence of the human genome Science 291 1304 CrossRef Google Scholar PubMed

Zhao, Z. Han, L. 2009 CpG islands: algorithms and applications in methylation studies Biochem Biophys Res Commun 382 643 CrossRef Google Scholar PubMed

Book contents

15 - Finding genes: going ashore at CpG islands

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive