Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgements
- Design and conventions of this book
- 1 Introduction: working with the molecules of life in the computer
- 2 Gene technology: cutting DNA
- 3 Gene technology: knocking genes down
- 4 Gene technology: amplifying DNA
- 5 Human disease: when DNA sequences are toxic
- 6 Human disease: iron imbalance and the iron responsive element
- 7 Human disease: cancer as a result of aberrant proteins
- 8 Evolution: what makes us human?
- 9 Evolution: resolving a criminal case
- 10 Evolution: the sad case of the Tasmanian tiger
- 11 A function to every gene: termites, metagenomics and learning about the function of a sequence
- 12 A function to every gene: royal blood and order in the sequence universe
- 13 A function to every gene: a slimy molecule
- 14 Information resources: learning about flu viruses
- 15 Finding genes: going ashore at CpG islands
- 16 Finding genes: in the world of snurpsp
- 17 Finding genes: hunting for the distant RNA relatives
- 18 Personal genomes: the differences between you and me
- 19 Personal genomes: what’s in my genome?
- 20 Personal genomes: details of family genetics
- Appendix I Brief Unix reference
- Appendix II A selection of biological sequence analysis software
- Appendix III A short Perl reference
- Appendix IV A brief introduction to R
- Index
- References
15 - Finding genes: going ashore at CpG islands
Published online by Cambridge University Press: 05 August 2012
- Frontmatter
- Contents
- Preface
- Acknowledgements
- Design and conventions of this book
- 1 Introduction: working with the molecules of life in the computer
- 2 Gene technology: cutting DNA
- 3 Gene technology: knocking genes down
- 4 Gene technology: amplifying DNA
- 5 Human disease: when DNA sequences are toxic
- 6 Human disease: iron imbalance and the iron responsive element
- 7 Human disease: cancer as a result of aberrant proteins
- 8 Evolution: what makes us human?
- 9 Evolution: resolving a criminal case
- 10 Evolution: the sad case of the Tasmanian tiger
- 11 A function to every gene: termites, metagenomics and learning about the function of a sequence
- 12 A function to every gene: royal blood and order in the sequence universe
- 13 A function to every gene: a slimy molecule
- 14 Information resources: learning about flu viruses
- 15 Finding genes: going ashore at CpG islands
- 16 Finding genes: in the world of snurpsp
- 17 Finding genes: hunting for the distant RNA relatives
- 18 Personal genomes: the differences between you and me
- 19 Personal genomes: what’s in my genome?
- 20 Personal genomes: details of family genetics
- Appendix I Brief Unix reference
- Appendix II A selection of biological sequence analysis software
- Appendix III A short Perl reference
- Appendix IV A brief introduction to R
- Index
- References
Summary
Elucidation of the human genome sequence was a significant milestone in the life sciences (Lander et al., 2001; Venter et al., 2001; International Human Genome Sequencing Consortium, 2004). However, with access to this information an obvious but entirely non-trivial problem was encountered. What does all the genetic information in the form of some three billion bases represent in biological terms? One important category of information is the sequences that specify genes, i.e. regions that give rise to mRNAs that in turn encode specific protein molecules. Not only proteins are specified by the genome; also a large number of RNAs are transcribed from DNA that do not give rise to mRNA, but have other functions. (These are non-coding RNAs and will be discussed in Chapter 17.) In the next three chapters we will deal with the computational problems of finding proteins and non-coding RNA genes, starting out with a genomic sequence.
When it comes to the protein-coding genes of a mammalian genome, only a very small fraction, about 1.5–2%, of the genome codes for protein. For these genomes we are faced with the problem of identifying relatively small and scattered coding regions in a vast sea of non-coding material. There is a striking difference in this respect between mammals and a bacterium like Escherichia coli, whose genome contains as much as 83% of coding sequence. In the next chapter the focus will be on prediction of exon regions of protein-coding genes. Here we will address another sub-problem of finding protein-coding genes. We will see how a simple prediction of regions known as CpG islands will help us to locate sites in the genome that are close to the transcription start sites of genes.
- Type
- Chapter
- Information
- Genomics and BioinformaticsAn Introduction to Programming Tools for Life Scientists, pp. 198 - 207Publisher: Cambridge University PressPrint publication year: 2012