Book contents
- Frontmatter
- Contents
- List of Contributors
- Preface
- 1 An Introduction to Next-Generation Biological Platforms
- 2 An Introduction to The Cancer Genome Atlas
- 3 DNA Variant Calling in Targeted Sequencing Data
- 4 Statistical Analysis of Mapped Reads from mRNA-Seq Data
- 5 Model-Based Methods for Transcript Expression-Level Quantification in RNA-Seq
- 6 Bayesian Model-Based Approaches for Solexa Sequencing Data
- 7 Statistical Aspects of ChIP-Seq Analysis
- 8 Bayesian Modeling of ChIP-Seq Data from Transcription Factor to Nucleosome Positioning
- 9 Multivariate Linear Models for GWAS
- 10 Bayesian Model Averaging for Genetic Association Studies
- 11 Whole-Genome Multi-SNP-Phenotype Association Analysis
- 12 Methods for the Analysis of Copy Number Data in Cancer Research
- 13 Bayesian Models for Integrative Genomics
- 14 Bayesian Graphical Models for Integrating Multiplatform Genomics Data
- 15 Genetical Genomics Data: Some Statistical Problems and Solutions
- 16 A Bayesian Framework for Integrating Copy Number and Gene Expression Data
- 17 Application of Bayesian Sparse Factor Analysis Models in Bioinformatics
- 18 Predicting Cancer Subtypes Using Survival-Supervised Latent Dirichlet Allocation Models
- 19 Regularization Techniques for Highly Correlated Gene Expression Data with Unknown Group Structure
- 20 Optimized Cross-Study Analysis of Microarray-Based Predictors
- 21 Functional Enrichment Testing: A Survey of Statistical Methods
- 22 Discover Trend and Progression Underlying High-Dimensional Data
- 23 Bayesian Phylogenetics Adapts to Comprehensive Infectious Disease Sequence Data
- Index
- Plate section
21 - Functional Enrichment Testing: A Survey of Statistical Methods
Published online by Cambridge University Press: 05 June 2013
- Frontmatter
- Contents
- List of Contributors
- Preface
- 1 An Introduction to Next-Generation Biological Platforms
- 2 An Introduction to The Cancer Genome Atlas
- 3 DNA Variant Calling in Targeted Sequencing Data
- 4 Statistical Analysis of Mapped Reads from mRNA-Seq Data
- 5 Model-Based Methods for Transcript Expression-Level Quantification in RNA-Seq
- 6 Bayesian Model-Based Approaches for Solexa Sequencing Data
- 7 Statistical Aspects of ChIP-Seq Analysis
- 8 Bayesian Modeling of ChIP-Seq Data from Transcription Factor to Nucleosome Positioning
- 9 Multivariate Linear Models for GWAS
- 10 Bayesian Model Averaging for Genetic Association Studies
- 11 Whole-Genome Multi-SNP-Phenotype Association Analysis
- 12 Methods for the Analysis of Copy Number Data in Cancer Research
- 13 Bayesian Models for Integrative Genomics
- 14 Bayesian Graphical Models for Integrating Multiplatform Genomics Data
- 15 Genetical Genomics Data: Some Statistical Problems and Solutions
- 16 A Bayesian Framework for Integrating Copy Number and Gene Expression Data
- 17 Application of Bayesian Sparse Factor Analysis Models in Bioinformatics
- 18 Predicting Cancer Subtypes Using Survival-Supervised Latent Dirichlet Allocation Models
- 19 Regularization Techniques for Highly Correlated Gene Expression Data with Unknown Group Structure
- 20 Optimized Cross-Study Analysis of Microarray-Based Predictors
- 21 Functional Enrichment Testing: A Survey of Statistical Methods
- 22 Discover Trend and Progression Underlying High-Dimensional Data
- 23 Bayesian Phylogenetics Adapts to Comprehensive Infectious Disease Sequence Data
- Index
- Plate section
Summary
Introduction and Motivation
With the proliferation of available high-throughput data being generated by platforms such as microarray, mass spectrometry, and sequencing machines, scientists are now commonly confronted with measurements of thousands of molecules. The term molecule here can refer to anything from an mRNA strand to an SNP (single nucleotide polymorphism) to a peptide to an exon. When analysis is done across these molecules on a genome-wide basis, it becomes difficult for analysts to interpret the results. In many instances, investigators are provided with a list of ranked molecules that are then prioritized for further study. Although this represents a fairly reasonable approach, it has the potential to be unwieldy depending on the number of molecules under consideration. Additionally, meta-analysis of high-throughput gene expression studies have found that sets of molecules such as metabolic pathways and signaling cascades are dysregulated more consistently across studies than are the specific molecules detected within these sets (e.g., Fan et al., 2006).
Predicated on the assumption that individual molecules act in concert for various biological processes, multiple databases have been constructed to classify molecules into sets of biological interest, and many are publicly available. Databases such as as KEGG (Kyoto Encyclopedia of Genes and Genomes)(Kanehisa and Goto, 2000; Kanehisa et al., 2006, 2008), GO (Gene Ontology) (Consortium, 2000, 2004), and Biocarta give functional annotation for genes (GO) and relationships to enzymes and metabolites (KEGG and Biocarta).
- Type
- Chapter
- Information
- Advances in Statistical BioinformaticsModels and Integrative Inference for High-Throughput Data, pp. 423 - 444Publisher: Cambridge University PressPrint publication year: 2013