Book contents
- Frontmatter
- Contents
- Contributors
- Introduction
- Part A Horizontal Meta-Analysis
- Part B Vertical Integrative Analysis (General Methods)
- 6 Identify Multi-Dimensional Modules from Diverse Cancer Genomics Data
- 7 A Latent Variable Approach for Integrative Clustering of Multiple Genomic Data Types
- 8 Penalized Integrative Analysis of High-Dimensional Omics Data
- 9 A Bayesian Graphical Model for Integrative Analysis of TCGA Data: BayesGraph for TCGA Integration
- 10 Bayesian Models for Flexible Integrative Analysis of Multi-Platform Genomics Data
- 11 Exploratory Methods to Integrate Multisource Data
- Part C Vertical Integrative Analysis (Methods Specialized to Particular Data Types)
- Index
- Color plates
7 - A Latent Variable Approach for Integrative Clustering of Multiple Genomic Data Types
from Part B - Vertical Integrative Analysis (General Methods)
Published online by Cambridge University Press: 05 September 2015
- Frontmatter
- Contents
- Contributors
- Introduction
- Part A Horizontal Meta-Analysis
- Part B Vertical Integrative Analysis (General Methods)
- 6 Identify Multi-Dimensional Modules from Diverse Cancer Genomics Data
- 7 A Latent Variable Approach for Integrative Clustering of Multiple Genomic Data Types
- 8 Penalized Integrative Analysis of High-Dimensional Omics Data
- 9 A Bayesian Graphical Model for Integrative Analysis of TCGA Data: BayesGraph for TCGA Integration
- 10 Bayesian Models for Flexible Integrative Analysis of Multi-Platform Genomics Data
- 11 Exploratory Methods to Integrate Multisource Data
- Part C Vertical Integrative Analysis (Methods Specialized to Particular Data Types)
- Index
- Color plates
Summary
Abstract
Clustering analysis is an unsupervised learning method that aims to group data into subsets based on the similarity among the data points. In gene expression microarray studies, clustering analysis has been used to identify biologically meaningful disease subtypes (samples in the same subtype share similar gene expression profiles), or to discover gene expression modules co-regulated through a similar mechanism. Recent technology advances have facilitated integrated genomic profiling across multiple platforms simultaneously including next-generation sequencing and high throughput array platforms.With the rapid accumulation of multidimensional datasets, there is an increasing need for robust and scalable statistical and computational methods for the analysis of such datasets. This book covers a wide range of topics on information integration of omics datasets. In this Chapter, we briefly review the recent advances in integrative clustering methods with a focus on introducing a latent variable approach developed by the authors and its extensions to perform variable selection, and to account for both discrete and continuous data types in the joint model. We also discuss several important questions in clustering analysis including how to determine the number of clusters and assess cluster stability. Finally, we demonstrate the application of the method to the TCGA colorectal cancer (CRC) dataset which includes whole-exome DNA-sequencing, Affymetrix SNP6.0 array, and RNA-sequencing in 276 CRC samples.
Introduction
Cancer is a heterogeneous disease. Identifying clinically relevant tumor subtypes that correlate with patient outcome (e.g., treatment response, survival) is an important yet difficult task. Over the past years, molecular classification based on microarray gene expression data has led to important discoveries of novel cancer subtypes (Perou et al., 1999; Alizadeh et al., 2000; Sorlie et al., 2001; Lapointe et al., 2003; Hoshida et al., 2003). However, the biological and therapeutic implications of most cancer expression subtypes remain largely unknown due to the lack of understanding of the underlying disease mechanisms. In addition, expression changes may be related to cellular activities independent of tumorigenesis, and therefore leading to subtypes that may not be directly relevant for diagnostic and prognostic purposes.
- Type
- Chapter
- Information
- Integrating Omics Data , pp. 155 - 173Publisher: Cambridge University PressPrint publication year: 2015