Hostname: page-component-77c89778f8-5wvtr Total loading time: 0 Render date: 2024-07-24T09:30:40.246Z Has data issue: false hasContentIssue false

Characterization of lung tumor subtypes through gene expression cluster validity assessment

Published online by Cambridge University Press:  20 July 2006

Giorgio Valentini
Affiliation:
DSI – Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, via Comelico 39, Milano, Italy; valentini@dsi.unimi.it,ruffino@dsi.unimi.it
Francesca Ruffino
Affiliation:
DSI – Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, via Comelico 39, Milano, Italy; valentini@dsi.unimi.it,ruffino@dsi.unimi.it
Get access

Abstract

The problem of assessing the reliability of clusters patients identified by clustering algorithms is crucial to estimate the significance of subclasses of diseases detectable at bio-molecular level, and more in general to support bio-medical discovery of patterns in gene expression data. In this paper we present an experimental analysis of the reliability of clusters discovered in lung tumor patients using DNA microarray data. In particular we investigate if subclasses of lung adenocarcinoma can be detected with high reliability at bio-molecular level. To this end we apply cluster validity measures based on random projections recently proposed by Bertoni and coworkers. The results show that at least two subclasses of lung adenocarcinoma can be detected with relatively high reliability, confirming and extending previous findings reported in the literature.

Type
Research Article
Copyright
© EDP Sciences, 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alizadeh, A., Ross, D.T., Perou, C.M. and van de Rijn, M., Towards a novel classification of human malignancies based on gene expression. J. Pathol. 195 (2001) 4152. CrossRef
Anbazhagan, R et al., Classification of small cell lung cancer and pulmonary carcinoid by gene expression profiles. Cancer Research 59 (1999) 51195122.
Azuaje, F., A cluster validity framework for genome expression data. Bioinformatics 18 (2002) 319320. CrossRef
A. Bertoni, R. Folgieri, F. Ruffino and G. Valentini, Assessment of clusters reliability for high dimensional genomic data, in BITS 2005, Bioinformatics Italian Society Meeting, Milano Italy (2005).
A. Bertoni and G. Valentini, Random projections for assessing gene expression cluster stability, in IJCNN 2005, The IEEE-INNS International Joint Conference on Neural Networks, Montreal (2005).
A. Bertoni and G. Valentini, Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses. Artif. Intell. Med. (in press)
J.C. Bezdek and N.R. Pal, Some new indexes of cluster validity. IEEE Trans. Systems, Man and Cybernetics Part B 28 (1998) 301–315.
Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.J., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J. and Meyerson, M., Classification of human lung carcinoma by mRNA expression profiling reveals distinct adenocarcinoma subclasses. PNAS 98 (2001) 1379013795. CrossRef
Bolshakova, N., Azuaje, F. and Cunningham, P., An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics 21 (2005) 451455. CrossRef
Breathnach, O.S. et al., Clinical features of patients with stage iiib and iv bronchioloalveolar carcinoma of the lung. Cancer 86 (1999) 11651173. 3.0.CO;2-9>CrossRef
P. Cheeseman and J. Stutz, Bayesian classification (autoclass): Theory and results, in Advances in Knowledge Discovery and Data Mining, edited by U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurasamy, MIT Press, Cambridge, MA 2 (1996) 153–180.
Chen, J.J., Delongchamp, R., Tsai, C., Hsueh, H., Sisatare, F., Thompson, K., Deasi, V. and Fuscoe, J., Analysis of variance components in gene expression data. Bioinformatics 20 (2004) 14361446. CrossRef
Davies, D.L. and Bouldin, D.W., A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence 1 (1979) 224227. CrossRef
Dudoit, S. and Fridlyand, J., A prediction-based method for estimating the number of clusters in a dataset. Genome Biology 3 (2002) 121. CrossRef
Dudoit, S. and Fridlyand, J., Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19 (2003) 10901099. CrossRef
Dunn, J., Well separated clusters and optimal fuzzy partitions. J. Cybernetics 4 (1974) 95104. CrossRef
Garber, M.E. et al., Diversity of gene expression in adenocarcinoma of the lung. PNAS 98 (2001) 1378413789. CrossRef
Hartigan, J.A. and Wong, M.A., A k-means clustering algorithm. Appl. Stat. 28 (1979) 100108. CrossRef
The, T.K. Ho random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 832844.
Jain, A.K., Murty, M.N. and Flynn, P.J., Data Clustering: a Review. ACM Computing Surveys 31 (1999) 264323. CrossRef
Johnson, W.B. and Lindenstrauss, J., Extensions of Lipshitz mapping into Hilbert space, in Conference in modern analysis and probability, Contemporary Mathematics. Amer. Math. Soc. 26 (1984) 189206.
L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990).
Kerr, M.K. and Curchill, G.A., Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS 98 (2001) 89618965. CrossRef
King, B., Step-wise clustering procedures. J. Am. Stat. Assoc. 69 (1967) 86101. CrossRef
McShane, L.M., Radmacher, D., Freidlin, B., Yu, R., Li, M.C. and Simon, R., Method for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18 (2002) 14621469. CrossRef
Monti, S., Tamayo, P., Mesirov, J. and Golub, T., Consensus Clustering: A Resampling-based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning 52 (2003) 91118. CrossRef
Rousseeuw, P.J., Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comp. App. Math. 20 (1987) 5365. CrossRef
M. Smolkin and D. Gosh, Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 36 (2003).
Sorensen, J.B., Hirsch, F.R., Gazdar, A. and Olsen, J.E., Interobserver variability in histopahologic subtyping and grading of pulmonary adenocarcinoma. Cancer 71 (1993) 29712976. 3.0.CO;2-E>CrossRef
Valentini, G., Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data. Bioinformatics 22 (2006) 369370. CrossRef
Ward, J.H., Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58 (1963) 236244. CrossRef