Hostname: page-component-848d4c4894-2pzkn Total loading time: 0 Render date: 2024-06-08T10:00:16.789Z Has data issue: false hasContentIssue false

A machine learning approach for the identification of population-informative markers from high-throughput genotyping data: application to several pig breeds

Published online by Cambridge University Press:  11 October 2019

G. Schiavo
Affiliation:
Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale G. Fanin 46, Bologna 40127, Italy
F. Bertolini
Affiliation:
National Institute of Aquatic Resources, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
G. Galimberti
Affiliation:
Department of Statistical Sciences ‘Paolo Fortunati’, University of Bologna, via delle Belle Arti 41, Bologna 40126, Italy
S. Bovo
Affiliation:
Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale G. Fanin 46, Bologna 40127, Italy
S. Dall’Olio
Affiliation:
Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale G. Fanin 46, Bologna 40127, Italy
L. Nanni Costa
Affiliation:
Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale G. Fanin 46, Bologna 40127, Italy
M. Gallo
Affiliation:
Associazione Nazionale Allevatori Suini (ANAS), Via Nizza 53, Roma 00198, Italy
L. Fontanesi*
Affiliation:
Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale G. Fanin 46, Bologna 40127, Italy
Get access

Abstract

Single nucleotide polymorphisms (SNPs) able to describe population differences can be used for important applications in livestock, including breed assignment of individual animals, authentication of mono-breed products and parentage verification among several other applications. To identify the most discriminating SNPs among thousands of markers in the available commercial SNP chip tools, several methods have been used. Random forest (RF) is a machine learning technique that has been proposed for this purpose. In this study, we used RF to analyse PorcineSNP60 BeadChip array genotyping data obtained from a total of 2737 pigs of 7 Italian pig breeds (3 cosmopolitan-derived breeds: Italian Large White, Italian Duroc and Italian Landrace, and 4 autochthonous breeds: Apulo-Calabrese, Casertana, Cinta Senese and Nero Siciliano) to identify breed informative and reduced SNP panels using the mean decrease in the Gini Index and the Mean Decrease in Accuracy parameters with stability evaluation. Other reduced informative SNP panels were obtained using Delta, Fixation index and principal component analysis statistics, and their performances were compared with those obtained using the RF-defined panels using the RF classification method and its derived Out Of Bag rates and correct prediction proportions. Therefore, the performances of a total of six reduced panels were evaluated. The correct assignment of the animals to its breed was close to 100% for all tested approaches. Porcine chromosome 8 harboured the largest number of selected SNPs across all panels. Many SNPs were included in genomic regions in which previous studies identified signatures of selection or genes (e.g. ESR1, KITL and LCORL) that could contribute to explain, at least in part, phenotypically or economically relevant traits that might differentiate cosmopolitan and autochthonous pig breeds. Random forest used as preselection statistics highlighted informative SNPs that were not the same as those identified by other methods. This might be due to specific features of this machine learning methodology. It will be interesting to explore if the adaptation of RF methods for the identification of selection signature regions could be able to describe population-specific features that are not captured by other approaches.

Type
Research Article
Copyright
© The Animal Consortium 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ai, H, Huang, L and Ren, J. 2013. Genetic diversity, linkage disequilibrium and selection signatures in Chinese and Western pigs revealed by genome-wide SNP markers. PLoS ONE 8, e56001.CrossRefGoogle ScholarPubMed
ANAS 2018. Registro Anagrafico. Retrieved on 10 December 2018 from http://www.anas.it/ Google Scholar
Bertolini, F, Galimberti, G, Calò, DG, Schiavo, G, Matassino, D and Fontanesi, L 2015. Combined use of principal component analysis and random forests identify population-informative single nucleotide polymorphisms: application in cattle breeds. Journal of Animal Breeding and Genetics 132, 346356.CrossRefGoogle ScholarPubMed
Bertolini, F, Galimberti, G, Schiavo, G, Mastrangelo, S, Di Gerlando, R, Strillacci, MG, Bagnato, A, Portolano, B and Fontanesi, L 2018. Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds. Animal 12, 1219.CrossRefGoogle ScholarPubMed
Bovo, S, Mazzoni, G, Bertolini, F, Schiavo, G, Galimberti, G, Gallo, M, Dall’Olio, S and Fontanesi, L 2019. Genome-wide association studies for 30 haematological and blood clinical-biochemical traits in Large White pigs reveal genomic regions affecting intermediate phenotypes. Scientific Reports 9, 7003.CrossRefGoogle ScholarPubMed
Breiman, L 2001. Random forests. Machine Learning 45, 532.CrossRefGoogle Scholar
Chang, CC, Chow, CC, Tellier, LC, Vattikuti, S, Purcell, SM and Lee, JJ 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, s13742–015–0047–8.CrossRefGoogle Scholar
Fontanesi, L, Scotti, E, Gallo, M, Nanni Costa, L and Dall’Olio, S 2016. Authentication of “mono-breed” pork products: identification of a coat colour gene marker in Cinta Senese pigs useful to this purpose. Livestock Science 184, 7177.CrossRefGoogle Scholar
Genuer, R, Poggi, J-M and Tuleau-Malot, C 2015. VSURF: an R package for variable selection using random forests. The R Journal 7/2, 1933.CrossRefGoogle Scholar
Hastie, T, Tibshirani, R and Friedman, JH 2009. The elements of statistical learning, 2nd edition. Springer, New York, NY, USA.CrossRefGoogle Scholar
Huisman, J 2017. Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond. Molecular Ecology Resources 17, 10091024.CrossRefGoogle ScholarPubMed
Hulsegge, B, Calus, MP, Windig, JJ, Hoving-Bolink, AH, Eijndhoven, MH and Hiemstra, SJ 2013. Selection of SNPs from 50K and 777K arrays to predict breed-of-origin in cattle. Journal of Animal Science 91, 51285134.CrossRefGoogle Scholar
Jacobs, A, De Noia, M, Praebel, K, Kanstad-Hanssen, Ø, Paterno, M, Jackson, D, McGinnity, P, Sturm, A, Elmer, KR and Llewellyn, MS 2018. Genetic fingerprinting of salmon louse (Lepeophtheirus salmonis) populations in the North-East Atlantic using a random forest classification approach. Scientific Reports 8, 1203.CrossRefGoogle ScholarPubMed
Jolliffe, IT and Cadima, J 2016. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A 374, 20150202.CrossRefGoogle ScholarPubMed
Kijas, JW, Serrano, M, McCulloch, R, Li, Y, Salces Ortiz, J, Calvo, JH, Pérez-Guzmán, MD and International Sheep Genomics Consortium 2013. Genome wide association for a dominant pigmentation gene in sheep. Journal of Animal Breeding and Genetics 130, 468475.CrossRefGoogle Scholar
Li, M, Tian, S, Jin, L, Zhou, G, Li, Y, Zhang, Y, Wang, T, Yeung, CK, Chen, L, Ma, J, Zhang, J, Jiang, A, Li, J, Zhou, C, Zhang, J, Liu, Y, Sun, X, Zhao, H, Niu, Z, Lou, P, Xian, L, Shen, X, Liu, S, Zhang, S, Zhang, M, Zhu, L, Shuai, S, Bai, L, Tang, G, Liu, H, Jiang, Y, Mai, M, Xiao, J, Wang, X, Zhou, Q, Wang, Z, Stothard, P, Xue, M, Gao, X, Luo, Z, Gu, Y, Zhu, H, Hu, X, Zhao, Y, Plastow, GS, Wang, J, Jiang, Z, Li, K, Li, N, Li, X and Li, R 2013 Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nature Genetics 45, 14311438.CrossRefGoogle ScholarPubMed
Liaw, A and Wiener, M 2002. Classification and regression by random forest. R News 2, 1822.Google Scholar
Ligges, U and Mächler, M 2013. Scatterplot3d - an R package for visualizing multivariate data. Journal of Statistical Software 8, 120.Google Scholar
Meng, YA, Yu, Y, Cupples, LA, Farrer, LA and Lunetta, KL 2009. Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics 10, 78.CrossRefGoogle ScholarPubMed
Naderi, S, Yin, T and König, S 2016. Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups. Journal of Dairy Science 99, 72617273.CrossRefGoogle ScholarPubMed
Paschou, P, Ziv, E, Burchard, EG, Choudhry, S, Rodriguez-Cintron, W, Mahoney, MW and Drineas, P 2007. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genetics 9, 16721686.Google Scholar
Rothschild, M, Jacobson, C, Vaske, D, Tuggle, C, Wang, L, Short, T, Eckardt, G, Sasaki, S, Vincent, A, McLaren, D, Southwood, O, van der Steen, H, Mileham, A and Plastow, G 1996. The estrogen receptor locus is associated with a major gene influencing litter size in pigs. Proceedings of the National Academy of Sciences of the USA 93, 201205.CrossRefGoogle ScholarPubMed
Rubin, CJ, Megens, HJ., Martinez Barrio, A, Maqbool, K, Sayyab, S, Schwochow, D, Wang, C, Carlborg, Ö, Jern, P, Jørgensen, CB, Archibald, AL, Fredholm, M, Groenen, MA and Andersson, L 2012. Strong signatures of selection in the domestic pig genome. Proceedings of the National Academy of Sciences of the USA 109, 1952919536.CrossRefGoogle Scholar
Russo, V, Fontanesi, L, Davoli, R, Chiofalo, L, Liotta, L and Zumbo, A 2004. Analysis of single nucleotide polymorphisms in major and candidate genes for production traits in Nero Siciliano pig breed. Italian Journal of Animal Science 3, 1929.CrossRefGoogle Scholar
Schiavo, G, Galimberti, G, Calò, DG, Samorè, AB, Bertolini, F, Russo, V, Gallo, M, Buttazzoni, L and Fontanesi, L 2016. Twenty years of artificial directional selection have shaped the genome of the Italian Large White pig breed. Animal Genetics 47, 181191.CrossRefGoogle ScholarPubMed
Takasuga, A 2016. PLAG1 and NCAPG-LCORL in livestock. Animal Science Journal 87, 159167.CrossRefGoogle ScholarPubMed
Wang, K, Wu, P, Yang, Q, Chen, D, Zhou, J, Jiang, A, Ma, J, Tang, Q, Xiao, W, Jiang, Y, Zhu, L, Li, X and Tang, G 2018. Detection of selection signatures in Chinese Landrace and Yorkshire pigs based on genotyping-by-sequencing data. Frontiers in Genetics 9, 119.CrossRefGoogle ScholarPubMed
Weir, BS and Cockerham, CC 1984. Estimating F-statistics for the analysis of population structure. Evolution 38, 13581370.Google ScholarPubMed
Wilkinson, S, Archibald, AL, Haley, CS, Megens, HJ, Crooijmans, RP, Groenen, MA, Wiener, P and Ogden, R 2012. Development of a genetic tool for product regulation in the diverse British pig breed market. BMC Genomics 13, 580.CrossRefGoogle ScholarPubMed
Wilkinson, S, Lu, ZH, Megens, HJ, Archibald, AL, Haley, C, Jackson, IJ, Groenen, MA, Crooijmans, RP, Ogden, R and Wiener, P 2013. Signatures of diversifying selection in European pig breeds. PLoS Genetics 9, e1003453.CrossRefGoogle ScholarPubMed
Wilkinson, S, Wiener, P, Archibald, AL, Law, A, Schnabel, RD, McKay, SD, Taylor, JF and Ogden, R 2011. Evaluation of approaches for identifying population informative markers from high density SNP chips. BMC Genetics 12, 45.CrossRefGoogle ScholarPubMed
Yang, B, Cui, L, Perez-Enciso, M, Traspov, A, Crooijmans, RPMA, Zinovieva, N, Schook, LB, Archibald, A, Gatphayak, K, Knorr, C, Triantafyllidis, A, Alexandri, P, Semiadi, G, Hanotte, O, Dias, D, Dovč, P, Uimari, P, Iacolina, L, Scandura, M, Groenen, MAM, Huang, L and Megens, HJ 2017. Genome-wide SNP data unveils the globalization of domesticated pigs. Genetics Selection Evolution 49, 71.CrossRefGoogle ScholarPubMed
Yang, S, Li, X, Li, K, Fan, B and Tang, Z 2014. A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds. BMC Genetics 15, 7.CrossRefGoogle ScholarPubMed
Zhang, Z, Xiao, Q, Zhang, QQ, Sun, H, Chen, JC, Li, ZC, Xue, M, Ma, PP, Yang, HJ, Xu, NY, Wang, QS and Pan, YC 2018. Genomic analysis reveals genes affecting distinct phenotypes among different Chinese and western pig breeds. Scientific Reports 8, 13352.CrossRefGoogle ScholarPubMed
Supplementary material: File

Schiavo et al. supplementary material

Tables S1-S5 and Figure S1

Download Schiavo et al.  supplementary material(File)
File 598.6 KB