Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh

Md. Akib Al-Zubayer; Khorshed Alam; Hasibul Hasan Shanto; Md. Maniruzzaman; Uttam Kumar Majumder; Benojir Ahammed

doi:10.1017/S0021932024000063

Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh

Published online by Cambridge University Press: 20 March 2024

Uttam Kumar Majumder and

Benojir Ahammed

Show author details

Md. Akib Al-Zubayer: Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
Khorshed Alam: Affiliation:
School of Business, University of Southern Queensland, Toowoomba, QLD, Australia Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, Australia
Hasibul Hasan Shanto: Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
Md. Maniruzzaman: Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
Uttam Kumar Majumder: Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
Benojir Ahammed*: Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
*: Corresponding author: Benojir Ahammed; Emails: benojir@stat.ku.ac.bd; benojirstat@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Increasing prevalence of non-communicable diseases (NCDs) has become the leading cause of death and disability in Bangladesh. Therefore, this study aimed to measure the prevalence of and risk factors for double and triple burden of NCDs (DBNCDs and TBNCDs), considering diabetes, hypertension, and overweight and obesity as well as establish a machine learning approach for predicting DBNCDs and TBNCDs. A total of 12,151 respondents from the 2017 to 2018 Bangladesh Demographic and Health Survey were included in this analysis, where 10%, 27.4%, and 24.3% of respondents had diabetes, hypertension, and overweight and obesity, respectively. Chi-square test and multilevel logistic regression (LR) analysis were applied to select factors associated with DBNCDs and TBNCDs. Furthermore, six classifiers including decision tree (DT), LR, naïve Bayes (NB), k-nearest neighbour (KNN), random forest (RF), and extreme gradient boosting (XGBoost) with three cross-validation protocols (K2, K5, and K10) were adopted to predict the status of DBNCDs and TBNCDs. The classification accuracy (ACC) and area under the curve (AUC) were computed for each protocol and repeated 10 times to make them more robust, and then the average ACC and AUC were computed. The prevalence of DBNCDs and TBNCDs was 14.3% and 2.3%, respectively. The findings of this study revealed that DBNCDs and TBNCDs were significantly influenced by age, sex, marital status, wealth index, education and geographic region. Compared to other classifiers, the RF-based classifier provides the highest ACC and AUC for both DBNCDs (ACC = 81.06% and AUC = 0.93) and TBNCDs (ACC = 88.61% and AUC = 0.97) for the K10 protocol. A combination of considered two-step factor selections and RF-based classifier can better predict the burden of NCDs. The findings of this study suggested that decision-makers might adopt suitable decisions to control and prevent the burden of NCDs using RF classifiers.

Keywords

classification machine learning non-communicable diseases

Information

Type: Research Article
Information: Journal of Biosocial Science , Volume 56 , Issue 3 , May 2024 , pp. 426 - 444

DOI: https://doi.org/10.1017/S0021932024000063 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ahammed, B, Maniruzzaman, M, Talukder, A and Ferdausi, F (2021). Prevalence and risk factors of hypertension among young adults in Albania. High Blood Pressure & Cardiovascular Prevention 28, 35–48.CrossRef Google Scholar PubMed

Ahammed, B, Sarder, MA, Kundu, S, Keramat, SA and Alam, K (2022). Multilevel exploration of individual-and community-level factors contributing to overweight and obesity among reproductive-aged women: a pooled analysis of Bangladesh Demographic and Health Survey, 2004–2018. Public Health Nutrition 25(8), 2074–2083.CrossRef Google Scholar PubMed

Al Kibria, GM, Hashan, MR, Hossain, MM, Zaman, SB and Stennett, CA (2021). Clustering of hypertension, diabetes and overweight/obesity according to socioeconomic status among Bangladeshi adults. Journal of Biosocial Science 53(2), 157–166.CrossRef Google Scholar

Al-Zubayer, MA, Ahammed, B, Sarder, MA, Kundu, S, Majumder, UK and Islam, SM (2021). Double and triple burden of non-communicable diseases and its determinants among adults in Bangladesh: evidence from a recent demographic and health survey. International Journal of Clinical Practice 75(10), e14613.Google Scholar PubMed

Bangladesh Bureau of Statistics. Population & Housing Census (2022). URL: http://www.bbs.gov.bd/site/page/47856ad0-7e1c-4aab-bd78-892733bc06eb/Population-&-Housing.2022.Google Scholar

Bentéjac, C, Csörgő, A and Martínez-Muñoz, G (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review 54, 1937–1967.CrossRef Google Scholar

Bigna, JJ and Noubiap, JJ (2019). The rising burden of non-communicable diseases in sub-Saharan Africa. The Lancet Global Health 7(10), e1295–e1296.CrossRef Google Scholar PubMed

Bista, B, Dhungana, RR, Chalise, B and Pandey, AR (2020). Prevalence and determinants of non-communicable diseases risk factors among reproductive aged women of Nepal: results from Nepal Demographic Health Survey 2016. PloS One 15(3), e0218840.CrossRef Google Scholar PubMed

Biswas, T, Townsend, N, Islam, MS, Islam, MR, Gupta, RD, Das, SK and Al Mamun, A (2019). Association between socioeconomic status and prevalence of non-communicable diseases risk factors and comorbidities in Bangladesh: findings from a nationwide cross-sectional survey. BMJ Open 9(3), e025538.CrossRef Google Scholar PubMed

Bloom, DE, Cafiero, ET, Jané-Llopis, E, Abrahams-Gessel, S, Bloom, LR, Fathima, S, Feigl, AB, Gaziano, T, Mowafi, M, Pandya, A, Prettner, K, Rosenberg, L, Seligman, B, Stein, AZ and Weinstein, C (2011). The Global Economic Burden of Noncommunicable Diseases. Geneva: World Economic Forum.Google Scholar

Boutilier, JJ, Chan, TC, Ranjan, M and Deo, S (2021). Risk stratification for early detection of diabetes and hypertension in resource-limited settings: machine learning analysis. Journal of Medical Internet Research 23(1), e20123.CrossRef Google Scholar PubMed

Breiman, L (2001). Random forests. Machine Learning 45, 5–32.CrossRef Google Scholar

Bunkhumpornpat, C, Sinapiromsaran, K and Lursinsap, C (2011). MUTE: Majority under-sampling technique. In 2011 8th International Conference on Information, Communications & Signal Processing, IEEE, Singapore, pp. 1–4.CrossRef Google Scholar

Cheng, D, Ting, C, Ho, C and Ho, C (2020). Performance evaluation of explainable machine learning on non-communicable diseases. Solid State Technology 63, 2780–2793.Google Scholar

Davagdorj, K, Pham, VH, Theera-Umpon, N and Ryu, KH (2020). XGBoost-based framework for smoking-induced noncommunicable disease prediction. International Journal of Environmental Research and Public Health 17(18), 6513.CrossRef Google Scholar PubMed

Fatou, NG, Ibrahima, FA, Camara, MS and Alassane, BA (2020). A study on predicting and diagnosing non-communicable diseases: case of cardiovascular diseases. In 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), IEEE, Morocco, pp. 1–8.Google Scholar

Ferdowsy, F, Rahi, KS, Jabiullah, MI and Habib, MT (2021). A machine learning approach for obesity risk prediction. Current Research in Behavioral Sciences 2, 100053.CrossRef Google Scholar

Fottrell, E, Ahmed, N, Shaha, SK, Jennings, H, Kuddus, A, Morrison, J, Akter, K, Nahar, B, Nahar, T, Haghparast-Bidgoli, H and Khan, AA (2018). Distribution of diabetes, hypertension and non-communicable disease risk factors among adults in rural Bangladesh: a cross-sectional survey. BMJ Global Health 3(6), e000787.CrossRef Google Scholar PubMed

Golino, HF, Amaral, LS, Duarte, SF, Gomes, CM, Soares, TD, Reis, LA and Santos, J (2014). Predicting increased blood pressure using machine learning. Journal of Obesity 23, 2014.CrossRef Google Scholar

Guo, SS, Wu, W, Chumlea, WC and Roche, AF (2002). Predicting overweight and obesity in adulthood from body mass index values in childhood and adolescence. The American Journal of Clinical Nutrition 76(3), 653–658.CrossRef Google Scholar PubMed

Hastie, T, Tibshirani, R, Friedman, JH and Friedman, JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.CrossRef Google Scholar

Hossain, SM and Chetty, G (2011). Next generation identity verification based on face-gait Biometrics. In Proceedings of the International Conference on Biomedical Engineering and Technology 11, 142–148.Google Scholar

Hu, M, Nohara, Y, Wakata, Y, Ahmed, A, Nakashima, N and Nakamura, M (2018). Machine learning based prediction of non-communicable diseases to improving intervention program in Bangladesh. European Journal of Biomedical Informatics 14(2), 20–28.CrossRef Google Scholar

Islam, MM, Rahman, MJ, Roy, DC, Tawabunnahar, M, Jahan, R, Ahmed, NF and Maniruzzaman, M (2021). Machine learning algorithm for characterizing risks of hypertension, at an early stage in Bangladesh. Diabetes & Metabolic Syndrome: Clinical Research & Reviews 15(3), 877–884.CrossRef Google Scholar

Islam, SM, Purnat, TD, Phuong, NT, Mwingira, U, Schacht, K and Fröschl, G (2014). Non-Communicable Diseases (NCDs) in developing countries: a symposium report. Globalization and Health 10(1), 1–8.CrossRef Google Scholar

Islam, SM, Talukder, A, Awal, MA, Siddiqui, MM, Ahamad, MM, Ahammed, B, Rawal, LB, Alizadehsani, R, Abawajy, J, Laranjo, L and Chow, CK (2022). Machine learning approaches for predicting hypertension and its associated factors using population-level data from three South Asian countries. Frontiers in Cardiovascular Medicine 2022, 9.CrossRef Google Scholar

James, G, Witten, D, Hastie, T and Tibshirani, R (2013). An Introduction to Statistical Learning. New York: Springer, Vol. 112, p. 18.CrossRef Google Scholar

Khalequzzaman, M, Chiang, C, Choudhury, SR, Yatsuya, H, Al-Mamun, MA, Al-Shoaibi, AA, Hirakawa, Y, Hoque, BA, Islam, SS, Matsuyama, A and Iso, H (2017). Prevalence of non-communicable disease risk factors among poor shantytown residents in Dhaka, Bangladesh: a community-based cross-sectional survey. BMJ Open 7(11), e014710.CrossRef Google Scholar PubMed

Liao, Z, Ju, Y and Zou, Q (2016). Prediction of G protein-coupled receptors with SVM-prot features and random forest. Scientifica 2016, 1–10.CrossRef Google Scholar PubMed

Libbrecht, MW and Noble, WS (2015). Machine learning applications in genetics and genomics. Nature Reviews Genetics 16(6), 321–332.CrossRef Google Scholar PubMed

Liu, H and Motoda, H (2012). Feature Selection for Knowledge Discovery and Data Mining. New York: Springer Science & Business Media.Google Scholar

Ma, D, Sakai, H, Wakabayashi, C, Kwon, JS, Lee, Y, Liu, S, Wan, Q, Sasao, K, Ito, K, Nishihara, K and Wang, P (2017). The prevalence and risk factor control associated with noncommunicable diseases in China, Japan, and Korea. Journal of Epidemiology 27(12), 568–573.CrossRef Google Scholar

Maniruzzaman, M, Rahman, MJ, Ahammed, B and Abedin, MM (2020). Classification and prediction of diabetes disease using machine learning paradigm. Health Information Science and Systems 8, 1–4.CrossRef Google Scholar PubMed

Maniruzzaman, M, Rahman, MJ, Ahammed, B, Abedin, MM, Suri, HS, Biswas, M, El-Baz, A, Bangeas, P, Tsoulfas, G and Suri, JS (2019). Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Computer Methods and Programs in Biomedicine 176, 173–193.CrossRef Google Scholar

Maniruzzaman, M, Rahman, MJ, Al-MehediHasan, M, Suri, HS, Abedin, MM, El-Baz, A, Suri, JS (2018). Accurate diabetes risk stratification using machine learning: role of missing value and outliers. Journal of Medical Systems 42, 1–7.CrossRef Google Scholar

Maniruzzaman, M, Shin, J and Hasan, MA (2022). Predicting children with ADHD using behavioral activity: a machine learning analysis. Applied Sciences 12(5), 2737.CrossRef Google Scholar

Matsuoka, D (2021). Classification of imbalanced cloud image data using deep neural networks: performance improvement through a data science competition. Progress in Earth and Planetary Science 8(1), 1.CrossRef Google Scholar

Merlo, J, Yang, M, Chaix, B, Lynch, J and Råstam, L (2005). A brief conceptual tutorial on multilevel analysis in social epidemiology: investigating contextual phenomena in different groups of people. Journal of Epidemiology & Community Health 59(9), 729–736.CrossRef Google Scholar PubMed

Montañez, CA, Fergus, P, Hussain, A, Al-Jumeily, D, Abdulaimma, B, Hind, J and Radi, N (2017). Machine learning approaches for the prediction of obesity using publicly available genetic profiles. In 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, AK, USA, pp. 2743–2750.CrossRef Google Scholar

National Institute of Population Research and Training (NIPORT), and ICF (2020). Bangladesh Demographic and Health Survey 2017–18. Dhaka, Bangladesh, and Rockville, Maryland, USA: NIPORT and ICF.Google Scholar

Pranto, B, Mehnaz, SM, Mahid, EB, Sadman, IM, Rahman, A and Momen, S (2020). Evaluating machine learning methods for predicting diabetes among female patients in Bangladesh. Information 11(8), 374.CrossRef Google Scholar

Quinlan, JR (1986). Induction of decision trees. Machine Learning 1, 81–106.CrossRef Google Scholar

Rabe-Hesketh, S and Skrondal, A (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 169(4), 805–827.CrossRef Google Scholar

Riaz, BK, Islam, MZ, Islam, AS, Zaman, MM, Hossain, MA, Rahman, MM, Khanam, F, Amin, KB and Noor, IN (2020). Risk factors for non-communicable diseases in Bangladesh: findings of the population-based cross-sectional national survey 2018. BMJ Open 10(11), e041334.CrossRef Google Scholar PubMed

Russell, S, Sturua, L, Li, C, Morgan, J, Topuridze, M, Blanton, C, Hagan, L and Salyer, SJ (2019). The burden of non-communicable diseases and their related risk factors in the country of Georgia, 2015. BMC Public Health 19, 1–9.CrossRef Google Scholar PubMed

Saeed, KM (2013). Prevalence of risk factors for non-communicable diseases in the adult population of urban areas in Kabul City, Afghanistan. Central Asian Journal of Global Health 2(2), 1–20.Google Scholar PubMed

Shah, S, Luo, X, Kanakasabai, S, Tuason, R and Klopper, G (2019). Neural networks for mining the associations between diseases and symptoms in clinical notes. Health Information Science and Systems 7, 1–9.CrossRef Google Scholar PubMed

Singh, B and Tawfik, H (2020). Machine learning approach for the early prediction of the risk of overweight and obesity in young people. In Computational Science–ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Springer International Publishing, Proceedings, Part IV 20, pp. 523–535.CrossRef Google Scholar

Vos, T, Lim, SS, Abbafati, C, Abbas, KM, Abbasi, M, Abbasifard, M, Abbasi-Kangevari, M, Abbastabar, H, Abd-Allah, F, Abdelalim, A and Abdollahi, M (2020). Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet 396(10258), 1204–1222.CrossRef Google Scholar

Wang, Q, Yang, M, Pang, B, Xue, M, Zhang, Y, Zhang, Z and Niu, W (2022). Predicting risk of overweight or obesity in Chinese preschool-aged children using artificial intelligence techniques. Endocrine 77(1), 63–72.CrossRef Google Scholar PubMed

World Health Organization (2013). Global Action Plan for the Prevention and Control of NCDs 2013–2020. Geneva: WHO.Google Scholar

World Health Organization (2016). Global Report on Diabetes. Geneva: World Health Organization (WHO).Google Scholar

World Health Organization (2020). Noncommunicable Diseases. URL: https://www.who.int/news-room/fact-sheets/detail/noncommunicablediseases (accessed 1st April 2020).Google Scholar

World Health Organization (2022). Noncommunicable Diseases. URL: https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases (accessed 16th September 2022).Google Scholar

Yosef, T (2020). Prevalence and associated factors of chronic non-communicable diseases among cross-country truck drivers in Ethiopia. BMC Public Health 20(1), 1–7.CrossRef Google Scholar PubMed

Zaman, MM, Bhuiyan, MR, Karim, M, Rahman, M, Akanda, AW and Fernando, T (2015). Clustering of non-communicable diseases risk factors in Bangladeshi adults: an analysis of STEPS survey 2013. BMC Public Health 15(1), 1–9.Google Scholar PubMed

Zhang, L, Yuan, M, An, Z, Zhao, X, Wu, H, Li, H, Wang, Y, Sun, B, Li, H, Ding, S and Zeng, X (2020). Prediction of hypertension, hyperglycemia and dyslipidemia from retinal fundus photographs via deep learning: a cross-sectional study of chronic diseases in central China. PloS One 15(5), e0233166.CrossRef Google Scholar PubMed

Article contents

Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests