284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us

Suresh K. Bhavnani; Weibin Zhang; Daniel Bao; Sandra Hatch; Timothy Reistetter; Brian Downer

doi:10.1017/cts.2023.340

284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us

Part of: JCTS_2023_ABSTRACT_COLLECTION

Published online by Cambridge University Press: 24 April 2023

Suresh K. Bhavnani ,

Weibin Zhang ,

Daniel Bao ,

Sandra Hatch ,

Timothy Reistetter and

Brian Downer

Show author details

Suresh K. Bhavnani: Affiliation:
University of Texas Medical Branch
Daniel Bao: Affiliation:
University of Texas Medical Branch
Sandra Hatch: Affiliation:
University of Texas Medical Branch
Timothy Reistetter: Affiliation:
University of Texas Medical Branch
Brian Downer: Affiliation:
University of Texas Medical Branch

Article contents

Abstract

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

OBJECTIVES/GOALS: While disease subtypes are critical for precision medicine, most projects use unipartite clustering methods such as k-means which are not fully automated, do not provide statistical significance, and are difficult to interpret. These gaps were addressed through bipartite networks and tested for generalizability on three national databases. METHODS/STUDY POPULATION: Data. All participants with self-reported stroke from the 2010 Health and Retirement Study (HRS), with cases (n=798) having one or more 8 depressive symptoms measured by the Centers for the Epidemiological Study–Depression 8 scale, and controls (n=389) with none of those symptoms. The replication data set consisted of independent identically-defined participants (cases=725, controls=190) from 1998 HRS. Method. (1) Bipartite network analysis and modularity maximization to automatically identify patient-symptom biclusters with significance. (2) Rand Index to measure the replicability of symptom co-occurrences in the replication data. (3) ExplodeLayout to visualize and interpret the subtypes. (4) R libraries to generalize the methods, upload them to CRAN, and then tested on the N3C and All of Us platforms. RESULTS/ANTICIPATED RESULTS: The analysis identified 4 depressive symptom subtypes (https://postimg.cc/Ny8YwXJW) which had significant modularity (Q=0.26, z=3.03, P DISCUSSION/SIGNIFICANCE: We developed generalizable methods to automatically identify biclusters, measure the clustering significance, and visualize the results for interpretation. These methods were successfully tested on three national level data bases. Such generalizable methods should accelerate the analysis of subtypes, and the design of targeted interventions.

Type: Precision Medicine/Health
Information: Journal of Clinical and Translational Science , Volume 7 , Issue s1 , April 2023 , pp. 85

DOI: https://doi.org/10.1017/cts.2023.340 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.

Article contents

284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us

Abstract

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests