Hostname: page-component-848d4c4894-x24gv Total loading time: 0 Render date: 2024-06-02T16:53:44.407Z Has data issue: false hasContentIssue false

11 Novel Systematic Method for Identifying Congenital Anomaly Cases in Electronic Health Record Databases

Published online by Cambridge University Press:  03 April 2024

Elly Brokamp
Affiliation:
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA, 37203
Lisa Bastarache
Affiliation:
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
Nancy Cox
Affiliation:
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA, 37203
Rizwan Hamid
Affiliation:
Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
Nikhil K. Khanakari
Affiliation:
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA, 37203
Gillian Hooker
Affiliation:
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA, 37203
Megan Shuey
Affiliation:
Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA, 37203
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

OBJECTIVES/GOALS: Congenital anomalies (CAs) affect 3% of live births, yet the cause of 80% of CAs is unknown and for the 20% with an identified cause, variability in penetrance suggests additional risk drivers exist. Our method for identifying and categorizing CAs in electronic health record (EHR) linked biobank databases can expand and improve CA etiologic research. METHODS/STUDY POPULATION: We identified individuals with CAs in three groups: 1. Those with at least one CA 2. Those with multiple CAs (MCA), those with two or more ‘major’ CAs, and 3. Those with CAs in a specific organ system. We also created a novel quantitative approach, using phenome-wide association studies (pheWAS), for determining CA-associated genetic disease billing codes in order to separate individuals that have a known genetic cause for their CAs from those with idiopathic CAs. We updated CA phecodes, aggregates of clinical billing codes, which we used to identify CA cases in Vanderbilt’s EHR-linked biobank database, BioVU. We create a new phecode, ‘All CAs’, for researchers to quickly identify all individuals with at least one CA. We evaluate the definition of MCA using pheWAS analyses to compare ‘minor’ vs ‘major’ CA. RESULTS/ANTICIPATED RESULTS: The new CA phecode nomenclature includes 5.8 times more codes for CAs compared with the previous version (365 vs 56), improving granularity. 85 (19.7%) CA-associated genetic disease billing codes were identified through literature review. PheWAS analyses revealed an additional 16 (3.7%) genetic disease billing codes with one or more significant (p< 2.75 x10-5) association with CA-related phecodes. Identifying CA-associated genetic disease billing codes allows researchers to differentiate between idiopathic CAs and those that have a known genetic cause. PheWAS analyses of individuals with previously considered “minor” CAs showed many associated severe health problems, revealing that the differentiation between “minor” vs “major” CAs when identifying individuals with MCA in the EHR is arbitrary. DISCUSSION/SIGNIFICANCE: Our CA identification method is scalable for the growing number of EHR-linked biobanks. Differentiating between idiopathic CAs from those with known causes will increase power in studies discovering additional genetic drivers of CAs. Our novel method allows for expansion and acceleration of CA epidemiological research in EHR-linked biobank data.

Type
Biostatistics, Epidemiology, and Research Design
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2024. The Association for Clinical and Translational Science