Hostname: page-component-5c6d5d7d68-pkt8n Total loading time: 0 Render date: 2024-08-26T02:08:32.906Z Has data issue: false hasContentIssue false

Compliance considerations in the geo-enrichment of an EHR data warehouse with social and environmental determinants of health

Published online by Cambridge University Press:  02 May 2024

Maryam Abdallah
Affiliation:
Keck Medicine of USC, University of Southern California, Los Angeles, CA, USA Southern California Clinical and Translational Science Institute, Los Angeles, CA, USA
Neil Bahroos*
Affiliation:
Keck Medicine of USC, University of Southern California, Los Angeles, CA, USA Southern California Clinical and Translational Science Institute, Los Angeles, CA, USA Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Praveen Angyan
Affiliation:
Keck Medicine of USC, University of Southern California, Los Angeles, CA, USA Southern California Clinical and Translational Science Institute, Los Angeles, CA, USA
Beau MacDonald
Affiliation:
Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
Camilla Catignas
Affiliation:
Southern California Clinical and Translational Science Institute, Los Angeles, CA, USA
Daniella Garofalo
Affiliation:
Keck Medicine of USC, University of Southern California, Los Angeles, CA, USA Southern California Clinical and Translational Science Institute, Los Angeles, CA, USA
Amy Chuang
Affiliation:
Keck Medicine of USC, University of Southern California, Los Angeles, CA, USA Southern California Clinical and Translational Science Institute, Los Angeles, CA, USA
Hakob Abajian
Affiliation:
Keck Medicine of USC, University of Southern California, Los Angeles, CA, USA Southern California Clinical and Translational Science Institute, Los Angeles, CA, USA
John Wilson
Affiliation:
Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
*
Corresponding author: N. Bahroos, Email: neil.bahroos@med.usc.edu
Rights & Permissions [Opens in a new window]

Abstract

Social and environmental determinants of health (SEDoH) are crucial for achieving a holistic understanding of patient health. In fact, geographic factors may have more influence on health outcomes than patients’ genetics. Integrating SEDoH into the electronic health record (EHR), however, poses notable technical and compliance-related challenges. We evaluated barriers to the integration of SEDoH in the EHR and developed a privacy-preserving strategy to mitigate risk of protected health information exposure. Using coded identifiers for patient addresses, the strategy evaluates an alternative approach to ensure efficient, secure geocoding of data while preserving privacy throughout the data enrichment processes from numerous SEDoH data sources.

Type
Brief Report
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Association for Clinical and Translational Science

Introduction

Electronic health records (EHRs) are inherently limited in providing valuable information for social and environmental determinants of health (SEDoH), though such data are critical for comprehensive patient history and precision medicine initiatives. An individual’s zip code may be linked more to influencing health outcomes or issues than their genetics [Reference Harris1]. Improving our ability to capture SEDoH can bridge the gap in health disparities and improve outcomes for marginalized populations [Reference Ford-Gilboe, Wathen and Varcoe2,Reference Hatef, Searle and Predmore3]. Although some healthcare organizations have integrated patient-reported social determinants forms in their EHRs, data are often sparse [Reference Cook, Sachs and Weiskopf4]. While publicly accessible neighborhood-level SEDoH data exist, seamlessly integrating this information in the patient electronic health record (EHR) is complex and presents compliance-related challenges.

Collecting SEDoH data begins with geocoding, or translating, an address or Census tract to its latitudinal and longitudinal coordinates [Reference Rana, Song and Islam5]. Geocoded addresses are then geo-enriched with SEDoH data retrievable through extensive and publicly available datasets, but accurately linking the variables to patient data requires disclosing individual geographic identifiers. Once addresses are geocoded, they can be linked to corresponding neighborhood and community-level SEDoH variables derived from a multitude of datasets. Datasets are available via publicly hosted files, public application programming interfaces (APIs), and commercial APIs that are behind a paywall. These datasets often utilize geolocations defined by the US Census Bureau to report on various SEDoH [Reference Cook, Sachs and Weiskopf4], and some initiatives have combined multiple data sources to create composite SEDoH indices [Reference Bazemore, Cottrell and Gold6]. Geographic identifiers beyond the first three digits of some zip codes are considered protected health information (PHI). Use and disclosure of PHI beyond the scope of providing patient care are restricted based on the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule [7].

Geographic information system (GIS) software can geocode and enhance addresses with geospatial data [Reference Harris1]. The Social and Environmental Determinants Address Enhancement (SEnDAE) toolkit [Reference Kingsbury, Abajian and Abajian8] employs an innovative strategy whereby an intermediate server separates the requesting health provider organization’s (HPO) IP address when transmitting deidentified patient addresses to a cloud-based geocoding service [Reference Rivera and Hoffman9]. While this significantly reduces the risk of accidental PHI disclosure, further safeguards can be put in place by carrying out in-house geocoding within a self-contained GIS application. Conservative arguments trust that a self-contained approach may be the only HIPAA-compliant method to protect PHI during external data transfer [Reference Rundle, Bader and Mooney10]. This paper explores these compliance challenges and offers recommendations for integrating SEDoH with EHR data while minimizing risk.

Materials and methods

To retrieve SEDoH variables for any individual patient, their home address must be translated, or geocoded, from its standard format into the specific latitudinal and longitudinal coordinates. The geocoded location can then be linked, or geo-enriched, with its corresponding social and environmental data points. This process can be completed either by sending location data to a web service or purchasing a local geocode database. The former option simplifies the process of geocoding, as there is no server set up, installation or maintenance required. Several such services exist, including a free service provided by the US Census Bureau [Reference Cook, Sachs and Weiskopf4]. However, utilizing web services involves risks associated with the disclosure of PHI to a remote server external to one’s institution. The alternate option is to instantiate a local geocode database and service, which can be purchased from several companies. While this requires the additional steps of setup and maintenance of the server, and keeping the software up to date, it eliminates the need for external disclosure of PHI.

The current project purchased Esri’s ArcGIS Pro 3.X with the Business Analyst Extension. The software was installed locally on a secure server created specifically for geocoding and geo-enrichment purposes. We designed a workflow, illustrated in Figure 1, to ensure the local GIS enhancement server contained only the minimum-necessary PHI required for geocoding and geo-enrichment. First, a randomized, deidentified ID is assigned to each patient, yielding a code key that is stored within our HIPAA-compliant Research Enterprise Data Warehouse (EDW). Second, deidentified patient IDs and their corresponding geographical addresses are loaded onto the local GIS server. Third, addresses are geocoded and subsequently assigned a randomized, deidentified address ID, yielding a code key that is stored within the local GIS server. Finally, deidentified address IDs are linked to corresponding Census Tract IDs and exported from the server. Census Tracts do not contain fixed individual geographic identifiers and are considered less specific geographic subdivisions than latitude and longitude, or even Census Block groups [Reference Rana, Song and Islam5], further minimizing PHI risk.

Figure 1. Diagram of workflow utilized for geocoding and geo-enrichment.

Results

All patients (n = 554,562) within the university’s EHR who opted in to participating in research and had valid addresses were included in this project. Full addresses and deidentified patient IDs were loaded onto the local GIS server. All current and previous patient addresses were included in the data, such that some patient IDs corresponded to multiple addresses. All addresses were geocoded and assigned a randomized address ID.

Datasets were selected from six data sources (Table 1). These sources were determined to contain valuable social and environmental variables and had the necessary geographic identifiers needed for linkage. All datasets were downloaded and stored locally, allowing geo-enrichment efforts to be completed on the local GIS server. Five of the six datasets were directly downloaded from the source and stored on the local server, which took about 10 seconds per dataset. The remaining dataset, the US Census American Community Survey (ACS), was only accessible through API calls and could not be directly downloaded. The ACS API was called with broad arguments to collect data for all addresses across the entire United States. ACS data were downloaded once a month, although it can be refreshed at any frequency as feasible for an institution. A free API key was registered, which the US Census requires for IP addresses that exceed 500 daily queries. Loading all ACS data on the local server via API calls took 8.28 minutes. While this meant that live data were not being obtained through the API, it allowed us to maintain the same level of privacy as downloadable, locally stored datasets.

Table 1. Sample data sources included in geo-enrichment of patient addresses

Once all addresses were geo-enriched with corresponding variables from all datasets, deidentified patient IDs and linked geospatial data were exported from the local GIS server and loaded back into the Research EDW. Patient IDs were reidentified using the code key maintained within the Research EDW, and the newly geo-enriched patient data were integrated with our existing EHR data warehouse. The data were integrated with our Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) [Reference Hripcsak, Duke and Shah11] by modeling SEDoH data on our local concepts for extending the OMOP CDM and creating measurement tables. We also converted the SEDoH data from OMOP into observations in our local instance of i2b2 [12], a self-service cohort discovery tool, using the SEnDAE ontology extension framework [Reference Kingsbury, Abajian and Abajian8]. These efforts allowed for geo-enriched EHR data to be readily available for researchers and clinicians to query and extract. The toolkit and OMOP CDM are publicly available at https://github.com/scctsi/gis-toolkit.

Discussion

This project was successful in geocoding and geo-enriching an EHR data warehouse in a secure, compliant manner. Utilizing the Esri database provided a minimal-cost solution to support this project. While local installation of the Esri database prevented external PHI transfer, there are limitations to this method. The setup, installation, and maintenance of such a server can be a burden to organizations. Due to the time-consuming nature of the in-house geocoding process and quality validation, our organization currently completes geocoding and geo-enrichment on an ad hoc basis as a consultation service for research projects and once every two years for our entire patient population.

To streamline the geocoding process, we plan to transition to a secure process that utilizes APIs to an external service for geocoding. This has been reviewed and approved by our university’s Compliance Department. The process involves setting up a server with a random hostname specifically for geocoding patient addresses. When an API call is made to an external geocoding service, the service may store patient addresses and referrer hostnames for auditing purposes, posing additional risk of patient reidentification. Utilizing a random hostname anonymizes the call such that our organization cannot be identified and linked to the patient addresses sent. This process takes an average of 0.27 s per address to geocode a sample set of addresses. Once this new process is fully implemented, we will geocode and geo-enrich our EHR data once a week.

Ethical issues remain inherent with the use of patient geographical data, and geolocation data are an element of PHI when linked to patients or HPOs [Reference Goodchild, Appelbaum and Crampton13]. We recommend becoming familiar with decisions associated with the geocoding process [Reference Goldberg, Wilson and Knoblock14], variability of positional accuracy, geocoding methods [Reference Jones, DellaValle and Flory15]. the use of different geographical units when matching address [Reference Zandbergen16], as well as published practices and protocols for internet geolocation [Reference Rivera and Hoffman9,Reference Rundle, Bader and Mooney10,Reference Bader, Mooney and Rundle17] Institutional interpretation of HIPAA and privacy policies varies, and patient geolocation approaches should be evaluated by appropriate officials prior to implementation [Reference Rivera and Hoffman9].

Employing secure strategies to geocoding EHR data allows for the benefits of geo-enrichment of patient data while minimizing privacy and security risks. Once securely geocoded, data can be safely enriched with any place-based measures to study the impacts of SEDoH and design and prescribe interventions to yield better health outcomes. We have outlined a framework for secure, compliant geo-enrichment of patient data that can be adapted and implemented at other institutions. Increasing consideration of SEDoH in both research and clinical practice can ultimately reduce health inequities and improve outcomes for marginalized populations.

Acknowledgments

This effort was supported in part by the Southern California Clinical and Translational Science Institute (SC-CTSI) (UL1TR001855) and USC Keck’s Health Data Innovation Program (HDIP). We thank Jane Choi and Masi Thomason for the help and support with copyediting the article.

Author contributions

Conceptualization: MA, NB, PA, BM, DG, JW. Writing – original draft: MA, NB, PA, CC, DG. Writing – review and editing: MA, BM, CC, DG, JW. Methodology: NB, BM, AC. Resources: MA, NB, CC. Software: PA, BM, HA. Validation: BM, HA. Data curation: BM, AC. Formal analysis: BM. Visualization: MA. Investigation, supervision, project administration, and funding acquisition: NB.

Funding statement

This work was supported by grants UL1TR001855 from the National Center for Advancing Translational Science (NCATS) of the US National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Competing interests

None.

References

Harris, DR. Geographic information systems as data sharing infrastructure for clinical data warehouses. J Soc Clin Data Manag. 2023;3(4):1–8. doi: 10.47912/jscdm.240.CrossRefGoogle Scholar
Ford-Gilboe, M, Wathen, CN, Varcoe, C, et al. How equity-oriented health care affects health: key mechanisms and implications for primary health care practice and policy. Milbank Q. 2018;96(4):635671. doi: 10.1111/1468-0009.12349.CrossRefGoogle ScholarPubMed
Hatef, E, Searle, KM, Predmore, Z, et al. The impact of social determinants of health on hospitalization in the veterans health administration. Am J Prev Med. 2019;56(6):811818. doi: 10.1016/j.amepre.2018.12.012.CrossRefGoogle ScholarPubMed
Cook, LA, Sachs, J, Weiskopf, NG. The quality of social determinants data in the electronic health record: a systematic review. J Am Med Inform Assoc. 2021;29(1):187196. doi: 10.1093/jamia/ocab199.CrossRefGoogle ScholarPubMed
Rana, MKZ, Song, X, Islam, H, et al. Enrichment of a data lake to support population health outcomes studies using social determinants linked EHR data. AMIA Jt Summits Transl Sci Proc. 2023;2023:448457.Google ScholarPubMed
Bazemore, AW, Cottrell, EK, Gold, R, et al. Community vital signs”: incorporating geocoded social determinants into electronic records to promote patient and population health. J Am Med Inform Assoc. 2016;23(2):407412. doi: 10.1093/jamia/ocv088.CrossRefGoogle ScholarPubMed
Office for Civil Rights (OCR). Methods for de-identification of PHI. HHS.gov. February 22, 2023. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#standard. Accessed December 14, 2023.Google Scholar
Kingsbury, P, Abajian, H, Abajian, M, et al. SEnDAE: a resource for expanding research into social and environmental determinants of health. Comput Methods Programs Biomed. 2023;238:107542. doi: 10.1016/j.cmpb.2023.107542.CrossRefGoogle ScholarPubMed
Rivera, B, Hoffman, MA. Technical strategies for real-time geocoding in healthcare, 2018 IEEE international smart cities conference (ISC2), 2018:15.CrossRefGoogle Scholar
Rundle, AG, Bader, MDM, Mooney, SJ. The disclosure of personally identifiable information in studies of neighborhood contexts and patient outcomes. J Med Internet Res. 2022;24(3):e30619. doi: 10.2196/30619.CrossRefGoogle ScholarPubMed
Hripcsak, G, Duke, JD, Shah, NH, et al. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform. 2015;216:574578.Google ScholarPubMed
Informatics for Integrating Biology to the Bedside, Partners Healthcare Systems. i2b2 TranSMART Foundation; 2023. www.i2b2.org.Google Scholar
Goodchild, M, Appelbaum, R, Crampton, J, et al. A white paper on locational information and the public interest. Am Assoc Geograph. 2023;2022:1040. doi: 10.14433/2017.0113.Google Scholar
Goldberg, D, Wilson, J, Knoblock, C. Exploring the USEOF gazetteers and geocoders for the analysis and interpretation of a dynamically changing world. In: Understanding Dynamics of Geographic Domains. Boca Raton: CRC Press; 2008. 174.Google Scholar
Jones, RR, DellaValle, CT, Flory, AR, et al. Accuracy of residential geocoding in the agricultural health study. Int J Health Geogr. 2014;13(37):37. doi: 10.1186/1476-072X-13-37.CrossRefGoogle ScholarPubMed
Zandbergen, PA. A comparison of address point, parcel and street geocoding techniques. Comput Environ Urban Syst. 2008;32(3):214232. doi: 10.1016/j.compenvurbsys.2007.11.006.CrossRefGoogle Scholar
Bader, MD, Mooney, SJ, Rundle, AG. Protecting personally identifiable information when using online geographic tools for public health research. Am J Public Health. 2016;106(2):206208. doi: 10.2105/AJPH.2015.302951.CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. Diagram of workflow utilized for geocoding and geo-enrichment.

Figure 1

Table 1. Sample data sources included in geo-enrichment of patient addresses