Over the past several decades, 2 separate but emerging trends have converged: an increase in the number of disasters of all varieties and the rise of big data as an analytical tool. In the United States (U.S.) and around the world, disasters have been occurring with increasing frequency. 1–Reference Greenough, McGeehin and Bernard3 A disaster is commonly defined as “a sudden, calamitous event that seriously disrupts the functioning of a community or society and causes human, material, and economic, or environmental losses that exceed the community’s or society’s ability to cope using its own resources.” 4 From kinetic natural disasters such as hurricanes and earthquakes, to pandemics such the 2009 H1N1 and coronavirus disease 2019 (COVID-19), and terrorism, the increasing incidence of disasters has created several public health preparedness challenges. Since 2000, more than 1 million people have died globally from natural disasters, with an associated economic impact exceeding $1.7 trillion. Reference Thomas and López5 In the United States alone, there were 119 major natural disasters between 2010 and 2019 which caused $810.5 billion in expenses and killed 5217 people. 6 While the United States has borne a significant impact from such events, from 1970 to 2008 more than 95% of natural disaster-associated deaths were in developing nations. Reference Thomas and López5 Climate-related disasters, in particular, have increased nearly 35% since the 1990s. 1
In the United States, the Federal Emergency Management Agency’s (FEMA’s) National Response Framework outlines 5 core capabilities within the National Preparedness Goal: prevention, protection, mitigation, response, and recovery. 7 These phases, which comprise the disaster life-cycle, culminate in recovery-phase activities that necessitate addressing longer-term physical health and psychosocial sequelae along with infrastructural impacts. Reference Fitzpatrick, Willis and Spialek8–12 These long-term needs form the basis of the recovery phase, in which the goal is to reconstitute the “livelihoods and health, as well as economic, physical, social, cultural and environmental assets, systems and activities, of a disaster-affected community or society.” 13 Such recovery efforts are consonant with the United Nations Office for Disaster Risk Reduction’s concept of “Build Back Better,” 12 and with FEMA’s National Disaster Recovery Framework. 11 As disasters are increasing in frequency and severity, it is critical to recognize that many areas hit by prior disasters are also at higher risk for future disasters, which adds urgency to the need to recover quickly and effectively. Areas still recovering from prior events, or those having not yet adequately addressed physical or social vulnerabilities, may experience even higher morbidity and mortality along with reduced resiliency from disasters. Reference Ferris, Petz and Stark14
Against the backdrop of increasingly frequent and severe disasters, the role of “big data” to aid recovery-phase activities requires more explicit ascertainment. In the 21st century, use of data has become a key feature of nearly every industry. While business and private industries, from banking to health care, have made great use of big data for improved efficiency, opportunities exist for enhanced use of big data in disaster management. Reference Yu, Yang and Li15,Reference Elichai16 This lag, or lack of use, is often attributed to the highly dynamic, disparate and diverse nature of data in disasters, yet analytic approaches aimed at big data are actually designed to deal with each of these considerations and more. Reference Yu, Yang and Li15 As analytic approaches to big data have been further developed and validated, their use across business, government, and academia have grown. Reference Wamba, Akter and Edwards17
While the term “big data”, and the analytic approaches to manage it, is not always clearly defined, there is general consensus that big data should be defined in terms of Volume, Velocity, and Variety. Reference Wamba, Akter and Edwards17,18 Volume refers to the quantity of data, which in a disaster setting may come from established steady-state data sources such as emergency department reporting, Reference Vollmer, Glampson and Mellan19 responders and other aid workers actively collecting information in the field, Internet of things (IoT) devices, Reference Asadzadeh, Pakkhoo and Saeidabad20 social media, Reference Shoyama, Cui and Hanashima21,Reference Yeo, Knox and Hu22 industrial equipment, geographic information systems (GIS) technology, overhead imagery, and more. Velocity refers to the rate at which data are received or processed. Variety refers to the mixed structure and format of the data, especially when a mix of sources are being aggregated in real time.
To date, there have been limited attempts to systematically study and apply the technological advances of big data to disaster management, let alone to recovery-phase efforts therein. Traditionally, most of the limited focus has been placed on response-phase activities in which responders attempt to save lives and property. Reference Freeman, Blacker and Hatt23 This focus has extended from more traditional academic literature reviews Reference Freeman, Blacker and Hatt23 to startup companies working on machine learning and artificial intelligence to find vulnerable populations during disasters. Reference Flavelle24,Reference Schwab25 The application of these modern technologies to process big data into usable metrics present promising opportunities for more efficient disaster recovery. As the Volume, Velocity, and Variety of data continue to challenge the disaster management community, and with the pace of disasters increasing globally, enabling adoption and use of big data across all phases of the disaster life cycle, including recovery, is more critical than ever. To that end, here, we present an integrative literature review focusing on big data concepts as applied to the recovery phase of the disaster life cycle.
Methods
To assess the existing knowledge base on disaster recovery using big data, we performed an integrative literature review. Reference Torraco26 The literature search was conducted using search terms developed iteratively a priori by the research team. On March 31, 2020, we conducted the final search of PubMed, Embase, CINAHL, LILACS, Web of Science, Scopus, Biosis Citation Index, Compendex, Inspec, NTIS, and GeoBase. All results from each database’s inception to the search date were included in the review process. Search terms were developed iteratively a priori. Full search terms and scope were adapted for each database and are available in Supplementary Table 1.
Using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) Checklist Reference Moher, Liberati and Tetzlaff27 as a guide throughout, all citations were imported into EndNote, a citation management tool (Clarivate Analytics, Philadelphia, PA), and duplicates were removed. The unique records were then imported into Covidence (Veritas Health Innovation, Melbourne, Australia), a systematic review software program for initial title/abstract and follow-on full-text review. At each stage, 2 reviewers screened the articles with a third reviewer resolving conflicts. At the title/abstract phase, each reviewer was blinded to the other reviewer’s decision.
Final inclusion and exclusion criteria are listed in Table 1. All articles required, at a minimum, a real-world/nontheoretical disaster research publication during the recovery phase with some implementation of a big data component. Big data can be a nebulous concept, so for the purpose of this research the definitions from Gandomi and Haider Reference Gandomi and Haider28 were used. Broadly, the tool must include some component of high-volume data, high-velocity data or highly variable data where “traditional data management and analysis technologies (would) [be] inadequate for deriving timely intelligence”. Reference Gandomi and Haider28 Articles were further screened by the reviewers for methodological rigor, including a clearly logical research process, no obvious technical concerns and publications in a peer-reviewed source. Each article included in the final analysis was abstracted for basic features, including the publication data (eg, year, first author), type of disaster (eg, hurricane), study type (eg, case study), and, by consensus, qualitative major themes.
Results
On final search, 51,504 articles were initially identified. After removing duplicates, 25,417 articles remained from the 11 databases used. Blinded dual title/abstract review led to 238 articles being advanced to the full-text review. After full text review, 18 articles Reference Akter and Wamba29–Reference Wen and Lin46 were ultimately included in the final analysis. Figure 1 displays the final PRISMA Reference Moher, Liberati and Tetzlaff27 diagram.
Geographically, 8/18 (44%) of the included articles focused on disasters impacting the United States, with the remaining including Canada, France, India, Italy, Japan, Pakistan, and Turkey. While there was no left-sided boundary on publication dates, the earliest identified publication was in 2007. Median year of publication was between 2015 and 2016. Overall, disasters studied focused on hurricanes 39% of the time, followed by earthquakes (22%). All but 2 identified studies were classified as a case study or case series (89%), with the outliers being a systematic review and a literature review. Thematically, social media and GIS were the most commonly identified modalities, at 44% and 33%, respectively. Full results are displayed in Table 2.
Abbreviations: GIS, Geographic Information Systems; ICT, information communication technology; her, electronic health records.
GIS
GIS represented the most commonly used tool in the big data disaster recovery landscape. Often used to survey disaster recovery areas, GIS represents a multimodal data stream where different sources of information can readily be merged and superimposed. For example, Aydinoglu and Bilgin Reference Aydinoglu and Bilgin30 developed a model to make GIS data from multiple sources interoperable while focusing on Turkish landslides. GIS was also used in several studies to model human behavior and movement following a disaster. Elliott and Pais Reference Elliott and Pais37 used GIS to examine the impact on human displacement and resettlement post disaster in urban versus rural areas. Their key finding was in urban regions, the recovery phase has an effect of dispersing lower socioeconomic status (SES) populations, while in rural areas, long-term recovery concentrates groups of lower SES populations into smaller areas. Contreras et al. Reference Contreras, Blaschke and Kienberger35 assessed postdisaster satisfaction with resettlement by displaced communities as a function of their distance from the center of a natural disaster. In this case study of the L’Aquila earthquake, the authors used spatial variables to indicate the evolution through disaster phases and measure the progression of the recovery process.
Social Media
Social media abstraction is another common method of examining disaster recovery status and community resilience. Combined with surveys and sentiment analysis, social media can provide a natural data platform for analyzing a community’s disaster recovery. Reference Boulianne, Minaker and Haney31,Reference Brandt, Turner-McGrievy and Friedman32,Reference Cheng, Mitomo and Otsuka34,Reference Gruebner, Lowe and Sykora39,Reference Huang and Xiao40,Reference Ragini, Anand and Bhaskar43,Reference Shibuya and Tanaka45,Reference Wen and Lin46 Notably, due to the rapid ability to share information by means of social media, Twitter in particular was a key driver of disaster information from both regular citizens and government officials, and was used by agencies in the United States Reference Brandt, Turner-McGrievy and Friedman32 and globally. Reference Cheng, Mitomo and Otsuka34 Boulianne et al. Reference Boulianne, Minaker and Haney31 found those who followed the Fort McMurray, Alberta wildfire via social media exhibited a statistically significantly higher degree of “caring about others” and increased likelihood of getting involved by means of donations or volunteering. Gruebner et al. Reference Gruebner, Lowe and Sykora39 took sentiment analysis a step further and proposed using tweets as a supplemental form of syndromic surveillance for disaster-related mental health sequelae. Their research noted a spatial dimension to emotions, prompting a discussion about future efforts to target postdisaster mental health resources at the more granular (eg, neighborhood) level. While more commonly studied in the context of natural disasters, Wen and Lin Reference Wen and Lin46 extrapolated the same concept to the Paris terrorist attacks of 2015. Beyond mental health, indicators of economic recovery, such as used car transactions on Facebook, provided another proxy for community resilience and recovery. Reference Shibuya and Tanaka45
Mental Health
Similarly, given the nature of social media to provide a platform for micro-blogging and real-time expression of thoughts and sentiments, big data provides an opportunity to abstract for mental health sequelae following a disaster. As described above, Gruebner et al. Reference Gruebner, Lowe and Sykora39 and Wen and Lin Reference Wen and Lin46 trended emotions such as anger, sadness, and fear over both dimensions of time and geographic space for naturally occurring and terrorist disasters, respectively. Boulianne et al. Reference Boulianne, Minaker and Haney31 further noted social media could be a prolific source of spreading positive messages about disaster recovery.
Hurricanes
Notably, 7/18 (39%) of the included studies focused on the recovery process after hurricanes. By using big data techniques, the identified studies were able to demonstrate changes in population-level socioeconomic factors (eg, median household income, number of single mother households) along the path of a hurricane. Reference Elliott and Pais37 Combined with social media as described above, big data presented the opportunity to trend the disaster process from before the hurricane impacted the community (pre-event phase) through recovery. By aggregating the hurricane damage assessment produced from GIS with the timeline of events, a greater understanding of temporal impact and the pace of the recovery process can be achieved. Reference Huang and Xiao40 For example, social media enabled syndromic surveillance for mental health sequelae based on spatial and temporal proximity to the path of the hurricane, Reference Gruebner, Lowe and Sykora39 and helped determine when the major transition points occurred in the disaster life cycle. Reference Huang and Xiao40
Discussion
In recent years, big data has revolutionized the landscape in many industries, from fraud detection to high-speed financial trading to targeted online advertising. Yet, further opportunities remain for enhanced use of big data to tackle persistent challenges in the disaster management field specifically and throughout the health-care system more broadly. Reference Yu, Yang and Li15,Reference Elichai16,Reference Vigilante, Escaravage and McConnell47 The disaster life cycle remains a key area of public health emergency management where big data has only begun to permeate. Reference Akter and Wamba29 Although the technological and analytical challenges of working with high Volume, Velocity, and Variable data in a disaster recovery setting have traditionally been constrained by technological limitations, novel analytic approaches can help overcome these challenges and enable greater use across the disaster management community. As the real-time accessible volume of data increases (eg, social media) and field technology matures to handle the velocity and variability of the data generated in a recovery environment, new approaches to using big data become accessible. Previous literature reviews have found much broader applications of big data in antecedent phases of the disaster life cycle, Reference Freeman, Blacker and Hatt23 but this integrative literature review found the published research on the recovery phase is mostly limited to single-incident case studies that leveraged existing big data sources and repurposed them in an ad hoc manner. Given the widespread geographic range and reoccurrence of hurricanes and other disasters, big data, including generated through GIS, presents promising opportunities to continuously survey the landscape and trend recovery over extended periods of time and frequently occurring disasters.
A key feature of big data includes the ability to marry multiple data streams in different formats. The findings from our literature review indicate that traditionally disparate data sources can be combined to provide a more comprehensive overview of the recovery process. For example, information obtained from overhead imagery of impacted areas Reference Contreras, Blaschke and Kienberger35 (eg, neighborhood rebuilding, population resettling) can be merged with requests for social services, nutrition, Reference Martin, Rex and Barnett48 financial support, and economic data, Reference Kimura, Inoguchi and Tamura41,Reference Nejat, Moradi and Ghosh42 along with social media sentiment analysis Reference Akter and Wamba29 to provide a more comprehensive perspective of a community’s recovery. Reference Fitzpatrick, Willis and Spialek8 With the extended timeline of disaster recoveries, these technologies facilitate previously unprecedented assessments of population-level longitudinal physical and mental health, and exacerbations in social inequities. Reference Leiva-Bianchi, Mena and Ormazábal9 Past research into other phases of the disaster life cycle, such as preparedness, has identified a role for GIS in developing susceptibility tools and early warning indicators. Reference Freeman, Blacker and Hatt23
As the field of big data continues to grow and fully encompass the entire disaster life cycle, additional sources can be added to provide an even greater perspective while cross-validating overlapping data streams. New and increasingly diverse data sources have prompted discussions surrounding adding an additional term to describe big data—Veracity—which refers to the level of trust or accuracy in the data. While not included in most early definitions of big data, veracity has proven to be one of the more critical attributes, often constraining the adoption and use of big data analytics. Reference Plotnick and Hiltz49 Despite none of the identified articles having addressed this term explicitly, it has permeated the big data world more broadly, especially in the business community. Reference Wamba, Akter and Edwards17 From senior decision makers to operators in the field, it has been repeatedly noted that trust in the data is a key consideration before adoption of technology in a disaster. Reference Wamba, Akter and Edwards17,Reference Plotnick and Hiltz49 This is key because, when data sources are not adequately cleaned and integrated with concurrent data streams, there is a real risk of making improper inferences, especially in the context of providing life-saving and life-sustaining services that are the hallmark of a rapidly evolving disaster recovery. Reference White50
Community Social Well-Being Through Social Media
One area in particular where big data can have a significant impact during a disaster recovery is on community social wellbeing. As previously described, big data has been leveraged to analyze and aggregate various forms of social media independently and in conjunction with other data streams. Given social media’s proliferation as a ubiquitous and immediate form of self-expression, big data methods, such as sentiment analysis of Twitter activity, become particularly helpful due to the volume and velocity of new tweets in the wake of a disaster. Ragini et al. Reference Ragini, Anand and Bhaskar43 explored this issue through a proof-of-concept study using analytical methods to trend the sentiment of social media posts in a disaster region from the response through the recovery. By using machine learning to perform sentiment analysis, the authors were able to demonstrate an ability to separate various needs within an impacted population. While interventions, such as mental health services, are often directed at the individual level, using big data to trend sentiment at the community or neighborhood level longitudinally throughout the recovery can facilitate informatics-driven interventions. Disaster recovery is a complex undertaking, and as areas recovering from disasters are often at risk of future disasters, Reference Jackson51 the public health emergency management community must begin to leverage big data to comprehensively characterize, monitor and improve recovery efforts.
COVID-19 Recovery Opportunities
This integrative literature review was conducted primarily before the impact of COVID-19. Given the once in a generation impact of the COVID-19 pandemic, the recovery process—medically, economically, and psychosocially—is expected to be prolonged and unprecedented in modern times. Reference Barnett, Rosenblum and Strauss-Riggs10 The scale of COVID-19 led to the adoption of entirely new data systems aimed at collecting and using big data across the national response. For example, the U.S. Department of Health and Human Services (HHS) stood up HHS Protect 52 to integrate hundreds of datasets across government, and the U.S. Department of Defense established Tiberius, a data management platform, to manage informatics around vaccine supply, demand, and distribution. Reference Simunaci53 Understanding how these efforts did, or did not, leverage big data analytics effectively throughout the recovery phase, is of immense operational value. Specifically, such understandings present an immediate opportunity for advancing and analyzing new methods to study recovery efforts using big data and to enhance big data’s use across academia, government, and industry.
Limitations
As with all literature reviews, this integrative literature review is subject to several limitations. First, the search was developed iteratively using keywords intended to capture major themes. However, it is possible that our search strategy missed a key term and a resulting key item in the literature. Additionally, despite the use of dual reviewers at each stage, it is possible, as with any such review, that human error improperly excluded a key study. As noted above, our literature review was conducted primarily before the impact of COVID-19. However, given current uncertainties surrounding the nature and scope of how big data-related technologies will be applied to the pending full recovery phase post COVID-19, future research will need to explicate COVID-19 specifc recovery-phase big data considerations. Finally, we intentionally limited our search to real-world disasters published in English-language peer-reviewed literature. While this choice is designed to ensure only properly vetted and formally published research was included, worthy examples in gray literature, or in a non-English language, may have been systematically excluded. For example, the plurality of hurricane-related disasters may be a function of a bias in the type of disaster by region of the world and language spoken.
Conclusions
This study examined the existing literature base of past applications of big data to disaster recovery. Our findings revealed limited prior research in this area, largely focused on case studies or series, and specific disasters. Predominantly, the extant literature focuses on the United States, with recurring themes of research assessing the role of big data as applied to GIS, social media, and mental health. This presents a broad opportunity to expand this evolving discipline into the disaster recovery realm. As big data permeates nearly every facet of modern society, while disasters simultaneously increase in frequency, the need and opportunities for further research regarding applications of big data to disaster recovery are timely and highly salient. Particularly in the context of regions experiencing recurring disasters, big data presents opportunities for more efficient monitoring of recovery processes, resources, and ultimately improved human resiliency.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/dmp.2021.332.
Conflict(s) of Interest
The views expressed are those of the authors and do not reflect the official policy or position of any employers or affiliated organizations. The authors have no conflicts of interest to report.