Impact statement
This paper demonstrates the use of Bayesian techniques in engineering applications for which data is scarce, specifically applications within the subsurface. The combined use of field data, numerical modeling, and Bayesian calibration to infer accurately important (and often difficult to quantify) parameters at the district scale are showcased. The applicability of this methodology under heterogeneous conditions is demonstrated, as is that different measurement locations can contribute different amounts of information to the inference of parameter values, understanding of which can reduce costs incurred in site investigations. The significance of calibration and accounting for uncertainty is highlighted in the design of an existing shallow geothermal system, indicating higher ground capacity than utilized and thus potential for a larger scale system.
1. Introduction
As cities grow in size and density, shortage of space above ground drives the increasing development of underground infrastructures. Residential and commercial spaces, transport systems, industrial processes, and energy applications all compete for the use of the shallow subsurface. Structures built into the ground and the associated heat flux from these are known to impact ground temperatures, with anomalous temperatures propagating particularly far through groundwater flow (Ferguson and Woodbury, Reference Ferguson and Woodbury2004; Bidarmaghz et al., Reference Bidarmaghz, Choudhary, Soga, Terrington, Kessler and Thorpe2020). Such temperature anomalies can have knock-on effects, impacting cooling and heating requirements to maintain underground spaces at comfortable levels (particularly due to groundwater not transporting away as much heat (Blum et al., Reference Blum, Menberg, Koch, Benz, Tissen, Hemmerle and Bayer2021)), groundwater quality, and the functioning of underground biospheres (Benz et al., Reference Benz, Bayer, Menberg, Jung and Blum2015; Attard et al., Reference Attard, Rossier, Winiarski and Eisenlohr2016; Bayer et al., Reference Bayer, Attard, Blum and Menberg2019; Krcmar et al., Reference Krcmar, Flakova, Ondrejkova, Hodasova, Rusnakova, Zenisova and Zatlakovic2020).
Effective and equitable utilization of the shallow sub-surface as a space and energy resource requires numerical thermo-hydraulic modeling of the ground at city-scale, taking into account anthropogenic influences and how they interact with one another (Meng et al., Reference Meng, Vienken, Kolditz and Shao2019; Epting et al., Reference Epting, Böttcher, Mueller, García-Gil, Zosseder and Huggenberger2020; Bidarmaghz et al., Reference Bidarmaghz, Choudhary, Narsilio and Soga2021). Large-scale numerical models have proven sufficiently reliable in capturing anthropogenic influences on the shallow subsurface. However, such models are computationally expensive and depend on measured data for calibration and parameter inference to yield reliable results. Inevitably, uncertainties arise within the modeling process, stemming from various sources such as noise in field measurements, simplifications adopted to make computational models tractable, and parametric uncertainty. Accounting for such uncertainties is important for the reliability, reproducibility, and interpretability of model outputs (Volodina and Challenor, Reference Volodina and Challenor2021). Bayesian frameworks allow for the quantification of uncertainty and its incorporation into the calibration process, as well as accounting for prior beliefs about the system that is modeled (Kaipio and Somersalo, Reference Kaipio and Somersalo2007).
While Bayesian calibration methods are used in several fields, for example, engineering mechanics (Rappel et al., Reference Rappel, Beex, Hale, Noels and Bordas2020; Pepi et al., Reference Pepi, Gioffrè and Grigoriu2020), natural hazards engineering (Zheng et al., Reference Zheng, Xie and Long2021), building energy modeling (Hou et al., Reference Hou, Hassan and Wang2021), and ecological and environmental modeling (Speich et al., Reference Speich, Dormann and Hartig2021), they are not widely utilized in the context of large-scale subsurface models for shallow geothermal applications, and their performance, when applied to field data, has not been tested fully. Case studies utilizing such methods on groundwater models and deep geothermal reservoirs exist (Cui et al., Reference Cui, Fox and O’Sullivan2019; Omagbon et al., Reference Omagbon, Doherty, Yeh, Colina, O’Sullivan, McDowell, Nicholson, Maclaren and O’Sullivan2021; Scott et al., Reference Scott, O’Sullivan, Maclaren, Nicholson, Covell, Newson and Gujónsdóttirð2022), however, the complexities present in the shallow subsurface, such as thermal influence from the surface as well as from anthropogenic infrastructure and activity, as well as the contrast in scale and magnitude make the problem context significantly different. Moreover, compared to synthetic data (i.e., data generated numerically or under controlled conditions) calibration on field data poses a number of non-trivial challenges: (a) observations are expensive to obtain and hence scarce, (b) while the spread of observations should capture meaningful variations for robust inference, these are often not easily tractable (even when data is available), leading to a lower content of information compared to synthetic data, and (c) observation noise and measurement errors introduce additional elements of uncertainty to the calibration process.
The scarcity of measurements is of particular relevance to the calibration of the subsurface models due to the cost and labor intensity of observations within the ground, requiring the drilling of boreholes and the deployment of measurement equipment (Nicholson et al., Reference Nicholson, Alferink, Paton-simpson, Gravatt, Guzman, Popineau, Sullivan, Sullivan and Maclaren2020). Hence data is often sparse relative to the complexity of the system. Bayesian techniques are particularly useful as they include probabilistic distributions for prior knowledge and measurements, thereby accounting explicitly for the lack of knowledge about the system being modeled (Omlin and Reichert, Reference Omlin and Reichert1999). For example, Cui et al. (Reference Cui, Fox and O’Sullivan2011) estimate the posterior distribution for parameters used in a deep geothermal reservoir model using a two-stage Markov chain Monte Carlo (MCMC) sampling scheme, where the two stages consist of running the MCMC samples through a surrogate model, and subsequently running samples producing acceptable results through the original model. A possible issue with such an approach, however, is that it often requires a large number of forward runs of the original model which can be computationally expensive. An alternative methodology is proposed by Menberg et al. (Reference Menberg, Bidarmaghz, Gregory, Choudhary and Girolami2020), based on the work by Kennedy and O’Hagan (Reference Kennedy and O’Hagan2001) and Goh et al. (Reference Goh, Bingham, Holloway, Grosskopf, Kuranz and Rutter2013). It employs models of two different levels of fidelity (i.e., mesh resolutions) to generate a large set of numerical data to train Gaussian Processes (GP) emulators for use in the calibration. The use of multi-fidelity approaches has also been suggested in a hierarchical framework by Maclaren et al. (Reference Maclaren, Nicholson, Bjarkason, O’Sullivan and O’Sullivan2020), that incorporates posterior-informed approximation errors and has been tested with deep geothermal applications. Adopting GP emulators is also used successfully by Cui et al., (Reference Cui, Peeters, Pagendam, Pickett, Jin, Crosbie, Raiber, Rassam and Gilfedder2018) in regional groundwater modeling as well as by Rajabi and Ketabchi (Reference Rajabi and Ketabchi2017) in coastal groundwater management applications. A limitation of the training of GP models is the required run-time complexity of $ \mathcal{O}\left({N}^3\right) $ , where $ N $ is the sum of the number of field observations and the number of numerical outputs (Chong et al., Reference Chong, Lam, Pozzi and Yang2017). That is to say, large data sets can make calibration infeasible if the process is prohibitively costly. Therefore, it becomes crucial to assess how much individual data-points contribute to parameter inference. It calls for a process by which points that provide the most information can be identified and curated as a manageable but high-value data set. For subsurface models covering large areas, this in turn requires the identification of critical locations where measurements can be improved and uncertainty in parameter posteriors reduced (Nicholson et al., Reference Nicholson, Alferink, Paton-simpson, Gravatt, Guzman, Popineau, Sullivan, Sullivan and Maclaren2020).
A further compounding factor, limiting the ability of parameter inference for spatially large models, is the issue of non-identifiability for parameters that vary spatially in large regions where measured quantities in the ground tend to be highly local due to this heterogeneity. Perego et al. (Reference Perego, Guandalini, Fumagalli, Aghib, De Biase and Bonomi2016) consider the impact of heterogeneity in soil profile on GSHP efficiency for a system installed within the town of Alessandria, Italy. A (deterministic) 3D heat transport modeling suite, GeoSIAM (Integrated System for GeoModelling Analyses), was used to model the system and validated against reference cases. The authors find that the assumption of homogeneity and averaged thermal properties can underestimate the thermal impact on the surrounding soil, leading to greater thermal imbalances. This issue also may be addressed through Bayesian methods in that the resulting inferred parameter values are distributions, rather than single-valued estimates (Omlin and Reichert, Reference Omlin and Reichert1999). However, it is useful to understand the degree to which the distributions inferred from data represent the spatial variations of parameters across a given region. Characterization of heterogeneity and its impact on inferred parameters using Bayesian methods is a growing topic of interest in hydrogeology (Zhou et al., Reference Zhou, Gómez-Hernández and Li2014; Linde et al., Reference Linde, Ginsbourger, Irving, Nobile and Doucet2017). For example, Painter et al. (Reference Painter, Woodbury and Jiang2007) demonstrate the ability of Bayesian methods to out-perform deterministic approaches in inferring a hydraulic conductivity field for a numerical model of the highly heterogeneous Edwards aquifer, Texas, USA. The authors compare the performance of simple interpolation and of numerical upscaling combined with cokriging against a revision to the hydraulic conductivity field using Bayesian updating, finding that the latter greatly improved the forward groundwater model, reducing the mean residual error by a factor of 10.
To address the challenges highlighted above, we apply a multi-fidelity Bayesian calibration approach using field data measured across locations in the city center of Cardiff, UK. The calibration methodology is also applied with synthetic, numerically generated data. The synthetic data enables us to have confidence in the setup of the calibration process and allows control of the conditions under which data for calibration is produced. Additionally, comparison between posteriors inferred from field data and those from synthetic data permits exploration of the additional complexities contained within field data and provides a measure for the increased uncertainty arising from field data. Our first objective is to investigate and identify a subset of high-value data points, thus addressing the issues of data scarcity and information gain. We do so by performing the calibration process with subsets of the available data and analyzing how the posteriors of the different parameters are affected by the size of the data subsets and by the physical phenomena experienced by the measurement points included in the subset. Our second objective is to test the ability of the calibration methodology to infer reasonable distributions for spatially heterogeneous parameters. We do so by numerically creating synthetic scenarios where the aquifer hydraulic conductivity is varied and examining the parameters inferred using these synthetic data sets. Finally, we demonstrate the importance of model calibration and uncertainty quantification by propagating the uncertainties in the inferred parameters to the design of an existing open-loop Ground Source Heat Pump system within the modeled area. In sum, the novelty of the paper lies in demonstrating data-efficient calibration of large-scale ground thermal models with field data and the application of the information gained on geothermal applications, thus contributing to the on-going efforts towards utilizing our environment’s resources in efficient manners.
2. Methodology
2.1. Bayesian multi-fidelity framework formulation
We employ a multi-fidelity approach developed by Menberg et al. (Reference Menberg, Bidarmaghz, Gregory, Choudhary and Girolami2020), which extends the Kennedy and O’Hagen “single-fidelity” Bayesian framework (Kennedy and O’Hagan, Reference Kennedy and O’Hagan2001) (hereafter KOH) by including outputs from numerical models at both high and low levels of fidelity for the calibration. The advantage of this method is that the low-fidelity model may be run at a lower computational cost than the high-fidelity one, albeit with lower accuracy, and thereby more frequently, providing a greater number of numerical data points for calibration. The multi-fidelity framework relates field observations $ {y}_f $ to simulation outputs $ {y}_c $ (ultimately re-expressed in terms of the low-fidelity model output term $ {\eta}_l $ and a term representing the mismatch between output from the high- and the low-fidelity models, $ \mu $ ) and a discrepancy function $ \delta $ , for a given set of state variables $ x $ , as
where $ {\theta}_h $ and $ {\theta}_l $ are the sets of unknown model parameters of the high-fidelity and the low-fidelity models, respectively. $ \zeta (x) $ represents the true, non-observable, physical process and $ \varepsilon $ the random measurement error. GP models are used for the approximation of the terms in equation (1), namely $ \left({\eta}_l,\mu, \delta \right) $ , with covariance matrices adapted from the KOH approach accordingly, shown in Supplementary Appendix A.
The multi-fidelity framework is implemented using the STAN programming language, employing a Hamiltonian Monte Carlo (HMC) algorithm to sample the posterior distributions. Four independent sampling chains are run during a calibration process, with 2000 samples each, the first 1000 being discarded as part of the burn-in phase. The outcome of a calibration consists of posterior distributions of the model parameters $ \theta $ , as well as the hyper-parameters, which define the range of prior uncertainty of the parameters. While some methods for Bayesian inference, such as the Bayesian approximation error (BAE) approach (e.g., Maclaren et al. (Reference Maclaren, Nicholson, Bjarkason, O’Sullivan and O’Sullivan2020)), use empirical estimates for approximation of the posterior distributions, our approach follows the concept of the KOH framework. Here, the posteriors distributions of all unknown (hyper-) parameters are inferred simultaneously by HMC sampling. However, it should be noted that due to limitations in the number of samples our posterior are approximations of the true distributions, and indeed several studies advise caution when interpreting the posteriors of error terms from the KOH framework quantitatively (Li et al., Reference Li, Augenbroe and Brown2016; Menberg et al., Reference Menberg, Heo and Choudhary2019). In fact, even with an infinite number of samples, the use of GP emulators would result in an approximation of the true posterior. The role of the hyper-parameters in the GP models and the calibration process is further explained in Supplementary Appendix 6. Further information on Bayesian inference can be found in works such as Gelman et al. (Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013), Chong and Menberg (Reference Chong and Menberg2018), Choi et al. (Reference Choi, Menberg, Kikumoto, Heo, Choudhary and Ooka2018), and Menberg et al. (Reference Menberg, Heo and Choudhary2019).
2.2. Description of the domain and field data
The city center of Cardiff, UK, has been monitored extensively by the British Geological Survey (BGS) and the Cardiff Harbour Authority over 5–20 years, for geological and hydrological data (Williams, Reference Williams2008; Heathcote et al., Reference Heathcote, Lewis, Russell and Soley1997). The data include soil profiles and geological characteristics of the area (Kendall et al., Reference Kendall, Williams, Patton and Thorpe2020; Patton et al., Reference Patton, Farr, Boon, James, Williams, James, Kendall, Thorpe, Harcombe, Schofield, Holden and White2020), hydrological information such as groundwater level and hydraulic head measurements, as well as temperature time series measurements from a number of monitoring boreholes within the area (Farr et al., Reference Farr, Patton, Boon, James, Coppell and James2019,0).
The domain considered in this study consists of a rectangle of approximately 3.5 km2 in area, in the southern part of the city, illustrated in Figure 1a where footprints of buildings likely to be with basements are shown. The geological layers down to a depth of 45 m below ground level are shown in Figure 1b, and relevant properties of the materials are listed in Table 1. Below a depth of approximately $ z=-25 $ m, the subsurface consists almost exclusively of Mercia Mudstone bedrock (ROCK), which is considered to be the base of the glaciofluvial sand and gravel deposits (GFSD) that constitute the aquifer, while the shallower layers consist of made ground (MGR), tidal flat deposits (TFD), and alluvium (ALV). An annual mean hydraulic head distribution of the area, shown in Figure 1c, is generated from a detailed hydrogeological model developed by the BGS and includes pumped groundwater control areas, mitigating rise in groundwater levels (Williams, Reference Williams2008). The average groundwater level in this domain is found to be 4 m below the ground surface.
a Values further explored in sensitivity analysis and uncertainty quantification, Sections 2.4 and 3 .
b This value is a conservative as recent studies by Boon et al. (Reference Boon, Farr and Hough2021) suggest that the thermal conductivity for fully-saturated dolomitic mudstone could be as high as 2.7 W/(m K).
The temperature monitoring boreholes at 24 locations scattered throughout the domain, at depths ranging from 1.5 to 12.5 m below the ground surface are shown in Figure 2a (see Table 2 in Patton et al. (Reference Patton, Farr, Boon, James, Williams, James, Kendall, Thorpe, Harcombe, Schofield, Holden and White2020) for co-ordinates of sensors). A linearly increasing sinusoidal function is fitted to each of the measured temperature time series. The sinusoidal variation observed in the temperatures is a consequence of the seasonal temperature fluctuation at the ground surface, and so the oscillation period is set to 365 days. Key features, such as the mean temperature $ {T}_{\mathrm{mean}} $ , amplitude $ {T}_{\mathrm{amp}} $ , annual temperature shift $ {T}_{\mathrm{inc}} $ , and phase $ \phi $ of the temperature oscillations, are extracted. This provides a suitable framework for data management, allowing the use of a large data set in a lower dimensional space by removing the temporal dimension, and enabling efficient utilization of multiple points in the calibration process. Figure 2b,c illustrates the fitting applied to data measured at two of the boreholes.
2.3. Numerical modeling
The domain within Cardiff is modeled using finite element methods and is implemented within the software suite COMSOL Multiphysics (COMSOL ® v. 5.6, 2020). A semi-3D numerical modeling approach (Bidarmaghz et al., Reference Bidarmaghz, Choudhary, Soga, Kessler, Terrington and Thorpe2019, Reference Bidarmaghz, Choudhary, Soga, Terrington, Kessler and Thorpe2020; Makasis et al., Reference Makasis, Kreitmair, Bidarmaghz, Farr, Scheidegger and Choudhary2021) is utilized to model the subsurface response to thermal and hydraulic phenomena. This approach entails modeling a collection of horizontal planes using the governing equations for conductive and convective heat transfer and for fluid flow through porous media in 2D, and thermally coupling neighboring planes via conduction. A key aspect of the modeling is the inclusion of heated basements to assess their anthropogenic influence on the thermal state of the ground, as well as the incorporation of the river Taff in Cardiff, modeled using turbulent flow governing equations. Detailed information on the numerical modeling methodology, including governing equations, initial and boundary conditions, and a labeled schematic of the model set-up can be found in Supplementary Appendix B. The methodology has been validated against more complex detailed 3D numerical simulations which have, in turn, been validated against both experimental and numerical data (Bidarmaghz and Narsilio, Reference Bidarmaghz and Narsilio2018; Bidarmaghz et al., Reference Bidarmaghz, Choudhary, Soga, Terrington, Kessler and Thorpe2020; Makasis et al., Reference Makasis, Narsilio, Bidarmaghz, Johnston and Zhong2020).
For the purposes of the Bayesian calibration utilizing the multi-fidelity approach defined in Section 2.1, two numerical models with different levels of fidelity are required. In addition to these, a model with very high fidelity is created to generate output data of controllable scenarios (i.e., both input and output values are known), hereafter called the data-generating model and its output synthetic data. No noise was added to the synthetically-generated training data. These allow the systematic exploration of the heterogeneity across the domain on parameter inference, to understand at what degree this methodology is applicable under conditions with heterogeneity (which is expected in large-scale modeling of the ground). The schematics for these three models are shown in Figure 3. The number of planes implemented in the models defines the fidelity of the models and determines computational costs: the data-generating model with the highest resolution (41 layers) requires about 15 hr to run; the high-fidelity model (23 layers) requires approximately 5 hr to run; and the low-fidelity model (8 layers) requires less than 30 min for one computational run. All three models implement a higher layer density near the surface, with planes spaced more closely together (i.e., smaller $ dz $ ). This is to better resolve the topmost layers of the domain, where more pronounced geological variation and the presence of heat sources produce a more complex convective flow pattern, compared to deeper underground where the soil conditions are more homogeneous, there is a lack of heat sources, and the presence of the mostly impermeable bedrock diminishes convection. Each plane is meshed with 38,415 triangular elements, irrespective of the model fidelity. Simulations were run on high-performance computers with two 2.1 GHz processors (24 cores) and 192 GB of RAM operating at 2400 MHz.
2.4. Parameter screening and calibration set-up
The computational cost of the calibration process (both in terms of the cost taken to perform necessary model runs as well as the time taken for the calibration process itself) necessitates parameter screening to identify parameters with the greatest impact on subsurface temperatures. Limiting the calibration to only the most influential parameters also addresses the possibility of over-fitting (Chong and Menberg, Reference Chong and Menberg2018). To this end, the Morris Method (Morris, Reference Morris1991; Menberg et al., Reference Menberg, Heo and Choudhary2016; Chong and Menberg, Reference Chong and Menberg2018) is used to identify the model parameters with the greatest influence on the model output of interest, that is, the mean temperature $ {T}_{\mathrm{mean}} $ at a given location. The Morris Method is a global sensitivity analysis whereby the input parameter space is non-dimensionalized onto a unit hyper-cube $ {H}^k $ (for a set of $ k $ parameters), and each dimension $ i $ is divided into a regular grid of spacing $ {\Delta}_i $ with a total of $ p $ levels along each dimension. A series of $ t $ trajectories is defined within $ {H}^k $ , each beginning at a randomly chosen point in the parameter space with the following points differing from the previous one by changing a single parameter by one increment at a time for a total sequence of $ k+1 $ . For a set of $ t=10 $ trajectories with $ k=8 $ parameters (see below), then, this yields a total of 90 model runs. The results from these runs are used to determine the effective mean for each parameter, a measure of the magnitude of influence of a parameter $ X $ on model output $ Y $ , which is given by
This can then be used to rank parameters according to the sensitivity of model output on parameter variation.
The parameters considered for screening are listed in Table 2 along with their plausible ranges, and the ranking of the parameters according to their effective mean values. The ranking is determined by taking the average of the effective mean calculated across the 24 measurement locations considered in the domain. Results from this analysis are illustrated in Figure 4, showing the effective mean of each parameter as well as the variation in this metric across the 24 locations. The most influential parameter is seen to be the far-field ground temperature $ {T}_{\mathrm{ground}} $ , which is unsurprising as it affects most directly the mean temperature at a location, itself being the mean temperature of the ground in absence of any external temperature fluctuations. The sensitivity of the model output to $ {T}_{\mathrm{ground}} $ is fairly consistent across the 24 measurement locations in the domain, with the percentiles of the box-plot nestled closely around the average value.
The next two most influential parameters are the temperature to which the basements are heated $ {T}_{\mathrm{room}} $ and the percentage of basements in the domain that are heated $ {B}_{\mathrm{perc}} $ . This percentage is varied in the sensitivity analysis by “activating” (i.e., applying the temperature boundary condition of $ T={T}_{\mathrm{room}} $ ) to varying fractions of the basements shown in Figure 1. The parameters exhibit a signification variation in sensitivity across the measurement points, confirming that heat fluxes from any individual basement have a highly localized effect on ground temperatures. Despite being explored independently in the sensitivity study, the two parameters are closely linked in terms of how they impact the mean temperature of the surrounding ground. Because the basement percentage is a parameter with a more localized impact on mean temperature than the basement temperature, $ {T}_{\mathrm{room}} $ is chosen as the parameter for calibration, while keeping the percentage of basements active constant at 60%, following initial investigations where both $ {B}_{\mathrm{perc}} $ and $ {T}_{\mathrm{room}} $ were varied.
The third-highest ranking parameter is the hydraulic conductivity of the glacio-fluvial sediment deposits, that is, the aquifer. The hydraulic conductivity affects the groundwater flow velocity and thereby how far groundwater heated from nearby basements flows before returning to the ambient ground (i.e., far-field) temperature, hence indirectly impacting the mean temperature measured in the vicinity of basements. Slower groundwater flow disperses less of the heat generated by basements, leading to the accumulation of heat beneath heated infrastructures in the ground.
The three parameters to be estimated are therefore the far-field ground temperature, the basement percentage, and the aquifer hydraulic conductivity. Being a Bayesian approach, the calibration requires prior estimates of the model parameters (and hyper-parameters). For the three model parameters $ \theta $ , the priors span the entire plausible range of the parameter, defined through literature review and local expertise, and follow a normal distribution of $ \mathcal{N}\left(\mu =0.5,\sigma =0.2\right) $ . (The priors of the hyper-parameters follow Menberg et al. (Reference Menberg, Bidarmaghz, Gregory, Choudhary and Girolami2020) and are given in Supplementary Table 4.) Latin Hypercube Sampling (LHS) is used to determine the combinations of parameter inputs for the runs of the numerical model, and the model outputs are used to train the statistical emulator. The high-fidelity model is sampled 20 times and the low-fidelity model, with its lower computational cost, is sampled 50 times. It should be noted that, for the data-generating model, defined values are used for the three uncertain parameters. Namely a far-field ground temperature of $ {T}_{\mathrm{ground}}=13.3 $ °C, hydraulic conductivity of the aquifer of $ {k}_{h,\mathrm{GFSD}}=2\times {10}^{-3} $ m/s, and a temperature of the anthropogenic heat sources of $ {T}_{\mathrm{room}}=15 $ °C.
Mean temperature values are shown in Figure 5 for the synthetic data (black crosses) alongside the field data (black asterisks). Also shown is the output from the 20 high-fidelity (colored pluses) and 50 low-fidelity model runs (colored circles), based on LHS covering the expected parameter range. Both the synthetic and the field data fall well within the range of outputs from the high- and low-fidelity models, providing confidence in the chosen range of prior parameter estimates.
3. Calibration Results
The posterior parameter distributions inferred from calibration on both the synthetic and the field data are shown in Figure 6. The $ \hat{R} $ criterion (Gelman and Rubin, Reference Gelman and Rubin1992), which compares the inner and inter-chain variance of the posterior samples is applied for convergence diagnostics and approaches unity (i.e., $ <1.01 $ (Vehtari et al., Reference Vehtari, Gelman, Simpson, Carpenter and Bürkner2021)) within the 1000 samples (after burn-in) of the HMC algorithm for all parameters shown. The Effective Sample Size (ESS) is a further metric related to convergence and is computed to determine sample independence, included in Supplementary Appendix C. For the synthetic data (top row) values used as parameter input in the model, that is, the values that are to be inferred, are indicated by a dashed vertical line. The calibration performs very well for the far-field ground temperature (left-most panel), with both the mean and mode values agreeing with the input of T ground=13.3°C to within two decimal places, and with a very narrow distribution around this value reflecting the high degree of confidence, mirroring the observations from the sensitivity analysis of the impact of far-field ground temperature on local soil temperature. The hydraulic conductivity and basement temperature (central and right-most panels) are also inferred well, with the posteriors exhibiting narrower uncertainty compared to the prior distributions, albeit with less precision and accuracy than for the far-field ground temperature. The aquifer hydraulic conductivity posterior overestimates the input value by about $ 0.4\times {10}^{-3} $ m/s, while the posterior for the basement temperature by 0.6 °C. These results provide further confidence in the methodology as well as indicating the amount of information the 24 monitoring boreholes can provide for the three calibration parameters.
For calibration using field data, shown in the bottom row of Figure 6, the resulting mean and modal values for $ {T}_{\mathrm{ground}} $ agree well with deep borehole measurements taken in the region (12.9 °C for the far-field ground temperature (Farr et al., Reference Farr, Patton, Boon, James, Williams and Schofield2017)), providing further confidence in the results. The posterior for the basement temperature shows lower values than expected, indicating that basement structures in the city center of Cardiff are only about two degrees higher than the temperature of the ground. The hydraulic conductivity posteriors indicate a value of about $ 2.9\times {10}^{-3} $ m/s, which is higher than found in the literature or measured from previous work in the area, but still within the plausible range for the type of material in the area. The observed differences in uncertainty between the two rows of inferred parameter posteriors (i.e., standard deviation values) are to be expected as calibration on data from field observations is subject to noise, model error, and further complexities not present in the numerically-generated data calibration. The resulting precision hyper-parameter distributions for the field data calibration are included in Appendix C.
3.1 Impact of subset size
The influence of data set volume on parameter inference is examined by incrementally reducing the size of data subset from the 24 locations. In addition to the size of the data set used for calibration, the amount of information contained in the data from the points included in the data set determines the ability of the process to infer the parameter values. To determine which points in the data set provide the most information, calibration would need to be performed on all possible combinations of subsets of a given size. This is constrained, however, by the prohibitively high computational costs of the calibrations, particularly for large subset sizes as the matrix for inversion grows with the number of data points, and is outwith the scope of this work. Subsets are therefore selected systematically, whereby measurement points are selected so as to maximize the variety of points in the subset, in terms of the spatial coordinates $ \left(x,y,z\right) $ and to cover the largest amount of volume possible, thereby maximizing mutual information spatially. This is based on the assumption that points that are in close proximity to one another are more likely to experience similar conditions and not provide complementary information. For this, the measurement locations are clustered spatially to find points that are closest together, as illustrated in the dendrogram in Figure 7a. Pair-wise comparison is performed on points nearest to one another, retaining only the most dissimilar points in the subset, decrementing from the maximum of 24 points. This includes approximately maintaining the ratio between the number of data points within and outwith the aquifer, noted with blue and orange colored labels respectively, at each step of the reduction. For example, the subset containing 16 points is shown in Figure 7b, resulting from having removed eight points, eliminating points with indices $ \left(\mathrm{10,14,15,18,24,9,22,7}\right) $ , of which five are located in the aquifer, and three outside. It is worth noting that this approach of determining successive measurement locations is a naive approach to maximize mutual information. The purpose of this exercise is to demonstrate the impact of using different data locations on the inferred posterior distributions rather than optimizing the locations of measurements for this example of a (non-linear) Bayesian inverse problem for which work exists (e.g., Huan and Marzouk, Reference Huan and Marzouk2013; Alexanderian et al., Reference Alexanderian, Petra, Stadler and Ghattas2016; Alexanderian, Reference Alexanderian2021).
Results of calibrations performed on the selected subsets are shown in Figure 8. The plots show the change of the posterior distributions for each of the $ \theta $ parameters with a change in subset size. This illustration choice enables understanding at what size the data set becomes sufficient for the posterior distribution found from calibration on the subset to become equivalent to that of the calibration performed on the entire set. The top row (panels a, b, c) shows results from application to synthetic data, with the “true” value used as input in the data-generating model shown as black dashed lines. While the posterior distribution for $ {T}_{\mathrm{ground}} $ remains mostly unchanged, there exist notable changes in the other two parameter posteriors as the subset size increases. For example, at a size of 12 points the distributions for the parameters narrow, indicating a significant contribution of information form the 12th data point. This point, labeled “8” in Figure 2a, is both within the aquifer and close to basements, likely making it a good source of information for the calibration algorithm. For calibration on synthetic data, as few as 10 data points (shown in Figure 7d) can be used to predict the combined three parameters reasonably confidently. However, the uncertainty is reduced even further with more than 12 data points, especially for $ {T}_{\mathrm{room}} $ . On the other hand, it is also interesting to note that the value for $ {k}_h $ is better predicted with a subset size of 10 rather than any other size (despite the high number of outliers). This suggests that using more data points could, in some cases, reduce the predictive accuracy for certain parameters and thus data points should be carefully selected.
The bottom row of Figure 8 (panels d, e, f) shows results for calibration on field data. Compared with results from synthetically created scenarios, these results are less well-behaved, with more uncertainty persisting throughout the different set sizes. The inferred ground temperature $ {T}_{\mathrm{ground}} $ exhibits a greater variance in the posteriors and the mean value of individual calibrations varies more compared to the posterior calibrated on the synthetic data, reinforcing that real data stem from systems with greater complexity, heterogeneity, and/or may be affected by measurement noise and external factors and heat sources not accounted for within the numerical modeling. The posteriors for $ {k}_h $ and $ {T}_{\mathrm{room}} $ exhibit inconclusive outcomes for the smaller subsets, with unchanged distributions (same as the prior) or bi-modality where the algorithm identifies multiple possible solutions. At a data set size of 16 (points shown in Figure 7b) the posterior distributions appear to start converging onto a more defined value, narrowing and moving away from the extremal ends of the calibration range. Therefore, even though there is a reduction in uncertainty with increasing subset size, the 16 data points are likely sufficient to predict reasonable values for the parameters. The irregular pattern with which the posterior distributions change with increasing subset size also suggests that different measurement locations/data points contain different levels of information and therefore differently influence the parameter posterior distributions.
The difference in behavior of the posteriors with increasing subset size is a result of the level of uncertainty and variation within the data used for calibration. The synthetic data is generated using a model with a single input parameter and without measurement noise such that there is less uncertainty than for parameter values inferred from the field data. As a result, additional data points from the synthetic data predominantly act to narrow the posterior distribution to a single value. In contrast, further information from additional data points from the field data can in fact contradict information from other data points, causing a significant change in the posterior upon updating using the new evidence.
To illustrate this further a small-scale study is conducted, for which calibrations are performed on 20 unique data subsets, consisting of 12 measurement locations. The borehole indices of the points contained in each subset are given in Supplementary Table 5 (in Supplementary Appendix D), and results, in terms of the posteriors inferred from calibrating the model on the data from the given subset of the measured field data, are shown in Figure 9. The significant variation in the parameter value distributions inferred from the different data sets indicates that the measurement locations provide different levels of information or even different information altogether. This is illustrated by the fact that the variation in the inferred far-field ground temperature, Figure 9a, is significantly less than for the other two parameters which are likely to be subject to much more local variations. The influence of measurement locations and volume of data sets on parameter inference is explored more in the subsequent section.
In terms of obtaining data for a site investigation, the findings of this section showcase the importance of identifying the additional information contained in different data points, as this could significantly reduce costs for obtaining data, for example, due to drilling unnecessary boreholes. It should be noted that this exercise is performed after the measurements were taken. The value of this ex post exploration of measurement locations is to illustrate the impact that data set size and measurement point distribution can have on parameter inference and the importance of identifying appropriate measurement locations which are often subject to constraints, particularly in densely built-up areas where drilling locations are limited due to existing infrastructure and due to legal issues such as land-ownership, to minimize both cost as well as uncertainty.
3.2. Heterogeneity
The previous validation uses single values as parameter inputs in the data-generating model, assuming the homogeneous distribution of parameter values throughout the domain. However, that is rarely the case for many parameters in reality, where natural variability creates spatial heterogeneity in domain properties. In particular, hydro-geological properties of the subsurface, such as the hydraulic conductivity, are known to vary greatly (Rehfeldt et al., Reference Rehfeldt, Boggs and Gelhar1992). To explore the ability of the calibration methodology to infer model parameters in scenarios where heterogeneity is present, two test scenarios with spatially varying aquifer hydraulic conductivity values are considered, designated as “noisy” and “regional”. The data used in these calibrations is generated using the very high-fidelity data-generating model, with spatially varying hydraulic conductivity values.
In the “noisy” scenario, hydraulic conductivity values change randomly within the aquifer following a normal distribution with mean $ {\mu}_{k_h}=7\times {10}^{-3} $ m/s and standard deviation $ {\sigma}_{k_h}=5\times {10}^{-4} $ m/s, while the far-field and basement temperatures were kept the same as for the single value case considered above. This represents a scenario where the aquifer permeability is subject to random heterogeneity as a result of variation in the compaction of the gravel and sand deposits making up the aquifer. This implementation of the hydraulic conductivity neglects spatial correlation, as is often incorporated in hydro-geological applications by means of a random field formulation. However, the purpose of our approach here is not to infer the spatial distribution of posterior hydraulic conductivity values, in favor of a simultaneous calibration of all parameters of interest, hence spatial correlation is not considered. It would be expected that the noise in the hydraulic conductivity would make it more difficult for the calibration to infer “correct” parameter values, giving rise to broader distributions and greater uncertainty in values. Results from the calibration using output data from this scenario are shown in Figure 10, with the calibration algorithm having achieved convergence for all three parameters, as indicated by the $ \hat{R} $ values. As for the single-valued case considered above, the far-field ground temperature and basement temperature are well predicted, with accurate and precise posterior distributions and mean and modal values very close to the ones used as data-generating model input. The basement temperature posterior is significantly more accurate compared to that of the single value calibration, suggesting that higher values for the ground hydraulic conductivity may reduce the uncertainty in $ {T}_{\mathrm{room}} $ as this allows anthropogenically heated groundwater to propagate further, providing more information about heat sources to a greater number of measurement points. The hydraulic conductivity (central panel) is seen to be inferred reasonably well, despite the input values in the domain following a distribution. The posterior reduces in uncertainty compared to the prior, although not to a degree as to fully match the normal input distribution, underestimating the “true” mean value by about $ 0.8\times {10}^{-3} $ m/s. Overall “noisy” heterogeneity, where the value of the parameter varies based on a (sufficiently narrow) distribution, may be considered to be captured relatively well by the calibration methodology.
In the “regional” scenario, the model domain is divided into four regions (shown in Figure 2a), each with a different hydraulic conductivity assigned to it. These regions are defined using natural and anthropogenic features of the domain: the western residential side of the river, densely populated with basements (10 data points), the northern commercial part (4 points), the eastern side with lower basement density (4 points), and the southern part which is close to the river outflow (6 points). The parameter posteriors inferred from calibration applied to data from this scenario are shown in Figure 11, with the “true” parameter value shown as either a line (if the plotted parameter is applied homogeneously at the domain-level) or as black markers (if the parameter is applied regionally). Results are shown for calibration performed on the entire data set, labeled “Whole domain”, and for regional subsets, with data points within each of the four regions of different hydraulic conductivity labeled by the cardinal direction of the region. The calibration methodology continues to perform well for the far-field ground temperature (left-most panel) as this is a parameter with a common value across all locations and with a strong direct impact on the mean temperature. The basement temperature (right-most panel) is inferred well for the whole domain and the west and north regions, with data points close to basements, but less well for the regions with measurement locations further away from basements, illustrating the localized nature of the parameter. Moreover, the fact that the north and east regions contain as few as 4 data points could affect the information the calibration algorithm has to meaningfully infer parameter values. The hydraulic conductivity values are inferred well regionally, as may be seen in the central panel, where the calibrations performed using data points from the individual regions cluster the interquartile range around the input value that is to be inferred, with only the south region underestimating the true value. All of the calibrations converge, with $ \hat{R} $ values of 1.0, including the one for the whole domain (that is using the data points from all four regions within one calibration to obtain a single “representative” distribution for $ {k}_h $ ), suggesting that, at the domain level the, posterior distribution could be used as a representative, capturing regional heterogeneity. To test this, the mean parameter values inferred from calibration on data points from the whole domain are used as input to a simulation of the numerical model, and the mean temperature values found at the measurement locations are compared with the data from the regional data-generating model. Results from this exercise are shown in Figure 12, showing the good agreement between the synthetic data used for the calibration ( $ {T}_{\mathrm{mean},\hskip0.2em \mathrm{regional}} $ , $ x $ -axis) and the output from the model using the mean parameter values inferred from the calibration ( $ {T}_{\mathrm{mean},\hskip0.2em \mathrm{calibrated}} $ , $ y $ -axis), with changes in mean temperature of $ \mid \Delta T\mid =\mid {T}_{\mathrm{mean},\hskip0.2em \mathrm{calibrated}}-{T}_{\mathrm{mean},\hskip0.2em \mathrm{regional}}\mid \le 0.2{}^{\circ} $ C. This suggests that, in cases where parameters vary regionally within a domain, calibration with sufficient data points could be able to infer a representative domain value for those parameters, which could be used in urban planning and design.
4. Impact of Uncertainty on Geothermal Planning
The inherent variation within the subsurface and the difficulty and costs associated with obtaining high-resolution data typically result in uncertainty in its properties. This can impact numerous applications relating to the underground and can lead to over-designing as a measure of accounting for this uncertainty. One such application is the utilization of shallow geothermal energy, which uses ground-source heat pump systems (GSHPs) to efficiently provide renewable thermal energy for heating and cooling purposes by rejecting heat to or absorbing heat from the ground using ground heat exchangers (GHEs) (Mustafa Omer, Reference Mustafa Omer2008; Bayer et al., Reference Bayer, Attard, Blum and Menberg2019). To assess the potential impact of parameter uncertainty on GSHP design, an existing open-loop GSHP system within the studied area of the city center of Cardiff is examined as an example case. This system has been installed at the Grangetown Nursery School, in the southwestern region of the modeled domain, and is monitored by the BGS (Boon et al., Reference Boon, Farr, Abesser, Patton, James, Schofield and Tucker2019). It includes two ground-source heat pumps that can provide up to 11 kW each, noting that, at the time of designing, few data from the site and aquifer were available and thus assumptions were made in the design. The hydraulic gradient of the local area is estimated to be 0.002 and the two wells (one abstraction and one injection) have a diameter of 0.2 m, a screen length of about 9 m, screened across the full thickness of the gravel aquifer, and are constructed at a distance of 20 m apart. Boon et al. (Reference Boon, Farr, Abesser, Patton, James, Schofield and Tucker2019) note the close (non-ideal) well spacing resulted from a lack of space for drilling on site during the installation work, a situation not uncommon in highly built-up urban environments. Typical open-loop GHE design processes for feasibility studies are followed to determine the potential geothermal power that this system can provide for different values of the hydraulic conductivity of the aquifer, using the calibration results from Section 3. More details on the project specifications may be found in Boon et al. (Reference Boon, Farr, Abesser, Patton, James, Schofield and Tucker2019) and detailed information on GSHP systems and open-loop design processes followed within this work can be found in Supplementary Appendix E.
This analysis focuses on the uncertainty in the hydraulic conductivity value of the aquifer, and uses various values for this parameter as found in the literature (listed in Table 3), as well as values obtained from the calibration presented above to explore the impact of uncertainty on the design of a typical open-loop shallow geothermal system. Results of the analysis are presented in Figure 13, showing the significant impact that the value of aquifer hydraulic conductivity can have on system design and expected geothermal potential. The figure shows the peak load distribution, calculated using the calibrated hydraulic conductivity distribution from Section 3 and the process outlined in Supplementary Appendix E, as box plots for both the current geometry and the theoretical maximum load that can be provided (which does not constrain the spacing of the wells). The peak loads obtained using hydraulic conductivity values found in the literature are also shown. For the current system geometry and a well spacing of 20 m, the estimated achievable peak power spans a wide range, from 5 to 183 kW, with the box plot interquartile range (25th, and 75th percentiles) narrowing this to between 38 kW and 62 kW. Three of the four literature hydraulic conductivity values result in peak loads on the lower range of the distribution. The fourth value, labeled “Lit 2”, results in an extreme peak load outside the calibrated range. This value is likely less suitable for this scenario, given current knowledge of the domain, but is included for completeness. Results for the theoretical maximum peak load for the area, where well spacing constraints are ignored, exhibit a range of peak loads between 0.15 MW and 3.17 MW and box plot interquartile range between 1.05 MW and 1.70 MW, relatively wider than the results obtained from considering the current geometry. Similarly, theoretical peak load values determined using literature values for the hydraulic conductivity are on the lower end of the calibrated range, save the one labeled “Lit 2”. Given the heterogeneity in the soil structure that can exist between aquifers of similar geological designation (as well as within a single aquifer) and the resulting uncertainties in the hydraulic properties, the results suggest that lack of detailed information from the site can result in a range of different GSHP designs of varying suitability for a given scenario. This can lead to under-utilizing the available resources or, perhaps more concerningly, attempting to use more resources than available, thereby risking damage to the GSHP system or the natural environment. In this case, results suggest that the open-loop system installed in the city of Cardiff is under-utilized, since the peak load of 22 kW currently installed (from the heat pumps) is at the lower end of the calculated range and it is highly likely that a higher load could be provided. Given the uncertainty, a conservative approach could, for example, be to use the lower percentile calibration value of 38 kW as peak load, which is still close to double the current design peak load. It is worth noting that the current system was designed to fulfill the heating and cooling demands of the school, which are less than the ground capacity. However, a clearer understanding of the subsurface capacity could have potentially enabled a larger scale system, where the geothermal energy is shared with nearby buildings, assuming sufficient demand. Quantifying the uncertainty in important design parameters can, therefore, contribute towards better utilizing natural resources, even within conservative approaches, resulting in relatively lower risk designs with higher thermal yields.
The discrepancy in geothermal potential between the two sets of results presented in Figure 13 highlights one of the potential drawbacks of open-loop systems, namely the physical constraints associated with the distance between wells. To fully avoid thermal interference, this distance typically needs to be large, often unreasonably so for urban locations such as London (Banks, Reference Banks2009; Fry, Reference Fry2009; Abesser, Reference Abesser2010). Figure 14 shows the optimal well spacing calculated for a given value of ground hydraulic conductivity, as explained in Supplementary Appendix E, noting that the advised minimum spacing according to the open-loop code of practice for the UK by CIBSE is 100 m (while numerical modeling is recommended) (Wincott et al., Reference Wincott and Billings2019). The well spacing values obtained using calibrated parameters peak at around 250 m with 25th and 75th percentiles at around 230 and 290 m. Using the hydraulic conductivity values from literature results in optimal spacing values outside that range, a very low value of around 130 m for “Lit 2”, and values higher than 550 m for “Lit 1”, “Pump test”, and “Preliminary”. In the case of attempting to design a GSHP system without thermal interference, the first value would likely result in under-designing the system, leading to lower whole-system efficiency and, in the worst-case scenario, system failure, while the other three would likely result in very costly over-designed systems, or, more realistically, would not be constructed due to the estimated costs and/or low thermal potential. It is apparent that uncertainty in important design parameters, such as hydraulic conductivity, can result in significantly different GSHP system designs, and costs, and could heavily influence whether a design is deemed a financially feasible solution. Moreover, the magnitudes of the distances calculated suggest that it would be difficult to utilize the maximum open-loop geothermal potential in most urban situations for private land owners and indicate the need for planning and investigations to be undertaken at a larger (city-)scale, with the potential of sharing geothermal resources across multiple buildings and land owners.
5. Conclusion
In this work, uncertainty in underground thermal modeling parameters is investigated, identifying the impacts this uncertainty can have on our understanding of how anthropogenic heat fluxes affect the shallow subsurface and how to utilize the ground as a resource. To this end, the city center of Cardiff is modeled using a semi-3D finite element approach to solve the governing equations of heat transfer and fluid flow. A Bayesian statistical framework is adopted, combining field observations and data from simulations at two levels of fidelity, to calibrate the numerical model and infer values for the most influential parameters within the model. Namely, these are the far-field ground temperature, $ {T}_{\mathrm{ground}} $ , the hydraulic conductivity of the aquifer, $ {k}_h $ , and the temperature of the anthropogenic heat sources considered in the domain (i.e., heated basements), $ {T}_{\mathrm{room}} $ . The calibration methodology is applied to numerically generated (synthetic) and to field data at a set of measurement locations, both to further validate the methodology as well as to highlight the additional unknowns and challenges faced when working with real data. Results for both sets of calibrations show convergence, with the synthetic data case resulting in inferred posterior distributions with mean and modal values that match closely the model inputs, and the field data case producing distributions with reduced uncertainty compared to the prior used and reasonable values for the explored parameters. Specifically, the field data calibration posteriors show a far-field ground temperature $ {T}_{\mathrm{ground}} $ of 12.9 °C, agreeing with deep borehole measurements from the area, a $ {k}_h $ of about $ 3.1\times {10}^{-3} $ m/s, somewhat higher than previous investigations for the area and local soil type, and a $ {T}_{\mathrm{room}} $ of 14.7 °C, indicating basements in the area are not significantly heated.
A notable limitation of parameter inference methodologies, such as the one presented in this work, is the lack of data availability, as not many locations have the quality and quantity of data that is available for the study area chosen in this work. The size of data set required to yield satisfactory calibration results is investigated, for both synthetic and field data, by methodically decreasing the number of data points used for calibration. Results show that with synthetic data calibration can be done to a high degree of accuracy using as few as 10 data points, while with field data at least 16 points become necessary due to the complexity, unknowns, and noise associated with realistic conditions. The results highlight the importance of identifying information contained within data measured at different locations for the purposes of parameter estimation, which in the context of a site investigation could help mitigate costs from drilling boreholes in locations that provide little additional information.
The impact of heterogeneity is also investigated, as this can be an important aspect of the subsurface. This work focuses on heterogeneity in $ {k}_h $ , particularly (a) heterogeneity throughout the material, represented as noise in the value of $ {k}_h $ , and (b) regional heterogeneity, implemented by subdividing the area into four regions with different local $ {k}_h $ values. Noise in the parameter value is shown to not negatively impact the calibration process, with very reasonable posterior distributions found for the three investigated parameters (inferred mean within 0.1 °C of true values for $ {T}_{\mathrm{ground}} $ and $ {T}_{\mathrm{room}} $ and within 10% for $ {k}_h $ ). The introduction of regional heterogeneity also resulted in reasonably accurate posterior distributions, and importantly, a fully converged calibration using the entire data set (i.e., the combination of data from all four sub-regions) suggests that “equivalent” parameter values inferred using data from the entire domain can be used as a model value for the model of the whole domain, giving rise to a difference of less than 0.2 °C between mean temperatures determined from the calibrated model and the calibration data. Considering the heterogeneity of realistic parameters, this result provides confidence in using the calibration methodology to predict parameter values, including area-representative values that could be used in urban design and planning.
Finally, the impact of the identified uncertainty on engineering applications is assessed by considering the effect of different parameter values on an installed open-loop GSHP system present within the modeled area. The shallow geothermal potential of this system is computed using theoretical methods for a range of different values for $ {k}_h $ , found in the above investigations and in the literature. The results indicate that uncertainty in $ {k}_h $ can affect significantly the design and energy potential of such a system, with capacity (peak load) values in this case ranging greatly from 5 to 183 kW. The plausible (interquartile) range using the inferred distribution for $ {k}_h $ is between 38 kW and 62 kW, much higher than the system’s current design load, indicating that if this information had been available during the design stage and there was equivalent demand from nearby buildings, the system could have been utilized even further, providing more geothermal energy for heating and cooling. Investigating the maximum geothermal potential of this system without accounting for its specific configuration shows much higher load values, for which the representative well spacing would vary from 133 to 783 m depending on the value of $ {k}_h $ used, illustrating the importance of reducing uncertainty in sensitive parameters early on in the design phase of such projects and thus the importance of subsurface monitoring and characterization of aquifer properties at appropriate scales.
Acknowledgments
The authors are grateful to Ricky Terrington (BGS), Johanna Scheidegger (BGS), and Ashley Patton (BGS) for their valuable input to the data acquisition for this work and to Cardiff Harbour Authority for the provision of supporting data and access to boreholes. Gareth Farr and David Boon publish with the permission of the executive director, BGS, UKRI.
Data Availability Statement
The data that support the findings of this study are available from the British Geological Survey. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the British Geological Survey via https://www.bgs.ac.uk/about-bgs/contact-us/ with the permission of the British Geological Survey.
Author Contributions
Conceptualisation: M.J.K.; N.M.; R.C.; K.M.; A.B. Methodology: M.J.K; N.M.; K.M.; A.B.; R.C. Data curation: G.J.F.; D.B.P. Data visualisation: M.J.K.; N.M. Writing original draft: M.J.K.; N.M. All authors approved the final submitted draft.
Funding Statement
This work was supported by CMMI-EPSRC: Modeling and Monitoring of Urban Underground Climate Change (EP/T019425/1), by the Centre for Smart Infrastructure & Construction (EP/N021614/1) and the Centre for Digital Build Britain at the University of Cambridge, as well as AI for Science and Government (ASG), UKRI’s Strategic Priorities Fund awarded to the Alan Turing Institute, UK (EP/T001569/1). The financial support for Kathrin Menberg via the Margarete von Wrangell program of the Ministry of Science, Research and the Arts (MWK) of the State of Baden-Württemberg is gratefully acknowledged. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests
The authors declare no competing interests exist.
A. Gaussian process covariance formulation
Gaussian Process models are used for the approximation of the terms in Equation 1, namely $ \left({\eta}_l,\mu, \delta \right) $ , with covariance matrices adapted from the KOH approach accordingly, that is,
where $ p $ is the number of state variables and $ {q}_h $ and $ {q}_l $ are the number of parameters of the high- and the low-fidelity models, respectively. Equations 3-5 introduce unknown hyper-parameters, namely the precision hyper-parameters ( $ {\lambda}_{\eta_l} $ , $ {\lambda}_{\mu } $ , $ {\lambda}_{\delta } $ ) and the correlation hyper-parameters ( $ {\beta}_{\eta_l} $ , $ {\beta}_{\mu } $ , $ {\beta}_{\delta } $ ) which are sampled along with the model parameters $ \theta $ . The precision hyper-parameters are a measure for the magnitude of the covariance functions and thus to what extent the variation in output may be explained by the associated term. For example, $ {\lambda}_{\eta l} $ accounts for the covariance in the field and model outputs that is captured by the low-fidelity model emulator term in Equation (1). A large value for a given $ \lambda $ (and, correspondingly, a small value for $ {\lambda}^{-1} $ ) indicates that the related term absorbs only a small portion of the variance in the model output, i.e. explains a small amount of the variation in $ {y}_f $ (Choi et al., Reference Choi, Menberg, Kikumoto, Heo, Choudhary and Ooka2018). The correlation hyper-parameters are an indication for the smoothness of the covariance functions. The priors for the hyper-parameters are given in Table 4, is chosen in accordance with work done in previous studies (Menberg et al., Reference Menberg, Bidarmaghz, Gregory, Choudhary and Girolami2020; Chong and Menberg, Reference Chong and Menberg2018; Guillas et al., Reference Guillas, Rougier, Maute, Richmond and Linkletter2009).
The posterior distribution depends on the unknown calibration parameters, $ \theta $ , the GP hyper-parameters, $ \beta $ , and the GP precision hyper-parameters, $ \lambda $ , and is estimated using HMC, which is suitable for high-dimensional distributions. The likelihood function is given by
where $ z $ is a single vector containing the combined observations and outputs of low- and high-fidelity simulations and
B. Numerical modelling details
This work utilises a validated semi-3D numerical modelling approach (Bidarmaghz et al., Reference Bidarmaghz, Choudhary, Soga, Terrington, Kessler and Thorpe2020; Makasis et al., Reference Makasis, Kreitmair, Bidarmaghz, Farr, Scheidegger and Choudhary2021), to model the subsurface response to thermal and hydraulic phenomena. A collection of $ x $ - $ y $ planes is modelled using the governing equations for conductive and convective heat transfer, and fluid flow through porous media, i.e.,
where $ \mathrm{d}z $ is the relative width of the plane (or the distance between planes) (m), $ {\rho}_{\mathrm{eff}} $ is the effective density (kg/m3), $ {C}_{p,\mathrm{eff}} $ the effective specific heat capacity (J/(kg K)), $ t $ is time (s), $ {\rho}_f $ is the fluid (groundwater) density (kg/m3), $ {C}_{p,f} $ is the specific heat capacity of the fluid (J/(kg K)), $ {\mathbf{v}}_{\mathbf{f}} $ is the Darcy velocity of the fluid (m/s), $ \mathbf{q} $ is the heat flux (W/m2), the permeability $ K $ (m $ {}^2 $ ) of the material is related to the hydraulic conductivity $ {k}_h $ (m/s) by $ K{\mu}_f={k}_h\left({\rho}_f\mathbf{g}\right) $ , $ \mu $ is the dynamic viscosity of water (Pa $ \cdot $ s), $ {p}_f $ is the pressure of water (Pa), $ \nabla Z $ is the total head gradient (m) and $ {Q}_{\mathrm{applied}} $ represents an external heat source (W), such as a ground heat exchangers. Each plane is thermally coupled to its nearest vertical neighbours via conductive heat transfer, which is the dominant mode of heat transfer in the $ z $ -direction. The upwards and downwards out-of-plane conductive heat fluxes ( $ {q}_{0,\mathrm{up}} $ and $ {q}_{0,\mathrm{down}} $ , respectively) from neighbouring planes are given by
where $ {\lambda}_{\mathrm{eff}} $ is the effective thermal conductivity (W/(m K)), and $ {T}_n $ represents the temperature at the $ {n}^{\mathrm{th}} $ plane. Further details on the modelling can be found in the cited literature.
The dynamics of the River Taff flowing through the domain are modelled using incompressible turbulent single-phase flow physics and coupled to the heat transfer physics in terms of temperature, pressure and velocity, as explained in detail in (Makasis et al., Reference Makasis, Kreitmair, Bidarmaghz, Farr, Scheidegger and Choudhary2021). The Reynolds-averaged Navier-Stokes (RANS) equations are implemented for conservation of momentum, and the continuity equation for conservation of mass. The governing equations for turbulent flow are solved for the pressure and velocity vectors of the fluid flow within the river which are then coupled to the heat transfer equations to calculate the transfer of heat and thus distribution of temperature within the domain.
A series of boundary and initial conditions applied to the model are shown in Figure 15. At the uppermost plane, a seasonally varying temperature is applied, utilising as a semi-empirical function in time and depth (Beardsmore and Cull, Reference Beardsmore and Cull2001; Baggs, Reference Baggs1983) fitted to local conditions, i.e.
where $ {T}_{0,g} $ is the mean annual ground temperature (°C), assumed as 12.9°C (Farr et al., Reference Farr, Patton, Boon, James, Williams and Schofield2017), $ {T}_{amp} $ the seasonal heating cycle amplitude (°C), assumed as 6.5°C (NCAS British Atmospheric Data Centre, 2020), is the angular frequency of the heating cycle ( $ \mathrm{rad} $ ) with period $ P=365 $ days, $ \varepsilon =\sqrt{\unicode{x03C0} /\left( P\alpha \right)} $ , $ \alpha $ is the thermal diffusivity of the ground (m2/s), $ {k}_v $ is the vegetation coefficient (defined spatially based on the surface cover conditions, adopting a value of 0.9 for suburban and 1.0 for urban terrain (Popiel and Wojtkowiak, Reference Popiel and Wojtkowiak2013)), and $ {t}_0 $ is the day of coldest temperature after January 1st, found to be 26 days (NCAS British Atmospheric Data Centre, 2020). At the bottom-most plane it is assumed that any influence from the surface will have dissipated and a constant temperature of $ T={T}_{0,g} $ is applied. For each plane, thermal symmetry is applied at all domain boundaries, and the time independent hydraulic head distribution in Figure 1 is used to define the groundwater flow conditions. In the shallow layers, where appropriate, the temperature and velocity of the river flowing into the domain are also defined.
C. Further calibration results
The Effective Sample Size (ESS) is an important metric for MCMC approaches, showing the correlation within parameter samples and identifying the equivalent number of effectively independent samples drawn from the Markov Chains (Fabreti and Höhna, Reference Fabreti and Höhna2022). For the calibrations on the complete dataset of 24 points, using 1000 samples for each of the four chains (after burn-in), the ESS for the three calibration parameters were computed to be 3791, 2198, and 2140, respectively, when using synthetic data, and 4102, 1866, and 2024 when using real data. The values computed, which are relatively high, support the calibration configuration and provide confidence in its findings.
Convergence is further indicated by plotting of the four chains of the HMC sampling algorithm. These are shown in the top row of Figure 16 for the calibration on synthetically generated data, and in the bottom row for calibration on the field data. A visual inspection of these suggests adequate convergence of the calibrations, as the trace plots resemble a “fat, hairy caterpillar” that does not bend (Sorensen and Vasishth, Reference Sorensen and Vasishth2015).
D. Data subset study
The indices contained in the data subsets used to infer the posteriors shown in Figure 9 are given in Table 5. Subsets are unique and contain 12 distinct measurement locations.
E. Open-loop shallow geothermal energy systems
GSHP systems can provide renewable geothermal energy for heating and cooling purposes. These systems can be installed in various ways, most commonly in close-loop configurations which circulate water in a closed pipe loop, implemented in GHEs such as purpose-built boreholes, trenches or energy geo-structures (Makasis, Reference Makasis2019; Makasis et al., Reference Makasis, Narsilio, Bidarmaghz, Johnston and Zhong2020; Loveridge et al., Reference Loveridge, McCartney, Narsilio and Sanchez2020). Another economically viable approach, used in this work to demonstrate the impact of uncertainty on the use of the subsurface as a resource, is using open-loop GSHP systems (Banks, Reference Banks2009; Milnes and Perrochet, Reference Milnes and Perrochet2013). These systems work by abstracting groundwater from a subsurface aquifer, passing it through a water-to-water GSHP to either extract or reject heat from/to it, and then re-inject the resulting cooled or heated water back to the aquifer. The abstraction and injection processes are commonly implemented using two wells (Banks, Reference Banks2009), one for each process, however a single well combining both can also be used (Rode et al., Reference Rode, Liesch and Goldscheider2015) and in rare cases only an extraction well may be required (Ampofo et al., Reference Ampofo, Maidment and Missenden2006). In addition to posterior distributions for the calibration parameters, the calibration yields a posteriors for the hyper-parameters. The precision hyper-parameter prior and posterior distributions are shown in Figure 17.
In designing an open-loop shallow geothermal system, it is important to quantify the thermal capacity, defined as (Ungemach, Reference Ungemach2003; Pujol et al., Reference Pujol, Ricard and Bolton2015)
where $ P $ is the thermal load/capacity of the well (W), $ \eta $ the efficiency of the heat pump, assumed as 0.95 (Pujol et al., Reference Pujol, Ricard and Bolton2015), $ {c}_w $ the specific heat capacity of groundwater, assumed as 4180 J/(kg K), $ {\rho}_w $ the density of groundwater, assumed as 998 kg/m3, $ Q $ the pumping flow rate (m3/s), and $ \Delta T $ the difference in the water temperature between abstraction and injection (K). The thermal capacity is a product of the hydraulic productivity (the pumping flow rate) and thermal productivity (given by the water temperature difference between abstraction and injection points) (Goetzl and Steiner, Reference Goetzl, Steiner, Hofmann, Riedel and Görz2017). Due to the re-injection of water to the underground and its interaction with groundwater resources, these values have strong dependencies on local restrictions and legislation and different guidelines can be found in different countries (Goetzl and Steiner, Reference Goetzl, Steiner, Hofmann, Riedel and Görz2017; UK Environment Agency, 2014). In addition, the design parameters depend on properties of the local subsurface and project specifications, such as thermal demand and associated costs.
The thermal productivity aspect of the design is typically restricted, such that groundwater resources are not significantly affected. In the UK, it is required for $ \Delta T $ to be less than 10 °C and the maximum temperature of the water injected into the ground to not exceed 25 °C (UK Environment Agency, 2014), while in Austria more conservative values of a maximum $ \Delta T $ of 6 °C, a minimum water temperature of 5 °C and a maximum of 20 °C are adopted (Rupprecht et al., Reference Rupprecht, Steiner, Heiermann and Riedel2017). In this scenario, a $ \Delta T $ of 8 °C is adopted, taking the temperature of the water from around 13 °C to about 5 °C, close to the limit of UK-based guidelines. The hydraulic productivity, or suitable pumping rate, depends on the hydraulic properties of the aquifer, such as aquifer thickness and hydraulic conductivity. Different equations exist in the literature to calculate this value, many consisting of adaptations of Thiem’s theorem (for example (García-Gil et al., Reference García-Gil, Vázquez-Suñe, Alcaraz, Juan, Sánchez-Navarro, Montlleó, Rodríguez and Lao2015)). Bottcher et al. (Reference Böttcher, Casasso, Götzl and Zosseder2019) summarise these and propose an approach that determines the flow rate to so as to avoid (i) depletion of the aquifer, (ii) thermal interference between injection and abstraction wells, and (iii) groundwater flooding at the injection well. This methodology is subsequently adopted herein to calculate the hydraulic productivity and thus the peak thermal load the geothermal system can provide, additionally implementing an upper limit of 100 L/s due to technical limitations as suggested by (García-Gil et al., Reference García-Gil, Vázquez-Suñe, Alcaraz, Juan, Sánchez-Navarro, Montlleó, Rodríguez and Lao2015). The formulation of the hydraulic productivity is summarised below, while for a detailed explanation readers can refer to the original paper (Böttcher et al., Reference Böttcher, Casasso, Götzl and Zosseder2019):
where $ K $ is the hydraulic conductivity of the aquifer (m/s), $ b $ the saturated aquifer thickness (m), in this case 9 m, $ {v}_D $ the Darcy velocity (m/s), $ {x}_w $ the distance between abstraction and injection wells (m), $ {z}_{\mathrm{max}} $ the maximum allowed groundwater level (m), $ z $ the natural groundwater level (m), and $ i $ the hydraulic gradient (-), in this case 0,002.
In an open-loop system, feedback from the injection well might affect the operation of the abstraction well and create thermal interference. An ideal design places the two wells such that no thermal interference occurs, even though, this can be arguably difficult. Banks presents an equation calculating the minimum distance for the two wells and argues that, over time, thermal interference may be inevitable (Banks, Reference Banks2009):
where $ L $ is the well spacing (m), $ Q $ the abstraction flow rate ( $ {m}^3 $ /s), $ T $ the aquifer transmissivity ( $ {m}^2 $ /s) and $ i $ the hydraulic gradient (-).
Comments
No Comments have been published for this article.