Introduction
Clostridioides difficile is the most common infectious cause of healthcare-associated diarrhea. Reference Katz, Golding and Choi1 It can impose a significant burden on the healthcare system and can cause life-threatening complications including pseudomembranous colitis, toxic megacolon, colonic perforation, sepsis, and death. Reference Karas, Enoch and Aliyu2
Diarrhea is very common in hospitalized patients, occurring in 12%–32% of hospitalized patients with less than one-fifth caused by toxigenic C. difficile. Reference Polage, Solnick and Cohen3 It would be helpful to be able to predict the probability that a patient in the emergency room or hospital with diarrhea has toxigenic C. difficile diarrhea (TCdD). Patients with a high risk of TCdD might be immediately started on empiric therapy, potentially reducing their risk of developing life-threatening complications. Patients with a high risk of TCdD who test negative might have their stool analysis repeated or proceed to colonoscopy for macroscopic evidence of disease. Finally, stool analysis might be avoided for patients with a low risk of TCdD, thereby decreasing test utilization.
Published systematic reviews and meta-analyses have associated C. difficile infection with several demographic factors, historical data, medication exposures, physical examination findings, and laboratory results (Table 1). C. difficile infection risk and severity increases with increasing age. Reference Abou Chakra, Pepin, Sirard and Valiquette4,Reference Deshpande, Pasupuleti and Thota5 Patients in hospital for prolonged stays have a greater risk of hospital-acquired C. difficile infection. Reference Honda, Kato and Olsen16 Previous TCdD was the historical factor having the strongest association with current TCdD risk. Reference Abou Chakra, Pepin, Sirard and Valiquette4 Antibiotic Reference Abou Chakra, Pepin, Sirard and Valiquette4–Reference Slimings and Riley7 and proton pump inhibitor Reference Abou Chakra, Pepin, Sirard and Valiquette4,Reference Deshpande, Pasupuleti and Thota5,Reference Trifan, Stanciu and Girleanu10 exposure consistently increased TCdD risk, while exposure to histamine-2 receptor antagonists Reference Tleyjeh, Abdulhak and Riaz11 and steroids Reference Furuya-Kanamori, Stone and Clark12 has also been associated with increased risk. The only physical finding consistently associated with TCdD risk was hypotension. Reference Kulaylat, Buonomo and Scully15 Finally, patients with TCdD are more likely to have leukocytosis Reference Abou Chakra, Pepin, Sirard and Valiquette4 , elevated creatinine, Reference Abou Chakra, Pepin, Sirard and Valiquette4,Reference Deshpande, Pasupuleti and Thota5,Reference Phatharacharukul, Thongprayoon, Cheungpasitporn, Edmonds, Mahaparn and Bruminhent13 and hypoalbuminemia. Reference Phatharacharukul, Thongprayoon, Cheungpasitporn, Edmonds, Mahaparn and Bruminhent13 However, the independent influence that these factors have on TCdD risk is currently unknown.
Note. Up arrow (↑) indicates that variable was associated with a significantly increased risk of outcome. Down arrow (↓) indicates that variable was associated with a significantly decreased risk of outcome. Black arrow indicates that a significant association was adjusted for other factors. Gray arrows indicate that significant association was unadjusted. Dash (-) indicates no significant association between outcome and variable.
(CKD, chronic kidney disease; DEMO, demographics; PE, physical examination; MA, meta-analysis; SR, systematic review; C, cohort; IBD, inflammatory bowel disease; PPI, proton pump inhibitors; H2RA, histamine-2 receptor antagonist; WBC, white blood cell)
Models exist the predict the likelihood of asymptomatic toxigenic C. difficile carriage in the elderly, Reference Lee, Lee and Suh17 the risk that currently asymptomatic people subsequently develop TCdD, Reference Aukes, Fireman and Lewis18–Reference Kuntz, Smith and Petrik20 and the probability of ominous outcomes in TCdD patients. Reference Li, Oh, Young, Rao and Wiens21,Reference Madden, Petri, Costa, Warren, Ma and Sifri22 A large number of factors have been associated with C. difficile infection (Table 1), but no model exists that returns the probability that a patient with diarrhea has TCdD. In this study, we took a random sample of patients at our hospital from 2014 to 2018 who underwent toxigenic C. difficile testing to derive and internally validate a model that estimates the probability of TCdD.
Methods
Study setting
This study took place at the Ottawa Hospital, a 1,000-bed teaching hospital with 2 campuses that is the tertiary referral institution and trauma center for a region of approximately 1.3 million people. Annually, the Ottawa Hospital has more than 175,000 emergency department visits, 40,000 non-psychiatric admissions, and performs more than 50,000 surgical procedures. The study protocol was approved by our research ethics board and adhered to throughout (File 20210372-01H).
Derivation of the sampling frame and sample size calculation
We created the study’s sampling frame using our hospital’s data warehouse to identify all stool analyses for toxigenic C. difficile conducted in our emergency department or hospital between March 15, 2014 (before which results were unavailable in our data warehouse) and December 31, 2018 (final date for which data was available when the study was initiated). Of these 29,311 specimens, we excluded those tests that were not the first sample for each patient (n = 11,608). We included only the first sample taken to be a proxy for whether patients presented with diarrhea or developed it during their admission.
From the remaining 17,703 specimens, we randomly selected 3,100 stool samples (17.5% of the final sampling frame) using the RANUNI random number function in SAS. This sample size was determined using methods presented by Riley et al. Reference Riley, Ensor and Snell23 We knew that 8.7% of stool samples tested positive for toxigenic C. difficile, and we conservatively projected a need for 44 degrees of freedom in our model (Appendix A). We then calculated 4 required sample sizes based upon (1) a 1% error in the expected probability from the null model’s intercept (sample size = 3,060), (2) a 5% mean error around individual predictions (sample size = 1,010), (3) target shrinkage of 80% (sample size = 2,332), and (4) an optimism of 0.05 (sample size = 1,743). We took the maximum of these estimates (n = 3,060) and, anticipating an exclusion rate (due to poor documentation or unavailable records) of between 1% and 1.5%, settled on the final sample size of 3,100 patients.
Outcome
Unformed stool samples from patients with suspected C. difficile infection were tested using a 2-step algorithm. Formed stool samples were rejected by the laboratory and not analyzed. Samples were initially screened for the C. difficile-specific glutamate dehydrogenase common antigen (GDH) by enzyme-linked immunosorbent assay (CDiffCHEK-60, TECHLAB, USA). GDH-positive samples were further tested for the presence of the tcdB gene using the Cdiff Simplexa® Universal PCR (Diasorin Molecular, USA). Patients with GDH-positive, polymerase chain reaction (PCR)-positive samples were classified with TCdD; all other patients were classified as TCdD negative.
Data collection
To identify potential factors that we should offer to our prediction model, we first identified all covariates found to be significantly associated with any type of TCdD in published systematic reviews or meta-analyses regarding C. difficile diarrhea (Table 1). In addition to these 15 covariates, we identified 2 other covariates that we thought might be associated with TCdD (eosinophil count and hospitalization day of sampling) based on personal experience and cohort studies. Reference Kulaylat, Buonomo and Scully15,Reference Honda, Kato and Olsen16
We then captured values for each of these factors for all patients in our sample. We retrieved values for ten covariates (age, testing location, hospitalization antibiotic exposure, gastric acid suppression therapy, steroid exposure, and laboratory values) from our hospital data warehouse and the remaining covariates (comorbidity status, previous C. difficile infection, abdominal pain, systolic blood pressure) from medical record review by trained abstractors (SD, JZ, YY, EB) who were licensed and practicing physicians between June and December 2021 (Appendix A). The abstraction database interface blinded abstractors to TCdD status. Patients were classified as exposed to antibiotics during their encounter if they had been given any member of the beta-lactam, carbapenems, sulpha, aminoglycoside, macrolides, fluoroquinolone, tetracycline, lincosamide classes, or metronidazole prior to toxigenic C. difficile testing. Patients were classified as exposed to antibiotics prior to their encounter if they were prescribed oral or intravenous antibiotics (other than metronidazole or vancomycin) within the previous 3 months according to treating physician notes. Steroid exposure included prednisone, prednisolone, dexamethasone, or methylprednisolone prior to or during their encounter. We abstracted vital signs closest to and prior to toxigenic C. difficile testing.
Analysis
We used logistic regression (SAS 9.4.2, Cary, NC) to measure the adjusted association of the selected variables with TCdD status. To control for the excessive influence of outliers, all continuous variables were Winsorized (ie, values below and above the 1st and 99th percentile were assigned those values, respectively).
Missing data was an issue for the laboratory tests, with the prevalence of missing values ranging from 3.5% for white blood cell count to 44.6% for serum lactate. Because these variables had skewed distributions, they were first log transformed to improve their imputation. We then used PROC MI to apply the expectation-maximization (EM) algorithm to compute the maximum likelihood estimates for imputation. All variables in the analytical data set, including outcomes, were used in the imputation model with 20 imputations.
To create the TCdD Model, we used a continuous variable transformation identification algorithm from Sauerbrei et al Reference Sauerbrei, Meier-Hirmer, Benner and Royston24 which combined backward elimination with an adaptive transformation identification algorithm to select the best fractional polynomial transformation for each continuous variable’s adjusted association with TCdD status using an alpha-error inclusion criterion of 0.1. Each polynomial included 2 terms and consumed 4 degrees of freedom (2 for the algorithm and 2 for the variable). Reference Steyerberg25 Our original model-building strategy planned to offer all candidate variables (Appendix A) to the algorithm. We did not offer lactate to the model because it was missing in 44.6% of patients. In addition, we offered 2 interactions to the model: (1) an interaction between age and white blood cell count because the latter increases less in older patients Reference Compte, Dumont and Bron26,Reference Vargas, Apewokin and Madan27 and (2) an interaction between sampling location and previous antibiotic status because a classification tree (for an analysis not reported here) showed a strong interaction between these 2 variables. This observation might reflect that pre-admission antibiotic exposure would be more completely documented in the consultation note for patients presenting with diarrhea (and, therefore, sampled in the emergency department).
This model-building strategy was applied to each of the 20 imputation samples. The variables and transformations most commonly selected in the samples were used for the final variable selection. This model was fit in each imputation sample and final pooled parameter estimates were determined using PROC MIANALYZE.
Model performance was assessed using internal validation within 1,000 bootstrap samples. All reported performance statistics accounted for optimism using methods described by Steyerberg. Reference Steyerberg28 We calculated the C-statistic (for discrimination) and the integrated calibration index (ICI for calibration) Reference Austin and Steyerberg29 and used the bootstrap percentile method to calculate 95% confidence limits. Reference Efron and Tibshirani30 C-statistics range from 0.5 to 1 with higher values indicating better discrimination. The ICI calculates the absolute difference between true and expected probability using a LOcal regESSion (LOESS) regression model of true values regressed on expected probabilities. ICI values range from 0 to 1 with lower values indicating better calibration. Finally, we used the methods described by Sullivan et al Reference Sullivan, Massaro and D’Agostino31 to reduce the final model to a points-based risk score, which we refer to as the TCdD Risk Score. The performance of this score was determined with the optimism-corrected C-statistic and ICI in 1,000 bootstrap samples.
Results
Of the 3,100 randomly selected patients, 50 (1.6%) were excluded because no vital signs were recorded before the sample was taken (n = 36), no complete admission or consult note was present (n = 10), or medical record access was restricted by the patient (n = 4). This left 3,050 patients in the study cohort (Table 2). Patients had a mean age of 63.1 years (SD, 18.0), and most were admitted from the emergency room. Abdominal pain was present in almost a third of patients. Previous toxigenic C. difficile diarrhea (TCdD) was uncommon (1.2%). Antibiotic use prior to admission was recorded in a third of patients with antibiotic administration during the hospitalization (prior to sampling) recorded in more than half. Albumin and lactate were not measured in 22.3% and 44.6% of patients, respectively.
Note. SD, standard deviation; Dept., department; CDAD, C. difficile-associated diarrhea; WBC, white blood cell.
TCdD was present in 246 patients (8.1%) and its status varied by several factors (Table 2). Notably, TCdD patients were notably more likely to have been exposed to antibiotics prior to admission (49.2% vs 31.6%), be sampled in the emergency department (23.2% vs 14.3%), have abdominal pain (36.2% vs 28.3%), and have previous TCdD (2.8% vs 1.1%). Mean white blood cell count and patient age were both slightly greater in those who had TCdD (11.1 × 109/L vs 10.3 × 109/L and 64.8 vs 63.7 years, respectively). Patients with TCdD had slightly lower mean eosinophil counts (0.12 × 109/L vs 0.17 × 109/L).
The TCdD Model included 4 binary variables and 4 continuous variables along with 2 interaction terms (Appendix B). The presence of abdominal pain (adjusted odds ratio [adjOR] 1.3 [95% CI, 1.0–1.8]) and a history of previous TCdD (adjOR 2.5 [1.1–6.1]) were both independently associated with an increased risk of TCdD (Table 3). Sampling location interacted significantly with previous antibiotic exposure: increased TCdD risk associated with previous antibiotic exposure was significantly greater when patients had their stool sample taken in the emergency room (adjOR 4.2 [95% CI, 2.5, 7.0]) than when patients had their stool sample taken in hospital (adjOR 1.7 [1.3, 2.3]). In contrast, sampling location had no influence on TCdD risk in patients without previous antibiotic exposure. TCdD risk decreased with time from presentation to testing (Figure 1A) and eosinophil count (Figure 1B). Patient age and white blood cell (WBC) count interacted significantly (Appendix B). TCdD risk increased with rising WBC count predominantly when patients were older than 60 years (Figure 2). Similarly, TCdD risk increased with rising age primarily when WBC count exceeded 10 × 109/L.
Note. The influence of all binary variables in the TCdD Risk Model (Appendix B) is presented.
(OR, odds ratio; CI, confidence interval; CDAD, C. difficile-associated diarrhea; Dept., Department)
In a 1000-bootstrap internal validation, the model had an optimism-corrected C-statistic of 0.651 (95% CI, 0.621–0.676) and ICI of 0.0167 (95% CI, 0.001–0.022). Model-generated TCdD probabilities (expressed as percentages) ranged from 0.04% to 55.6% with a median value (25th–75th percentile) of 6.7% (5.0%–9.2%). Although calibration was good in patients with an expected TCdD risk of less than 20%, the model tended to overestimate risk when the expected risk was higher (Figure 3).
The TCdD Risk Score is presented in Table 4. A 1-point increase in the score represents the increase in TCdD risk associated with being sampled in the emergency department without previous antibiotic exposure. In our sample, the TCdD Risk Score ranged from –13 to +16, with scores that were associated with TCdD probabilities of 0.3%–52.2%, respectively (Table 4). The median risk score (25th–75th percentile) was 2 (0–4). In a 1000-bootstrap internal validation, the optimism-corrected C-statistic was 0.627 (95% CI, 0.620–0.629), and the optimism-corrected integrated calibration index was 0.038 (95% CI, 0.005–0.038) (Figure 3).
Note. The TCdD Risk Score is calculated by adding up the points assigned to a patient’s status for abdominal pain, previous TCdD, antibiotic exposure prior to hospitalization, location of sampling, patient age, and white blood cell count. TCdD probability by risk score is provided in the bottom table. In a 1,000-bootstrap internal validation, the optimism-corrected C-statistic was 0.627 (95% CI, 0.620–0.629), and the optimism-corrected integrated calibration index was 0.0376 (95% CI, 0.005–0.038).
Discussion
In a randomly sampled and large cohort of symptomatic patients at a single hospital, this study measured the independent association of factors previously identified to be associated with toxigenic C. difficile diarrhea (TCdD). We identified several factors that were independently associated to create the TCdD Model, which had moderate discrimination and good calibration. Predictive performance was slightly reduced in the TCdD Risk Score. We conclude that TCdD risk can be predicted in symptomatic patients using readily available clinical factors with modest accuracy.
The TCdD Model performed with modest accuracy despite using optimal model construction methods. First, we reviewed the literature to identify all factors significantly associated with TCdD risk in systematic reviews or meta-analyses. We then determined the status of these variables in an appropriately sized and randomly selected cohort. Therefore, we believe it is unlikely that our model’s middling predictive performance stems from missing a known important predictor or an inadequate sample size. Second, model construction used all of the techniques recommended in the Prediction model Risk Of Bias ASsessment Tool (PROBAST) criteria Reference Wolff, Moons and Riley32 : we used appropriate data sources and inclusion criteria; all predictors were precisely defined, were assessed without knowing outcome status, and are available to clinicians; the outcome was determined in a universal, standard fashion independent of predictor status; and we used a large patient cohort, analyzed continuous variables appropriately, appropriately accounted for missing data, included almost all randomly selected patients in our analysis, avoided predictor selection based on univariate analysis, and accounted for optimism when measuring model performance. Therefore, we don’t believe that the TCdD Model’s modest performance was due to substandard methodologies.
Several issues should be kept in mind when interpreting our results. First, although our study included a cohort that was large and randomly sampled from a valid and inclusive sampling frame, it was limited to a single institution. The validity of our model should be determined in an external population, as opposed to the interval validation with bootstrap sampling used in this study. Second, data collection was not prospective. This trait is most relevant to factors reliant upon physician documentation, such as abdominal pain and previous antibiotic use. Our analysis found that these factors were both independently associated with TCdD status (Table 3); misclassification of these important variables due to inaccurate physician documentation could mute their true association with TCdD risk. Therefore, TCdD prediction accuracy might improve with prospective data collection. Third, laboratory staff in our hospital exclude formed stools sent for C. difficile toxin analysis. Therefore, our study included only patients with diarrhea. However, there are 2 situations in which study patients with C. difficile toxin-positive stool could be misclassified with TCdD: patients colonized with C. difficile with diarrhea of another cause and patients with formed stool whose sample was incorrectly analyzed. We believe it to be unlikely that the prevalence of these situations was high enough to importantly alter results. Fourth, although our medical record abstractors were not told each patient’s TCdD status, they might have determined this diagnosis during abstraction which could have influenced values of abstracted covariates. Because abstractors focused on consult notes and vital signs (as opposed to microbiology results), we believe that such misclassification is unlikely to be meaningful. Fifth, “missingness” was especially common for laboratory data, especially albumin and lactate. Although we accounted for this using appropriate multiple imputation methods, model performance might improve if all patients had these tests measured, assuming that these factors are importantly predictive of TCdD status. Finally, we do not believe that our model’s performance was strong enough to be immediately impactful to clinicians—such as excluding patients for C. difficile toxin testing for patients with a predicted low risk of the disease. However, the predicted TCdD risk could be used by clinicians for risk stratifying patients; for example, patients with an elevated predicted TCdD risk whose initial testing is negative might undergo repeat testing or other related studies for C. difficile such as colonoscopy. If new TCdD predictive factors are identified in the future, our model could serve as a substrate for enhancing TCdD predictive models.
In conclusion, these data show that TCdD risk can be predicted using readily available clinical factors with modest accuracy. Further study is required to determine whether other factors improve TCdD prediction capability.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/ash.2024.58.
Acknowlegments
This study was not funded. None of the authors have a conflict of interest with regard to this study. None of the authors have any financial conflicts with regard to this study.
Competing interests
The authors have no conflicts of interest and no funding was received for the study.