Introduction
Delineation of target volumes and organs at risk (OARs) is a key component of radiotherapy planning, but inter/intra-observer variation in contouring is well recognised and is a significant source of error within treatment workflows. Reference Jameson, Holloway, Vial, Vinod and Metcalfe1,Reference Vinod, Min, Jameson and Holloway2 Potential reasons for this variation may include the influence of disease site experience/expertise and skills in cross-sectional image interpretation. Reference Vinod, Min, Jameson and Holloway2–Reference Njeh4 The consequences of contouring variation may be profound; incorrect delineation is associated with inferior survival outcomes in clinical trials. Reference Peters, O’Sullivan and Giralt5,Reference Weber, Tomsej, Melidis and Hurkmans6
Various methods exist to minimise contouring variation including delineation protocols, atlases, auto-contours, peer review and teaching. Reference Vinod, Min, Jameson and Holloway2,Reference Gwynne, Gilson, Dickson, McAleer and Radhakrishna3,7,Reference Weiss and Hess8 Radiotherapy is a craft specialty, necessitating the acquisition and refinement of contouring skills during clinical practice. Reference Walls, Hanna and McAleer9 To mitigate the potential impact on training of the reduction in junior doctor working hours, smarter and more efficient methods of delivering training are required. Reference Datta and Davies10 Dedicated contouring workshops may be a valuable source of experiential learning especially concerning new radiotherapy techniques. Reference De Bari, Dahele, Palmu, Kaylor, Schiappacasse and Guckenberger11–Reference Khoo, Schick and Plank13
Following changes to the commissioning of stereotactic ablative radiotherapy (SABR) in the UK, the Royal College of Radiologists (RCR) and UK SABR Consortium organised a workshop which focused on SABR contouring for lung cancer and bone and nodal oligometastatic disease. 14 The aim of the workshop was to share expertise and experience in SABR techniques and improve participants’ contouring skills. Given the COVID-19 pandemic, the workshop took place in virtual format. In this study, we evaluated the impact of teaching during the workshop on contouring variation for multiple target volumes/OARs in the thorax and pelvis.
Methods and Materials
Format of the workshop
The workshop took place on 19 and 22 October 2020; each session lasted two hours in duration and was delivered using Adobe® Connect™ (Adobe, San Jose, CA, USA). Participants were UK-based consultants in clinical oncology, and the workshop was aimed at those without prior expertise in SABR, although relative baseline experience was not assessed/recorded prior to the workshop. Participants were asked to delineate target volumes/OARs for three cases prior to the workshop using the web-based platform EduCase™ (RadOnc eLearning Centre, Inc., Fremont, CA, USA). A video tutorial was provided, which explained how to use EduCase™.
The target volumes/OARs for the three cases were as follows:
Right upper lobe primary lung cancer
-
IGTVlung (internal target volume)
-
BrachialPlex
-
BronchusProx
-
Oesophagus
-
Spinal_Canal
Left pelvic bone metastasis secondary to breast cancer
-
GTVbone (gross tumour volume)
-
CTVbone (clinical target volume)
-
Femur_Head_Left
-
Rectum
Right common iliac lymph node secondary to prostate cancer
-
GTVnode
-
Bowel_Large
-
SacralPlex
Each case was accompanied by a clinical vignette (history, diagnosis, investigations and intended treatment) and instructions detailing which structures were to be delineated and on which axial computed tomography (CT) slices. CT axial slice thickness was 3 mm for the lung cancer case and 1 mm for the bone/node cases. Image co-registration performed in EduCase™ between CT and positron emission tomography-computed tomography (PET-CT) was available for all cases, magnetic resonance imaging (MRI) was available for the nodal and pelvic bone cases, and 4DCT was available for the primary lung case. For the lung cancer case, IGTVlung could be defined on the maximum intensity projection (MIP) scan with reference to the average intensity projection, 0% and 30% respiratory phases.
Pre-workshop contours, anonymised to clinician, were reviewed across the two workshops, and teaching was provided for each case including demonstration of a reference contour produced by the workshop faculty of UK consultant clinical oncologists with a combined total of consultant experience of approximately 50 years. Relevant published contouring guidance and atlases were identified during both sessions. Teaching included clinical cases to illustrate the general principles of patient selection, planning and treatment delivery of SABR for primary lung cancer and oligometastatic disease and a dedicated session for target volume/OAR contouring.
Following each workshop, participants were invited to review/adjust their contours based on the teaching. Final attempts could be submitted up to two weeks after the second workshop session, although no further contours eligible for inclusion in the study were submitted more than 3 days after the final workshop session. The faculty provided individual written feedback to participants on their post-workshop contours (this information was not provided for pre-workshop contours).
Participants were asked to provide feedback for individual speaker sessions and the overall workshop experience using a 5-point Likert scale and free text responses.
Analysis of participant contours
Each participant’s contours were compared against a reference contour, which was produced by the clinician who led each case discussion during the workshop and peer-reviewed by a second faculty member. For each structure, the specific axial CT slices to be contoured was specified; these were non-contiguous, and therefore, a volume was not obtained. Some participants had delineated contours on slices other than those specified in the case. Therefore, to ensure a fair comparison for all participants, only contours on those pre-specified slices were considered. Participants with only one set of contours (e.g., only pre-workshop contours) were excluded. Participants with two sets of submitted contours but where no changes were made to the post-workshop contours were included.
EduCase™ provides 2-dimensional (i.e., area) Dice similarity coefficient (DSC) and line domain error (LDE) values for individual slices for participant contours compared with the reference contour. DSC is an overlap measure, which measures the intersection of two contours relative to the union and ranges from 0 (zero overlap) to 1 (perfect overlap). Reference Jameson, Holloway, Vial, Vinod and Metcalfe1,Reference Vaassen, Hazelaar and Vaniqui15,Reference Vinod, Jameson, Min and Holloway16 DSC can be calculated by the following formula: Reference Dice17, Reference Rivin del Campo, Rivera and Martínez-Paredes18
where Areareference∩Areaparticipant is the intersecting overlap of the two areas and Areareference+Areaparticipant is the union of the two areas.
LDE is a distance metric within EduCase™, which measures the average absolute Euclidean distance in millimetres between corresponding points on the reference and participant contours.
Since each structure was not a volume but instead a series of individual slices, a summary measure per structure for each participant was produced. The median value of DSC/LDE for each of these slices was calculated for each of the structures contoured by each participant. These median structure DSC/LDE values for participants with both pre- and post-workshop contours were exported into IBM-SPSS Statistics for Windows version 26 (IBM Corp., Armonk, NY, USA). Each of the included contours was reviewed by two of the authors (Finbar Slevin and Romélie Rieu) to identify potential reasons for low DSC/high LDE values.
Following the workshop, the faculty reviewed participants’ post-workshop contours and provided a score (acceptable, within acceptable variation or unacceptable) and written feedback.
Statistical considerations
The median DSC/LDE and interquartile range (IQR) are presented as summary statistics for all the participants’ median structure DSC/LDE values pre- and post-workshop, since a normal distribution of data could not be assumed and also to minimise the influence of outlying values. Box and whisker plots were produced by importing data into R 3·6·1 (R Core Team, R Foundation for Statistical Computing, Vienna, Austria) using the ggplot2 library. Reference Wickham19 A statistical comparison of the median DSC/LDE for each participant’s structures pre- and post-workshop was undertaken using the Wilcoxon signed ranks test in SPSS, since this was paired data. A p value of <0·05 was taken to indicate a statistically significant difference.
Results
Fifty participants registered for the workshop and 43 submitted at least one set of contours for each of the cases. Of these 43 participants, 21 (49%), 20 (47%) and 22 (51%) participants produced pre- and post-workshop contours for the lung cancer, pelvic bone metastasis and pelvic node metastasis cases, respectively. A summary of the DSC/LDE values pre- and post-workshop and results of statistical comparisons are shown in Table 1. The spread of the median DSC/LDE values for each structure across all of the participants is illustrated in Figure 1.
LDE, line domain error; DSC, Dice similarity coefficient.
* Indicates statistically significant result.
Statistically significant improvements in DSC post-workshop were observed for each structure except for IGTVlung, Spinal_Canal, Oesophagus and BronchusProx. Only BronchusProx was associated with a worsening in median DSC post-workshop, but this difference was not statistically significant. The magnitude of increase in DSC post-workshop was often small; only GTVbone (0·08), IGTVlung (0·05) and SacralPlex (0·37) were associated with a ≥ 0·05 increase in median DSC. A median value of DSC > 0·7 and >0·8 post-workshop was observed for nine (75%) and five (42%) of the 12 structures, respectively; no median DSC value was >0·9.
Statistically significant improvements in LDE post-workshop were observed for each structure except for BronchusProx, Oesophagus and Spinal_Canal. Similar to DSC results, BronchusProx was associated with a worsening in median LDE post-workshop although this difference was not statistically significant. Again, the magnitude of improvement was often small; only GTVbone (1·7 mm), CTVbone (1·2 mm) and SacralPlex (42 mm) were associated with >1 mm reduction in median LDE post-workshop.
Some post-workshop contours were unchanged from pre-workshop: GTVnode (5 participants, 24%), Bowel_Large (10 participants, 46%), GTVbone (2 participants, 10%), CTVbone (2 participants, 10%), Rectum (8 participants, 40%), IGTVlung (8 participants, 40%), Spinal_Canal (11 participants, 52%), Oesophagus (7 participants, 33%), BronchusProx (7 participants, 33%). When the data were re-analysed without these unchanged structures, no significant differences were observed.
Regarding BrachialPlex, the case instructions did not specify that only the ipsilateral structure was to be delineated and some participants contoured bilateral structures. Similarly for Femur_Head_Left, the femoral head (i.e., excluding the femoral neck) was to be delineated but several participants delineated both the femoral head and neck and/or produced bilateral structures. Therefore, these two structures were omitted from statistical comparisons.
Regarding post-workshop contours, a summary of the feedback provided to participants is shown in Table 2. Ninety-two per cent of post-workshop contours were considered to be acceptable or within acceptable variation.
Eighty-four per cent of participants provided feedback on the workshop; of these, feedback regarding the overall workshop experience and each of the individual speakers was considered to be ‘good’ or ‘very good’ in 82% and 99% of responses, respectively. Ten per cent of feedback concerned technical issues during the workshop (e.g., sound quality).
Discussion
This study has evaluated the impact of teaching during a SABR contouring workshop for a relatively large number of participants and multiple target volume/OARs in the thorax and pelvis. The positive feedback provided by participants about the workshop suggests that it is feasible to deliver contouring teaching in a virtual capacity. This is important since, accelerated by the changes adopted during the COVID-19 pandemic, it is likely that medical training and teaching will increasingly be delivered using virtual methods. Another important message is that it is possible to continue to provide contouring training during the COVID-19 pandemic when in-person meetings are restricted. Virtual training also has the potential to reach a larger audience, without geographical restrictions, than could typically be achieved in person. We demonstrated that median DSC/LDE values for participants who completed pre- and post-workshop contours for most of the target volume/OARs were similar to the reference contour, with DSC > 0·7 for 75% of structures and LDE < 5 mm for 83% of structures. While statistically significant improvements post-workshop in DSC and LDE were observed for 50% and 58% of structures, respectively, the magnitude of improvement was small in most cases and the clinical significance of such modest improvements remains uncertain.
Although multiple studies on the effect of teaching on contouring variation have been reported, several factors make a direct comparison between these and our study challenging. Reference Vinod, Jameson, Min and Holloway16 Heterogeneity exists between studies concerning the numbers of participants, types of teaching, the structures for which contouring variation is evaluated and the types of metrics used to evaluate this variation and the use of statistical tests. Reference Jameson, Holloway, Vial, Vinod and Metcalfe1,Reference Vinod, Min, Jameson and Holloway2,Reference Vinod, Jameson, Min and Holloway16 However, systematic reviews of such studies have demonstrated that an improvement in contouring variation through teaching can be achieved. Reference Vinod, Min, Jameson and Holloway2,Reference Cacicedo, Navarro-Martin, Gonzalez-Larragan, De Bari, Salem and Dahele20 We did not observe a large increase in DSC/reduction in LDE post-workshop, and a number of limitations of our work may explain this. While participants were asked to review their pre-workshop contours after teaching and produce a post-workshop submission, only approximately half of participants did so which reduced the number for which an analysis of teaching impact could be performed. Furthermore, even for those who did re-submit a second set of contours in some cases no changes were made. Possible reasons for this could include satisfaction with pre-workshop contours, insufficient time to re-contour every structure and a lack of hands-on time during the workshop to practise/fully compare contours with the reference contour. The latter point may be particularly relevant since it has been previously suggested that active participation is more likely to improve learning during contouring workshops. Reference Eriksen, Salembier and Rivera21 Insufficient provision of practical experience was raised as a potential explanation for failure to observe improved contouring post-teaching in a previous study of a head and neck contouring programme, although there may be time/resource challenges to effectively deliver this especially for larger audiences and during the COVID-19 pandemic where face-to-face meetings are restricted. Reference D’Souza, Jaswal and Chan22 Residual differences in knowledge/ability between participants despite teaching were also suggested as a possible reason why significant improvements in prostate/rectal contouring were not observed in a previous evaluation of the impact of teaching. Reference Szumacher, Harnett and Warner23
Low DSC/high LDE values for certain structures in our study could be related to interpretation of the case instructions, especially for BrachialPlex and Femur_Head_Left. The latter structure was also only to be delineated on a single axial CT slice at the very inferior aspect of the femoral head. Different methods for contouring BrachialPlex exist, and there remains variation in practice. Reference Hall, Guiou and Lee24–26 Given the high dose per fraction used with SABR and variable reliance on MRI across different treatment centres, the UK SABR Consortium Guidance recommends contouring the subclavian/axillary vessels as a surrogate for BrachialPlex. 26 National consensus is needed, and future iterations of the recently published OAR harmonisation guidance will support this. Reference Mir, Kelly and Xiao25 For SacralPlex, some participants delineated the visible nerve using the MRI while others delineated a larger surrogate structure using the CT. Both of these may be legitimate approaches, although contouring as per the Yi et al guidance does not rely on expert MRI interpretation of nerve position and may therefore be simpler for those learning. Reference Yi, Mak and Yang27 However, unfamiliarity with the contouring of certain OARs might have contributed to low DSC/high LDE results. Although not statistically significant, the median DSC/LDE for BrochusProx appeared slightly worse post-workshop. The reason for this was not clearly apparent, and the magnitude of difference was small, but it could possibly be related to delineation uncertainties regarding the distal extent of the lobar bronchi. A visual guide to delineation of BrachialPlex, BronchusProx and SacralPlex is illustrated in Figure 2 while recommended contouring guidance/atlases are collated in Table 3. Reference Mir, Kelly and Xiao25–Reference Wright, Yom and Awan30
The metric thresholds that correlate to a minimum expected standard of contouring are uncertain but it has been previously suggested that DSC > 0·7 indicates a good level of agreement. Reference Vinod, Min, Jameson and Holloway2 However, previous studies have demonstrated discrepancies between contours considered to be acceptable based on expert review and the results of overlap measure comparisons. Reference Duke, Tan and Jensen31 In this study, 92% of the post-workshop contours were considered to be acceptable/within acceptable variation while 75% of structures had a DSC > 0·7. A range of comparison metrics exists, and each provides different information about the relationship between two contours and each has its limitations. Reference Vinod, Jameson, Min and Holloway16 A summary of commonly used metrics for contour comparison is shown in Supplementary Table; it is unclear which is the optimum metric to use. Reference Jameson, Holloway, Vial, Vinod and Metcalfe1,Reference Vinod, Min, Jameson and Holloway2,Reference Vinod, Jameson, Min and Holloway16,Reference Rivin del Campo, Rivera and Martínez-Paredes18,Reference Ackerly, Andrews, Ball, Guerrieri, Healy and Williams32–Reference Taha and Hanbury37 For this reason, it has previously been recommended that multiple metrics ideally be reported including measures of volume, overlap and distance. Reference Jameson, Holloway, Vial, Vinod and Metcalfe1,Reference Vinod, Jameson, Min and Holloway16 In this study, we only reported DSC and LDE since we did not have volumetric contouring data. It should be emphasised that DSC may provide less reliable results when applied to very small contours and it may lack discrimination for very large volumes. Reference Rivin del Campo, Rivera and Martínez-Paredes18 However, it does provide some insight into both the volumetric and spatial relationship between two contours and it is frequently reported in contouring studies. Reference Jameson, Holloway, Vial, Vinod and Metcalfe1,Reference De Bari, Dahele, Palmu, Kaylor, Schiappacasse and Guckenberger11
Quantitative concordance in target volume/OAR delineation does not necessarily equate to a clinically acceptable contour; incorrect delineation of even a small proportion of a target volume or an OAR could have profound clinical consequences, especially for SABR where tight margins, steep dose gradients and ablative doses are used. Reference Vinod, Min, Jameson and Holloway2,Reference Murray, Franks and Hanna38,Reference Trakul, Koong and Chang39 This risk means that quantitative metrics should ideally be accompanied by visual review of contours and provision of qualitative feedback, analogous to the peer review process used in clinical practice and recommended by the RCR. 7 This approach is used in clinical trials for pre-trial approval for participation or on-trial individual case evaluation. Qualitative feedback can be provided detailing acceptable/unacceptable variation from the protocol, and a similar process was used in this study for feedback on post-workshop contours. Reference Gwynne, Gilson, Dickson, McAleer and Radhakrishna3,Reference Cox, Cleves, Clementel, Miles, Staffurth and Gwynne40–Reference Gwynne, Jones and Maggs42 However, this approach may be time-consuming and an efficient/reliable method of assessment which can identify clinically relevant discrepancies is needed. Reference Jameson, Holloway, Vial, Vinod and Metcalfe1,Reference Vinod, Jameson, Min and Holloway16,Reference Duke, Tan and Jensen31
The practice of clinical oncology takes place against an increasingly complex backdrop of developments in imaging and novel methods of treatment delivery. Alongside ever-increasing pressures in healthcare services, considerable challenges exist for training and continuous professional development of trainees and consultants, respectively. Reference Walls, Hanna and McAleer9 Formal training initiatives have been established to deliver the acquisition, and maintenance, of contouring competences in an attempt to improve target volume/OAR delineation beyond what could be achieved by a single workshop in isolation. The Fellowship in Anatomic deLineation and CONtouring (FALCON) programme is a European Society of Radiation Oncology (ESTRO) initiative that provides access to e-learning contouring resources in addition to its use within dedicated workshops. Reference Rivin del Campo, Rivera and Martínez-Paredes18,Reference Eriksen, Salembier and Rivera21 The RCR ARENA and Clinical Oncology Planning Project (COPP) are some example of initiatives to increase access to expert/peer-led structured outlining training to promote consistency in target volume and OAR outlining, and facilitate robust assessment of outlining practice for all grades of Clinical Oncologists. Reference Evans, Radhakrishna and Gilson43
This study has a number of additional limitations. The workshop was limited in its time/level of interactivity because of restrictions imposed during the pandemic, and this could have impacted on the educational experience/DSC and LDE results that we observed (although participant feedback for the workshop remained positive). The same cases were used for both pre- and post-workshop contouring; while this enabled the analysis of paired data, it meant that post-workshop contour performance could have been influenced by familiarity with the case and thus extrapolation of similar levels of performance to other cases would not necessarily be guaranteed. We did not stratify by prior experience when undertaking our analysis; this was because this information was not available to the authors but it could have influenced the results that were obtained. The workshop was aimed at those without prior experience in SABR but experience with OAR delineation would have varied depending on disease site expertise. We also did not evaluate longer-term maintenance of contouring competences by provision of further cases for contouring as part of this workshop, although response rates for such interventions for a single workshop in isolation may be limited. Reference Cacicedo, Navarro-Martin, Gonzalez-Larragan, De Bari, Salem and Dahele20 It might be expected that initial educational gains could progressively negate over time, meaning that ongoing evaluation of performance during training programmes/as part of consultant continuous professional development will be required. Finally, feedback on post-workshop contours was only available for approximately half of participants included in our analyses; this affected the conclusions that can be drawn regarding the qualitative feedback but does reflect the challenge of providing such information in a timely manner.
When planning a contouring workshop, the following considerations may be relevant based on prior recommendations/the authors’ experience Reference Gwynne, Gilson, Dickson, McAleer and Radhakrishna3,Reference Vinod, Jameson, Min and Holloway16,Reference Cacicedo, Navarro-Martin, Gonzalez-Larragan, De Bari, Salem and Dahele20,Reference Duke, Tan and Jensen31,Reference Roques44 :
-
Workshop format; incorporation of time to practise contouring/re-contouring is recommended in addition to didactic teaching (the duration of the workshop should be considered in relation to this)
-
Clarity of instructions for cases to be contoured including detailed delineation guidance and specification of laterality, where relevant
-
Timely access to relevant target volume/OAR guidance/atlases
-
Provision of co-registered imaging
-
Target audience; disease sites, numbers of target volume/OARs, number/complexity of cases
-
Choice of assessment; quantitative metrics (such as volume, distance and overlap metrics) should ideally be used in conjunction with qualitative feedback. Be realistic about how much qualitative feedback can be provided in a timely manner to each participant
-
Post-workshop, provision of expert contour (where available) for participant comparison
-
Where a reference contour is used; discussion regarding variation that may occur between even ‘expert’ outliners. One approach could be to use three expert contours and demonstrate the union and overlap as the maximum and minimum acceptable contours
-
Identification of common errors/sources of variation for particular target volume/OARs
-
Highlight available e-learning resources for self-directed learning
-
Design of workshop feedback to evaluate participant confidence in contouring before/after the workshop
-
Audiovisual/technological considerations; including undertaking a ‘trial run’ of the equipment prior to the workshop, including a method of quality assurance for displayed imaging, provision for participants with disabilities and recommendations that participants ensure they have a stable Internet connection and adequate audiovisual equipment to fully participate
Conclusion
This study has demonstrated that virtual contouring training is feasible and that teaching during a virtual SABR contouring workshop for multiple target volumes/OARs was associated with some improvements in contouring variation. Virtual contouring workshops could play an important role in aiding the acquisition of contouring competences alongside formal training initiatives.
Supplementary material
For supplementary material accompanying this paper visit https://doi.org/10.1017/S1460396921000583
Acknowledgements
The authors would like to acknowledge Scott Kaylor and EduCase™ for their support of the workshop/analysis of participant contouring data. We thank Paul Elbourn and team at Profile Productions, as well as the representatives at the Royal College of Radiologists and UK SABR Consortium for their assistance in developing and organising the workshop.
Financial Support
Finbar Slevin is a Clinical Research Fellow supported by a Cancer Research UK Centres Network Accelerator Award to the ART-NET consortium (grant number A21993). Matthew Beasley and Richard Speight are supported by a Cancer Research UK Centres Network Accelerator Award to the ART-NET consortium (grant number A21993). Louise J Murray is an associate professor supported by a Yorkshire Cancer Research funded by Yorkshire Cancer Research (award number L389LM). Alison C Tree acknowledges support from Cancer Research UK (C33589/A28284) and the CRUK ICR/RMH RadNet centre. This project represents independent research supported by the National Institute for Health research (NIHR) Biomedical Research Centre at The Royal Marsden NHS Foundation Trust and the Institute of Cancer Research, London. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. Ann M Henry is supported by grants from Cancer Research UK (award number 108,036), National Institute for Health Research (NIHR) (award number 111,218), Medical Research Council (MRC) (award number 107,154) and Sir John Fisher Foundation (charity, no award number).
Conflicts of Interests
James Good reports honoraria from ViewRay and GenesisCare. Fiona McDonald reports consulting for Accuracy and AstraZeneca, speaker fees from AstraZeneca and Elekta and a research grant from Merck Sharp & Dohme. Alison C Tree declares research funding from Elekta, Accuracy and Varian, and honoraria from Elekta and Genesis healthcare.