Optimisation of gyrokinetic microstability using adjoint methods

G.O. Acton; M. Barnes; S. Newton; H. Thienpondt

doi:10.1017/S0022377824000709

Optimisation of gyrokinetic microstability using adjoint methods

Published online by Cambridge University Press: 23 September 2024

S. Newton and

G.O. Acton*: Affiliation:
Rudolf Peierls Centre For Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK United Kingdom Atomic Energy Authority, Culham Campus, Abingdon, Oxfordshire OX14 3DB, UK
M. Barnes: Affiliation:
Rudolf Peierls Centre For Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
S. Newton: Affiliation:
United Kingdom Atomic Energy Authority, Culham Campus, Abingdon, Oxfordshire OX14 3DB, UK
H. Thienpondt: Affiliation:
Laboratorio Nacional de Fusión, CIEMAT, 28040 Madrid, Spain
*: †Email address for correspondence: georgia.acton@merton.ox.ac.uk

Article contents

Abstract
Introduction
Plasma evolution equations
Adjoint method for gyrokinetics
Normalisations and magnetic geometry
Numerical implementation
Numerical results
Numerical efficiency improvement
Conclusion and discussion
Funding
Declaration of interests
Footnotes
References

Rights & Permissions

Abstract

Microinstabilities drive turbulent fluctuations in inhomogeneous, magnetised plasmas. In the context of magnetic confinement fusion devices, this leads to an enhanced transport of particles, momentum and energy, thereby degrading confinement. In this work, we describe an application of the adjoint method to efficiently determine variations of gyrokinetic linear growth rates on a general set of external parameters in the local $\delta f$-gyrokinetic model. We then offer numerical verification of this approach. When coupled with gradient-based techniques, this methodology can facilitate the optimisation process for the microstability of the confined plasmas across a high-dimensional parameter space. We present a numerical demonstration wherein the ion-temperature-gradient instability growth rate in a tokamak plasma is minimised with respect to flux surface shaping parameters. The adjoint method approach demonstrates a significant computational speed-up compared with a finite-difference gradient calculation.

Keywords

fusion plasma plasma confinement plasma instabilities

Type: Research Article
Information: Journal of Plasma Physics , Volume 90 , Issue 4 , August 2024 , 905900406

DOI: https://doi.org/10.1017/S0022377824000709 [Opens in a new window]

NASA ADS Abstract Service [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

1 Introduction

Within the plasma core of magnetic confinement fusion (MCF) devices, gradients of temperature and density act as sources of free energy that are capable of driving microscale instabilities, should they exceed critical values (see e.g. Rudakov & Sagdeev Reference Rudakov and Sagdeev1961; Drake & Lee Reference Drake and Lee1977 or Tang, Connor & Hastie Reference Tang, Connor and Hastie1980 for ion-temperature-gradient (ITG), microtearing mode or kinetic ballooning mode descriptions, respectively). Beyond such critical values the transport fluxes increase rapidly, thus requiring a large additional power input to maintain the temperature gradient. One mechanism behind these large fluxes is turbulent mixing, which results in the increased transport of particles, momentum and energy out of the device. To properly predict plasma profiles, one should thus solve the transport equations in which nonlinear, turbulent fluxes determine profile evolution. However, due to this ‘stiffness’ of the transport (Dimits et al. Reference Dimits, Bateman, Beer, Cohen, Dorland, Hammett, Kim, Kinsey, Kotschenreuther and Kritz2000), the critical gradients obtained from a linear instability analysis are often a first reasonable approximation to the experimental outcomesFootnote ¹ (Doyle, Houlberg & Kamada Reference Doyle, Houlberg and Kamada2007). Hence, it is worthwhile to determine if externally controlled experimental parameters can be chosen to optimise linear stability.

The growth rate of linear microinstabilities can be influenced by a large number of parameters, including plasma density, temperature, flow profiles and the magnetic geometry. In an idealised situation, one would determine how such growth rates, and by extension critical gradients, depend on all parameters governing the system, and then design MCF devices that optimise temperature and/or density gradients. However, analytical searches are intractable, and, because the number of tuneable parameters in modern MCF devices is large, full numerical parameter scans are often prohibitively expensive.

In this paper we address this issue by employing an adjoint approach (cf. Pironneau Reference Pironneau1974) to enable efficient calculation of local variations to the linear growth rate with respect to all parameters of interest within the local, $\delta f$-gyrokinetic model. In contrast to a finite-difference calculation, the adjoint method is essentially independent of the dimension of the parameter space, and can be used to optimise over a large number of variables at once without incurring additional computational cost beyond solving the system equations; the associated cost of the adjoint method is roughly equivalent to solving the linearised gyrokinetic system of equations twice.

The archetypal microinstability in MCF plasmas is the ITG instability (see e.g. Rudakov & Sagdeev Reference Rudakov and Sagdeev1961; Cowley, Kulsrud & Sudan Reference Cowley, Kulsrud and Sudan1991), as it has been identified as the main source of heat transport in the core of many tokamaks (Mantica et al. Reference Mantica, Angioni, Baiocchi, Baruzzo, Beurskens, Bizarro, Budny, Buratti, Casati and Challis2011). Therefore, for clarity, we shall present the optimisation process with ITG instabilities in mind. We shall apply the adjoint method to the linear gyrokinetic equation and demonstrate its utility by calculating the sensitivity of the growth rate to geometrical parameters in a tokamak, with the aim of maximising the critical ITG through shaping considerations only. However, the technique and equations derived below allow for calculations of the gradient of the linear growth rate with respect to a general set of unspecified parameters, $\boldsymbol {p}$, and could thus be applied more broadly, and to other instabilities. In general, the adjoint method we present here is agnostic to the mechanism behind the instability, and gives the gradient of the linear growth rate with respect to a set of external parameters independent of the mode type. Given that the growth rate is a continuous function this approach could thus also be applied to linear growth rates for which the instability mechanism changes.

This paper is organised as follows: in § 2 we outline the gyrokinetic-Maxwell system, including the governing equations for the evolution of the plasma distribution function and electromagnetic fields. In § 3, we present details of the gyrokinetic-adjoint system, and derive the adjoint equations as applied to a linear, low-flow, electromagnetic $\delta f$-gyrokinetic system with collisions. The aim in mind is to minimise the linear growth rate in a parameter space defined by a generic set of independent variables, $\boldsymbol {p}$, that influence the plasma evolution. We also extract the electrostatic, collisionless limit of the adjoint equations. The second portion of the paper then focuses on applying the model derived, and discussing how this method may be used to vary the plasma profile (such as temperature or density gradients) in directions favourable for fusion whilst retaining microstability. In § 4 we briefly discuss how the process is implemented using the $\delta f$-gyrokinetic code stella, with § 4.2 specialising to the case where $\boldsymbol {p}$ consists of the parameters needed to specify the local magnetic geometry within the Miller formalism (Miller et al. Reference Miller, Chu, Greene, Lin-Liu and Waltz1998); we describe the motivation behind this choice, and describe how it is implemented numerically. Section 5 covers the numerical methods employed in solving the adjoint system. Finally, § 6 presents the numerical results, including tests for an example application in which the adjoint method is utilised to vary the magnetic geometry with the aim to minimise the linear ITG growth rate.

2 Plasma evolution equations

We model the evolution of plasma fluctuations with the $\delta f$-gyrokinetic equation (Catto Reference Catto1978; Antonsen & Lane Reference Antonsen and Lane1980; Abel et al. Reference Abel, Plunk, Wang, Barnes, Cowley, Dorland and Schekochihin2013), which one derives by exploiting spatial and temporal scale separation. Our starting point is the Vlasov–Maxwell system of equations including collisions

(2.1)\begin{gather} \frac{\mathrm{d} f_{\nu}}{\mathrm{d} t} = \sum_{\nu'} C_{\nu \nu'} [\,f_{\nu}, f_{\nu'} ] , \end{gather}

(2.2)\begin{gather}\boldsymbol{\nabla} \boldsymbol{\cdot} \boldsymbol{E} = 4 {\rm \pi}\varrho , \end{gather}

(2.3)\begin{gather}\boldsymbol{\nabla} \boldsymbol{\cdot} \boldsymbol{B} = 0 , \end{gather}

(2.4)\begin{gather}\frac{1}{c} \frac{\partial{\boldsymbol{B}}}{\partial{t}} =- \boldsymbol{\nabla} \times{\boldsymbol{E}} , \end{gather}

(2.5)\begin{gather}\boldsymbol{\nabla} \times{\boldsymbol{B}} = \frac{1}{c} \left( 4 {\rm \pi}\boldsymbol{j} + \frac{\partial{\boldsymbol{E}}}{\partial{t}} \right) , \end{gather}

where $\varrho$ is the electric charge density, $\,\boldsymbol {j}$ is the current density, $\nu$ is the particle species index, $c$ the speed of light, $t$ the time, $\boldsymbol {E}$ and $\boldsymbol {B}$ are the electric and magnetic fields, respectively, and $C_{\nu \nu '} [\,f_{\nu }, f_{\nu '} ]$ accounts for the effects on species $\nu$ from Coulomb collisions with species $\nu '$. We relate the charge density and current density to the particle distribution function, $f_{\nu }$, via the velocity space integrals

(2.6)\begin{gather} \varrho = \sum_{\nu} Z_{\nu} e \int \,\mathrm{d}^3 v f_{\nu} , \end{gather}

(2.7)\begin{gather}\boldsymbol{j} = \sum_{\nu} Z_{\nu} e \int \,\mathrm{d}^3 v \boldsymbol{v} f_{\nu}, \end{gather}

with $Z_{\nu } e$ the species charge, and $e$ the proton charge. We take the frequency of fluctuations, $\omega$, to be much less than the gyrofrequency of particles, $\varOmega _{\nu }$, defined as $\varOmega _{\nu } = Z_{\nu } e B/ m_{\nu } c$, with $B$ the magnitude of the magnetic field, and $m_{\nu }$ the species mass.

We decompose quantities into their mean and fluctuating parts. The mean components determine the evolution of the background plasma, and are found by averaging over all fluctuations. We represent the average of a given quantity, $h$, over all fluctuations by $\langle h \rangle _{\text {turb} }$ and define this average as

(2.8)\begin{equation} \langle h (t) \rangle_{\text{turb}} = \frac{1}{T} \int_{t-T/2}^{t+T/2} \,\mathrm{d} t' \langle h(t') \rangle_{{\perp}} , \end{equation}

where $T$ is some intermediate time shorter than the (transport) time scale associated with mean profile evolution, and longer than time scales associated with typical fluctuations, such that $\omega ^{-1} \ll T \ll L/ v_{\text {th},i}$, where $L$ is the system size, and $v_{\text {th},\nu } \doteq \sqrt {2 T_{\nu }/m_{\nu }}$ the species thermal velocity, with $T_{\nu }$ the species temperature. Here, $\langle \boldsymbol {\cdot } \rangle _{\perp }$ is an appropriately defined spatial average over a length $l$ satisfying $\rho _{\text {ref}} \ll l \ll L$ for a surface perpendicular to the magnetic field (Abel et al. Reference Abel, Plunk, Wang, Barnes, Cowley, Dorland and Schekochihin2013). The distribution function is decomposed as $f_{\nu } = F_{\nu } + \delta f_{\nu }$ with $F_{\nu } = \langle \,f_{\nu } \rangle _{\text {turb}}$ determining the profile of the equilibrium plasma, and $\delta f_{\nu }$ the contribution from plasma fluctuations.

We take the Larmor radius of particles, $\rho _{\nu } = v_{\perp }/\varOmega _{\nu }$, to be much smaller than the system size, $L$, where $v_{\perp }$ is the magnitude of velocity in the plane perpendicular to the magnetic field. We henceforth take the reference scale to be ordered the same as the ion scale, with $\rho _{\text {ref}} = v_{\text {th}, \text {ref}} /\varOmega _{\text {ref}} \sim \rho _i$. We expand the Vlasov–Maxwell equations including collisions, (2.1)–(2.5), in the small parameter $\epsilon \sim \rho _{\star } \ll 1$, with $\rho _{\star } = \rho _{i} / L$ the ratio of the reference ion gyroradius to the system size, and equate terms of equivalent order. We order terms as follows:

(2.9)\begin{equation} \epsilon \sim \rho_{{\star}} \doteq \frac{\rho_{i}}{L} \sim \frac{\omega}{\varOmega_{i}} \sim \frac{k_{{\parallel}} }{k_{{\perp}}} \sim \frac{e \phi}{T_{\nu}} \sim \frac{\delta B}{B_0} \sim \frac{\delta f_{\nu}} {F_{0,\nu}} \ll 1 \quad \text{and} \quad k_{{\perp}} \rho_{i} \sim 1 . \end{equation}

In the above orderings we have decomposed the magnetic field into the equilibrium contribution, $\boldsymbol {B}_0$, and the magnetic fluctuations, $\delta \boldsymbol {B}$, such that the full magnetic field is $\boldsymbol {B} = \boldsymbol {B}_0 + \delta \boldsymbol {B}$. We have also introduced the parallel and perpendicular wavenumbers, $k_{\parallel } = \boldsymbol {k} \boldsymbol {\cdot } \hat {\boldsymbol {b}}$ and $\boldsymbol {k}_{\perp } = (\boldsymbol {I} - \hat {\boldsymbol {b}} \hat {\boldsymbol {b}}) \boldsymbol {\cdot } \boldsymbol {k}$, with $\hat {\boldsymbol {b}}$ the unit vector in the direction of the magnetic field, and $\boldsymbol {I}$ the identity matrix. The perturbed electric potential has been introduced as $\phi$. We also expand the mean and fluctuating components of the distribution function

(2.10)\begin{equation} \left. \begin{aligned} F_{\nu} & = F_{0,\nu} + F_{1,\nu} + F_{2,\nu} \cdots , \\ \delta f_{\nu} & = \delta f_{1,\nu} + \delta f_{2, \nu} + \cdots , \end{aligned} \right\} \end{equation}

with $F_{0,\nu } \sim f_{\nu }$, $F_{1,\nu } \sim \delta f_{1,\nu } \sim \epsilon f_{\nu }$, $F_{2,\nu } \sim \delta f_{2,\nu } \sim \epsilon ^2 f_{\nu }$ and so on. Equilibrium quantities are taken to have characteristic length scales of order $L$, and evolve slowly on the long transport time scale $\tau _{E}^{-1} \sim \epsilon ^3 \varOmega _{i}$; they are thus understood to be static during our considerations. Small-scale fluctuations, captured in $\delta f_{\nu }$, have characteristic length scales of the order $\rho _{i} \sim \epsilon L$ and frequencies $\omega \sim \epsilon \varOmega _{i}$.

The charged particles follow magnetic field lines and perform rapid gyration in the plane perpendicular to the field. We introduce $\boldsymbol {R}_{\nu }$ as the gyro-centre for species $\nu$, and $\boldsymbol {r}$ to indicate the spatial position vector for any given particle. These descriptions of particle location are related through $\boldsymbol {R}_{\nu } = \boldsymbol {r} - \boldsymbol {\rho }_{\nu } (\vartheta )$, with $\boldsymbol {\rho }_{\nu } (\vartheta ) = \hat {\boldsymbol {b}}_0 \times \boldsymbol {v} / \varOmega _{\nu }$ the velocity-dependent vector gyroradius. Our analysis need only consider $\hat {\boldsymbol {b}}_0$; hence, we shall set $\hat {\boldsymbol {b}} \equiv \hat {\boldsymbol {b}}_0$ in the remainder of this paper for notational brevity. The gyrophase, $\vartheta$, characterises this gyro-motion, and has a large associated frequency $|\dot {\vartheta }| \approx \varOmega _{\nu }$. A natural approach is to average over these fast oscillations by introducing a gyroaverage, defined through

(2.11)\begin{gather} \langle h(\boldsymbol{r}) \rangle_{\boldsymbol{R}_{\nu}} = \langle h(\boldsymbol{R}_{\nu} + \boldsymbol{\rho}_{\nu} (\vartheta)) \rangle_{\boldsymbol{R}_{\nu}} = \frac{1}{2 {\rm \pi}} \int_0^{2{\rm \pi}} h(\boldsymbol{R}_{\nu} + \boldsymbol{\rho}_{\nu} (\vartheta)) \,\mathrm{d} \vartheta , \end{gather}

(2.12)\begin{gather}\langle h(\boldsymbol{R}_{\nu}) \rangle_{\boldsymbol{r}} = \langle h(\boldsymbol{r} - \boldsymbol{\rho}_{\nu} (\vartheta)) \rangle_{\boldsymbol{r}} = \frac{1}{2 {\rm \pi}} \int_0^{2{\rm \pi}} h(\boldsymbol{r} - \boldsymbol{\rho}_{\nu} (\vartheta)) \,\mathrm{d} \vartheta , \end{gather}

where $\boldsymbol {R}_{\nu }$ and $\boldsymbol {r}$ are held constant when performing the $\vartheta$ integrations in (2.11) and (2.12), respectively.

Expanding the Vlasov–Maxwell equations, then equating terms of order $\epsilon ^{-1}$ with respect to $v_{\textrm {th},i} F_{0,i} /L$, we obtain the equation $\partial F_{0, \nu } / \partial \vartheta = 0$, which demands $F_{0,\nu }$ be independent of gyrophase. In the presence of modest collisionalityFootnote ² zeroth-order terms provide the further constraint that the equilibrium component is a Maxwellian in velocities

(2.13)\begin{equation} F_{0,\nu} := \frac{n_{\nu}}{ ({\rm \pi} v_{\text{th},\nu}^2)^{{3}/{2}}} {\rm e}^{-v^2/v_{\text{th},\nu}^2}, \end{equation}

where $n_{\nu }$ represents the species density.

Equating terms ordered $\epsilon ^1$ gives the evolution equation for first-order perturbations. This exists within a six-dimensional phase space with coordinate choice $\{\boldsymbol {R}_{\nu }, v_{\parallel }, \mu _{\nu }, \vartheta \}$. Here, $\mu _{\nu }$ is the magnetic moment defined as $\mu _{\nu } = m_{\nu } v_{\perp }^2/ 2 B$ and is a conserved quantity to the order of consideration. Gyrophase dependence is removed by gyroaveraging the full equation, reducing the phase-space dimensionality by one. As a result the gyroaveraged fluctuating distribution function of guiding centres arises, denoted using $g_{\nu } (\boldsymbol {R}_{\nu }, v_{\parallel }, \mu _{\nu } ) \doteq \langle \delta f_{\nu } \rangle _{\boldsymbol {R}_{\nu }} = \delta f_{\nu } + (Z_{\nu } e/T_{\nu })F_{0,\nu }[\phi - \langle \phi \rangle _{\boldsymbol {R}_{\nu }}\! ]$, such that $g_{\nu }$ is gyrophase independent.

We expand the total time derivative in (2.1) in terms of partial derivatives in $\{t, v_{\parallel }, \mu _{\nu }, \vartheta, \boldsymbol {R}_{\nu } \}$ – with each partial derivative taken assuming all other variables are held fixed, unless explicitly stated otherwise – and then gyroaveraging over $\vartheta$ to reduce the dimensionality of our equations from 6 to 5. We define a ‘low-flow’, or ‘drift’, ordering to be when the flow speed is ordered as $\rho _*$ small compared with the thermal speed. In this ordering the resulting linear, electromagnetic gyrokinetic equation including collisions is

(2.14)\begin{gather} \frac{\partial{g_{\nu}}}{\partial{t}} + v_{{\parallel}} \left[ \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} g_{\nu} + \frac{Z_{\nu} e}{T_{\nu}} F_{0,\nu} \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} \langle \chi \rangle_{\boldsymbol{R}_{\nu}} \right] + \boldsymbol{v}_{M,\nu} \boldsymbol{\cdot} \left[ \boldsymbol{\nabla}_{{\perp}} g_{\nu} + \frac{Z_{\nu} e}{T_{\nu}} F_{0,\nu} \boldsymbol{\nabla}_{{\perp}} \langle \chi \rangle_{\boldsymbol{R}_{\nu}} \right] \nonumber\\ - \frac{\mu_{\nu}}{m_{\nu}} \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} B_0 \frac{\partial{g_{\nu}}}{\partial{v_{{\parallel}}}} + \langle \boldsymbol{v}_{\boldsymbol{\chi}} \rangle_{\boldsymbol{R}_{\nu}} \boldsymbol{\cdot} \left. \boldsymbol{\nabla} \right|_{\mathcal{E}} F_{0,\nu} + \frac{Z_{\nu} e}{T_{\nu}} \frac{\mu_{\nu}}{m_{\nu} c } \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} B_0 F_{0,\nu} \langle A_{{\parallel}} \rangle_{\boldsymbol{R}_{\nu}} = C^{(l)}_{\nu} , \end{gather}

where $C^{(l)}_{\nu } = {\sum }_{\nu '} C^{l}_{\nu, \nu '}$ is the linearised collision operator taken to be self-adjoint such that $C^{(l) {\dagger} }_{\nu } = C^{(l)}_{\nu }$.Footnote ³ We have introduced $\chi = \phi - \boldsymbol {v} \boldsymbol {\cdot } \boldsymbol {A}/c$ as the gyrokinetic potential, with $\boldsymbol {A} = A_{\parallel } \hat {\boldsymbol {b}} + \boldsymbol {A}_{\perp }$ the fluctuating magnetic vector potential, ($\delta \boldsymbol {B} = \boldsymbol {\nabla } \times {\boldsymbol {A}}$), which has been decomposed into components parallel and perpendicular to the equilibrium magnetic field. The gradient acting on the Maxwellian appears as $\boldsymbol {\nabla } |_{\mathcal {E}}$. This indicates that the derivative has been taken at constant kinetic energy, $\mathcal {E} = m_{\nu } v^2/2$, rather than at fixed $\{v_{\parallel }, \mu _{\nu } \}$ variables, in contrast to the other spatial gradients appearing in equation (2.14). Finally, $\boldsymbol {v}_{M,\nu }$ and $\boldsymbol {v}_{\boldsymbol {\chi }}$ are the magnetic and generalised $\boldsymbol {E} \times \boldsymbol {B}$ drifts defined through

(2.15)\begin{gather} \boldsymbol{v}_{M,\nu} = \frac{1}{\varOmega_{\nu}} \hat{\boldsymbol{b}} \times \left( \frac{\mu_{\nu}}{ m_{\nu}}\boldsymbol{\nabla} B + v_{{\parallel}}^2 \boldsymbol{\kappa} \right) , \end{gather}

(2.16)\begin{gather}\boldsymbol{v}_{\chi} = \frac{c}{B} \hat{\boldsymbol{b}} \times \boldsymbol{\nabla}_{{\perp}} \chi , \end{gather}

with $\boldsymbol {\kappa } = \hat {\boldsymbol {b}} \boldsymbol {\cdot } \boldsymbol {\nabla } \hat {\boldsymbol {b}}$ the equilibrium magnetic field curvature.

The system is closed by the field equations consisting of quasineutrality, ${\sum _{\nu } Z_{\nu } \delta n_{\nu } = 0}$, with $\delta n_{\nu }$ the perturbed density, and the low-frequency Ampère's law, $\boldsymbol {\nabla } \times \delta \boldsymbol {B} = (4{\rm \pi} / c) \delta \boldsymbol {J}$, with $\delta \boldsymbol {B}$ and $\delta \boldsymbol {J}$ the fluctuating magnetic field and current density, respectively. When written in terms of the distribution function these relations become

(2.17)\begin{gather} \sum_{\nu} Z_{\nu} \int \,\mathrm{d}^3 v \left[ \langle g_{\nu} \rangle_{\boldsymbol{R}_{\nu}} + \frac{Z_{\nu} e}{T_{\nu}} F_{0,\nu} \left( \left\langle \langle \chi \rangle_{\boldsymbol{R}_{\nu}} \right\rangle_{\boldsymbol{r}} - \phi \right) \right] = 0, \end{gather}

(2.18)\begin{gather}\boldsymbol{\nabla}_{{\perp}}^2 A_{{\parallel}} - \frac{4{\rm \pi}}{c} \sum_{\nu} Z_{\nu} e \int \,\mathrm{d}^3 v v_{{\parallel}} \left[ g_{\nu} + \frac{Z_{\nu}e}{T_{\nu} c} F_{0,\nu} v_{{\parallel}} \left\langle \langle A_{{\parallel}} \rangle_{\boldsymbol{R}_{\nu}} \right\rangle_{\boldsymbol{r}} \right] = 0, \end{gather}

(2.19)\begin{gather}\boldsymbol{\nabla}_{{\perp}} \delta B_{{\parallel}} - \frac{4{\rm \pi}}{c} \sum_{\nu} Z_{\nu} e \int \,\mathrm{d}^3 v \boldsymbol{\nabla} \boldsymbol{\cdot} \left[ \left( g_{\nu} + \frac{Z_{\nu}e}{T_{\nu}} F_{0,\nu} \left\langle \langle \chi \rangle_{\boldsymbol{R}_{\nu}} \right\rangle_{\boldsymbol{r}} \right)(\hat{\boldsymbol{b}} \times \boldsymbol{v}_{{\perp}} ) \right] = 0 , \end{gather}

where $\delta B_{\parallel } = \delta \boldsymbol {B}\boldsymbol {\cdot } \hat {\boldsymbol {b}} = (\boldsymbol {\nabla } \times \boldsymbol {A}_{\perp }) \boldsymbol {\cdot } \hat {\boldsymbol {b}}$ is the parallel component of the perturbed magnetic field.

2.1 Magnetic coordinates

We can express the magnetic field using the Clebsch representation

(2.20)\begin{equation} \boldsymbol{B} = \boldsymbol{\nabla} \alpha \times \boldsymbol{\nabla} \psi , \end{equation}

where $\psi$ and $\alpha$ are the flux surface and field line labels, respectively. We choose to work in field-aligned coordinates (Beer, Cowley & Hammett Reference Beer, Cowley and Hammett1995) denoted by $(x,y,z)$, with $z$ the position along the magnetic field line and $(x,y)$ the position in the plane perpendicular to $\hat {\boldsymbol {b}}$. Together, the unit vectors $\{ \hat {\boldsymbol {x}}, \hat {\boldsymbol {y}}, \hat {\boldsymbol {b}}\}$ form a left-handed, orthonormal basis. We relate the coordinates $(x,y)$ to $(\psi, \alpha )$ using

(2.21)\begin{equation} \left. \begin{aligned} x & = \frac{{\rm d} x}{{\rm d} \psi} (\psi - \psi_0) , \\ y & = \frac{{\rm d} y}{{\rm d} \alpha} (\alpha - \alpha_0) , \end{aligned} \right\} \end{equation}

with $(\psi _0, \alpha _0)$ denoting the values at the centre of the domain.

We take the discrete Fourier transform in $\{x,y\}$ whilst retaining the real-space coordinate for the parallel direction, and evaluate nonlinear terms pseudo-spectrally in order to retain spectral accuracy in spatial derivatives when implementing the equations numerically.Footnote ⁴ The discrete Fourier transform in $\{x,y\}$

(2.22)\begin{equation} g_{\nu} (x,y,z, v_{{\parallel}}, \mu_{\nu},t) = \sum_{k_x, k_y} \hat{g}_{\boldsymbol{k},\nu} (z,v_{{\parallel}}, \mu_\nu,t) e^{{({\rm i}k_x x+ {\rm i} k_y y)}}, \end{equation}

is justified provided the condition $k_x \sim k_y \gg 1/L$ is satisfied. This translates to a requirement that the turbulent fluctuations at the edges of our domain are decorrelated, such that they may be considered statistically identical, and periodic boundary conditions can be enforced in $(x,y)$ (Beer et al. Reference Beer, Cowley and Hammett1995).

2.2 System equations

We next transform our evolution equations (2.14), and (2.17)–(2.19). The definition of $\chi$ is used to find the Fourier transform of the quantity $\langle \chi \rangle _{\boldsymbol {R}_{\nu }} = \langle \phi \rangle _{\boldsymbol {R}_{\nu }} - v_{\parallel } \langle A_{\parallel } \rangle _{\boldsymbol {R}_{\nu }}/c - \langle \boldsymbol {v}_{\perp } \boldsymbol {\cdot } \boldsymbol {A}_{\perp } \rangle _{\boldsymbol {R}_{\nu }}/c$

(2.23)\begin{equation} \mathcal{F}_{\boldsymbol{k}} \left[ \left\langle \chi \right\rangle_{\boldsymbol{R}_{\nu}} \right] \doteq \hat{\chi}_{\boldsymbol{k}, \nu} = \left[ {\rm J}_{0,\boldsymbol{k}, \nu} \hat{\phi}_{\boldsymbol{k}} - \frac{v_{{\parallel}}}{c} {\rm J}_{0,\boldsymbol{k}, \nu} \hat{A}_{{\parallel}, \boldsymbol{k}} + 2 \frac{\mu_{\nu}}{Z_{\nu} e} \frac{{\rm J}_{1,\boldsymbol{k},\nu} }{a_{\boldsymbol{k},\nu }} \delta \hat{B}_{{\parallel}, \boldsymbol{k}} \right] , \end{equation}

where we define $\mathcal {F}_{\boldsymbol {k}} [ \langle \chi \rangle _{\boldsymbol {R}_{\nu }} ] \doteq \hat {\chi }_{\boldsymbol {k}, \nu }$ to be the Fourier components of $\langle \chi \rangle _{\boldsymbol {R}_{\nu }}$. The variables $\textrm {J}_{n, \boldsymbol {k}, \nu }$ above are the $n$th-order Bessel functions of the first kind for species $\nu$. The Bessel functions arise naturally as a result of the gyroaverages that appear in (2.14), and have argument $a_{\boldsymbol {k}, \nu } = k_{\perp } v_{\perp } / \varOmega _{\nu }$. Utilising this, we can write each of our functional operators in terms of $\{ \hat {g}_{\boldsymbol {k},\nu }, \hat {\phi }_{\boldsymbol {k}}, \hat {A}_{\parallel, \boldsymbol {k}}, \delta \hat {B}_{\parallel, \boldsymbol {k}} \}$. The resulting gyrokinetic equation is

(2.24)\begin{align} \hat{G}_{\boldsymbol{k},\nu} & = \frac{\partial{\hat{g}_{\boldsymbol{k},\nu}}}{\partial{t}} + v_{{\parallel}} \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} z \left[\frac{\partial{\hat{g}_{\boldsymbol{k},\nu}}}{\partial{z}} + \frac{Z_{\nu} e}{T_{\nu}} \frac{\partial{\hat{\chi}_{\boldsymbol{k}, \nu}}}{\partial{z}} F_{0,\nu} \right] \nonumber\\ & \quad + {\rm i} \omega_{*,\boldsymbol{k},\nu} F_{0,\nu} \hat{\chi}_{\boldsymbol{k}, \nu} + {\rm i} \omega_{d,\boldsymbol{k},\nu} \left[ \hat{g}_{\boldsymbol{k},\nu} + \frac{Z_{\nu} e}{T_{\nu}} \hat{\chi}_{\boldsymbol{k}, \nu} F_{0,\nu} \right] \nonumber\\ & \quad - \frac{\mu_{\nu}}{m_{\nu}} (\hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} B_0) \frac{\partial{\hat{g}_{\boldsymbol{k},\nu}}}{\partial{v_{{\parallel}}}} + \frac{Z_{\nu} e}{T_{\nu}} \frac{\mu_{\nu}}{m_{\nu} c } (\hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} B_0) F_{0,\nu} {\rm J}_{0,\boldsymbol{k}, \nu} \hat{A}_{{\parallel}, \boldsymbol{k}} - \hat{C}_{\boldsymbol{k},\nu} [\hat{g}_{\boldsymbol{k},\nu}] \nonumber\\ & = 0. \end{align}

The drift frequencies, $\omega _{d,\nu }$ and $\omega _{*,\nu }$, correspond to the magnetic drift frequencies resulting from the gradient and curvature of the magnetic field and the diamagnetic drift respectively, and are given by

(2.25)\begin{gather} \omega_{d, \boldsymbol{k}, \nu} = \frac{1}{\varOmega_{\nu}} (v_{{\parallel}}^2 \boldsymbol{v}_{\kappa} + \mu_{\nu} \boldsymbol{v}_{\boldsymbol{\nabla} B} ) \boldsymbol{\cdot} ( k_x \boldsymbol{\nabla} x + k_y \boldsymbol{\nabla} y ) , \end{gather}

(2.26)\begin{gather}\omega_{*,\boldsymbol{k}, \nu} = \frac{c k_y } {B_{0} } \frac{{\rm d} y}{{\rm d} \alpha} \frac{{\rm d} \ln F_{0,\nu}}{{\rm d} \psi} , \end{gather}

with $\boldsymbol {v}_{\kappa } = \hat {\boldsymbol {b}} \times ( \hat {\boldsymbol {b}} \boldsymbol {\cdot } \boldsymbol {\nabla } \hat {\boldsymbol {b}} )$, and $\boldsymbol {v}_{\boldsymbol {\nabla } B} = \hat {\boldsymbol {b}} \times \boldsymbol {\nabla } B$.

The distribution function and field quantities are time-dependent functions, and their evolution equations admit normal mode solutions. For a generic set of initial conditions, the fastest growing (or slowest decaying) mode is likely to have an initial amplitude that is comparable to or smaller than other normal modes in the system. This results in an initial transient period due to the competition between these different normal modes, followed by exponential growth or decay once the fastest growing (or slowest decaying) mode has a much larger amplitude than all others. Therefore, we decompose $\hat {f}_{\boldsymbol {k}} = \sum _{j} \bar {f}_{\boldsymbol {k},j} \textrm {e}^{\gamma _{\boldsymbol {k},j} t}$Footnote ⁵ with $\gamma _{\boldsymbol {k},j} \in \mathbb {C}$ the complex frequencies, and $\bar {f}_{\boldsymbol {k},j}$ the amplitude for the $j$th normal mode.Footnote ⁶ Quasineutrality and Ampère's law, along with the linearity of the gyrokinetic equation, ensure that all fluctuating quantities for an equivalent normal mode share the same complex frequency and will thus exhibit the same time-dependent behaviour during the period of exponential growth or decay. After a sufficiently long period of time (the exact length of which will depend on both the relative growth rates and starting amplitudes of each mode) the fastest growing mode will dominate, meaning we can approximate the time dependent behaviour after large times using a single temporal mode; $\hat {f}_{\boldsymbol {k}} \approx \bar {f}_{\boldsymbol {k},0} \textrm {e}^{\gamma _{\boldsymbol {k}, 0} \tilde {t}}$, with $\mathrm {Re}(\gamma _{\boldsymbol {k},0}) > \mathrm {Re}(\gamma _{\boldsymbol {k},j})$, $\forall j$, $j \neq 0$. Using this normal mode decomposition in the gyrokinetic equation and considering behaviour beyond the transient period gives

(2.27)\begin{align} \hat{G}_{\boldsymbol{k},\nu} & = \gamma_{\boldsymbol{k}, 0} \bar{g}_{\boldsymbol{k} , \nu, 0}+ v_{{\parallel}} \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} z \left[\frac{\partial{\bar{g}_{\boldsymbol{k} , \nu, 0}}}{\partial{z}} + \frac{Z_{\nu} e}{T_{\nu}} \frac{\partial{\bar{\chi}_{\boldsymbol{k}, 0}}}{\partial{z}} F_{0,\nu} \right] \nonumber\\ & \quad + {\rm i} \omega_{*,\boldsymbol{k},\nu} F_{0,\nu} \bar{\chi}_{\boldsymbol{k}, 0} + {\rm i} \omega_{d,\boldsymbol{k},\nu} \left[ \bar{g}_{\boldsymbol{k} , \nu, 0} + \frac{Z_{\nu} e}{T_{\nu}} \bar{\chi}_{\boldsymbol{k}, 0} F_{0,\nu} \right] \nonumber\\ & \quad - \mu_{\nu} \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} B_0 \frac{\partial{\bar{g}_{\boldsymbol{k} , \nu, 0}}}{\partial{v_{{\parallel}}}} + \frac{Z_{\nu} e}{T_{\nu}} \frac{\mu_{\nu}}{m_{\nu} c } (\hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} B_0) F_{0,\nu} {\rm J}_{0,\boldsymbol{k}, \nu} \bar{A}_{{\parallel}, \boldsymbol{k}, 0} - \hat{C}_{\boldsymbol{k},\nu} [\bar{g}_{\boldsymbol{k} , \nu, 0}]. \end{align}

The corresponding transformed, normalised field equations are $\hat {Q}_{\boldsymbol {k}} = \hat {M}_{\boldsymbol {k}} = \hat {N}_{\boldsymbol {k}} = 0$, where

(2.28)\begin{align} \hat{Q}_{\boldsymbol{k}} & = \sum_{\nu} Z_{\nu} e \left\{ \frac{2 {\rm \pi}B_0}{m_{\nu} } \int{\mathrm{d}^2 v \, {\rm J}_{0,\boldsymbol{k}, \nu} \bar{g}_{\boldsymbol{k} , \nu, 0} } + \frac{Z_{\nu} e n_{\nu} }{T_{\nu}} \left( \varGamma_{0, \boldsymbol{k}, \nu} - 1 \right) \bar{\phi}_{\boldsymbol{k}, 0} + \frac{n_{\nu} }{B_0} \varGamma_{1, \boldsymbol{k}, \nu} \delta \bar{B}_{{\parallel}, \boldsymbol{k}, 0} \right\} , \end{align}

(2.29)\begin{align} \hat{M}_{\boldsymbol{k}} & =- \frac{4 {\rm \pi}}{k_{{\perp}}^2 c} \sum_{\nu} Z_{\nu} e \frac{2 {\rm \pi}B_0}{m_{\nu} } \int \,\mathrm{d}^2 v v_{{\parallel}} {\rm J}_{0,\boldsymbol{k}, \nu} \bar{g}_{\boldsymbol{k} , \nu, 0} \nonumber\\ & \quad + \left[1 + \frac{4 {\rm \pi}}{ k_{{\perp}}^2 c^2} \sum_{\nu} \frac{ (Z_{\nu} e)^2 n_{\nu}}{m_{\nu}} \varGamma_{0,\boldsymbol{k}, \nu} \right] \bar{A}_{{\parallel}, \boldsymbol{k}, 0} , \end{align}

and

(2.30) \begin{align} &\hat{N}_{\boldsymbol{k}} = 8 {\rm \pi}\sum_{\nu} \frac{2 {\rm \pi}B_0}{m_{\nu} } \int \,\mathrm{d}^2 v \frac{{\rm J}_{1, \boldsymbol{k}, \nu} }{a_{\boldsymbol{k}, \nu}} \mu_{\nu} \bar{g}_{\boldsymbol{k} , \nu, 0} + \left[ 4 {\rm \pi} \sum_{\nu} \frac{Z_{\nu} e n_{\nu}}{B_0} \varGamma_{1,\boldsymbol{k}, \nu} \right] \bar{\phi}_{\boldsymbol{k}, 0} \nonumber\\ & + \left[1+ 16 {\rm \pi}\sum_{\nu} \frac{ n_{\nu} T_{\nu}}{B_0^2} \varGamma_{2,\boldsymbol{k}, \nu} \right] \delta \bar{B}_{{\parallel}, \boldsymbol{k}, 0} , \end{align}

with $\int \,\mathrm {d}^2 v \doteq \int \,\mathrm {d} \mu \int \,\mathrm {d} v_{\parallel }$. In the above we have defined

(2.31)\begin{gather} \varGamma_{0,\boldsymbol{k}, \nu} = {\rm I}_0 (\alpha_{\boldsymbol{k},\nu} ) {\rm e}^{- \alpha_{\boldsymbol{k},\nu} }, \end{gather}

(2.32)\begin{gather}\varGamma_{1,\boldsymbol{k},\nu} = [{\rm I}_0(\alpha_{\boldsymbol{k},\nu} ) - {\rm I}_1(\alpha_{\boldsymbol{k},\nu} ) ] {\rm e}^{- \alpha_{\boldsymbol{k},\nu} }, \end{gather}

(2.33)\begin{gather}\varGamma_{2,\boldsymbol{k},\nu} = {\rm I}_1 (\alpha_{\boldsymbol{k},\nu}) {\rm e}^{- \alpha_{\boldsymbol{k},\nu} }, \end{gather}

where $\textrm {I}_0$ and $\textrm {I}_1$ are modified Bessel functions of the first kind, and $\alpha _{\boldsymbol {k}, \nu } = k_{\perp }^2 \rho _{\nu }^2 /2$.

3 Adjoint method for gyrokinetics

The adjoint-based optimisation method is a tool that, at its heart, efficiently calculates derivatives of a desired quantity with respect to a potentially large number of parameters. The associated computational cost depends only on the expense of solving both the objective functional and adjoint system of equations, and is essentially independent of the dimension of the parameter space. Adjoint methods have already been successfully applied to certain other aspects of MCF devices, such as optimising coil configurations for stellarator geometries (see, e.g. Paul et al. Reference Paul, Landreman, Bader and Dorland2018; Geraldini, Landreman & Paul Reference Geraldini, Landreman and Paul2021; Nies et al. Reference Nies, Paul, Hudson and Bhattacharjee2022). The novelty here is to apply the adjoint method to geometric optimisation and plasma microstability, which includes the complexity of the full linearised gyrokinetic system.

We now turn to the gyrokinetic equation and the objective of minimising the dominant linear growth rate, $\gamma _{\boldsymbol {k}, 0}$, with respect to a set of currently unspecified parameters $\{ p_i \}$, which we take to be the components of vector $\boldsymbol {p}$. This section outlines how one can take advantage of the adjoint method to efficiently obtain the gradient $\boldsymbol {\nabla }_{\boldsymbol {p}} \gamma _{\boldsymbol {k}, 0}$. Although a finite-difference scheme could be used to obtain such a gradient, this becomes computationally expensive when the dimension of $\boldsymbol {p}$, $\mathcal {N}_{\boldsymbol {p}}$, is large: for each gradient a finite-difference scheme demands we solve the system equations $\mathcal {N}_{\boldsymbol {p}} + 1$ times. In contrast, the adjoint method allows us to solve the system equations only once, and in exchange one must solve the set of adjoint equations, for which the cost is computationally equivalent to the original system equations.

We start by considering the general case of low-flow, linear, electromagnetic, $\delta f$-gyrokinetics including collisions, with equations (2.27)–(2.30) as the functional operators defining our system in the long time limit.

3.1 General formalism

Since our equations are limited to the linear regime, there is no coupling of different Fourier modes and it is thus possible to consider each perpendicular wavenumber individually. Given that we are also considering the post-transient limit with only one dominant growth rate, it will henceforth be assumed that only a single perpendicular wavenumber is being considered, and we will thus drop the $\boldsymbol {k}$ subscript, along with the subscript that denotes the dominant mode for the distribution function and field quantities. For notational brevity we shall also drop the overbars that appear on the distribution function and fields that denoted normal mode decomposition. Hence, everywhere $g_{\nu }$, $\phi$, $A_{\parallel }$ and $\delta B_{\parallel }$ appear it shall be assumed that they contain suppressed spatial and temporal Fourier subscripts.

Consider now a set of variables that influences our system and which exists within a parameter space spanned by all possible $\boldsymbol {p}$. We take the set $\{ p_i\}$, $i \in [1, \mathcal {N}_p ]$, to be linearly independent, with no time variation, and explore how variations within this space affect the linear growth rate. At present we need not specify which variables are denoted by $\boldsymbol {p}$, and thus derive a general set of adjoint equations for the gyrokinetic-Maxwell system above, (2.27)–(2.30).

Our objective functional, $\hat {G}_{\nu }$, is a linear functional of $\{g_{\nu }, \phi, A_{\parallel }, \delta B_{\parallel } \}$, which are coupled to the fields through the field equations, (2.28)–(2.30). When taking derivatives of $G_{\nu }$ we invariably end up with derivatives acting on all four of these variables. This is undesirable because it requires we calculate the gradients of $\hat {g}_{\nu }$ and the fields, which in turn requires us to solve the gyrokinetic system $\mathcal {N}_p + 1$ times, as discussed above. In order to eliminate these four derivatives we use something akin to the method of Lagrange multipliers and introduce four adjoint variables which multiply the corresponding constraint equations. The optimisation Lagrangian is thus

(3.1)\begin{equation} \mathcal{L} := \left\langle \hat{G}_{\nu}, \lambda_{\nu}\right\rangle_{z, v, {\nu}} + \left\langle \hat{Q}, \xi\right\rangle_{z}+ \left\langle \hat{M}, \zeta \right\rangle_{z}+ \left\langle \hat{N}, \sigma\right\rangle_{z} , \end{equation}

with the angle brackets representing inner products defined throughFootnote ⁷

(3.2)\begin{equation} \left. \begin{aligned} \left\langle a, b\right\rangle_{z} & = \int{ \frac{\mathrm{d} z }{B_0 \,\hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z} a b^* } , \quad \left\langle a, b\right\rangle_{v, {\nu}} = \sum_{\nu} \frac{2 {\rm \pi}B_0}{m_{\nu} } \int \,\mathrm{d}^2 v a b^* ,\\ \left\langle a, b\right\rangle_{z, v, {\nu}} & = \sum_{\nu} \int \frac{\mathrm{d} z}{B_0 \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z } \frac{2{\rm \pi} B_0}{m_{\nu} } \int \,\mathrm{d}^2 v a b^* . \end{aligned} \right\} \end{equation}

We have introduced the set of adjoint variables $\lambda _{\nu }$, $\xi$, $\zeta$ and $\sigma$, whose forms are to be determined.Footnote ⁸ We identify $\lambda _{\nu }$ as the adjoint variable to the distribution function, $g_{\nu }$, whereas $\xi$, $\zeta$ and $\sigma$ are adjoint to the field variables (referred to henceforth as adjoint fields). The quantities $\hat {G}_{\nu }$, $\hat {Q}$, $\hat {M}$ and $\hat {N}$ are our objective functions, and we remark that since $\hat {G}_{\nu }= \hat {Q} = \hat {M} = \hat {N} = 0$ for a consistent set of $\{ \boldsymbol {p}_0, g_{\nu }(\boldsymbol {p}_0), \phi (\boldsymbol {p}_0), A_{\parallel } (\boldsymbol {p}_0), \delta B_{\parallel } (\boldsymbol {p}_0)\}$ we have $\mathcal {L}|_{\boldsymbol {p}_0} = 0$ for all choices of adjoint variables. For later convenience we decompose the functional operators into their components that act on $g_{\nu }$, $\phi$, $A_{\parallel }$ and $\delta B_{\parallel }$ separately

(3.3)\begin{align} \hat{G}_{\nu} [\boldsymbol{p}; g_{\nu}, \phi, A_{{\parallel}}, \delta B_{{\parallel}}] & = \hat{G}_{g, \nu} [\boldsymbol{p} ; g_{\nu}] + \hat{G}_{\phi, \nu} [\boldsymbol{p} ; \phi] + \hat{G}_{A_{{\parallel}}, \nu} [\boldsymbol{p} ; A_{{\parallel}}] + \hat{G}_{B_{{\parallel}}, \nu} [\boldsymbol{p} ;\delta B_{{\parallel}}]\nonumber\\ & \quad - \hat{C}_{\nu}[\boldsymbol{p};g_{\nu}], \end{align}

(3.4)\begin{align}\hat{Q} [\boldsymbol{p} ; g_{\nu}, \phi, \delta B_{{\parallel}}] & = \left\langle \hat{Q}_{g, \nu} [\boldsymbol{p} ; g_{\nu}], \mathbb{I}\right\rangle_{v, {\nu}} + \hat{Q}_{\phi} [\boldsymbol{p} ; \phi ] + \hat{Q}_{B_{{\parallel}}} [\boldsymbol{p} ; \delta B_{{\parallel}}], \end{align}

(3.5)\begin{align}\hat{M} [\boldsymbol{p} ; g_{\nu}, A_{{\parallel}}] & = \left\langle \hat{M}_{g,\nu} [\boldsymbol{p} ; g_{\nu}] , \mathbb{I} \right\rangle_{v, {\nu}} + \hat{M}_{A_{{\parallel}}} [\boldsymbol{p} ; A_{{\parallel}} ], \end{align}

(3.6)\begin{align}\hat{N} [\boldsymbol{p} ; g_{\nu}, \phi, \delta B_{{\parallel}} ] & = \left\langle \hat{N}_{g,\nu} [\boldsymbol{p} ; g_{\nu} ], \mathbb{I}\right\rangle_{v, {\nu}} + \hat{N}_{\phi} [\boldsymbol{p} ; \phi ] + \hat{N}_{B_{{\parallel}}} [\boldsymbol{p}; \delta B_{{\parallel}}] , \end{align}

with $\mathbb {I}$ simply equal to one. Explicit expressions for these operators are given in Appendix A.

We next consider taking the gradient of (3.1) with respect to the variables in our parameter space, $\boldsymbol {p}$. We now isolate all terms multiplying derivatives of $\{g_{\nu }, \phi, A_{\parallel }, \delta B_{\parallel } \}$ so that each of their coefficients can be set to zero. We are at liberty to do this because of the freedom that exists in choosing the adjoint variables introduced in (3.1). The gradient of (3.1) is expanded using (3.3)–(3.6)

(3.7)\begin{align} \boldsymbol{\nabla}_{\boldsymbol{p}} \mathcal{L} & = \partial_{\boldsymbol{p}} \mathcal{L} + \left\langle \boldsymbol{\nabla}_{\boldsymbol{p}} g_{\nu} , \hat{G}^{{\dagger}}_{g,\nu} [\boldsymbol{p}; \lambda_{\nu}] - \hat{C}^{{\dagger}}_{\nu} [\boldsymbol{p} ; \lambda_{\nu}] + \hat{Q}^{{\dagger}}_{g,\nu} [\boldsymbol{p}; \xi] + \hat{M}^{{\dagger}}_{g,\nu} [\boldsymbol{p}; \zeta] + \hat{N}^{{\dagger}}_{g,\nu} [\boldsymbol{p}; \sigma]\right\rangle_{z, v, {\nu}} \nonumber\\ & \quad + \left\langle \boldsymbol{\nabla}_{\boldsymbol{p}} \phi , \hat{G}^{{\dagger}}_{\phi,\nu} [\boldsymbol{p} ; \lambda_{\nu}] + \hat{Q}^{{\dagger}}_{\phi} [\boldsymbol{p} ; \xi] + \hat{N}^{{\dagger}}_{\phi} [\boldsymbol{p} ; \sigma] \right\rangle_{z} + \left\langle \boldsymbol{\nabla}_{\boldsymbol{p}} A_{{\parallel}} , \hat{G}^{{\dagger}}_{A_{{\parallel}},\nu} [\boldsymbol{p} ; \lambda_{\nu}] + \hat{M}^{{\dagger}}_{A_{{\parallel}}} [\boldsymbol{p} ; \zeta]\right\rangle_{z} \nonumber\\ & \quad + \left\langle \boldsymbol{\nabla}_{\boldsymbol{p}} \delta B_{{\parallel}} , \hat{G}^{{\dagger}}_{B_{{\parallel}},\nu} [\boldsymbol{p} ; \lambda_{\nu}] + \hat{Q}^{{\dagger}}_{B_{{\parallel}}} [\boldsymbol{p} ; \xi] + \hat{N}^{{\dagger}}_{B_{{\parallel}}} [\boldsymbol{p} ; \sigma] \right\rangle_{z} + \mathcal{B} . \end{align}

Here, the daggers that appear on the functional operators denote the adjoint of those operators with respect to the relevant inner product. The partial derivative, $\partial _{\boldsymbol {p}}$, is taken at fixed $g_{\nu }$, $\phi$, $A_{\parallel }$ and $\delta B_{\parallel }$. Note that it acts on the inner product itself in addition to the terms within it to account for the $\boldsymbol {p}$-dependence of the JacobiansFootnote ⁹ and functional operators. The term ‘$\mathcal {B}$’ accounts for boundary terms that arise when we integrate by parts to invert operators such as $\left \langle \boldsymbol {\nabla }_{\boldsymbol {p}} \partial _z g_{\nu } , \lambda _{\nu }\right \rangle _{z}$ onto the adjoint variables, $\left \langle \boldsymbol {\nabla }_{\boldsymbol {p}} g_{\nu }, \partial _z \lambda _{\nu }\right \rangle _{z}$, and is given by

(3.8)\begin{align} \mathcal{B} & = \sum_{\nu} \frac{2 {\rm \pi}B_0}{m_{\nu} } \int \,\mathrm{d}^2 v v_{{\parallel}} \lambda_{\nu}^* \left[ \boldsymbol{\nabla}_{\boldsymbol{p}} g_{\nu} + \frac{Z_{\nu} e}{T_{\nu} } {\rm J}_{0,\nu} F_{0,\nu} \boldsymbol{\nabla}_{\boldsymbol{p}} \phi \right. \nonumber\\ & \quad \left. \left.- 2 \frac{Z_{\nu} e}{T_{\nu} } v_{{\parallel}} {\rm J}_{0,\nu} F_{0,\nu} \boldsymbol{\nabla}_{\boldsymbol{p}} A_{{\parallel}} + 4 \mu_{\nu} \frac{{\rm J}_{1,\nu}}{a_{\nu}} F_{0,\nu} \boldsymbol{\nabla}_{\boldsymbol{p}} \delta B_{{\parallel}} \right] \right|_{z =-\infty}^{z = \infty} \nonumber\\ & \quad \left. - \sum_{\nu} \frac{2{\rm \pi} B_0}{m_{\nu} } \int \,\mathrm{d} z \int \,\mathrm{d} \mu\, \mu_{\nu} \frac{\partial{B_0}}{\partial{z}} \lambda_{\nu}^* \boldsymbol{\nabla}_{\boldsymbol{p}} g_{\nu} \right|_{v_{{\parallel}} =- \infty}^{v_{{\parallel}} = \infty} . \end{align}

We can conveniently set these terms to zero by applying appropriate restrictions on our adjoint variables. These terms consequently define the boundary conditions we apply to our adjoint variables. The incoming boundary conditions along the magnetic field on $g_{\nu }$ are taken to be $g_{\nu }(z\rightarrow -\infty, v_{\parallel } > 0, \mu _{\nu }), g_{\nu }(z \rightarrow \infty, v_{\parallel } <0, \mu _{\nu }) \rightarrow 0$, independently of $\boldsymbol {p}$, such that $d_{\boldsymbol {p}} g_{\nu } = 0$ at these limits. We impose the boundary condition on $\lambda _{\nu }^*$ to be $\lambda _{\nu }^* (z \rightarrow -\infty, v_{\parallel } <0, \mu _{\nu }), \lambda _{\nu }^* (z \rightarrow \infty,v_{\parallel }>0, \mu _{\nu }) \rightarrow 0$ in order to eliminate the boundary term arising from the $z$ integration by parts, and hence remove the need to calculate $\boldsymbol {\nabla }_{\boldsymbol {p}} \{g_{\nu },\phi,A_{\parallel },\delta B_{\parallel } \}$ at the boundaries in $z$. The boundary term arising from integration by parts in $v_{\parallel }$ is automatically satisfied as it is assumed that $g_{\nu }(z, v_{\parallel } \rightarrow \pm \infty, \mu _{\nu }) = 0$, $\forall \{z, \mu _{\nu } \}$ independently of $\boldsymbol {p}$. However, it is convenient to impose that $\lambda _{\nu }^* (z, v_{\parallel } \rightarrow \pm \infty, \mu _{\nu }) = 0$ such that $\lambda _{\nu }^*$ and $g_{\nu }$ satisfy similar boundary conditions, whilst also ensuring $\lambda _{\nu }$ is sensibly defined and normalisable.Footnote ¹⁰ The substitution $\lambda ^{\leftrightarrow, *}_{\nu } = \lambda _{\nu }^*(z,-v_\parallel,\mu _{\nu })$ is made such that the adjoint equations more closely resemble those in the original gyrokinetic system. This redefines the $z$-boundary condition on the adjoint variable $\lambda ^{\leftrightarrow, *}_{\nu } (z \rightarrow - \infty, v_{\parallel } > 0, \mu _{\nu }), \lambda ^{\leftrightarrow, *}_{\nu } (z \rightarrow \infty, v_{\parallel } < 0, \mu _{\nu }) \rightarrow 0$, which now mirrors those satisfied by $g_{\nu }$.

Setting the remaining coefficients of $\boldsymbol {\nabla }_{\boldsymbol {p}} g_{\nu }$, $\boldsymbol {\nabla }_{\boldsymbol {p}} \phi$, $\boldsymbol {\nabla }_{\boldsymbol {p}} A_{\parallel }$ and $\boldsymbol {\nabla }_{\boldsymbol {p}} \delta B_{\parallel }$ in (3.7) equal to zero yields the constraint equations for the adjoint variables

(3.9)\begin{gather} \hat{G}^{{\dagger}}_{g,\nu} [ \boldsymbol{p} ; \lambda^{\leftrightarrow}_{\nu}] + \hat{Q}^{{\dagger}}_{g,\nu} [\boldsymbol{p} ; \xi] + \hat{M}^{{\dagger}}_{g,\nu} [\boldsymbol{p}; \zeta ] + \hat{N}^{{\dagger}}_{g,\nu} [\boldsymbol{p} ; \sigma] - \hat{C}^{{\dagger}}_{\nu} [\boldsymbol{p} ; \lambda^{\leftrightarrow}_{\nu}] = 0 , \end{gather}

(3.10)\begin{gather}\langle \hat{G}^{{\dagger}}_{\phi,\nu} [\boldsymbol{p} ; \lambda^{\leftrightarrow}_{\nu} ] \rangle_{v, {\nu}} + \hat{Q}^{{\dagger}}_{\phi} [ \boldsymbol{p} ; \xi ] + \hat{N}^{{\dagger}}_{\phi} [\boldsymbol{p} ; \sigma] = 0 , \end{gather}

(3.11)\begin{gather}\langle \hat{G}^{{\dagger}}_{A_{{\parallel}},\nu} [ \boldsymbol{p} ; \lambda^{\leftrightarrow}_{\nu}] \rangle_{v, {\nu}} + \hat{M}^{{\dagger}}_{A_{{\parallel}}} [\boldsymbol{p} ; \zeta] = 0 , \end{gather}

(3.12)\begin{gather}\langle \hat{G}^{{\dagger}}_{B_{{\parallel}},\nu} [ \boldsymbol{p} ; \lambda^{\leftrightarrow}_{\nu}] \rangle_{v, {\nu}} + \hat{Q}^{{\dagger}}_{B_{{\parallel}}} [\boldsymbol{p} ; \xi] + \hat{N}^{{\dagger}}_{B_{{\parallel}}} [ \boldsymbol{p} ; \sigma] = 0 . \end{gather}

The expressions for these adjoint operators are stated in Appendix B. The $z$-derivatives which appear in the $\hat {G}_{\nu }$ operators, under the velocity integrals, in (3.10)–(3.12) pose a potential difficulty; to calculate the adjoint fields, information is required for $\lambda _{\nu }^*$ at all $z$, for both positive and negative velocities. Given the boundary conditions on $\lambda _{\nu }$ and the propagation of information by advection this information is not readily available. This is akin to the problem faced when solving the gyrokinetic system; an incoming boundary condition is imposed on the distribution function at $z \rightarrow \pm \infty$ when $v_{\parallel } <0$ and $v_{\parallel } > 0$, respectively. The outgoing boundary information is not known a priori but must instead be solved for. Anticipating the difficulties this will create for numerical solution, we take moments of (3.9) are taken to simplify the adjoint equations. This simplifies the adjoint equations, bringing them into a form more closely resembling the that of the gyrokinetic field equation. It should be emphasised that we retain a fully kinetic treatment in doing so.

A summary of this calculation can be found in Appendix C, and the result after algebraic manipulation is

(3.13)\begin{gather} \gamma^* \lambda^{\leftrightarrow}_{\nu} + v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{z}} - \frac{\mu_{\nu}}{m_{\nu}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{v_{{\parallel}}}} - {\rm i} \omega_{d, \nu} \lambda^{\leftrightarrow}_{\nu} + Z_{\nu} e {\rm J}_{0,\nu} \xi\nonumber\\ - \frac{4 {\rm \pi}}{k_{{\perp}}^2} Z_{\nu} e {\rm J}_{0,\nu} \frac{v_{{\parallel}}}{c} \zeta + 8 {\rm \pi}\frac{{\rm J}_{1,\nu}}{a_{\nu}} \mu_{\nu} \sigma - \hat{C}_{\nu}[\lambda^{\leftrightarrow}_{\nu}] = 0 , \end{gather}

(3.14)\begin{gather}\xi + \frac{1}{\eta} \sum_{\nu} \frac{2 {\rm \pi}B_0}{m_{\nu} } \int \,\mathrm{d}^2 v \left[{\rm i} \omega_{*,\nu} + \frac{Z_{\nu}e}{T_{\nu}} \gamma^* \right] {\rm J}_{0,\nu} F_{0,\nu} \lambda^{\leftrightarrow}_{\nu} =0 , \end{gather}

(3.15)\begin{gather}\zeta - \frac{1}{k_{{\perp}}^2 } \sum_{\nu} \frac{2 {\rm \pi}B_0}{m_{\nu} } \int \,\mathrm{d}^2 v \frac{v_{{\parallel}}}{c} \left[ {\rm i} \omega_{*,\nu} + \frac{Z_{\nu}e}{T_{\nu}} \gamma^* \right] {\rm J}_{0,\nu} F_{0,\nu} \lambda^{\leftrightarrow}_{\nu} = 0 , \end{gather}

(3.16)\begin{gather}\sigma - \sum_{\nu} \frac{2 {\rm \pi}B_0}{m_{\nu} } \int \,\mathrm{d}^2 v \left( 2 \frac{\mu_{\nu} }{Z_{\nu} e}\frac{{\rm J}_{1,\nu}}{a_{\nu}} \right) \left[ {\rm i} \omega_{*,\nu} + \frac{Z_{\nu} e}{T_{\nu} } \gamma^* \right] F_{0,\nu} \lambda^{\leftrightarrow}_{\nu} = 0 , \end{gather}

with $\eta = \sum _{\nu } (Z_{\nu } e)^2 n_{\nu }/ T_{\nu }$. Noting that we can also rewrite $\hat {G}_{\nu } [\boldsymbol {p}; g_{\nu }, \phi, A_{\parallel }, \delta B_{\parallel }]= \gamma g_{\nu } + \hat {L}_{\nu } [\boldsymbol {p}; g_{\nu }, \phi, A_{\parallel }, \delta B_{\parallel }]$, and using $\boldsymbol {\nabla }_{\boldsymbol {p}} \mathcal {L} |_{\boldsymbol {p}_0} = 0$, we can rearrange equation (3.7) to obtain

(3.17)\begin{equation} \boldsymbol{\nabla}_{\boldsymbol{p}} \gamma \langle g_{\nu}, \lambda_{\nu} \rangle_{z,v, {\nu}} =- \left. \left[ \langle \partial_{\boldsymbol{p}} \hat{L}_{\nu} , \lambda_{\nu} \rangle_{z,v, {\nu}} + \langle \partial_{\boldsymbol{p}} \hat{Q}, \xi \rangle_{z} + \langle \partial_{\boldsymbol{p}} \hat{M} , \zeta \rangle_{z} + \langle \partial_{\boldsymbol{p}} \hat{N}, \sigma \rangle_{z} \right] \right|_{\boldsymbol{p}_0} , \end{equation}

where the partial derivatives have been pulled inside the inner products as the contribution arising from the Jacobian derivatives is zero by virtue of $\hat {G}_{\nu }(\boldsymbol {p}_0) = \hat {Q}(\boldsymbol {p}_0) = \hat {M}(\boldsymbol {p}_0) = \hat {N}(\boldsymbol {p}_0) = 0$.

To solve for the derivative of the linear growth rate, the following procedure is taken: the gyrokinetic equations, (2.27)–(2.30), are solved to obtain $g_{\nu }$ and $\gamma$, and then the equations (3.13)–(3.16) are used to solve for the adjoint variables. These quantities are then all fed into (3.17) to compute $\boldsymbol {\nabla }_{\boldsymbol {p}} \gamma$.

3.2 Electrostatic, collisionless limit

We consider now the electrostatic, collisionless limit, as we shall perform numerical tests in this regime. In the limit of small plasma $\beta$, meaning the ratio of plasma to magnetic pressure tends to zero, the magnetic field perturbations tend to zero. Hence, to extract the electrostatic, collisionless limit from these equations we set $A_{\parallel } = \delta B_{\parallel } = 0$, and $C_{\nu, \nu '} =0$. The linear, collisionless, electrostatic gyrokinetic equation is

(3.18)\begin{align} \hat{G}_{\nu} [\boldsymbol{p}; g_{\nu}, \phi] & = \gamma g_{\nu} + v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \left[\frac{\partial{g_{\nu}}}{\partial{z}} + \frac{Z_{\nu} e}{T_{\nu} } \frac{\partial{{\rm J}_{0,\nu} \phi}}{\partial{z}} F_{0,\nu} \right] + {\rm i} \omega_{*,\nu} {\rm J}_{0,\nu} \phi F_{0,\nu}\nonumber\\ & \quad - \frac{\mu_{\nu}}{m_{\nu} } \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{\partial{g_{\nu}}}{\partial{v_{{\parallel}}}} + {\rm i} \omega_{d,\nu} \left[ g_{\nu} + \frac{Z_{\nu} e}{T_{\nu} } {\rm J}_{0,\nu} \phi F_{0,\nu} \right], \end{align}

which is closed by the electrostatic limit of quasineutrality

(3.19)\begin{equation} \hat{Q} [\boldsymbol{p};g_{\nu}, \phi] = \sum_{\nu}Z_{\nu} e \left[ \frac{2{\rm \pi} B_0}{m_{\nu} } \int \,{\rm d}^2 v {\rm J}_{0,\nu} g_{\nu} + \frac{Z_{\nu} n_{\nu}}{T_{\nu}} (\varGamma_{0,\nu} -1) \phi \right] , \end{equation}

with $\hat {G}_{\nu }$ and $\hat {Q}$ identically zero, and $\hat {M}$, $\hat {N}$ providing no contribution in the electrostatic limit. In the above we are once again considering the long time behaviour of a single wavenumber, and have suppressed the associated subscripts.

As in § 3.1 we decompose the functional operators $\hat {G}_{\nu }[\boldsymbol {p}; g_{\nu }, \phi ]$ and $\hat {Q}[\boldsymbol {p}; g_{\nu }, \phi ]$ into components that act on $g_{\nu }$ and $\phi$ separately, with all other operators in (3.3)–(3.6) set to zero. The derivation in § 3.1 is unchanged, with the exception that some terms may now be omitted. The resulting derivative of the growth rate in the electrostatic, collisionless regime is

(3.20)\begin{equation} \boldsymbol{\nabla}_{\boldsymbol{p}} \gamma \langle g_{\nu}, \lambda_{\nu} \rangle_{z,v, {\nu}} =- \left. \left[ \langle \partial_{\boldsymbol{p}} \hat{L}_{\nu}, \lambda_{\nu} \rangle_{z,v, {\nu}} + \langle \partial_{\boldsymbol{p}} \hat{Q}, \xi \rangle_{z} \right] \right|_{\boldsymbol{p}_0}, \end{equation}

where $\hat {L}_{\nu }= \hat {G}_{\nu } - \gamma g_{\nu }$ is given by equation (3.18), and the adjoint equations are

(3.21)\begin{gather} \gamma^* \lambda^{\leftrightarrow}_{\nu} + v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{z}} - \frac{\mu_{\nu}}{m_{s}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{v_{{\parallel}}}} - {\rm i} \omega_{d, \nu} \lambda^{\leftrightarrow}_{\nu} + Z_{\nu} e {\rm J}_{0,\nu} \xi = 0 , \end{gather}

(3.22)\begin{gather}\xi + \frac{1}{\eta} \sum_{\nu} \frac{2 {\rm \pi}B_0} {m_{\nu} } \int \,\mathrm{d}^2 v \left[{\rm i} \omega_{*,\nu} + \frac{Z_{\nu} e}{T_{\nu} } \gamma^* \right] {\rm J}_{0,\nu} F_{0,\nu} \lambda^{\leftrightarrow}_{\nu} = 0 , \end{gather}

with $\lambda ^{\leftrightarrow }_{\nu } (v_{\parallel } ) = \lambda _{\nu }(-v_{\parallel } )$, and $\eta = \sum _{\nu } (Z_{\nu } e)^2 n_{\nu }/ T_{\nu }$ as before.

4 Normalisations and magnetic geometry

To evaluate (3.20), we need to solve for the set of variables $\{\gamma, g_{\nu }, \phi, \lambda ^{\leftrightarrow }_{\nu }, \xi \}$, evaluated at the unperturbed geometric values, $\boldsymbol {p}_0$. We do this by implementing and combining the adjoint system within the local $\delta f$-gyrokinetic code stella (Barnes, Parra & Landreman Reference Barnes, Parra and Landreman2019). In this section we write the gyrokinetic equations in normalised coordinates along with the corresponding normalised equations for the adjoint variables. We then introduce a specific choice for $\boldsymbol {p}$, the Miller parametrisation used to specify local magnetic equilibria in tokamaks (Miller et al. Reference Miller, Chu, Greene, Lin-Liu and Waltz1998), and detail how the adjoint method can be applied in this case.

4.1 Normalisation

4.1.1 Gyrokinetic normalisations

Here, we normalise the full electromagnetic gyrokinetic system and give the reduced electrostatic limit at the end of the section. We choose our reference quantities to be the same as in stella to aid in numerical implementation, and denote these with a subscript ‘$r$’. A reference length scale that characterises the parallel lengths in the simulation is introduced as $L=a$; for the Miller formalism stella takes $a$ to be the half diameter of the plasma volume (minor radius), at the height of the magnetic axis. The perpendicular reference length scale is taken as a reference gyroradius $\rho _r \doteq v_{\textrm {th},r}/\varOmega _r$, with $v_{\textrm {th},r} = \sqrt {2 T_r/m_r}$ the reference velocity, and $\varOmega _r = e B_r/m_{r}c$, with $T_r$, $n_r$, $B_r$ and $m_r$ being user specified quantities. The normalised variables are denoted with a tilde, and are provided in table 1.

Table 1. List of normalised parameters and variables.

Multiplying the gyrokinetic equation, rewritten in normalised coordinates, by the factor $(a^2/\rho _r v_{\textrm {th},r}) \textrm {exp}(-v^2/v_{\textrm {th},\nu }^2)/F_{0,\nu }$, we obtain the normalised low-flow, electromagnetic gyrokinetic equation taken in the long time limit

(4.1)\begin{align} \hat{G}_{\boldsymbol{k},\nu} & = \tilde{\gamma} \tilde{g}_{\boldsymbol{k}, \nu} + \tilde{v}_{{\rm th},\nu} \tilde{v}_{{\parallel}} \, \hat{\boldsymbol{b}}\boldsymbol{\cdot} \tilde{\boldsymbol{\nabla}} \tilde{z} \left[\frac{\partial{\tilde{g}_{\boldsymbol{k}, \nu}}}{\partial{\tilde{z}}} + \frac{Z_{\nu}}{\tilde{T}_{\nu}} \frac{\partial{\left\langle \tilde{\chi} \right\rangle_{\boldsymbol{k}, \nu}}}{\partial{\tilde{z}}} {\rm e}^{-\tilde{v}_{\nu}^2} \right] \nonumber\\ & \quad + {\rm i} \tilde{\omega}_{*,\boldsymbol{k},\nu} {\rm e}^{-\tilde{v}_{\nu}^2} \left\langle \tilde{\chi} \right\rangle_{\boldsymbol{k}, \nu} + {\rm i} \tilde{\omega}_{d,\boldsymbol{k},\nu} \left[ \tilde{g}_{\boldsymbol{k}, \nu} + \frac{Z_{\nu}}{\tilde{T}_{\nu}} \left\langle \tilde{\chi} \right\rangle_{\boldsymbol{k}, \nu} {\rm e}^{-\tilde{v}_{\nu}^2} \right] \nonumber\\ & \quad - \tilde{v}_{\text{th},\nu} \tilde{\mu}_{\nu} \hat{\boldsymbol{b}} \boldsymbol{\cdot} \tilde{\boldsymbol{\nabla}} \tilde{B}_0 \frac{\partial{\tilde{g}_{\boldsymbol{k}, \nu}}}{\partial{\tilde{v}_{{\parallel}}}} + 2 \frac{Z_{\nu}}{\tilde{m}_{\nu}} \tilde{\mu}_{\nu} \hat{\boldsymbol{b}} \boldsymbol{\cdot} \tilde{B}_0 {\rm e}^{-\tilde{v}_{\nu}^2} {\rm J}_{0,\boldsymbol{k}, \nu} \tilde{A}_{{\parallel}, \boldsymbol{k}} - \hat{C}_{\boldsymbol{k},\nu} [\tilde{g}_{\boldsymbol{k}, \nu}]. \end{align}

The corresponding transformed, normalised field equations are given by

(4.2)\begin{align} \hat{Q}_{\boldsymbol{k}} & = \sum_{\nu} Z_{\nu} \tilde{n}_{\nu} \left\{ \frac{2\tilde{B}_0}{\sqrt{\rm \pi}} \int{\,\mathrm{d}^2 \tilde{v} {\rm J}_{0,\boldsymbol{k}, \nu} \tilde{g}_{\boldsymbol{k},\nu, 0} } + \frac{Z_{\nu}}{T_{\nu}} \left( \varGamma_{0, \boldsymbol{k}, \nu} - 1 \right) \tilde{\phi}_{\boldsymbol{k},0} + \frac{1}{\tilde{B}_0} \varGamma_{1, \boldsymbol{k}, \nu} \delta \tilde{B}_{{\parallel}, \boldsymbol{k}, 0} \right\} , \end{align}

(4.3)\begin{align} \hat{M}_{\boldsymbol{k}} & =- \frac{\beta_r}{\left(k_{{\perp}}\rho_r\right)^2} \sum_{\nu} Z_{\nu} \tilde{n}_{\nu} \tilde{v}_{\text{th},\nu} \frac{2\tilde{B}}{\sqrt{\rm \pi}} \int \,\mathrm{d}^2 \tilde{v} \tilde{v}_{{\parallel}} {\rm J}_{0,\boldsymbol{k}, \nu} \tilde{g}_{\boldsymbol{k},\nu, 0} \nonumber\\ & \quad + \left[1+ \frac{\beta_r}{ \left(k_{{\perp}}\rho_r\right)^2} \sum_{\nu} \frac{Z_{\nu} \tilde{n}_{\nu}}{\tilde{m}_{\nu}} \varGamma_{0,\boldsymbol{k}, \nu} \right] A_{{\parallel}, \boldsymbol{k},0 } , \end{align}

(4.4)\begin{align} \hat{N}_{\boldsymbol{k}} & = 2 \beta_r \sum_{\nu} \tilde{n}_{\nu} \tilde{T}_{\nu} \frac{ 2 \tilde{B}_0}{\sqrt{\rm \pi}} \int \,\mathrm{d}^2 \tilde{v} \tilde{\mu}_{\nu} \frac{{\rm J}_{0,\boldsymbol{k}, \nu}}{\tilde{a}_{\boldsymbol{k}, \nu}} \tilde{g}_{\boldsymbol{k},\nu, 0} + \left[\frac{\beta_r}{2\tilde{B}_0} \sum_{\nu} Z_{\nu}\tilde{n}_{\nu} \varGamma_{1,\boldsymbol{k}, \nu} \right] \tilde{\phi}_{\boldsymbol{k}, 0} \nonumber\\ & \quad + \left[1+ \frac{\beta_r}{2\tilde{B}_0} \sum_{\nu} Z_{\nu}\tilde{n}_{\nu} \tilde{T}_{\nu} \varGamma_{2,\boldsymbol{k}, \nu} \right] \delta \tilde{B}_{{\parallel}, \boldsymbol{k},0} , \end{align}

with the reference plasma beta, $\beta _r = 8{\rm \pi} n_r T_r/B_r^2$.

4.1.2 Adjoint normalisation

We choose the normalisation of the adjoint variables in such a way that our optimisation Lagrangian is dimensionless. In general, this is achieved by enforcing that the dimension of the adjoint variables in § 3 satisfy

(4.5)\begin{equation} \left. \begin{array}{ll@{}} {}[\lambda_{\nu}] = [\hat{G}_{\nu}]^{-1} & [\xi] = [\hat{Q}]^{-1}, \\ {}[\zeta] = [\hat{M}]^{-1} & [\sigma ] = [\hat{N}]^{-1}, \end{array} \right\} \end{equation}

with $[A]$ denoting the dimensionality of $A$. The normalised adjoint variables should then satisfy

(4.6)\begin{equation} \left. \begin{array}{ll@{}} \tilde{\lambda}_{\nu} = \dfrac{\lambda_{\nu}} {[\lambda_{\nu}] } & \tilde{\xi} = \dfrac{\xi}{[\xi] },\\ \tilde{\zeta} = \dfrac{\zeta}{[\zeta]} & \tilde{\sigma} = \dfrac{\sigma}{ [\sigma ] } . \end{array} \right\} \end{equation}

Analysis of (2.27), and (2.28)–(2.30) gives the normalisations consistent with those chosen for the gyrokinetic variables

(4.7)\begin{equation} \left. \begin{array}{ll@{}} \tilde{\lambda}_{\nu} = \lambda_{\nu} \dfrac{F_{0,\nu}} {{\rm e}^{-\tilde{v}_{\nu}^2} } \dfrac{\rho_r v_{{\rm th},r} }{a^2} & \tilde{\xi} = \xi \dfrac{n_r e \rho_r}{ a }, \\ \tilde{\zeta} = \zeta \dfrac{ B_r \rho_r^2}{a} & \tilde{\sigma} = \sigma \dfrac{B_r \rho_r }{a} . \end{array} \right\} \end{equation}

We then multiply the adjoint equation (3.13), written in terms of normalised coordinates, by a factor of $( \rho _r/a ) F_{0,\nu }/ \textrm {e}^{-\tilde {v}_{\nu }^2}$ to obtain the electromagnetic, collisional adjoint equations in normalised units

(4.8)\begin{gather} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}} + \tilde{\gamma}^* \tilde{\lambda}^{\leftrightarrow}_{\nu} + \tilde{v}_{{\rm th},\nu} \tilde{v}_{{\parallel}} \, \hat{\boldsymbol{b}}\boldsymbol{\cdot} \tilde{\boldsymbol{\nabla}} \tilde{z} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{z}}} - \tilde{v}_{{\rm th},\nu} \, \tilde{\mu}_{\nu} \, \hat{\boldsymbol{b}} \boldsymbol{\cdot} \tilde{\boldsymbol{\nabla}} \tilde{z} \frac{\partial{{\tilde{B}_0}}}{\partial{{\tilde{z}}}} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{v}_{{\parallel}}}} - {\rm i} \tilde{\omega}_{d, \nu} \tilde{\lambda}^{\leftrightarrow}_{\nu} \nonumber\\ + Z_{\nu} \tilde{n}_{\nu} {\rm J}_{0,\nu} \tilde{\xi} - \frac{\beta_r}{(k_{{\perp}} \rho_r)^2} Z_{\nu} \tilde{n}_{\nu} \tilde{v}_{\text{th},\nu}{\rm J}_{0,\nu} \tilde{v}_{{\parallel}} \tilde{\zeta} + 2\beta_r \tilde{T}_{\nu} \tilde{\mu}_{\nu} \frac{{\rm J}_{1,\nu}}{\tilde{a}_{\nu}} \tilde{\sigma} - \hat{C}_{\nu}[\lambda_{\nu}] = 0 , \end{gather}

(4.9)\begin{gather}\tilde{\xi} + \frac{1}{\tilde{\eta}} \sum_{\nu} \frac{2\tilde{B}}{\sqrt{\rm \pi}} \int \,\mathrm{d}^2 \tilde{v} \left[{\rm i} \tilde{\omega}_{*,\nu} + \frac{Z_{\nu}}{\tilde{T}_{\nu}} \tilde{\gamma}^* \right] {\rm J}_{0,\nu} {\rm e}^{-\tilde{v}_{\nu}^2} \tilde{\lambda}^{\leftrightarrow}_{\nu} =0 , \end{gather}

(4.10)\begin{gather}\tilde{\zeta} - \sum_{\nu} \frac{2\tilde{B}}{\sqrt{\rm \pi}} \int \,\mathrm{d}^2 \tilde{v} (2 \tilde{v}_{\text{th},\nu} \tilde{v}_{{\parallel}}) \left[ {\rm i} \tilde{\omega}_{*,\nu} + \frac{Z_{\nu}}{\tilde{T}_{\nu}} \tilde{\gamma}^* \right] {\rm J}_{0,\nu} {\rm e}^{-\tilde{v}_{\nu}^2} \tilde{\lambda}^{\leftrightarrow}_{\nu} = 0 , \end{gather}

(4.11)\begin{gather}\tilde{\sigma} - \sum_{\nu} \frac{2\tilde{B}}{\sqrt{\rm \pi}} \int \,\mathrm{d}^2 \tilde{v} \left( 4\tilde{\mu}_{\nu} \frac{\tilde{T}_{\nu}}{Z_{\nu}} \frac{{\rm J}_{1,\nu}}{\tilde{a}_{\nu}} \right) \left[ {\rm i} \tilde{\omega}_{*,\nu} + \frac{Z_{\nu}}{\tilde{T}_{\nu}} \tilde{\gamma}^* \right] {\rm e}^{-\tilde{v}_{\nu}^2} \tilde{\lambda}^{\leftrightarrow}_{\nu} = 0 , \end{gather}

with $\tilde {\lambda }^{\leftrightarrow }_{\nu } = \tilde {\lambda }_{\nu }(\tilde {z},-\tilde {v}_{\parallel },\tilde {\mu }_{\nu })$, and $\tilde {\eta } = \sum _{\nu } Z_{\nu }^2 \tilde {n}_{\nu }/ \tilde {T}_{\nu }$. This change of variables has been made such that the adjoint equations more closely resemble the gyrokinetic equations, with the boundary conditions on $\tilde {\lambda }^{\leftrightarrow }_{\nu }$ mimicking those on $\tilde {g}_{\boldsymbol {k},\nu, 0}$. This is convenient as the numerical machinery from existing gyrokinetic codes can be re-used. An artificial time dependence has also been introduced in (4.8) in order to make computation easier, and we solve for the steady-state solution for $\tilde {\lambda }_{\nu }$, when $\partial _{\tilde {t}} \tilde {\lambda }_{\nu } = 0$.Footnote ¹¹

As before, we can use ${\hat {G}_{\nu } [\boldsymbol {p}; \tilde {g}_{\nu }, \tilde {\phi }, \tilde {A}_{\parallel }, \delta \tilde {B}_{\parallel }]= \tilde {\gamma } \tilde {g}_{\nu } + \hat {L}_{\nu } [\boldsymbol {p}; \tilde {g}_{\nu }, \tilde {\phi }, \tilde {A}_{\parallel }, \delta \tilde {B}_{\parallel }]}$, with the added constraint of ${ \boldsymbol {\nabla }_{\boldsymbol {p}} \mathcal {L} |_{\boldsymbol {p}_0} = 0}$ to rearrange the above and obtain

(4.12)\begin{equation} \boldsymbol{\nabla}_{\boldsymbol{p}} \tilde{\gamma} \langle \tilde{g}_{\nu}, \tilde{\lambda}_{\nu} \rangle_{\tilde{z},\tilde{v}, {\nu}} =- \left. \left[ \langle \partial_{\boldsymbol{p}} \hat{L}_{\nu}, \tilde{\lambda}_{\nu} \rangle_{\tilde{z},\tilde{v}, {\nu}} + \langle \partial_{\boldsymbol{p}} \hat{Q}, \tilde{\xi} \rangle_{\tilde{z}} + \langle \partial_{\boldsymbol{p}} \hat{M} , \tilde{\zeta} \rangle_{\tilde{z}} + \langle \partial_{\boldsymbol{p}} \hat{N} , \tilde{\sigma} \rangle_{\tilde{z}} \right] \right|_{\boldsymbol{p}_0} , \end{equation}

with closure equations provided by (4.8)–(4.11).

4.1.3 Electrostatic collisionless limit

Finally, in the electrostatic, collisionless regime, the system of equations to solve in the normalised stella coordinates is given by

(4.13)\begin{gather} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}} + \tilde{\gamma}^* \tilde{\lambda}^{\leftrightarrow}_{\nu} + \tilde{v}_{{\rm th},\nu} \tilde{v}_{{\parallel}} \, \hat{\boldsymbol{b}}\boldsymbol{\cdot} \tilde{\boldsymbol{\nabla}} \tilde{z} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{z}}} - \tilde{v}_{{\rm th},\nu} \, \tilde{\mu}_{\nu} \, \hat{\boldsymbol{b}} \boldsymbol{\cdot} \tilde{\boldsymbol{\nabla}} \tilde{z} \frac{\partial{{\tilde{B}_0}}}{\partial{{\tilde{z}}}} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{v_{{\parallel}}}} - {\rm i} \tilde{\omega}_{d, \nu} \tilde{\lambda}^{\leftrightarrow}_{\nu} \nonumber\\ + Z_{\nu} \tilde{n}_{\nu} {\rm J}_{0,\nu} {\rm e}^{-\tilde{v}_{\nu}^2} \xi = 0 , \end{gather}

(4.14)\begin{gather}\xi + \frac{1}{\tilde{\eta}} \sum_{\nu} \frac{2\tilde{B}}{\sqrt{\rm \pi}} \int \,\mathrm{d}^2 \hat{v} \left[{\rm i} \tilde{\omega}_{*,\nu} + \frac{Z_{\nu}}{\tilde{T}_{\nu}} \tilde{\gamma}^* \right] {\rm J}_{0,\nu} {\rm e}^{-\tilde{v}_{\nu}^2} \tilde{\lambda}^{\leftrightarrow}_{\nu} = 0 , \end{gather}

and

(4.15)\begin{equation} \boldsymbol{\nabla}_{\boldsymbol{p}} \tilde{\gamma} \langle \tilde{g}_{\nu}, \tilde{\lambda}_{\nu} \rangle_{\tilde{z},\tilde{v}, {\nu}} =- \left. \left[ \langle \partial_{\boldsymbol{p}} \hat{L}_{\nu}, \tilde{\lambda}_{\nu} \rangle_{\tilde{z},\tilde{v}, {\nu}} + \langle \partial_{\boldsymbol{p}} \hat{Q}, \tilde{\xi} \rangle_{\tilde{z}} \right] \right|_{\boldsymbol{p}_0} . \end{equation}

4.2 Magnetic geometry

The coefficients in equations (4.1)–(4.4), and thus the associated linear growth rates, are implicitly dependent on the magnetic geometry. For the remainder of the paper we will take $\boldsymbol {p}$ to be an appropriate set of parameters that specifies the local magnetic geometry. In particular, we will use the Miller formalism (Miller et al. Reference Miller, Chu, Greene, Lin-Liu and Waltz1998) to parameterise the magnetic field on the flux surface of interest. The Miller approach ensures that the Grad–Shafranov (Shafranov Reference Shafranov1966) equation is always satisfied locally by using a set of independent parameters to define a single flux surface in an axisymmetric device. The model equations describing the shape of the flux surface with flux label $r$ are

(4.16)\begin{gather} R(r,\theta) = R_0 (r) + r \cos \left\{ \theta + \sin(\theta) \delta(r) \right\}, \end{gather}

(4.17)\begin{gather}Z(r,\theta) = r \kappa (r) \sin (\theta). \end{gather}

Here, $R(r,\theta )$ and $Z(r,\theta )$ define the major radial and vertical locations for a given poloidal location, $\theta$, which is related to the cylindrical angle, and $\delta$ and $\kappa$ indicate the triangularity and elongation of the surface, respectively. Specifying the full set of Miller parameters provides all of the information required to compute the geometric coefficients in the gyrokinetic-adjoint system consistent with the local magnetohydrodynamic (MHD) equilibrium (though consistency with a global MHD equilibrium is not guaranteed).

The user specified input parameters used in the local version of stella are $\{ r_{\psi _0}, R_{\psi _0}, \varDelta _{\psi _0}, q_{\psi _0}, \hat {s}_{\psi _0}, \kappa _{\psi _0}, \kappa '_{\psi _0}, I_{\psi _0}, \delta _{\psi _0}, \delta '_{\psi _0}, \beta '_{\psi _0} \}$, corresponding to markers for the minor and major radii, horizontal Shafranov shift ($\varDelta _{\psi _0} = R_{\psi _0}'$), safety factor, magnetic shear ($\hat {s} \doteq (r/q) q'$), elongation and its radial derivative, the external axial discharge current (which is used as a proxy for the reference magnetic field), triangularity and its radial derivative and the radial pressure derivative, respectively, with each being specified at the flux surface of interest. Further information about how the Miller geometry is treated in stella is given in Appendix E. We henceforth set $\boldsymbol {p} := \{ r_{\psi _0}, R_{\psi _0}, \varDelta _{\psi _0}, q_{\psi _0}, \hat {s}_{\psi _0}, \kappa _{\psi _0}, \kappa '_{\psi _0}, I_{\psi _0}, \delta _{\psi _0}, \delta '_{\psi _0}, \beta '_{\psi _0} \}$.

5 Numerical implementation

The aim is to find the magnetic geometry that minimises the linear growth rate for the ITG instability, and maximises the linear critical temperature gradient across the device. This requires three distinct stages: first, the computation of $\boldsymbol {\nabla }_{\boldsymbol {p}} \gamma$ at a fixed ion temperature gradient, $T'_{i}$; second, its subsequent use in an optimisation algorithm to find the $\boldsymbol {p}$ that minimises $\gamma$ for this given $T'_{i}$; third, iteration of this procedure with variable $T'_{i}$ to find the maximum temperature gradient for which $\gamma \leq 0$ for the range of $\boldsymbol {p}$ considered.

5.1 Initial simulation

Solving, at an initial set of $\boldsymbol {p}_0$, for $\tilde {\gamma }, \tilde {g}_{\nu }$ and $\tilde {\phi }$ is achieved by allowing stella to run for a sufficiently long time, such that the solution is dominated by a single normal mode. To determine the time at which this is satisfied, we employ a convergence test: the growth rate is calculated at each time step and if the value of this is constant in time (within a specified tolerance) then the system is deemed to be converged. There are two components of the convergence test. The first is to check that the growth rates calculated at adjacent time steps are within a given tolerance of each other. The second is to perform a windowed average to check that the growth rate remains consistent over a defined number of time steps. Two windowed averages are done; one over $N_t$ time steps and one over $\mathbb {Z}(N_t/2)$ time steps. When these two window-averaged growth rates agree within a set tolerance, $\tilde {g}_{\nu }$ and $\tilde {\phi }$ are taken to have converged. The corresponding growth rate is then calculated from the windowed average.Footnote ¹²

5.2 Adjoint simulation

As previously introduced, an artificial time dependence is added to the adjoint equations to facilitate computation. The solution is found in the steady-state limit, in which the time derivative appearing in the adjoint equations goes to zero.

The resulting adjoint equations are treated in a similar way to the treatment of the usual gyrokinetic system of equations in stella. The main aim is to ensure that the parallel streaming term may be treated separately from the rest of the dynamics through operator splitting. This is done by discretising in time, and splitting the time derivative into a series of three steps

(5.1)\begin{equation} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}} = \left(\frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}}\right)_1 + \left(\frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}}\right)_2 + \left(\frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}} \right)_3 , \end{equation}

where

(5.2a)\begin{gather} \left(\frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}}\right)_1 =- \tilde{\gamma}^* \tilde{\lambda}^{\leftrightarrow}_{\nu} + {\rm i} \tilde{\omega}_{d, \nu} \tilde{\lambda}^{\leftrightarrow}_{\nu} - Z_{\nu} \tilde{n}_{\nu} \tilde{\xi} , \end{gather}

(5.2b)\begin{gather}\left(\frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}}\right)_2 = \tilde{v}_{{\rm th},\nu} \, \tilde{\mu}_{\nu} \, \hat{\boldsymbol{b}} \boldsymbol{\cdot} \tilde{\boldsymbol{\nabla}} \tilde{z} \frac{\partial{{\tilde{B}_0}}}{\partial{{\tilde{z}}}} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{v}_{{\parallel}}}}, \end{gather}

(5.2c)\begin{gather}\left(\frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{t}}} \right)_3 =- \tilde{v}_{{\rm th},\nu} \tilde{v}_{{\parallel}} \, \hat{\boldsymbol{b}}\boldsymbol{\cdot} \tilde{\boldsymbol{\nabla}} \tilde{z} \frac{\partial{\tilde{\lambda}^{\leftrightarrow}_{\nu}}}{\partial{\tilde{z}}} . \end{gather}

Analogous to stella, the terms in (5.2a) are treated explicitly using a strong-stability-preserving, third-order Runge–Kutta method (Gottlieb, Shu & Tadmor Reference Gottlieb, Shu and Tadmor2001). In combination with the operator splitting employed within stella the overall algorithm is second-order accurate in time step size, $\varDelta \tilde {t}$.

The parallel streaming and mirror terms, given by (5.2c) and (5.2b), respectively, are treated separately, due to the presence of the prefactor $\tilde {v}_{\textrm {th},\nu }$, which increases the relative amplitude of these terms when considering electron dynamics. As a result these terms have the potential to exert a stringent Courant–Friedrichs–Lewy condition on the simulation, and require a small time step to be taken in order to retain accuracy. Thus these terms are treated implicitly in time to relax this constraint.

The mirror term, (5.2b), is a simple advection equation of $\tilde {\lambda }^{\leftrightarrow }_{\nu }$ in $\tilde {v}_{\parallel }$, which is treated using a semi-Lagrange method, akin to the algorithm used to advect the distribution function in $\tilde {v}_{\parallel }$ within stella. The mirror coefficients are independent of both time and $\tilde {v}_{\parallel }$ and hence the exact characteristics of this equation are known. The interpolation in $\tilde {v}_{\parallel }$ is forth order accurate in $\tilde {v}_{\parallel }$ grid spacing, $\varDelta \tilde {v}_{\parallel }$(Barnes et al. Reference Barnes, Parra and Landreman2019).

The streaming term, (5.2c), is also an advection equation in $\tilde {z}$, which is treated using the Thompson algorithm for tri-diagonal solve. When fully centred in time the discretisation reduces to the Crank–Nicolson method, which is second-order accurate in time, and $\tilde {z}$ grid spacings, $\varDelta \tilde {t}$ and $\varDelta \tilde {z}$, respectively.

We are seeking a steady-state solution to the adjoint equations. The same convergence test is performed on $\tilde {\lambda }^{\leftrightarrow }_{\nu }$ as that performed on the distribution function in order to check that the complex growth rate has converged to zero within a given tolerance. When this is satisfied, the resulting $\tilde {\lambda }^{\leftrightarrow }_{\nu }$ is used to solve for $\tilde {\lambda }_{\nu }$. This is then stored for use in the remainder of the calculation.

5.3 Optimisation loop

Once the gradients $\boldsymbol {\nabla }_{\boldsymbol {p}} \gamma$ are obtained we use them inside an optimisation loop to find the $\boldsymbol {p}$ that minimises $\gamma$. We employ the Levenberg–Marquardt (LM) algorithm, (Transtrum & Sethna Reference Transtrum and Sethna2012), to find the local minimum. The method adopts a steepest descent behaviour when the location in parameter space is considered to be far from the minimum, and progresses towards Gauss–Newton behaviour as one approaches the minimum. This is achieved by introducing a damping factor, $\varGamma$, which is updated with each iteration. The algorithm iteratively solves the following:

(5.3)\begin{equation} [\boldsymbol{H} + \varGamma \text{diag}(\boldsymbol{H}) ] \,\mathrm{d} \boldsymbol{p} =- \boldsymbol{\nabla}_{\boldsymbol{p}} \gamma , \end{equation}

with $\boldsymbol {H} = \boldsymbol {\nabla }_{\boldsymbol {p}}^2 \gamma \approx (\boldsymbol {\nabla }_{\boldsymbol {p}} \gamma )^{{\dagger} } \boldsymbol {\nabla }_{\boldsymbol {p}} \gamma$ the Hessian matrix. When $\varGamma$ is large we have $\boldsymbol {p}_{\textrm {new}} \approx \boldsymbol {p}_{\textrm {old}} - \boldsymbol {\alpha }\boldsymbol {\cdot } \boldsymbol {\nabla }_{\boldsymbol {p}} \gamma$, with $\boldsymbol {\alpha } \doteq [\varGamma \textrm {diag} (\boldsymbol {H})]^{-1}$, which mimics the gradient-descent algorithm. However, when $\varGamma$ is small, (5.3) reduces to $\boldsymbol {p}_{\textrm {new}} \approx \boldsymbol {p}_{\textrm {old}} - \boldsymbol {H}^{{\dagger} } \boldsymbol {\cdot } \boldsymbol {\nabla }_{\boldsymbol {p}} \gamma$, which matches with the Gauss–Newton algorithm.

The LM formalism is derived using the Taylor expansion, and as such a trust region is included within the optimisation loop to ensure that the updated value of $\boldsymbol {p}$ is close enough to the previous, such that the Taylor approximation is valid within the limits for which the algorithm is applied. The trust region for $\boldsymbol {p}$ is defined via

(5.4)\begin{equation} \rho = \frac{0.5 \,\mathrm{d} \boldsymbol{p}^{{\dagger}} \boldsymbol{\cdot} \boldsymbol{H} \boldsymbol{\cdot} \mathrm{d} \boldsymbol{p}}{\mathrm{d} \boldsymbol{p} \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{p}} \gamma} < \bar{\epsilon} , \end{equation}

where $\bar {\epsilon }$ is a chosen tolerance. If $\rho > \bar {\epsilon }$ for a given $\mathrm {d} \boldsymbol {p}$ the algorithm rejects the output $p_{\textrm {new}}$ and increases the weight $\varGamma$ in an attempt to improve the accuracy of the approximation. This helps ensure that the updated value of $\boldsymbol {p}$ is a reasonable one.

It is worth noting that the LM algorithm is designed to find local minima, so there is no guarantee that the minimum obtained is the global minimum of the system.

We also emphasise that the gradient-based optimisation algorithm is independent of the adjoint method that has been developed for gyrokinetic microstability. The optimisation loop may be itself optimised to efficiently search for regions of stability given a gradient input. Different algorithms, and indeed different parameter choices within each algorithm, will yield different efficiencies in finding stable solutions. An illustrative example of this is given later in figure 1, where the step size for the optimisation loop is varied to yield two distinct paths through the parameter space using the adjoint gradient. However, this is not the focus of this paper and we will not labour on enhancing the gradient-based optimisation loop.

Figure 1. Two-dimensional parameter scan over elongation and triangularity, with the colour map indicating the amplitude of the linear growth rate. Here, $k_y = 0.68$, $k_x = 0.0$, $m_{e}/m_{i} = 2.7 \times 10^{-4}$, $T_i = T_e = 1$, $n_i = n_e = 1$, $a/L_{T_i} = a/L_{T_e} = 2.42$, $a/L_{n_i} = a/L_{n_e} = 0.81$, with $a$ the minor radius of the last closed flux surface. The path taken by the optimisation algorithm is indicated in white, with the initial point $\kappa = 1.5$, and $\delta = 0.14$. A second path, drawn in grey, is shown indicating the adjoint optimisation with a different step size for the optimisation loop.

6 Numerical results

6.1 Initial benchmark

The first numerical check we perform is to ensure that the values of $\mathrm {d}_{\boldsymbol {p}} \gamma _0$ obtained from our adjoint method agree with those obtained using a finite-difference approach. Following this, we perform a more extensive benchmark by conducting a parameter scan in the growth rate using stella for different values of the Miller parameters. We then choose an initial set of parameters and perform the adjoint-optimisation scheme described above, and overlay the results of this with the parameter scan to see how these compare, whilst checking the growth rates at each point considered by the adjoint-optimisation scheme against those obtained using a finite-difference approach. As a proof of principle we chose to vary two parameters: triangularity, $\delta$, and elongation, $\kappa$, whilst holding the other Miller variables fixed. Given that the Miller parametrisation is local to a given flux surface, this variation is not necessarily consistent with a global solution to the Grad–Shafranov equation. The choice to vary these two parameters in isolation is driven by two considerations; first, it is only intended as a proof-of-principle check of the adjoint approach so simplicity is desirable, and second, previous research has shown that maximal shaping, with large elongation and triangularity, minimises the linear ITG instability, whereas parameters such as $\kappa '$ and $\delta '$ have an order $a/R_0 \ll 1$ is the inverse aspect ratio, with $a$ and $R_0$ the scales associated with the minor and major radii, respectively (Highcock et al. Reference Highcock, Mandell, Barnes and Dorland2018). Table 2 lists the values of input equilibrium variables in the Miller geometry. These have been chosen to coincide with values used in Beeke et al. (Reference Beeke, Barnes, Romanelli, Nakata and Yoshida2020) in order to verify the qualitative behaviour found.

Table 2. List of Miller Parameters.

Equilibrium Miller parameter values used in the initial benchmark simulations.

Given these initial values of $\delta$ and $\kappa$, we determine the linear growth rates for a grid of perpendicular wavenumbers within $k_x<2.0$, $k_y<2.0$, which reveals that the most unstable mode is found at $\{k_y, k_x\} = \{0.68, 0.0\}$ for mass ratio $m_{e}/m_{i} = 2.7 \times 10^{-4}$, normalised species temperatures and densities of $T_i = T_e = 1$, $n_i = n_e = 1$ and normalised species temperature and density gradients of $a/L_{T_i} = a/L_{T_e} = 2.42$, $a/L_{n_i} = a/L_{n_e} = 0.81$.

In figure 1 we show a scan in the linear growth in elongation and triangularity obtained by running stella with the Miller parameters specified in table 2, and the values of $\kappa$ and $\delta$ adjusted accordingly. The contour colour indicates the magnitude of the growth rate, and the plot extends over a range of values that has been set by reasonable physical constraints on devices.

Figure 1 shows that increasing the elongation and triangularity of our flux surface reduces the linear growth rate, and that there exists a region of stability when the shaping is maximal, in agreement with previous work. The path taken using the adjoint method is indicated in white. At the chosen starting point, located in the region of instability, the gradient $\mathrm {d}_{\boldsymbol {p}}\gamma$ is calculated and the value of $\boldsymbol {p} = \{\delta, \kappa \}$ is updated using the previously mentioned LM method. The final point is found to be locally stable as the growth rate here is negative. The algorithm then checks a nearby point to determine if a small region of stability exists and, once this has been verified, outputs this as the final $\boldsymbol {p}$ value. In this particular case of a two-dimensional parameter scan, a finite-difference approach could utilise the same optimisation loop, requiring only three simulations, plus one additional simulation performed at the next iteration point to verify the growth rate. The adjoint method must simulate the gyrokinetic equation once, and also solve the adjoint equations – which are of similar computational cost as the gyrokinetic equation – in order to achieve the same result. In this particular low-dimensionality case, the advantage of the adjoint approach is comparatively small over using a finite-difference approach. However, the computation time required for the traditional method scales linearly with the number of parameters, so that for an $N$-dimensional parameter space, $N+1$ simulations are necessary at each iteration point, whilst the adjoint method is effectively independent of the number of parameters, and only the steady-state solutions to the gyrokinetic and adjoint equations are required. It can therefore be readily applied to a high-dimensional parameter space without any significant increase in computational cost.

A second path is plotted on figure 1 in grey. This path is taken using the same adjoint technique, but increasing the step size within the optimisation loop. When the step size is small, as with the white path, the LM algorithm more closely resembles a gradient-descent method, however, when the step size is larger, as with the grey path, the LM algorithm resembles Newton's method for gradient optimisation. The figure illustrates how the adjoint algorithm can be combined with an optimiser to quickly and efficiently converge to a stable region of parameter space.

6.2 Increasing the critical temperature gradient

We have seen that the adjoint method is a powerful technique for computing stable points in a large parameter space; figure 1 shows that it can efficiently be used inside an optimisation loop to find a minimum of the linear growth rate for a given temperature gradient. Once the adjoint-optimisation loop locates a region in the parameter space with negative or zero growth rate the local temperature gradient (or other plasma profile variable of interest) may be increased and the process repeated. Conversely, if a minimum positive (unstable) growth rate is found, the temperature gradient can be reduced to seek out the optimal shape that maximises the critical temperature gradient at which linear instability occurs to seek out microstability.

Figure 2 demonstrates an iterative use of the adjoint optimisation. Here, we have iterated the temperature gradient, and inside each temperature-gradient iteration the adjoint method is used to find a stable geometric configuration. However, this principle could be extended further by continuing to increase the temperature gradient until no region of stability is available, indicating a limiting temperature gradient that can be achieved through geometric considerations alone. Figure 2 is a demonstration that the adjoint method may be used to increase the temperature gradient, whilst retaining stability using geometry. Though we have opted to consider only two parameters in the above as a demonstration and for clarity have focused on ITG, it is possible to employ the adjoint method to optimise over a large number of geometric parameters simultaneously. Such an exploration would be expensive using traditional finite-difference methods.

Figure 2. Plots showing a parameter scan in elongation and triangularity, with a temperature gradient of $a/L_{T_i} = 3.80$, increased from the $a/L_{T_i} = 2.42$ value used in figure 1. The geometry of the initial point, located in the unstable region, is provided by final point in figure 1 and is now unstable due to the increased temperature gradient. Here, $k_y = 0.68$, $k_x = 0.0$, $m_{e}/m_{i} = 2.7 \times 10^{-4}$, $T_i = T_e = 1$, $n_i = n_e = 1$, $a/L_{T_i} = a/L_{T_e} = 3.80$, $a/L_{n_i} = a/L_{n_e} = 0.81$. Note that the colour scales used in the figures above are different than that used in figure 1. The right-hand side plot is a zoomed in figure of the left.

6.3 Negative triangularity

To form a final example here, we note there has been previous evidence that negative triangularity can offer improvements for microstability, Pochelon et al. (Reference Pochelon, Goodman, Henderson, Angioni, Behn, Coda, Hofmann, Hogge, Kirneva and Martynov1999), so we repeated the benchmark scan using negative triangularity. All values of equilibrium Miller parameters are the same as in table 2, except we now set the initial value of triangularity to $\delta = -0.14$.

The path taken using the adjoint-optimisation loop is again shown in figure 3 by the white path, starting in the dark red region at $\{\delta, \kappa \} = \{-0.14, 1.52\}$.

Figure 3. Growth rate contours for a parameter scan with negative triangularity for $k_y = 0.68$, $k_x = 0.0$ and equilibrium parameters $m_{e}/m_{i} = 2.7 \times 10^{-4}$, $T_i = T_e = 1$, $n_i = n_e = 1$, $a/L_{T_i} = a/L_{T_e} = 3.80$, $a/L_{n_i} = a/L_{n_e} = 0.81$. The white line indicates the path taken by the optimisation algorithm. The initial values of $\{\delta, \kappa \}$ are taken to be $\{-0.14, 1.52\}$.

This highlights a key feature of the gradient-based optimisation method: the solution is not unique, and the output can depend on the starting region within parameter space. See figure 4 for an illustrative example of this.

Figure 4. Growth rate contours for a parameter scan with both positive and negative triangularity for $k_y = 0.68$, $k_x = 0.0$ and equilibrium parameters $m_{e}/m_{i} = 2.7 \times 10^{-4}$, $T_i = T_e = 1$, $n_i = n_e = 1$, $a/L_{T_i} = a/L_{T_e} = 3.80$, $a/L_{n_i} = a/L_{n_e} = 0.81$. The white line indicates the two paths taken by the optimisation algorithm starting in different regions in parameter space. The initial values of $\{\delta, \kappa \}$ are taken to be $\{0.14, 1.52\}$ and $\{-0.14, 1.52\}$.

Finally, we consider increasing the temperature gradient for the negative triangularity case, and we repeat the procedure to find a stable region of parameter space. We again take the input parameters $\{\delta, \kappa \} = \{-0.9965, 2.4488\}$ to be the outputs from the previously optimised case at a temperature gradient of $a/L_{T_i} = 2.42$. Then we use the adjoint algorithm coupled with the gradient optimiser to look for a stable region of parameter space at an increased temperature gradient of $a/L_{T_i} = 3.8$. The results of this are shown in figure 5.

Figure 5. Growth rate contours for a parameter scan with negative triangularity at a temperature gradient of $a/L_{T_i}= 3.80$, increased from $a/L_{T_i}= 2.42$. The geometry of the initial point is taken as the final point in figure 3, and is now unstable due to the increased temperature gradient. The scan is performed at the same parameter values as those in figure 3 – $k_y = 0.68$, $k_x = 0.0$ $m_{e}/m_{i} = 2.7 \times 10^{-4}$, $T_i = T_e = 1$, $n_i = n_e = 1$, $a/L_{T_i} = a/L_{T_e} = 3.80$, $a/L_{n_i} = a/L_{n_e} = 0.81$. The white line indicates the path taken by the optimisation algorithm. The initial values of $\{\delta, \kappa \}$ are taken to be $\{-0.9965, 2.4488\}$.

7 Numerical efficiency improvement

Recall that for a conventional finite-difference approach method for a parameter vector, $\boldsymbol {p}$, of size $\mathcal {N}$ we require $\mathcal {N}+1$ simulations to be carried out until convergence at each of the points considered (these are shown by the white dots in the figures above) in order to compute one gradient in our parameter space. However, when using the adjoint method the same gradient may be computed for the cost of roughly two simulations – one gyrokinetic, and one adjoint – which is of similar computational cost as a gyrokinetic simulation. This approach is essentially independent of the number of parameters used, with the only additional cost incurred being that associated with calculating the partial derivatives that appear. However, these are extremely inexpensive for computers and can be calculated using one processor.

To illustrate this example we will show the progressive computational improvement of using the adjoint method, as compared with finite differences, and show how this scales favourably with increasing $\mathcal {N}$. In order for the comparisons to be fair, we shall run all simulations to a standard time of $100 a/v_{\mathrm {th}, i}$.

7.1 Numerical demonstration of improved efficiency

For the case where $\mathcal {N} =2$ we are optimising with respect to two parameters, $\{ \delta, \kappa \}$. We find that to calculate a gradient at each point in our parameter space requires a total of $\sim 4.752$ CPU (central processing unit) hours on 4 nodes, each with 48 cores on the supercomputer Marconi. The same calculation as performed using the adjoint method requires a total of ${\sim }3.168$ CPU hours. Though this is only a modest improvement, we shall show that as $\mathcal {N}$ increases the advantage of using the adjoint method, over a finite-difference scheme, becomes increasingly apparent.

Increasing the number of parameters to $\mathcal {N} = 4$, optimising over the miller parameters $\{ \varDelta, \kappa, q,\delta \}$, we find that the finite-difference approach requires ${\sim } 7.920$ CPU hours to compute a gradient at each point in parameter space. However, when computing the same gradient using the adjoint method the CPU time, to the precision of the CPU clock, is ${\sim }3.168$ CPU hours.

If we increase the number of parameters further to $\mathcal {N} = 7$, optimising over the miller parameters $\{R_0, R_{\mathrm {geo}}, \varDelta, \kappa, q, \hat {s}, \delta \}$ we find that using the finite-difference approach requires $\sim 12.672$ CPU hours to compute each gradient at a point in parameter space. When using the adjoint method to compute the same gradient, to the precision of the CPU clock, is still $\sim 3.168$ CPU hours.

Hence, we conclude that the numerical speed up is significant with increasing $\mathcal {N}$. This allows for the potential of including multiple harmonics within our shaping optimisation for very little additional computing cost.

For the example including seven parameters, we iterate the process of stepping through parameter space to find a point of stability. We take our set of initial parameters to be those given in table 2, and we simulate using a temperature and density gradient of $a/L_{T_i} = a/L_{T_e} = 2.42$ and $a/L_{n_i} = a/L_{n_e} = 0.81$, respectively. We perturb the magnetic geometry by varying $\{R_0, R_{\mathrm {geo}}, \varDelta, \kappa, q, \hat {s}, \delta \}$. Following iterations of the coupled adjoint-LM system we find a configuration that is stable when $\{R_0, I, \varDelta, \kappa, q, \hat {s}, \delta \} = \{2.979, 2.846, 0.562, 1.656, 2.085, 0.167, 0.225\}$. A cross-section of the initial, unstable configuration and the final, stable configuration, with their corresponding neighbouring flux surfaces, are shown in figure 6.

Figure 6. Plots of the flux surfaces in the poloidal cross-section. The orange surface is the flux surface at $\rho = 0.5$, and the blue and green surfaces are the two adjacent flux surfaces. The image (a) is the initial unstable configuration, before optimising. The image (b) is the stable, optimised configuration found using the adjoint-LM system.

8 Conclusion and discussion

We have derived an adjoint method tailored for local, linear gyrokinetics and demonstrated its numerical integration within the $\delta f$-gyrokinetic code stella. As a proof-of-principle case we have demonstrated the effectiveness of our adjoint method, as applied to a gyrokinetic system, by finding the geometric configuration of the magnetic field that is stable to microinstabilities. The illustrative example given has focused on increasing the temperature gradient, whilst preserving microstability; ITG instabilities are often prevalent in fusion devices due to the existence of large temperature gradients, and thus it is conceivable that geometric considerations could help mitigate their growth and improving overall efficiency.

As that the computational cost of the adjoint method remains independent of the number of parameters, its applicability to high-dimensional parameter spaces is readily apparent. The advantages become more pronounced with an expanding number of parameters, as the adjoint method outperforms traditional techniques for calculating gradients, where the computation cost scales with parameter count. This becomes especially beneficial when examining devices like stellarators, which have a large number of geometric parameters that can influence the microinstability of the confined plasma.

It is important to stress that although we have demonstrated a specific example focused on increasing the ITG, this approach can be readily extended to optimise the density gradient or other plasma properties, by adapting the overarching LM optimisation loop, without necessitating alterations to the adjoint calculation itself. Such adaptability enables the application of stella and other local, $\delta f$-gyrokinetic codes to explore the impact of shaping on various types of microinstabilities and assess how geometry can be instrumental in mitigating their growth.

A crucial point to note is that, while the numerical examples presented above have focused on the electrostatic, collisionless regime for optimisation with respect to the Miller geometry, equations (3.13)–(3.17) maintain generality. They can be applied to an electromagnetic system, including collisions, and can be optimised using any appropriate set $\{p_i\}$. It should also be stressed that the adjoint method is agnostic to the underlying mechanism driving the linear instability, and provides a gradient in the desired parameter space independently of the drive. As a result the method presented here is applicable to a wide range of microinstabilities.

It should also be emphasised that the adjoint method and the gradient-based optimisation presented are independent of one another. We have considered the simplified case of optimising the linear growth rate of a single wavenumber, $\boldsymbol{k}$. For more practical purposes one may wish to apply the adjoint method to a range of wavenumbers, and design the external optimisation routine to take an appropriately weighted average of these as the most unstable wavenumber may change as one moves within the parameter space.

Future extensions of this work may include: implementing the electromagnetic equations derived into gyrokinetic codes, applying this work to three-dimensional stellarator geometries, as well as potentially adapting the analytical framework to include global effects (radial or poloidal), whereby different $k_x$ or $k_y$ modes are coupled, and considering the time-dependent adjoint case to include effects such as $\boldsymbol {E} \times \boldsymbol {B}$ flow shear.

Acknowledgements

We are indebted to T. Adkins, W. Dorland, R. Ewart, M. Hardman and F. Parra for their helpful discussions on various topics throughout this project.

Editor P. Helander thanks the referees for their advice in evaluating this article.

Funding

This work has been carried out within the framework of the EUROfusion Consortium, funded by the European Union via the Euratom Research and Training Programme (Grant Agreement No 101052200 – EUROfusion) and from the EPSRC (grant number EP/W006839/1). Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them. Part of the simulations presented were performed on the CINECA Marconi supercomputer within the framework of the FUA37_STELTURB project.

Declaration of interests

The authors report no conflict of interest.

Appendix A. Decomposition of operators

Here, we give the definitions of the operators introduced in § 3 in (3.3)–(3.4)

(A1)\begin{equation} \left. \begin{aligned} \hat{G}_{g,\nu} [\boldsymbol{p}; g_{\nu}] & = \gamma g_{\nu} + v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{g_{\nu}}}{\partial{z}}- \frac{\mu}{m_{\nu}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{\partial{g_{\nu}}}{\partial{v_{{\parallel}}}} + {\rm i} \omega_{d,\nu} g_{\nu},\\ \hat{G}_{\phi,\nu} [\boldsymbol{p}; \phi] & = v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{Z_{\nu} e }{T_{\nu}} F_{0,\nu} \frac{\partial{{\rm J}_{0,\nu} \phi }}{\partial{z}} + {\rm i} \frac{Z_{\nu} e }{T_{\nu}} \omega_{d,\nu} {\rm J}_{0,\nu} F_{0,\nu} \phi + {\rm i} \omega_{*,\nu} {\rm J}_{0,\nu} F_{0,\nu} \phi,\\ \hat{G}_{A_{{\parallel}},\nu} [\boldsymbol{p}; A_{{\parallel}}] & =- \frac{v_{{\parallel}}^2}{c} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{Z_{\nu} e }{T_{\nu}} F_{0,\nu} \frac{\partial{{\rm J}_{0,\nu} A_{{\parallel}} }}{\partial{z}} - {\rm i} \frac{Z_{\nu} e }{T_{\nu}} \omega_{d,\nu} {\rm J}_{0,\nu} F_{0,\nu} \frac{v_{{\parallel}} } {c} A_{{\parallel}} - {\rm i} \omega_{*,\nu} {\rm J}_{0,\nu} F_{0,\nu} \frac{v_{{\parallel}}}{c} A_{{\parallel}} \\ & \quad + \frac{Z_{\nu} e}{T_{\nu}} \frac{\mu_{\nu}}{m_{\nu} c } \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} B_0 F_{0,\nu} {\rm J}_{0,\nu} A_{{\parallel}} , \\ \hat{G}_{B_{{\parallel}},\nu} [\boldsymbol{p}; \delta B_{{\parallel}} ] & = 2 v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z F_{0,\nu} \frac{\partial{}}{\partial{z}} \left( \frac{\mu_{\nu}}{T_{\nu}} \frac{{\rm J}_{1,\nu} }{a_{\nu }} \delta B_{{\parallel}} \right) + 2 {\rm i} \omega_{d,\nu} \frac{\mu_{\nu}}{T_{\nu}} \frac{{\rm J}_{1, \nu} }{a_{\nu }} F_{0,\nu} \delta B_{{\parallel}}\\ & \quad + 2 {\rm i} \omega_{*,\nu} \frac{\mu_{\nu}}{Z_{\nu} e} \frac{{\rm J}_{1, \nu} }{a_{\nu }} F_{0,\nu} \delta B_{{\parallel}} , \\ \hat{Q}_{g,\nu} [\boldsymbol{p}; g_{\nu}] & = Z_{\nu} e {\rm J}_{0,\nu} g_{\nu}, \\ \hat{Q}_{\phi} [\boldsymbol{p}; \phi] & = \sum_{\nu} \frac{(Z_{\nu} e)^2 n_{\nu}}{T_{\nu}} (\varGamma_{0,\nu} -1 ) \phi , \\ \hat{Q}_{B_{{\parallel}}} [\boldsymbol{p}; \delta B_{{\parallel}}] & = \sum_{\nu} Z_{\nu} e n_{\nu} \frac{\varGamma_{1,\nu}}{B_0} \delta B_{{\parallel}}, \\ \hat{M}_{g,\nu} [\boldsymbol{p}; g_{\nu}] & =- \frac{4 {\rm \pi}}{k_{{\perp}}^2} \frac{v_{{\parallel}} }{c} Z_{\nu} e {\rm J}_{0,\nu} g_{\nu}, \\ \hat{M}_{A_{{\parallel}}} [\boldsymbol{p}; A_{{\parallel}}] & = \left[1 + \frac{4 {\rm \pi}}{ k_{{\perp}}^2 c^2} \sum_{\nu} \frac{ (Z_{\nu} e)^2 n_{\nu}}{m_{\nu}} \varGamma_{0, \nu} \right] A_{{\parallel}} , \\ \hat{N}_{g,\nu} [\boldsymbol{p}; g_{\nu}] & = 8 {\rm \pi}\frac{{\rm J}_{1,\nu} }{a_{\nu} } \mu_{\nu} g_{\nu} , \\ \hat{N}_{\phi} [\boldsymbol{p}; \phi] & = \left[ 4 {\rm \pi}\sum_{\nu} \frac{Z_{\nu} e n_{\nu}}{B_0} \varGamma_{1, \nu} \right] \phi , \\ \hat{N}_{B_{{\parallel}}} [\boldsymbol{p}; \delta B_{{\parallel}}] & = \left[1+ 16 {\rm \pi}\sum_{\nu} \frac{ n_{\nu} T_{\nu}}{B_0^2} \varGamma_{2, \nu} \right] \delta B_{{\parallel}}, \end{aligned} \right\} \end{equation}

and $\hat {C}_{\nu,\nu '}$ is an appropriate collision operator.

Appendix B. Adjoints of operators

The adjoint operators appearing in equations (3.9)–(3.12) are obtained by performing integration by parts wherever a derivative acts on the distribution function or a field variable. After performing the change of variables $\tilde {v}_{\parallel } \rightarrow - \tilde {v}_{\parallel }$ the adjoint operators take the form

(B1)\begin{equation} \left. \begin{aligned} \hat{G}_{g,\nu}^{{\dagger}} [\boldsymbol{p}; \lambda^{\leftrightarrow}_{\nu}] & = \gamma^* \lambda^{\leftrightarrow}_{\nu} + v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{z}}- \frac{\mu_{\nu} }{ m_\nu} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{v_{{\parallel}}}} - {\rm i} \omega_{d,\nu} \lambda^{\leftrightarrow}_{\nu} , \\ \hat{G}_{\phi,\nu}^{{\dagger}} [\boldsymbol{p}; \lambda^{\leftrightarrow}_{\nu}] & = \frac{Z_{\nu} e} {T_{\nu} } {\rm J}_{0,\nu} F_{0,\nu} \hat{S}_{\nu},\\ \hat{G}_{A_{{\parallel}},\nu}^{{\dagger}} [\boldsymbol{p}; \lambda^{\leftrightarrow}_{\nu}] & = \frac{Z_{\nu} e} {T_{\nu} } {\rm J}_{0,\nu} F_{0,\nu} \frac{v_{{\parallel}}}{c } \hat{S}_{\nu} + \frac{Z_{\nu} e}{T_{\nu}} \frac{\mu_{\nu}}{m_{\nu} c } \hat{\boldsymbol{b}} \boldsymbol{\cdot} \boldsymbol{\nabla} B_0 F_{0,\nu} {\rm J}_{0,\nu} \lambda^{\leftrightarrow}_{\nu} , \\ \hat{G}_{B_{{\parallel}},\nu}^{{\dagger}} [\boldsymbol{p};\lambda^{\leftrightarrow}_{\nu}] & = 2 \frac{ \mu_{\nu}}{T_{\nu} } \frac{{\rm J}_{1,\nu}}{\tilde{a}_{\nu}} F_{0,\nu} \hat{S}_{\nu}, \\ \hat{Q}_{g,\nu}^{{\dagger}} [\boldsymbol{p}; \xi] & = Z_{\nu} e {\rm J}_{0,\nu} F_{0,\nu} \xi, \\ \hat{Q}_{\phi}^{{\dagger}} [\boldsymbol{p}; \xi] & = \sum_{\nu} \frac{(Z_{\nu} e)^2 n_{\nu}}{T_{\nu}} \left(\varGamma_{0,\nu} -1 \right) \xi, \\ \hat{Q}_{B_{{\parallel}}}^{{\dagger}} [\boldsymbol{p}; \xi] & = 4 {\rm \pi}\sum_{\nu} \frac{Z_{\nu} e n_{\nu} } {B_0} \varGamma_{1,\nu} \xi , \\ \hat{M}_{g,\nu}^{{\dagger}} [\boldsymbol{p}; \zeta] & =- \frac{4 {\rm \pi}}{k_{{\perp}}^2} \frac{v_{{\parallel}} }{c} Z_{\nu} e {\rm J}_{0,\nu} \zeta, \\ \hat{M}_{A_{{\parallel}}}^{{\dagger}} [\boldsymbol{p}; \zeta] & = \left[1 + \frac{4 {\rm \pi}}{ k_{{\perp}}^2 c^2} \sum_{\nu} \frac{ (Z_{\nu} e)^2 n_{\nu}}{m_{\nu}} \varGamma_{0, \nu} \right] \zeta, \\ \hat{N}_{g,\nu}^{{\dagger}} [\boldsymbol{p}; \sigma] & = 8 {\rm \pi}\frac{{\rm J}_{1,\nu} }{a_{\nu} } \mu_{\nu} \sigma , \\ \hat{N}_{\phi}^{{\dagger}} [\boldsymbol{p}; \sigma] & = \left[ 4 {\rm \pi}\sum_{\nu} \frac{Z_{\nu} e n_{\nu}}{B_0} \varGamma_{1, \nu} \right] \sigma, \\ \hat{N}_{B_{{\parallel}}}^{{\dagger}} [\boldsymbol{p}; \sigma] & = \left[1+ 16 {\rm \pi}\sum_{\nu} \frac{ n_{\nu} T_{\nu}}{B_0^2} \varGamma_{2, \nu} \right] \sigma, \\ \hat{C}_{\nu}^{{\dagger}} [\boldsymbol{p}; \lambda^{\leftrightarrow}_{\nu}] & = \hat{C}_{\nu} [\boldsymbol{p}; \lambda^{\leftrightarrow}_{\nu}], \end{aligned} \right\} \end{equation}

where we have defined

(B2)\begin{equation} \hat{S} [\,\boldsymbol{p}; \lambda^{\leftrightarrow}_{\nu}] = v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{z}} - {\rm i} \omega_{d,\nu} \lambda^{\leftrightarrow}_{\nu} - {\rm i} \omega_{*,\nu} \lambda^{\leftrightarrow}_{\nu}. \end{equation}

Appendix C. Simplifying adjoint equations

Consider taking the following moments of (3.9):

(C1a–c)\begin{equation} \left\langle \frac{Z_{\nu} e}{T_{\nu}} {\rm J}_{0,\nu} F_{0,\nu}, \boldsymbol{\cdot} \right\rangle_{v,\nu} , \quad \left\langle \frac{Z_{\nu} e}{T_{\nu}} {\rm J}_{0,\nu} F_{0,\nu} \frac{v_{{\parallel}}}{c} , \boldsymbol{\cdot} \right\rangle_{v, \nu} , \quad \left\langle 2 \frac{{\rm J}_{1,\nu}}{a_{\nu}} F_{0,\nu} \frac{\mu_{\nu}}{T_{\nu}} , \boldsymbol{\cdot} \right\rangle_{v, \nu} , \end{equation}

giving

(C2)\begin{gather} 0 = \sum_{\nu} \frac{2 {\rm \pi}B_0}{m_{\nu}} \int \,\mathrm{d} v_{{\parallel}} \int \,\mathrm{d} \mu_{\nu} \alpha_{\nu} (z, v_{{\parallel}}, \mu_{\nu} ) F_{0,\nu} \left\{ \gamma^* \lambda^{\leftrightarrow}_{\nu} + v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{z}} \right.\nonumber\\ \left. - \frac{\mu_{\nu}}{m_{\nu}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{v_{{\parallel}}}} - {\rm i} \omega_{d, \nu} \lambda^{\leftrightarrow}_{\nu} + Z_{\nu} e {\rm J}_{0,\nu} \xi - \frac{4 {\rm \pi}}{k_{{\perp}}^2} Z_{\nu} e {\rm J}_{0,\nu} \frac{v_{{\parallel}}}{c} \zeta + 8 {\rm \pi}\frac{{\rm J}_{1,\nu}}{a_{\nu}} \mu_{\nu} \sigma \right\}, \end{gather}

where $\alpha _{\nu }$ can take the forms

(C3)\begin{equation} \alpha_{\nu} = \left\{ \begin{array}{@{}l} \dfrac{Z_{\nu} e}{T_{\nu}} {\rm J}_{0,\nu} ,\\ \dfrac{Z_{\nu} e}{T_{\nu}} {\rm J}_{0,\nu} \dfrac{v_{{\parallel}}}{c} ,\\ 2 \dfrac{{\rm J}_{1,\nu}}{a_{\nu}} \dfrac{\mu_{\nu}}{T_{\nu}} . \end{array} \right. \end{equation}

We now identify different terms in (C2) for each $\alpha _{\nu }$ which are odd in $v_{\parallel }$ so evaluate to zero when integrated over the domain $\{-\infty, \infty \}$

(C4)\begin{align} 0 & = \sum_{\nu} \frac{2 {\rm \pi}B_0} {m_{\nu}} \int \,\mathrm{d}^2 v \frac{Z_{\nu} e}{T_{\nu}} F_{0,\nu} \left\{\vphantom{\underbrace{\frac{4 {\rm \pi}}{k_{{\perp}}^2} Z_{\nu} e {\rm J}_{0,\nu}^2 \frac{v_{{\parallel}}}{c} \zeta }_{\text{odd in }v_{{\parallel}}}} \gamma^* {\rm J}_{0,\nu} \lambda^{\leftrightarrow}_{\nu} + v_{{\parallel}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z {\rm J}_{0,\nu} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{z}} - {\rm i} \omega_{d, \nu} {\rm J}_{0,\nu} \lambda^{\leftrightarrow}_{\nu} \right.\nonumber\\ & \quad\left. - \frac{\mu_{\nu}}{m_{\nu}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{v_{{\parallel}}}} + Z_{\nu} e {\rm J}_{0,\nu}^2 \xi + \underbrace{\frac{4 {\rm \pi}}{k_{{\perp}}^2} Z_{\nu} e {\rm J}_{0,\nu}^2 \frac{v_{{\parallel}}}{c} \zeta }_{\text{odd in }v_{{\parallel}}} + 8 {\rm \pi}\frac{{\rm J}_{1,\nu} {\rm J}_{0,\nu} }{a_{\nu}} \mu_{\nu} \sigma \right\} , \end{align}

(C5)\begin{align} 0 = & \sum_{\nu} \frac{2 {\rm \pi}B_0} {m_{\nu}} \int \,\mathrm{d}^2 v \frac{Z_{\nu} e}{T_{\nu}} F_{0,\nu} \left\{\vphantom{\underbrace{ \frac{4 {\rm \pi}}{k_{{\perp}}^2} Z_{\nu} e \frac{{\rm J}_{0,\nu} {\rm J}_{1,\nu}}{a_{\nu}} \frac{v_{{\parallel}}}{c} \frac{\mu_{\nu}}{T_{\nu}} \zeta }_{\text{odd in }v_{{\parallel}}}} \gamma^* {\rm J}_{0,\nu} \frac{v_{{\parallel}}}{c} \lambda^{\leftrightarrow}_{\nu} + \frac{v_{{\parallel}}^2 }{c} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z {\rm J}_{0,\nu} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{z}}\right. \nonumber\\ & \quad - \frac{\mu_{\nu}}{m_{\nu}} \frac{v_{{\parallel}}}{c} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{v_{{\parallel}}}} - {\rm i} \omega_{d, \nu} \frac{v_{{\parallel}}}{c} {\rm J}_{0,\nu} \lambda^{\leftrightarrow}_{\nu} \nonumber\\ & \left.\quad + \underbrace{Z_{\nu} e {\rm J}_{0,\nu}^2 \frac{v_{{\parallel}}}{c} \xi}_{\text{odd in }v_{{\parallel}}} +\frac{4 {\rm \pi}}{k_{{\perp}}^2} Z_{\nu} e {\rm J}_{0,\nu}^2 \left( \frac{v_{{\parallel}}}{c} \right)^2 \zeta + \underbrace{ 8 {\rm \pi}\frac{{\rm J}_{1,\nu} {\rm J}_{0,\nu} }{a_{\nu}} \frac{v_{{\parallel}}}{c} \mu_{\nu} \sigma}_{\text{odd in }v_{{\parallel}}} \right\} , \end{align}

(C6)\begin{align} 0 & = 2 \sum_{\nu} \frac{2 {\rm \pi}B_0} {m_{\nu}} \int \,\mathrm{d}^2 v F_{0,\nu} \left\{ \vphantom{\underbrace{ \frac{4 {\rm \pi}}{k_{{\perp}}^2} Z_{\nu} e \frac{{\rm J}_{0,\nu} {\rm J}_{1,\nu}}{a_{\nu}} \frac{v_{{\parallel}}}{c} \frac{\mu_{\nu}}{T_{\nu}} \zeta }_{\text{odd in }v_{{\parallel}}}} \frac{{\rm J}_{1,\nu}}{a_{\nu}} \frac{\mu_{\nu}}{T_{\nu}} \gamma^* \lambda^{\leftrightarrow}_{\nu} + v_{{\parallel}} \frac{{\rm J}_{1,\nu}}{a_{\nu}} \frac{\mu_{\nu}}{T_{\nu}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{z}} \right. \nonumber\\ & \quad -\frac{\mu_{\nu}}{m_{\nu}} \hat{\boldsymbol{b}}\boldsymbol{\cdot} \boldsymbol{\nabla} z \frac{\partial{B_0}}{\partial{z}} \frac{{\rm J}_{1,\nu}}{a_{\nu}} \frac{\mu_{\nu}}{T_{\nu}} \frac{\partial{\lambda^{\leftrightarrow}_{\nu}}}{\partial{v_{{\parallel}}}} - {\rm i} \omega_{d, \nu} \frac{{\rm J}_{1,\nu}}{a_{\nu}} \frac{\mu_{\nu}}{T_{\nu}} \lambda^{\leftrightarrow}_{\nu} + \frac{Z_{\nu} e \mu_{\nu}}{T_{\nu}} \frac{{\rm J}_{0,\nu} {\rm J}_{1,\nu}}{a_{\nu}} \xi \nonumber\\ & \left.\quad + \underbrace{ \frac{4 {\rm \pi}}{k_{{\perp}}^2} Z_{\nu} e \frac{{\rm J}_{0,\nu} {\rm J}_{1,\nu}}{a_{\nu}} \frac{v_{{\parallel}}}{c} \frac{\mu_{\nu}}{T_{\nu}} \zeta }_{\text{odd in }v_{{\parallel}}} + 8 {\rm \pi}\left(\frac{{\rm J}_{1,\nu} }{a_{\nu}} \right)^2 \frac{\mu_{\nu}^2 }{T_{\nu}} \sigma \right\} . \end{align}

We can then perform integration by parts on the remaining $v_{\parallel }$ derivative, using velocity independence of the fields. Using equations (3.10)–(3.12) it is then possible to simplify the above equations to produce the results (3.14)–(3.16).

Appendix D. Integral weights

The code stella computes its $\tilde {\mu }_{\nu }$ grid points according to Gauss–Laguerre quadrature such that the $\tilde {\mu }_{\nu }$-integration of a variable, $f$, may be approximated numerically as

(D1)\begin{equation} B \int_0^{\infty} \,\mathrm{d} \mu_{\nu} f(\mu_{\nu}) \approx \sum_i^{N_{\mu_{\nu}}} w_i {\rm e}^{\hat{\mu}_{\nu}} \frac{B}{B_0} f(\hat{\mu}_{\nu}) , \end{equation}

where $N_{\mu _{\nu }}$ is the number of grid points in $\mu _{\nu }$, and the definition $\hat {\mu }_{\nu } = \mu _{\nu } B_0$ is made, with $B_0$ relating to the minimum of the normalised magnetic field. This $B_0$ acts as a scaling factor that acts on the upper limit of the $\mu _{\nu }$ grid such that the grid may be defined solely based on the number of grid points whilst still covering the necessary domain as $\hat {\mu }_{\nu }$ is an independent coordinate that is independent of $\boldsymbol {p}$. Previously, the importance of the role of the Jacobians was mentioned with a stress that certain variables are being considered as fixed, dummy variables, whilst others are dependent on the geometric inputs, $\boldsymbol {p}$. We take the $\hat {\mu }_{\nu }$ variable to be the dummy variable that is kept fixed in the perpendicular velocity integral such that the weights $w_i = B / B_0$ are the only factors to be perturbed in the integrals and, in a similar manner, the fixed variable, $\hat {\mu }_{\nu }$, is weighted by a varying factor, $B_0$, when it appears in the equations. It should be noted that since the $\hat {\mu }_{\nu }$ grid is calculated independently of the geometry, it is the same for each perturbed value of $\boldsymbol {p}$, and as such the $\hat {\mu }_{\nu }$ grids will always align even when multiplying terms that are evaluated at different $\boldsymbol {p}$ values, such as when coefficients are perturbed but are multiplied by $\tilde {g}_{\nu } (\boldsymbol {p}_0)$.

Appendix E. Geometry implementation

The code stella has an input option to use the Miller parametrisation of a flux surface; it takes a set of input variables to describe the local geometry of a specified flux surface along with the two adjacent flux surfaces on either side. We take (4.16) in the form

(E1)\begin{equation} R(r, \theta) = R_0 (r) + r \cos [ \theta + \sin(\theta) \delta(r) ] , \end{equation}

with the triangularity redefined as $\delta (r) := \arcsin [\bar {\delta }(r)]$. We now consider Taylor expanding in $r$ about $r = r_{\psi _0}$

(E2)\begin{equation} R_0(r) = R_{0} ( r_{\psi_0}) + \left. {\frac{\mathrm{d}{R_0}}{\mathrm{d}{r}}} \right|_{ r_{\psi_0}} (r - r_{\psi_0} ) + \cdots\approx R_{\psi_0} + \varDelta_{\psi_0} \,\mathrm{d} r + O (\mathrm{d} r^2) , \end{equation}

with $\mathrm {d} r = r - r_{\psi _0}$, $R_{\psi _0} = R_0( r_{\psi _0})$, and $\varDelta _{\psi _0} = {\textrm {d}R_0}/{\textrm {d}r}|_{ r_{\psi _0}}$. Similarly

(E3)\begin{equation} \delta(r) = \delta ( r_{\psi_0}) + \left. {\frac{\mathrm{d}{\delta}}{\mathrm{d}{r}}} \right|_{ r_{\psi_0}} (r - r_{\psi_0}) + \cdots\approx \delta_{\psi_0} + \delta'_{\psi_0} \,\mathrm{d} r + O (\mathrm{d} r^2), \end{equation}

with $\delta _{\psi _0} = \delta ( r_{\psi _0})$, and $\delta '_{\psi _0} = \mathrm {d}\delta / \mathrm {d} r |_{ r_{\psi _0}}$. Combining (E2) and (E3) gives

(E4)\begin{gather} R(r,\theta) \approx R_{\psi_0} + r_{\psi_0} \cos[ \theta + \sin(\theta) \delta_{\psi_0} ] \nonumber\\ + \left\{ \varDelta_{\psi_0} + \cos[\theta + \sin(\theta) \delta_{\psi_0} ] - r_{\psi_0} \sin[ \theta + \sin (\theta) \delta_{\psi_0} ] \sin(\theta) \delta'_{\psi_0} \right\} \,\mathrm{d} r , \end{gather}

such that the above definition holds on any given flux surface, $\psi _0$. Equivalently, (4.17) can also be expanded about $r= r_{\psi _0}$ by first expanding the elongation

(E5)\begin{equation} \kappa (r) = \kappa ( r_{\psi_0}) + \left. {\frac{\mathrm{d}{\kappa}}{\mathrm{d}{r}}} \right|_{ r_{\psi_0}} (r - r_{\psi_0}) + \cdots\approx \kappa_{\psi_0} + \kappa'_{\psi_0} \,\mathrm{d} r + O (\mathrm{d} r^2) , \end{equation}

with $\kappa _{\psi _0} = \kappa ( r_{\psi _0})$, and $\kappa '_{\psi _0} = \mathrm {d} \kappa / \mathrm {d} r |_{ r_{\psi _0}}$ to then write

(E6)\begin{equation} Z(r,\theta) \approx r_{\psi_0} \kappa_{\psi_0} \sin(\theta) + \left[ \kappa_{\psi_0} + r_{\psi_0} \kappa'_{\psi_0} \right] \sin(\theta) \, \mathrm{d} r . \end{equation}

These functions are used to describe the geometry of the flux surface of interest and the two adjacent flux surfaces by setting $\mathrm {d} r = \{0, \pm \varLambda \}$, with $\varLambda \ll 1$ a constant, in order to evaluate their radial derivatives. These quantities are then used to compute the Jacobian, magnetic field strength and configuration along with other functions defined on the flux surface.

Footnotes

¹ An important caveat to mention is that true critical gradient values can be shifted by nonlinear effects. Linear instabilities grow until nonlinear coupling to ‘zonal’ modes removes energy from the linearly growing modes. This energy injection into the zonal modes causes them to grow and can suppress turbulence completely in a region beyond the linear critical gradients (e.g. see Dimits et al. Reference Dimits, Bateman, Beer, Cohen, Dorland, Hammett, Kim, Kinsey, Kotschenreuther and Kritz2000; Rogers, Dorland & Kotschenreuther Reference Rogers, Dorland and Kotschenreuther2000). We will neglect this ‘Dimits shift’ of the critical gradient in the following analysis, and use only the linear gradient as a predictor for experimental profiles. This provides a lower bound for critical gradient values and can be a good approximation for the full nonlinear threshold value.

² This requires the collisionality $\nu _* \gtrsim \rho _* \omega$ – a regime within which we work.

³ It should be noted that this encapsulates a broad range of collision operators including the linearised Landau, and the linearised Fokker–Planck collision operators.

⁴ Spatial derivatives are exact in the Fourier representation and are hence more accurate than using a finite-difference scheme in configuration space.

⁵ We remind the reader that, here, $\boldsymbol {k}$ denotes the Fourier mode from the spatial decomposition, and $j$ is the subscript denoting the temporal normal mode.

⁶ It is worth noting that there are instances when the linear growth rates of two independent modes may be very similar, and a single dominant growth rate may require long run times to become evident.

⁷ Note that these must be linear in their arguments, exhibit conjugate symmetry and satisfy the Cauchy–Schwartz inequality to be well defined. We also require that all arguments are square integrable in order for the optimisation to be finite.

⁸ Note that the adjoint variables here are defined in Fourier space, so $\lambda _{\nu }$, $\xi$, $\zeta$ and $\sigma$ also contain suppressed $\boldsymbol {k}$ subscripts, and thus we are calculating each for a specific $\boldsymbol {k}$. The derivatives $\boldsymbol {\nabla }_{\boldsymbol {p}}$ indicate how these Fourier components respond to external changes of the parameters $\boldsymbol {p}$ in our system.

⁹ It is noteworthy to point out that although the Jacobians present in the integrals (as well as the Lagrange multipliers themselves) are $\boldsymbol {p}$-dependent, we anticipate that when the distribution function and fields are evaluated at $\boldsymbol {p}_0$ their derivatives provide zero contribution as they multiply the functional operators, which are identically zero at $\boldsymbol {p}_0$. Hence, we would be justified in pulling the partial derivative through these inner products, and the final derivative of the growth rate obtained should be the same.

¹⁰ An additional consequence is that this choice simplifies the implementation into an existing gyrokinetic code.

¹¹ Here, the time derivative serves as an iterative scheme to converge to the correct steady-state solution for $\tilde {\lambda }^{\leftrightarrow }_{\nu }$.

¹² In cases where there exist two modes of similar growth rates, it may take a long run time to isolate the dominant mode. In these instances of ambiguity, the adjoint method as demonstrated here, may fail to provide the correct gradient of the linear growth rate in the desired parameter space at that point. However, it should be noted that as soon as you move away from this region of ambiguity in the growth rate, the adjoint method can continue to be applicable, and will provide the correct gradient in the parameter space. Hence, even if you take a step in the wrong direction within the space, the adjoint method should correct itself to then step in the direction of stability.

References

Abel, I.G., Plunk, G.G., Wang, E., Barnes, M., Cowley, S.C., Dorland, W. & Schekochihin, A.A. 2013 Multiscale gyrokinetics for rotating tokamak plasmas: fluctuations, transport and energy flows. Rep. Prog. Phys. 76 (11), 116201.Google Scholar

Antonsen, T.M. Jr. & Lane, B. 1980 Kinetic equations for low frequency instabilities in inhomogeneous plasmas. Phys. Fluids 23 (6), 1205–1214.Google Scholar

Barnes, M., Parra, F.I. & Landreman, M. 2019 stella: an operator-split, implicit-explicit δf-gyrokinetic code for general magnetic field configurations. J. Comput. Phys. 391, 365–380.Google Scholar

Beeke, O., Barnes, M., Romanelli, M., Nakata, M. & Yoshida, M. 2020 Impact of plasma shaping on tokamak microstability. Nuclear Fusion 61 (6), 066020.Google Scholar

Beer, M.A., Cowley, S.C. & Hammett, G.W. 1995 Field-aligned coordinates for nonlinear simulations of tokamak turbulence. Phys. Plasmas 2 (7), 2687–2700.Google Scholar

Catto, P.J. 1978 Linearized gyro-kinetics. Plasma Phys. 20 (7), 719.Google Scholar

Cowley, S.C., Kulsrud, R.M. & Sudan, R. 1991 Considerations of ion-temperature-gradient-driven turbulence. Phys. Fluids B 3 (10), 2767–2782.Google Scholar

Dimits, A.M., Bateman, G., Beer, M.A., Cohen, B.I., Dorland, W., Hammett, G.W., Kim, C., Kinsey, J.E., Kotschenreuther, M., Kritz, A.H., et al. 2000 Comparisons and physics basis of tokamak transport models and turbulence simulations. Phys. Plasmas 7 (3), 969–983.Google Scholar

Doyle, E.J., Houlberg, W.A. & Kamada, Y. 2007 Chapter 2: plasma confinement and transport [Progress in the ITER Physics Basis (PIPB)]. Nucl. Fusion 47, S20–S36.Google Scholar

Drake, J.F. & Lee, Y.C. 1977 Kinetic theory of tearing instabilities. Phys. Fluids 20 (8), 1341–1353.Google Scholar

Geraldini, A., Landreman, M. & Paul, E. 2021 An adjoint method for determining the sensitivity of island size to magnetic field variations. J. Plasma Phys. 87 (3), 905870302.Google Scholar

Gottlieb, S., Shu, C.-W. & Tadmor, E. 2001 Strong stability-preserving high-order time discretization methods. SIAM Rev. 43 (1), 89–112.Google Scholar

Highcock, E.G., Mandell, N.R., Barnes, M. & Dorland, W. 2018 Optimisation of confinement in a fusion reactor using a nonlinear turbulence model. J. Plasma Phys. 84 (2), 905840208.Google Scholar

Mantica, P., Angioni, C., Baiocchi, B., Baruzzo, M., Beurskens, M.N.A., Bizarro, J.P.S., Budny, R.V., Buratti, P., Casati, A., Challis, C., et al. 2011 Ion heat transport studies in JET. Plasma Phys. Control. Fusion 53 (12), 124033.Google Scholar

Miller, R.L., Chu, M.S., Greene, J.M., Lin-Liu, Y.R. & Waltz, R.E. 1998 Noncircular, finite aspect ratio, local equilibrium model. Phys. Plasmas 5 (4), 973–978.Google Scholar

Nies, R., Paul, E.J., Hudson, S.R. & Bhattacharjee, A. 2022 Adjoint methods for quasi-symmetry of vacuum fields on a surface. J. Plasma Phys. 88 (1), 905880106.Google Scholar

Paul, E.J., Landreman, M., Bader, A. & Dorland, W. 2018 An adjoint method for gradient-based optimization of stellarator coil shapes. Nucl. Fusion 58 (7), 076015.Google Scholar

Pironneau, O. 1974 On optimum design in fluid mechanics. J. Fluid Mech. 64, 97–110.Google Scholar

Pochelon, A., Goodman, T.P., Henderson, M., Angioni, C., Behn, R., Coda, S., Hofmann, F., Hogge, J.P., Kirneva, N., Martynov, A.A., et al. 1999 Energy confinement and MHD activity in shaped TCV plasmas with localized electron cyclotron heating. Nucl. Fusion 39 (11Y), 1807–1818.Google Scholar

Rogers, B.N., Dorland, W. & Kotschenreuther, M. 2000 Generation and stability of zonal flows in ion-temperature-gradient mode turbulence. Phys. Rev. Lett. 85, 5336–5339.Google Scholar

Rudakov, L.I. & Sagdeev, R.Z. 1961 On the instability of a nonuniform rarefied plasma in a strong magnetic field. Sov. Phys. Dokl. 6, 415.Google Scholar

Shafranov, V.D. 1966 Plasma equilibrium in a magnetic field. Rev. Plasma Phys. 2, 103.Google Scholar

Tang, W.M., Connor, J.W. & Hastie, R.J. 1980 Kinetic-ballooning-mode theory in general geometry. Nucl. Fusion 20 (11), 1439.Google Scholar

Transtrum, M.K. & Sethna, J.P. 2012 Improvements to the Levenberg–Marquardt algorithm for nonlinear least-squares minimization. arXiv:1201.5885.Google Scholar