440
Views
0
CrossRef citations to date
0
Altmetric
Article

An Adjoint Method for High-Resolution EPMA Based on the Spherical Harmonics (PN) Model of Electron Transport

, , , &

Abstract

Electron Probe Microanalysis (EPMA) is a nondestructive technique to determine the chemical composition of material samples in the micro- to nanometer range. Based on intensity measurements of x-radiation, the reconstruction of the material composition and structure poses an inverse problem. The reconstruction methods currently applied are based on models that assume a homogeneous or a layered structure of the studied material. To increase the spatial resolution of reconstruction in EPMA the combination of a more sophisticated reconstruction method, which is based on a model that allows complex material structure, together with multiple measurements with varying beam configurations is required. We present a deterministic k-ratio model that is based on the PN model, an approximation of the radiative transfer equation for electron transport. Our goal is to approximate a maximum likelihood solution of the inverse problem using gradient-based optimization. We detail the application of the model in the context of algorithmic differentiation, in particular by deriving its continuous adjoint formulation. Algorithmic differentiation provides the flexibility to adapt the reconstruction method to various material parametrizations and thus to regularize and take into account prior knowledge. Through examples, we verify our implementation and demonstrate the flexibility of the reconstruction/differentiation method.

1. Introduction

Electron Probe Microanalysis (EPMA) (Heinrich and Newbury Citation1991; Reimer Citation1998) is an imaging technique used for the quantitative analysis of the composition of solid material samples at the micro- to nanometer scale. The sample is excited by a focused beam of electrons which induces multiple relaxation processes inside the sample. In EPMA, the emission of characteristic x-rays is of special focus. If an electron that is induced by the beam strikes a bound electron that occupies an atomic shell of an atom inside the specimen, the bound electron is ejected from its shell and the atom is left with a vacancy. Outer shell electrons fill this vacancy by emitting a quantized x-ray with an energy corresponding to the energy level difference of the excited (initial) and the relaxed (final) level. The energy levels of electron shells are characteristic of a specific atom, hence the energy of the emitted x-ray provides information about the composition of the material sample. Reconstructing the chemical composition from intensities of emitted x-radiation (normalized into k-ratios (kα)αA,A: measurement setups) forms the inverse problem in EPMA. In , we outline the main physical processes that occur during an experiment.

Figure 1. A sketch of the physical processes in EPMA. The sample G is rastered by electron beams (blue) with different positions μx. Electrons e from the electron beam (blue dotted line) scatter inside the sample and strike bound electrons that leave a vacancy (blue circle). Outer shell electrons (blue discs) fill this vacancy and release an x-ray of characteristic energy (red wobbly line). The x-ray travels through the sample and is counted by a detector. Beam electrons only excite a certain volume of the sample, the interaction volume (gray ellipses) which scales with the beam energy μϵ.

Figure 1. A sketch of the physical processes in EPMA. The sample G is rastered by electron beams (blue) with different positions μx. Electrons e– from the electron beam (blue dotted line) scatter inside the sample and strike bound electrons that leave a vacancy (blue circle). Outer shell electrons (blue discs) fill this vacancy and release an x-ray of characteristic energy (red wobbly line). The x-ray travels through the sample and is counted by a detector. Beam electrons only excite a certain volume of the sample, the interaction volume (gray ellipses) which scales with the beam energy μϵ.

The crucial ingredient to the inverse problem of reconstruction is the definition of a model k  P:PR|A| that maps parameters pPRnp (P: parameter space, np: number of parameters) to k-ratios (k  P)(p)R|A| (|A|: number of measurements). Then the reconstruction can be formalized as the following inverse problem: Find the set of parameters p* such that the modeled k-ratios (k  P)(p*) reproduce the experimental k-ratios kexp as good as possible.

The applications of EPMA are manifold (Llovet et al. Citation2021; Moy, Fournelle, and von der Handt Citation2019; Pinard et al. Citation2013). From geology, and material science to electronics, researchers rely on EPMA to determine the material composition and structure of samples. Every application brings with it different prior knowledge and objectives that must be considered in the reconstruction. We subdivide our model into a k-ratio model k:(GRne)R|A|,ρk[ρ] and a material parametrization P:P(GRne),pP(p)=ρ. Thereby ρi(x) is the mass concentration of element i in a compound with ne constituents at a point xG. The sample domain GR3 will be specified later. While the k-ratio model k can handle general materials, the reconstruction can be tailored to specific applications via the material parametrization P (e.g., using prior knowledge). In particular, our definition includes reconstruction approaches currently applied in EPMA, which are however very restrictive on the material parametrizations P. ZAF-models assume homogeneity of the material inside the interaction volume of each beam. Effectively, this assumption limits the spatial resolution of EPMA to the size of the interaction volume (Buse and Kearns Citation2020; Carpenter and Jolliff Citation2015; Moy and Fournelle Citation2017). Reconstruction methods based on the integration of ϕ(ρz)-curves currently only allow for samples that are layered in depth (Moy and Fournelle Citation2020). We aim to leverage a more sophisticated k-ratio model and the combination of k-ratio measurements from various beam positions and energies (resp. measurement setups A) to obtain a high-resolution spatial image of the material composition.

In this article, we present our model for k-ratios (taken from Bünger, Richter, and Torrilhon Citation2022) with a detail on the dependence on the mass concentrations ρ). The main part is dedicated to the derivation of an efficient and modular approach to compute the derivatives of our model, facilitating its integration into differentiable programming (Dorigo et al. Citation2023). We derive our differentiation approach based on the Adjoint Mode of Algorithmic Differentiation (Griewank Citation2003; Naumann Citation2011) and the Continuous Adjoint Method (Plessix Citation2006). The derivative forms the crucial part in the implementation of gradient-based optimization methods that find a local minimum of an objective function as a candidate solution of the inverse problem. We show the agreement of our gradient computation with finite differences and show the advantages of the adjoint method in terms of computation time. We also provide reconstruction experiments that compare different parametrizations in 1D and show a 2D reconstruction example.

1.1. The inverse problem of reconstruction

In theory (Stuart Citation2010; Tarantola Citation2005), the solution of the inverse problems is defined as knowledge of the posterior uncertainty of the parameters based on the likelihood of the data, which describes the uncertainty of the model and the measurement, and the prior uncertainty of the parameters. So far, we consider a Bayesian inversion of the presented problem as intractable. However, the statistical understanding interprets our approach as finding the maximum likelihood estimate under the assumption of an isotropic Gaussian likelihood. We define the reconstruction problem as the following optimization problem:(1) p*=arg minpPαA((kα  P)(p)kαexp)2=(JkP)(p)(1)

Assuming that the objective function JkP is differentiable w.r.t. p, gradient-based optimization methods (Steepest Descent, Conjugate Gradient, L-BFGS) (Nocedal and Wright Citation2006) can be utilized to iteratively approximate a local minimum p*P. For an efficient application of gradient-based methods, the efficient computation of the gradient is crucial. Additionally, we require our code to remain modular, such that either different parts of the model and, more importantly, the parametrization of the mass concentrations P is exchangeable. We remark that the computation of maximum likelihood estimates can be ambiguous and ill-posed depending on the number of available measurements and the chosen parametrization. Additional prior information about the material would be necessary that could enter our minimization problem Equation(1) as an additional regularization term R(ρ(·),p). Regarding the ill-posedness of the inverse problem, we refer to further research and note that the addition of a differentiable regularization term is possible.

2. The forward model

The characteristic x-ray intensity model used in this article combines a model for electron transport based on the continuous slowing down approximation (CSD) of the linear Boltzmann equation (radiative transfer) with a subsequently applied model for x-ray generation and attenuation. In Bünger (Citation2021); Bünger, Sarna, and Torrilhon (Citation2022); Bünger, Richter, and Torrilhon (Citation2022), the model is derived in more detail. Here we derive and summarize the main equations and address the dependence on mass concentrations ρ in detail.

2.1. K-ratios

In EPMA, x-ray intensity is measured by counting all characteristic x-rays emitted by electron scattering induced by the electron beam. Because of multiple uncertain quantities concerning the experimental device, the x-ray intensity Iα is normalized into a k-ratio kα using standard intensities Iαstd measured from a known reference sample.(2) kα=IαIαstd(2)

Therefore, the normalization eliminates uncertain multiplicative factors that influence the intensity, e.g., the detector efficiency.

An experiment yields multiple k-ratios. Each k-ratio corresponds to a specific x-ray transition and a specific experimental setup, e.g., the beam position, energy or angle. To distinguish between k-ratios, we use α=(αL,αB)A to denote a multi-index, that contains all information about the transition αL and the beam setup αB.(3) α=(Zi{Z1,Z2,,Zne},I{K,L1,,M1,},F{L1,,M1,},=αL(μx,beam,Σx,beam)R2×R2×2,(μϵ,beam,Σϵ,beam)R×R,(μΩ,beam,κΩ,beam)S2×R=αB)(3)

The individual symbols are the atomic number ZiαL, the initial (excited) IαL and the final (relaxed) FαL x-ray level of the electron-transition,Footnote1 the mean and variance of the beam’s position (μx,beam,Σx,beam)αB, of the beam’s energy (μϵ,beam,Σϵ,beam)αB and the mean and concentration parameter of the beam’s direction (μΩ,beam,κΩ,beam)αB.

2.2. Characteristic X-ray emission model

Based on mass concentrations ρ, we aim for a generic model for k-ratios kα which, according to the usual definition in EPMA, are computed as the ratio of a measured Iα and a standard intensity Iαstd.(4) kα[ρ]=Iα[ρ]Iα[ραstd](4)

Both intensities can be computed from the same model, only the underlying mass concentrations, ρ resp. ραstd, changes.

Neglecting multiplicative factors that cancel due to the normalization with standard intensities Iαstd, we formulate the intensity of characteristic x-rays Iα as the integral of the attenuation field Aα, the number of atoms Nα and the generation field of characteristic x-rays Iα over a domain GR3 of the sample which is chosen such that G covers the support of Iα, i.e., Iα(x)=0 xR3G.(5) Iα[ρ]=GAα[ρ](x) Nα[ρ](x) Iα[ρ](x)dx(5)

The attenuation field Aα describes the intensity reduction of generated x-rays while traveling through the sample toward the detector. Attenuation is quantified by Beer-Lamberts law where the attenuation coefficient μα is the weighted sum of mass concentrations ρi and mass attenuation coefficients τα,i.Footnote2(6) Aα[ρ](x)=exp(l(x,xd)μα(y)dy)=exp(l(x,xd)i=1neτα,iρi(y)dy)(6)

In Equation(6) the integration domain is the straight line l(x,xd)G from a point xG toward the detector position xdG, such that the line l(x,xd) lies inside the domain G. Neglecting the part outside the domain G assumes that x-rays travel unattenuated in the vacuum surrounding the material sample.

The number of atoms Nα (per unit volume) is given by the division of the mass concentration field ρi by the atomic mass Aα of the element Ziα.(7) Nα[ρ](x)=ρi(x)|ZiαAα(7)

The third factor that influences the x-ray intensities Iα is the field of characteristic x-ray generation.(8) Iα[ρ](x)=Eσα(ϵ)S2|v(ϵ)|f(x,ϵ,Ω)=ψα(x,ϵ,Ω)dΩdϵ(8)

It contains the electron fluence ψα(x,ϵ,Ω)=|v(ϵ)|f(x,ϵ,Ω) (v(ϵ): electron velocity, f(x,ϵ,Ω): number density of electrons) of beam electrons at xG with energy ϵE=[ϵcut,ϵmax] into direction ΩS2 inside the material probe which is weighted by the x-ray production cross-section σα.Footnote3 The energy interval E is defined by the cutoff energy ϵcut, which is chosen so small that no more ionization can occur, and ϵmax, which is chosen sufficiently large to capture all beam electrons.

Describing the electron fluence ψα forms the most difficult part of modeling the emission of characteristic x-rays for inhomogeneous samples. We detail our model for the electron fluence ψα in the following section.

2.3. Deterministic electron transport model

The dynamics of the beam electrons are described by a linear Boltzmann equation, an integro-differential equation describing the evolution of electron number density in phase-space (position-velocity space), whereby the physical processes governing the evolution of the electron number density are free-flight phases interrupted by elastic and inelastic collisions with atoms of the material probe. Since the timescale of the physical processes is very small in comparison to the duration of the experimental measurements of x-ray intensities, it is valid to neglect the time dependence and assume a steady-state distribution of beam electrons immediately after switching on the electron beam. Furthermore, as inelastic scattering cross-sections are highly peaked around small energy losses, i.e., electrons most probably lose a significant amount of energy in a sequence of small energy losses, it is common in EPMA to model the energy loss of projectile electrons as a continuous deceleration along the trajectory. These simplifications lead to modeling the beam electrons by the Boltzmann equation of electron transport in continuous slowing down (CSD, resp. BCSD) approximation – an evolution equation in energy space given by (Larsen et al. Citation1997)(9) ϵ(S[ρ](x,ϵ)ψα(x,ϵ,Ω))+Ω·xψα(x,ϵ,Ω)=(Q[ρ]ψα)(x,ϵ,Ω),(9) where the energy loss is governed by the stopping power S and the scattering operator Q accounts for elastic and inelastic collisions (CSD-average). Both, the stopping power S and the scattering operator Q are derived from cross-sections for compounds, which are defined by means of the additivity approximation. We use the cross-sections from Salvat et al. (Citation2007) that are also used in Bünger (Citation2021); Olbrant (Citation2012). Assuming a well-defined atomic mass, we can write both as the sum of atomic quantities Si and QiFootnote4 weighted by mass concentrations ρi.(10) S[ρ](x,ϵ)=i=1neρi(x)Si(ϵ)(10) (11) (Q[ρ]ψα)(x,ϵ,Ω)=i=1neρi(x)(Qiψα)(x,ϵ,Ω)(11)

The BCSD model describes the evolution of the electron fluence ψα in energy space and consequently needs to be equipped with initial and boundary conditions. As the evolution is prescribed from high to low energies an initial condition defines the electron fluence at the maximal energy ϵmax(12) ψα(x,ϵ=ϵmax,Ω)=ψα,0(x,Ω)=0.(12)

We will initialize the electron fluence at an energy ϵmax sufficiently larger than the mean beam energy μϵ,beam by ψα,0(x,Ω)=0 and model the electron beam by boundary conditions. The electron transport model can be equipped with Dirichlet-type boundary conditions by prescribing the incoming half of the electron fluence at the boundary(13) ψα(x,ϵ,Ω)=ψα,in(x,ϵ,Ω)ΩS2:n·Ω<0(13) where n denotes the outward-pointing normal-vector at xG and ψα,in is a given distribution of incoming particles. We model electron beams by a product of Gaussian distributions in space and energy together with a von-Mises-Fisher distribution in direction. An electron beam that is aligned with the outward-pointing normal n of the polished material surface and focused onto the point μx,beamG is modeled by(14) ψα,in(x,ϵ,Ω)=Nα,x(x)Nα,ϵ(ϵ)Fα,Ω(Ω),(14) where Nα,x(x) is the probability density function of a Gaussian with mean μx,beamα and covariance Σx,beamα. Respectively the distribution in energy Nα,ϵ(ϵ) (Gaussian with mean μϵ,beamα and variance Σϵ,beamα) and the distribution in direction Fα,Ω(Ω) (von-Mises-Fisher with mean μΩ,beamα and concentration κΩ,beamα) is defined.

2.4. Spherical harmonic approximation

A numerical solution of the electron transport model in EquationEquation (9) is expensive due to the high dimension of phase space of ψα(x,ϵ,Ω) and complexity of the scattering operator Q. Therefore, we employ the method of moments to derive a modal approximation of the linear Boltzmann equation. We reduce the phase space by replacing the direction Ω as an independent variable by a small set of moments to model the electron fluence ψ. Here we consider the spherical harmonic (PN) moment approximation (Case and Zweifel Citation1967; Mark Citation1944; Mark Citation1945; Seibold and Frank Citation2014), i.e., we approximate the electron fluence ψ by its series expansion in spherical harmonics Ylk:S2R truncated after some finite degree lNN.(15) ψα(x,ϵ,Ω)ψαPN(x,ϵ,Ω)=l=0Nk=lluαl,k(x,ϵ)Ylk(Ω)(15) where the expansion coefficients are the spherical harmonic moments of the angular distribution(16) uαl,k(x,ϵ)=S2Ylk(Ω)ψα(x,ϵ,Ω)dΩlN,k|l|,(16) and we model the fluence through the system of evolution equations of its moments uαl,k which we refer to as the PN equations. The PN equations are obtained by weakly enforcing EquationEquation (9) for ψαPN on the test space spanned by spherical harmonics up to degree N. By gathering the coefficients uαl,k and ansatz functions Ylk into vector functions uα:G×ER(N+1)2 and ϒN:S2R(N+1)2 such that ψαPN=(ϒN)Tuα, the equation for uα can be written as Fα(uα,ρ)=0.(17) ϵ(S[ρ](x,ϵ)uα(x,ϵ))+d=13A(d)xduα(x,ϵ)Q[ρ](x,ϵ)uα(x,ϵ)=Fα(uα,ρ)=0xG,ϵE(17)

With the following definitions for A(d) and Q[ρ].(18) A(d)=S2ΩdϒN(ϒN)TdΩ(18) (19) Q[ρ](x,ϵ)=S2ϒNQ[ρ](ϒN)TdΩ=i=1neρi(x)Qi(ϵ)(19)

Using the method of moments, we reduced the phase space by two dimensions at the cost of replacing a scalar equation with a system of (N+1)2 equations. Additionally, A(d) and Q exhibit structures that can be exploited to develop efficient numerical solvers. Q[ρ](x,ϵ) is diagonal as spherical harmonics are eigenfunctions of the collision operator Q[ρ] such that the computational cost of the scattering operation grows linearly in the number of moments.

The transport matrices A(d) inherit a sparsity pattern from the recursion relation of spherical harmonics. When ordering spherical harmonics that are odd (even) in Cartesian direction d into a vector function ϒN[e,d]:S2R(N+1)(N+2)/2 (resp. ϒN[o,d]:S2RN(N+1)/2) and choosing ϒN=((ϒN[e,d])T,(ϒN[o,d])T)T the matrix A(d) is of the following form.(20) A(d)=(0Â(d)TÂ(d)0)(20) (21) Â(d)=S2ΩdϒN[o,d](ϒN[e,d])TdΩ(21)

Hence, the even (odd) moments uα[e,d] couple with the odd (even) moments uα[o,d] only through their spatial derivatives in direction d, which can be employed by central finite difference schemes on staggered grids (Seibold and Frank Citation2014).

We employ the boundary conditions introduced in Bünger, Sarna, and Torrilhon (Citation2022). The boundary conditions follow from taking half moments of the Boltzmann boundary condition Equation(13) with odd (in direction d{1,2,3}) spherical harmonics as weighting functions and a stabilization step that is similar to the truncation of the series expansion in Equation(15). For a given boundary point xG with outward pointing boundary normal ed we formulate boundary conditions for the moment vector uα=((uα[e,d])T,(uα[o,d])T)T. The boundary conditions are given by(22) uα[o,d]=L(d)Â(d)uα[e,d]+gα(22) (23) L(d)=2edTΩ01ΩdϒN[o,d](ϒN[o,d])TdΩ and gα=edTΩ0ψα,inϒN[o,d]dΩ.(23)

These boundary conditions are compatible with the characteristics of the PN Equationequations (17) and allow us to assure energy stability, see Bünger, Sarna, and Torrilhon (Citation2022) for details. From Equation(12) we derive the following initial condition for the moments uα.(24) uα(x,ϵ=ϵmax)=S2ϒN(Ω)ψα,0(x,Ω)dΩ=0(24)

The PN Equationequations (17) describe the evolution of all (N+1)2 moments uα of ψα, whereas the intensity model Equation(8) only requires the angular average of ψα because we assumed the isotropic emission of x-rays. The spherical harmonic Y00(Ω) is constant, so we identify the moment uα0,0S2ψα(x,ϵ,Ω)dΩ as part of our x-ray emission model and simplify the integral Equation(8) to the following.(25) Iα(x)=Eσα(ϵ)S2ψα(x,ϵ,Ω)dΩdϵEσα(ϵ)uα0,0(x,ϵ)dϵ(25)

2.5. Mass concentration model

Typically, the quantities of interest in EPMA are weight fractions ωi(x)=Mi(dx)M(dx) that form a partition of unity i=1neωi(x)=1. In order to determine mass concentrations ρ(x) from weight fractions ω(x), additional assumptions are necessary. We neglect the influence of different molecular or crystal structures and assume that the compound {ω1,,ωne} is formed in a volume-preserving manner. Then the total volume is V(dx)=i=1neVi(dx)=i=1neMi(dx)ρipure=i=1neωi(x)M(dx)ρipure. We derive the following relation between mass fractions and mass concentrations.(26) ρi(x)=ωi(x)(j=1neωj(x)ρjpure)1:=ρtot(x)(26)

As a first possibility, we model weight fractions ωipc(x) as a piecewise-constant function. We subdivide the spatial domain G into nkN disjoint subdomains Gk and introduce the indicator function 1Gk(x). To guarantee the partition of unity we only use np=nk×(ne1) parameters pi,kpc and define the parametrization(27) ωipc(x)={k=1nkpi,kpc1Gk(x)i{1,ne1}1k=1nkj=1ne1pj,kpc1Gk(x)i=ne(27) with the additional constraints, that pi,kpc0 and j=1ne1pj,kpc1. Together with Equation(26), Equation(27) forms the piecewise-constant material parametrization Ppc:P(GRne). In the examples, we will replace the weight fraction parametrization ωpc(x) to investigate other parametrizations P(p).

3. Differentiation of the forward model

As described in the introduction, we consider the inverse problem as a minimization problem and apply iterative gradient-based optimization methods to find a candidate solution p*.(28) p*=arg minpP(JkP)(p)(28)

Gradient-based optimization methods require the gradient of the objective function dJdp. An efficient implementation of the gradient of our model, that simultaneously remains extensible and modular (particularly regarding the parametrization) is a challenge. While automatic algorithmic differentiation tools have the potential to transform any numerical code, the current versions of AD tools in julia (Zygote.jl, ReverseDiff.jl) cannot be applied to our model due to memory limitations and lack of performance.

In the following, we present a method, that adopts the structural implementation of derivative code of numerical code from algorithmic differentiation (AD). We focus on the adjoint mode of AD, since we are mainly interested in the gradient of the scalar objective function. The advantage of the adjoint mode is that it scales with the number of outputs, whereas with numerical differentiation (finite differences) or the tangent mode of AD, the complexity scales with the number of inputs. Also, the analytical derivation of our method provides a clear understanding that facilitates its memory-efficient implementation.

The core idea of AD is to base the derivative computation on local function transformations that are automatically composed. This also achieves modularity, i.e., if a particular function is modified, only the corresponding transformation needs to be adjusted.

3.1. Adjoint algorithmic differentiation

Algorithmic differentiation (AD) (Burghardt et al. Citation2022; Hüser Citation2022; Innes Citation2018; Kreyszig Citation1991; Naumann Citation2011) defines a framework to compute derivatives of numerical program code. In contrast to symbolic or numerical differentiation (finite differences), algorithmic differentiation is without approximation error and modular, so the code remains extensible. Also, the automatic differentiation of numerical code is possible. We review the basic definitions of AD with a (unusual) focus on high/infinite dimensional spaces. Alongside a motivating example, we describe the systematic composition into code that is able to compute derivatives.

Consider a differentiable function f:UXY,xf(x)=y, where X and Y are Hilbert spaces and U is the open domain of f. Given an input xU, we define the tangent operator f(x)x[·]:XY as the linear bounded operator that satisfies(29) limh0f(x+hx˙)f(x)h=f(x)x[x˙]x˙X.(29)

Additionally, given a tangent x˙X of the input, we define the tangent y˙ of the output by(30) y˙=f(x)x[x˙].(30)

Given the tangent operator f(x)x[·], the corresponding adjoint operator f(x)x*[·]:YX is defined by(31) y¯,f(x)x[x˙]Y=f(x)x*[y¯],x˙Xx˙X,y¯Y(31) where ·,·X and ·,·Y are the respective inner products of X and Y. Similarly to the tangent, given an adjoint y¯Y of the output, we define the adjoint x¯ of the input by(32) x¯=f(x)x*[y¯].(32)

Consequently, the identity y¯,y˙Y=x¯,x˙X holds for all x˙X and y¯Y.

In real, finite dimensional spaces (e.g., X=Rn,Y=Rm), the tangent operator can be represented by the Jacobian Df(x)Rm×n and the adjoint operator by its transpose Df(x)TRn×m. For examples in infinite dimensional (function) spaces, we refer to Sec. 3.2.

The core idea of AD is to subdivide a function into individual subfunctions (with individual tangent and adjoint operators) and intermediate variables (with intermediate tangents and adjoints). As an example, consider the following decomposition of f:UY,xf(x)=y into two subfunctions g:UV and h:V×UY where we use an intermediate variable vV (V is another Hilbert space).(33) v=g(x)y=h(v,x)(33)

Given xU,vV and yY as well as an input tangent x˙X, the tangent y˙ of is given by(34) y˙=f(x)x[x˙]=h(v,x)v[g(x)x[x˙]=v˙]+h(v,x)x[x˙](34) where we employ the chain rule to split the tangent operator of f into a combination of tangent operators of g and h. Additionally, we identify the tangent v˙ of the intermediate variable.

To find the corresponding adjoint operator, we successively use adjoint operators of the subfunctions g and h to isolate x˙ in the inner product identity(35) y¯,y˙Y=y¯,hv[v˙]+hx[x˙]Y=hv*[y¯],gx[x˙]=v˙V+hx*[y¯],x˙X=gx*[hv*[y¯]=v¯]+hx*[y¯],x˙X=x¯,x˙X.(35)

To summarize, given the adjoint y¯Y, the adjoints v¯ and x¯ are(36) x¯=g(x)x*[h(v,x)v*[y¯]=v¯]+h(v,x)x*[y¯].(36)

Based on the chain rule, the composition of individual tangent and adjoint operators of a larger decomposition into an algorithm to compute the derivative is very systematic and can be implemented automatically. We describe the methodology using an arbitrary decomposition of a function f:x ↦ f(x)=y with n inputs and m outputs into subfunctions ϕi=1,I.(37) (v0,v1,,v(n1))T=xvi=ϕi((vj)ji)i=1,,Iy=(vI,vI1,vIm)T.(37)

With (vj)ji we denote all predecessors of vi, i.e., all intermediate variables vj that vi directly depends on, and with (vj)ji we denote all successors of vi. Tangent mode AD is the successive computation of tangents(38) v˙i=kiϕi((vj)ji)vk[v˙k].(38)

Given an input tangent x˙, a tangent v˙i represents the directional derivative of vi into direction x˙. Note that the tangent mode follows the sequence of operations in the evaluation of the primal. In adjoint mode AD, the sequence of operations is reversed. To compute an adjoint v¯i, we consider the adjoints (v¯k)ki of all intermediate variables, that depend on vi.(39) v¯i=kiϕk((vj)jk)vi*[v¯k].(39)

Usually, in AD literature the single assignment code is considered, where all intermediate variables viR are scalar, and the tangent operators (resp. multiplication with a scalar) is self-adjoint. Then the ·* can be neglected. For real multivariate intermediate variables the adjoint is the transpose ·*=·T.

We briefly comment on some aspects of AD:

  • Computational Complexity: Implementations of tangent and adjoint mode AD differ in algorithmic complexity. The Jacobian of a function f:RnRm is obtained using tangent mode AD by j=1,,n successive executions of Equation(38) where the input tangents are seeded with unit vectors x˙=ej. Elements of the Jacobian are then retrieved from (Df(x))ij=y˙i|x˙=ej. In contrast to the tangent mode, using adjoint mode AD the Jacobian is obtained by i=1,,m successive applications of Equation(39) where the output adjoints are seeded with unit vector y¯=ei. Elements of the Jacobian are then retrieved from (Df(x))ij=x¯j|y¯=ei. Hence, to compute the full Jacobian, an implementation of tangent mode scales in the number of input variables while the adjoint mode scales in the number of output variables.

  • Modularity and Extensibility: An implementation of either tangent or adjoint mode AD is modular and extensible. It consists of a systematic composition of individual tangent and adjoint operators. For example, changing one subfunction in Equation(37) only affects its corresponding tangent and adjoint operator in Equation(38) resp. Equation(39). Implementations of the other operators remain.

  • Computational Graph: Both modes of AD can be visualized using a computational graph consisting of vertices for each variable (input, output, intermediate) and directed edges for direct dependence of variables. Assume a labeling of the edges with their corresponding tangent/adjoint operator applied to the tail(tangent)/head(adjoint) of the edge. The tangent is defined as the sum of all incoming edge tangent operators, while the adjoint is defined as the sum of all outgoing edge adjoint operators (c.f. Equations Equation(38) and Equation(39)).

  • Checkpointing – Adjoint Memory Consumption: While the tangents can be implemented alongside the execution of the primal function and intermediate primal variables (x, v) are available, they must be kept in memory for the computation of adjoints. Storing all intermediate variables is expensive. Checkpointing remedies the high memory consumption by recomputing certain intermediate variables while computing the adjoint operators. In the example Equation(36), the primal v=g(x) would be recomputed while evaluating the adjoint of h.

Our model includes the special case, that the mapping from ρ(x) to uα(x,ϵ) is given implicitly by the PDE in EquationEquation (17). We state the two basic steps of calculating adjoints in implicit relations and refer to the Appendix A for a more detailed derivation. Consider the implicit definition F(y=g(x),x)=0 (with differentiable F(y, x)), where we assume that it uniquely defines the differentiable mapping y=g(x). We introduce an additional intermediate adjoint variable F¯. Then, given the adjoint y¯ of the output, the adjoint x¯ of the input can be computed from(40) Fy*[F¯]=y¯ and x¯=Fx*[F¯](40) without explicit definition of x¯=g(x)x*[y¯].

3.2. Differentiation method for our model

In the following, we will successively define adjoint operators for the individual parts of the model described in Sec. 2. All derivations follow the same scheme: First, we define the tangent operator using the directional derivative and second we derive the adjoint operator from the inner product identity. We will elaborate on complex derivations and only state the adjoint operator for simple ones. In this paper, the only relevant derivatives are with respect to the mass concentrations ρ, hence all quantities that do not depend on ρ are treated as constants and their adjoints are omitted.

Compared to the forward computation, adjoints are computed in reversed order. We follow this behavior while deriving the adjoint operators for the individual dependencies. For brevity of notation we omit the accumulation of adjoints (tangents) and respectively only consider the operators that describe the direct propagation. The accumulation becomes apparent in the high-level computational graph in . If multiple edges leave a node, the adjoints of all outgoing edges have to be summed up.

  • [k¯α=Jkα*J¯]: The final operator Equation(1) of our model computes the squared error J from a set of simulated k-ratios kα,α{α1,}. Given the adjoint J¯, which for the computation of derivatives J· must be chosen J¯=1, the definition of the adjoint operator of the squared error from EquationEquation (1) is trivial.(41) k¯α=2(kαkαexp)J¯(41)

  • [I¯α=kαIα*k¯α]: Also the adjoint operator for normalization with standard intensities in EquationEquation (4) is derived by simple means.(42) I¯α=k¯αIαstd(42)

  • [{A¯α,N¯α,I¯α}=Iα{Aα,Nα,Iα}*I¯α]: We elaborate on the adjoint operators derived from EquationEquation (5). Linearity of the integral in Equation(5) directly yields the tangent I˙α. Using the standard definition of the inner products in R,a,bR=ab and in L2(G),a,bL2(G)=GaT(x)b(x)dx we derive adjoints A¯α,N¯α and I¯α. We detail A¯α.(43) I¯α,I˙αR=I¯α,GNα(x)Iα(x)A˙α(x)dxR=GNα(x)Iα(x)I¯αA˙α(x)dx=Nα(·)Iα(·)I¯α:=A¯α(·),A˙α(·)L2(G)(43)

    Accordingly, the adjoints N¯α(·) and I¯α(·) are derived.(44) N¯α(x)=Aα(x)Iα(x)I¯αandI¯α(x)=Nα(x)Aα(x)I¯α(44)

  • [ρ¯=Nαρ*N¯α]: From EquationEquation (7), we find the adjoint operator.(45) ρ¯i(x)|Ziα=N¯α(x)Aα(45)

  • [ρ¯=Aαρ*A¯α]: Deriving the adjoint from the attenuation operator given in Equation(6) requires more analysis. The tangent of the absorption operator A˙α[ρ](x) is given by the directional derivative into direction ρ̇(x).(46) A˙α[ρ](x)=Aα[ρ](x)l(x,xd)i=1neτα,iρ̇i(y)dy(46)

    By inserting the tangent A˙α into the definition of the adjoint Equation(31), we derive the adjoint of the attenuation operator. In the following derivation, we rewrite the line integral from Equation(46) as an integral over the whole space G by introducing δ(d(x,xd,y)) to denote the Dirac, that indicates the line segment l(x,xd). So d(x,xd,y) denotes the distance of y to the line segment l(x,xd) between x and xd.(47) A¯α,A˙αL2(G)=A¯α(·),Aα[ρ](·)l(·,xd)i=1neτα,iρ̇i(y)dyL2(G)=GA¯α(x)Aα[ρ](x)l(x,xd)i=1neτα,iρ̇i(y)dydx=GGi=1neA¯α(x)Aα[ρ](x)τα,iρ̇i(y)δ(d(x,xd,y))dydx=Gi=1neGA¯α(x)Aα[ρ](x)τα,iδ(d(x,xd,y))dxρ̇i(y)dy=Gi=1nel(y,xd*)A¯α(x)Aα[ρ](x)τα,idxρ̇i(y)dy=l(·,xd*)A¯α(x)Aα[ρ](x)(τα,1τα,ne)dx,ρ̇(·)L2(G)=ρ¯,ρ̇L2(G)(47)

    By swapping the order of integration dydx to dxdy, we are able to separate ρ̇i, but the interpretation of δ(d(x,xd,y)) changes. For a given y, δ(d(x,xd,y)) now indicates the points x of all lines l(x,xd) that contain y. In other words: δ(d(x,xd,y)) indicates the line l(y,xd*) from y to the point reflection xd*G of xdG at xG. Exemplarily, we illustrate l(x,xd),l(x,xd*) and d(x,xd,y) in . From Equation(47), we identify the adjoint operator of Equation(6).(48) ρ¯i(x)=τα,il(x,xd*)A¯α(y)Aα[ρ](y)dy(48) Where the integration domain l(x,xd*) is the line between x and the reflection xd* of xd at x.

  • [u¯α=Iαuα*I¯α]: In EquationEquation (25), the only relevant moment for the ionization field Iα is uα(0,0), hence the adjoint u¯α.(49) u¯αl,k(x,ϵ)={σα(ϵ)I¯α(x)l,k=00else(49)

  • [ρ¯=uαρ*u¯α]: The relation between ρ and uα is implicitly given by the PN equations and the boundary conditions, i.e., by the implicit PDE operator Fα(uα,ρ)=0 Equation(17). We describe the derivation of the adjoint operators Equation(40) for the PN equations. For a detailed derivation of Equation(40) we refer to the Appendix A. Let us introduce the auxiliary adjoint F¯α and derive the operators Fαuα*[·] and Fαρ*[·] in the following. Successively, both operators form the adjoint uαρ*[·]. In particular, we elaborate on the identities(50) F¯α,Fuαu˙αL2(G×E)=Fuα*F¯α=u¯α,u˙αL2(G×E)andF¯α,Fρρ̇L2(G×E)=Fρ*F¯α=ρ¯,ρ̇L2(G×E).(50)

  • [Fαuα*F¯α=u¯α]: The adjoint operator Fαuα* applied to F¯α describes the continuous adjoint equation.(51) S[ρ](x,ϵ)ϵF¯α(x,ϵ)d=13A(d)xiF¯α(x,ϵ)Q[ρ](x,ϵ)F¯α(x,ϵ)=Fαuα*F¯α=u¯α(x,ϵ)(51) In analogy to the reversed application of adjoint operators, the adjoint equation describes the evolution of the adjoint F¯α in reversed direction (w.r.t. the energy ϵ). Initial and boundary conditions follow from the derivation.(52) F¯α(x,ϵmin)=0(52) (53) F¯α[o,d]=L(d)Â(d)F¯α[e,n](53)

    Derivation Linearity of Fα in uα directly yields the tangent Fαuαu˙α. Using integration by parts and symmetry of A(d) and Q, we obtain(54) F¯α,Fαuαu˙α=EG(ϵ(Su˙α))TF¯α+d=13(A(d)xdu˙α)TF¯α(Qu˙α)TF¯α dx ​dϵ=EGϵ(Su˙αTF¯α)+u˙αT(SϵF¯α)+d=13xd(u˙αTA(d)F¯α)u˙αT(A(d)xdF¯α)u˙αT(QF¯α) dx ​dϵ=EGu˙αT(SϵF¯α)d=13u˙αT(A(d)xdF¯α)u˙αT(QF¯α) dx ​dϵ+EG(ϵ(Su˙αTF¯α))+d=13xd(u˙αTA(d)F¯α)dx ​dϵ=0=Fαuα*F¯α,u˙α=u¯α,u˙α(54)

    The last two terms in Equation(54) vanish due to the initial and boundary conditions imposed to u˙α and F¯α. The former term is zero because of the conditions u˙α(·,ϵmax)=0 and F¯α(·,ϵmin)=0.(55) EG(ϵ(Su˙αTF¯α))dx ​dϵ=G(Su˙αTF¯α)|ϵ=ϵmax=0u˙α(Su˙αTF¯α)|ϵ=ϵmin=0F¯αdx=0(55)

    The latter term in Equation(54) vanishes due to the choice of the boundary conditions for u˙α and F¯α. With the divergence theorem (n=(n1,n2,n3)T the boundary normal)(56) Gd=13xd(u˙αTA(d)F¯α)dx=Gd=13(u˙αTA(d)F¯α)nddS,(56) and the boundary conditions u˙e[o,d]=L(d)Â(d)u˙α[e,d] (g˙α=0) and F¯α[o,d]=L(d)Â(d)F¯α[e,d] the integrand in Equation(56) is zero because(57) u˙αTA(d)F¯α=(u˙α[e,d]u˙α[o,d])T(0Â(d)TÂ(d)0)(F¯α[e,d]F¯α[o,d])=(u˙α[e,d])TÂ(d)TF¯α[o,d]+(u˙α[o,d])TÂ(d)F¯α[e,d]=(u˙α[e,d])TÂ(d)T(L(d)Â(d)F¯α[e,d])+(L(d)Â(d)u˙α[e,d])TÂ(d)F¯α[e,d]=(u˙α[e,d])T(Â(d)TL(d)Â(d)+Â(d)TL(d)TÂ(d))=0F¯α[e,d]=0.(57)

    Using that L(d)=L(d)T, defined in Equation(23), is symmetric.

  • [ρ¯=Fαρ*F¯α]: The tangent Fαρρ̇ follows directly from linearity of Fα (and S and Q) in ρ. For the adjoint we find(58) F¯α,Fαρρ̇=F¯α,ϵ(i=1neSiρ̇iuα)i=1neQiρ̇iuαL2(G×E)=Gi=1neρ̇iElN,|k|lF¯αl,k(ϵ(Siuαl,k)Qiuαl,k)dϵdx=ElN,|k|lF¯αl,k(ϵ((S1Sne)uαl,k)+(Q1Qne)uαl,k)dϵ,ρ̇L2(G)=ρ¯,ρ̇L2(G),(58) and identify(59) ρ¯i(x)=EF¯αT(x,ϵ)(ϵ(Si(ϵ)uα(x,ϵ))+Qi(ϵ)uα(x,ϵ))dϵ.(59)

  • [ω¯pc=ρω*ρ¯]: The adjoint of the mass concentration model Equation(26) can again be derived by simple means.(60) ω¯ipc(x)=ρtot(x)ρi¯(x)ρtot2(x)ρipurek=1neωk(x)ρ¯k(x)(60)

  • [p¯=ωpcp*ω¯pc]: The indicator functions in Equation(27) lead to a partitioning of the integration domain G into Gk for the adjoints p¯i,k.(61) p¯i,k=Gkω¯ipc(x)ω¯nepc(x)dx(61)

Figure 2. The abstract computational graph of our model. Straight edges (bottom to top) correspond to the primal function evaluation. Bend edges (top to bottom) correspond to the adjoint evaluation and are labeled with the respective adjoint operators. Dashed boxes highlight the individual model evaluation for every experimental setup αA. Note that multiple adjoint edges entering a node correspond to the accumulation of the respective adjoint operators. This is not explicitly addressed in the derivation in section 3.2.

Figure 2. The abstract computational graph of our model. Straight edges (bottom to top) correspond to the primal function evaluation. Bend edges (top to bottom) correspond to the adjoint evaluation and are labeled with the respective adjoint operators. Dashed boxes highlight the individual model evaluation for every experimental setup α∈A. Note that multiple adjoint edges entering a node correspond to the accumulation of the respective adjoint operators. This is not explicitly addressed in the derivation in section 3.2.

Figure 3. The absorption path l(x,xd)={y|d(x,xd,y)=0} from x toward the detector position xd (solid line) and the path used for the adjoint l(x,xd*)={y|d(y,xd,x)=0} (dotted line). G is the computational domain.

Figure 3. The absorption path l(x,xd)={y|d(x,xd,y)=0} from x toward the detector position xd (solid line) and the path used for the adjoint l(x,xd*)={y|d(y,xd,x)=0} (dotted line). G is the computational domain.

This completes the derivation of adjoint formulations for all parts of the model presented in Sec. 2. Note that for nonlinear operators, the primary result is required to calculate the adjoint, and therefore must be saved during the forward execution or recalculated. This is analogous to discrete adjoint tools. Furthermore, all derived operators are modular and only need to be assembled according to the computational graph. When replacing parts of the model, only the corresponding adjoint operator needs to be adjusted. In the examples in Sections 4.3 and 4.4, we will use this modularity and exchange the parametrizations P.

3.3. Computational effort

The motivation for deriving and implementing the adjoint formulation of our model is that of the efficiency advantage over finite differences and the tangent mode of AD. Our implementation inherits the runtime complexity of the adjoint mode, as the runtime of a Jacobian calculation scales in the number of output variables of the primal. The price to pay for faster execution is the higher memory requirement since all primal variables have to be stored during forward execution or recalculated later.

Leveraging the convenience of automated AD tools to derive the discrete adjoint code of our implementation was not possible with current tools in julia. However, our approach somewhat mimics the development approach that tools, e.g., Zygote.jl or ChainRulesCore.jl take. Instead of implementing only tangent and adjoint versions of low-level operations, they aim for efficient implementations of high-level methods. We adopt this approach and derive an adjoint version in the continuous framework and implement discretized versions of the operators in code.

The high-level view offers further insight, and one more possibility to reduce the computational effort for the gradient. Note that the naive adjoint implementation requires |A| “solves” of the adjoint PDE (same amount as the forward). We separate the multi-index into α=(αL,αB)AL×AB where αL describes the k-ratio line (material, transition) and αB the beam setup (position, energy). We revisit EquationEquations Equation(44)Equation(49) and Equation(51). We realize that the only dependence of all equations from αB is due to the scalar I¯α. Instead of computing I¯α,u¯α and F¯α, we consider I˜αL=I¯αI¯α,u˜αL=u¯αI¯α,F˜αL=F¯αI¯α and eliminate thereby the dependence from the beam setup αB. The operators for I˜αL,u˜αL and F˜αL do not depend on αB. The accumulation of the adjoint ρ¯i (cf. EquationEquation (59), where we sum over all incoming edges in the computational graph ), can then be rewritten to(62) ρ¯i(x)=αAL×ABEF¯αT(x,ϵ)(ϵ(Si(ϵ)uα(x,ϵ))Qi(ϵ)uα(x,ϵ))dϵ=αLALF˜αLT(x,ϵ)EαBABI¯α(ϵ(Si(ϵ)uα(x,ϵ))Qi(ϵ)uα(x,ϵ))dϵ.(62)

Instead of solving the adjoint PDE for every F¯α it is sufficient to compute F˜αL for every αL. The number of expensive adjoint PDE solves that are required is reduced from |A| to |AL|.

4. Numerical examples

Preliminary results for the examples presented in section 4.3 and 4.4 can be found in (Achuda et al. Citation2023).

4.1. Implementation and discretization

Conceptually the discretization of the PN-equation Equation(17) and the adjoint PN-equation Equation(51) are the most challenging. For the numerical computation of the moments uα and F¯α we implemented a 3D staggered grid finite-difference method based on the method StaRMAP presented in (Seibold and Frank Citation2014). StaRMAP is specifically designed to exploit the structure of PN-equations. Individual sets of moments uα are discretized on the staggered grids in such a way, that sparsity of the transport matrices A(d) decouples the system and enables the use of second-order central differences that approximate the spatial derivative on the respective displaced grid. Second-order integration in energy is achieved using a splitting of the moments and half steps for each part of the solution.

The boundary conditions in Equation(22) and Equation(53) are used to compute moments on ghost nodes, located outside the computational domain G. Values of the moments for ghost nodes are set, such that the boundary condition holds for interpolated moments on the boundary of the domain G. For details see Bünger (Citation2021).

For integrations in energy and space as well as the line integral for the absorption we use the trapezoidal rule.

4.2. Comparison: sensitivities of a 1D material with finite differences

We consider a binary material consisting of copper Cu and nickel Ni and compare sensitivities Iαp computed using our implementation (resp. p¯ where I¯α is seeded with unit vectors) with sensitivities computed by central finite differences. By means of this comparison of our implementation to finite differences we address several questions at the same time:

  • By introducing a second parametrization, we exemplarily show the modularity of our implementation.

  • We provide insight into the nature of the model. The sensitivity Iαp is the linear approximation of intensities with respect to model parameters, hence a large sensitivity relates to a strong dependency and a small sensitivity to a weak dependency. This is particularly interesting for the inverse problem because it also unveils possible difficulties in the reconstruction.

  • Agreement with finite differences validates our implementation.

  • We demonstrate the claimed superiority of our method compared to finite differences in terms of computation time.

We introduce an alternative parametrization, where Λk(x),k{1,nk} are triangular functions covering the domain G, in particular k=1nkΛk(x)=1 xG. The partition of unity i=1neωi(x)=1 is (again) guaranteed by pi,kpl0 and j=1ne1pj,kpl1. We refer to the parametrization as piecewise-linear.(63) ωipl(x)={k=1nkpi,kplΛk(x)i{1,ne1}1k=1nkj=1ne1pj,kplΛk(x)i=ne(63)

The adjoint of the piecewise-linear parametrization p¯pl is given by(64) p¯i,kpl=GΛk(x)(ω¯ipl(x)ω¯nepl(x))dx.(64)

Modularity is set out by the fact, that only the parametrization and its adjoint are implemented, the rest of the code remains unchanged.

Now both parametrizations, the piecewise-constant parametrization (with nk = 20 parameters) and the piecewise-linear parametrization (also with nk = 20 parameters) are employed to compute sensitivities Iαppc resp. Iαppl. The physical setup and the settings of the PN approximation for this example can be found in . Parameter values for the parametrizations are sampled independently from a uniform distribution in [0,1].

Table 1. Model settings used for the comparison of 1D sensitivities.

We visualize the sensitivities by plotting ω1pc(x)|p=Iαppc and ω1pl(x)|p=Iαppl in . Using the parametrizations ω(x) is gentle abuse but since the parameters in ωpc and ωpl are related to space points, an interpretation of the plot is possible. Increasing a parameter, increases the mass concentration of Cu and simultaneously decreases the mass concentration of Ni, hence the positive values for Cu x-rays and the negative value for Ni x-rays. In depth the sensitivity decreases because with greater depth the generation of x-rays decreases and the absorption increases.

Figure 4. A visualization of the sensitivities computed using our method and finite differences. We abuse the parametrization function of each parametrization and plot ω1(x)|p=Iαp. The curves computed using our method (solid lines) coincide with the curves computed using finite differences (black dashed lines). A quantitative comparison of sensitivity values is given in .

Figure 4. A visualization of the sensitivities computed using our method and finite differences. We abuse the parametrization function of each parametrization and plot ω1(x)|p=∂Iα∂p. The curves computed using our method (solid lines) coincide with the curves computed using finite differences (black dashed lines). A quantitative comparison of sensitivity values is given in Table 2.

also compares sensitivities computed by our method (colored, solid lines) with the sensitivities computed by central finite differences (black, dashed lines). The curves coincide. Additionally, we compare relative errors (65) maxk|1Iαpk(IαpkIα(p+hek)Iα(phek)2h)|(65) of sensitivities in . Derivatives computed by algorithmic differentiation are exact (AD computes the exact derivative of the numerical approximation up to machine precision) (Naumann Citation2011). Our method as well as finite differences only provide approximations of the derivative. For our method, the underlying approximation of the adjoint state variable introduces errors, for finite differences the approximation error originates from the perturbation h.

Table 2. Relative errors in the sensitivities/derivatives of intensities for multiple x-rays αL and parametrization methods ω(x). Sensitivities/derivatives are computed using our implementation and using a central finite difference approximation of order 2.

In , the superiority of our method in terms of computation time compared to finite differences becomes apparent. We compare the runtime of a central finite difference method (2nd order) with our implementation for two different numbers of parameters (np = 20 and np = 40). The runtime of the forward model is 0.5s, and the computation of the derivative with finite differences clearly scales with the number of parameters np. The runtime of our implementation scales in the number of outputs (here |A|=4) and clearly does not scale with np. Note the minor overhead because of the backward execution.

Table 3. Runtimes of the computation of sensitivities/derivatives using our implementation and a central finite difference approximation of order 2 (FiniteDiff.jl).

Table 4. Model settings used for the reconstruction of the 1D interfaces.

4.3. 1D reconstruction of a sharp and a diffusive interface

We present the reconstruction of a coated material consisting of iron Fe and nickel Ni with a sharp (discontinuous) and a diffusive (continuous) interface between the layers.

  • We show the necessity of choosing a suitable parametrization.

  • We compare the previously defined parametrizations (Equation(27) and Equation(63)) to an additional non-linear parametrization Equation(66)

We introduce another parametrization that is based on the non-linear structure of a neural network (with x as the input and ωnl as the output. The activation function for one hidden layer is tanh() and sig() for the output layer to enforce 0ωinl1). The parametrization uses np = 10 parameters, which we call pnl=(a1,a2,a3,b1,b2,b3,â1,â2,â3,b̂)T.(66a) vi(x)=tanh(aix+biwi(x))i=1,2,3(66a) (66b) ω1nl(x)=sig(i=13âivi(x)+b̂=ŵ(x))(66b) (66c) ω2nl(x)=1ω1nl(x)(66c)

The adjoints for the non-linear parametrization are given in the following.(67a) a¯̂i=Gsig(ŵ(x))vi(x)(ω¯1nl(x)ω¯2nl(x))dx(67a) (67b) b¯̂=Gsig(ŵ(x))(ω¯1nl(x)ω¯2nl(x))dx(67b) (67c) v¯i(x)=sig(ŵ(x))âi(ω¯1nl(x)ω¯2nl(x))(67c) (67d) a¯i=Gtanh(wi(x))xv¯i(x)dx(67d) (67e) b¯i=Gtanh(wi(x))v¯i(x)dx(67e)

For a fair comparison, the number of parameters in the other parametrizations is also chosen to be 10. For the binary material the scalar ω1(x) completely describes the material, because ω2(x)=1ω1(x).

The artificial measurements for the reconstruction are k-ratios from multiple different beam energies between 9 keV and 15 keV. To compute the measurements we assume the reference material to be given. Additional PN model settings and the considered x-rays are tabulated in . We choose the BFGS method implemented in Optim.jl (Mogensen and Riseth Citation2018) as optimization method (with additional box-constraints for the piecewise-constant and the piecewise-linear parametrization, and no constraints for the non-linear parametrization).

In , the reconstructed density of the material with a sharp interface is visualized using the different parametrizations. From initial configurations (piecewise-constant, piecewise-linear: 0.5 Fe and 0.5 Ni; non-linear: random) the parametrizations iterate toward the reference density (black line) during the optimization. We visualize the initial, first, fifth iteration and the 100th iteration. The inability of the piecewise-linear parametrization to represent discontinuities becomes apparent. However, the piecewise-constant parametrization only approximates perfectly because one of the subdomain interfaces matches the interface of the reference material. On the other hand, the non-linear parametrization is flexible enough to identify the location of the interface and to approximate the discontinuity.

Figure 5. The reconstructed density of a material with a sharp interface between an iron Fe layer covering an Ni substrate. We use the piecewise-constant, the piecewise-linear and the non-linear parametrization for the reconstruction. For the piecewise-constant and the piecewise-linear parametrization, the geometry (interfaces) are visualized by gray vertical lines. The black line shows the reference density. All parametrizations converge. The piecewise-constant parametrization has a clear advantage because the interface aligns with an interface of the parametrization. The piecewise-linear parametrization cannot approximate the discontinuous interface, hence it converges to presented smoothed representation. The non-linear parametrization is flexible enough to identify the location of the interface and also approximates the discontinuity using a strong gradient.

Figure 5. The reconstructed density of a material with a sharp interface between an iron Fe layer covering an Ni substrate. We use the piecewise-constant, the piecewise-linear and the non-linear parametrization for the reconstruction. For the piecewise-constant and the piecewise-linear parametrization, the geometry (interfaces) are visualized by gray vertical lines. The black line shows the reference density. All parametrizations converge. The piecewise-constant parametrization has a clear advantage because the interface aligns with an interface of the parametrization. The piecewise-linear parametrization cannot approximate the discontinuous interface, hence it converges to presented smoothed representation. The non-linear parametrization is flexible enough to identify the location of the interface and also approximates the discontinuity using a strong gradient.

In , the reconstructed density of the material with a diffusive interface is visualized. Except for the reference material, exactly the same settings as for the reconstruction of the sharp interface are used. Again, the parametrizations iterate toward the reference density (black line). For the diffusive interface, none of the parametrizations can reconstruct the interface perfectly, but the piecewise-linear and the non-linear parametrizations perform better than the piecewise-constant parametrization.

Figure 6. The reconstructed density of a material with a diffusive interface between an iron Fe layer covering an Ni substrate. We use the piecewise-constant, the piecewise-linear and the non-linear parametrization for the reconstruction. For the piecewise-constant and the piecewise-linear parametrization, the geometry (interfaces) are visualized by gray vertical lines. The black line shows the reference density. All parametrizations converge. None of the parametrizations can represent the interface perfectly. The piecewise-linear and the non-linear parametrization perform better for this example, because of their flexibility to approximate continuous functions.

Figure 6. The reconstructed density of a material with a diffusive interface between an iron Fe layer covering an Ni substrate. We use the piecewise-constant, the piecewise-linear and the non-linear parametrization for the reconstruction. For the piecewise-constant and the piecewise-linear parametrization, the geometry (interfaces) are visualized by gray vertical lines. The black line shows the reference density. All parametrizations converge. None of the parametrizations can represent the interface perfectly. The piecewise-linear and the non-linear parametrization perform better for this example, because of their flexibility to approximate continuous functions.

In , we visualize the normalized error, the value of the objective function ||k(p)kexp||2, the error of the mass concentrations ||ρ(x)ρtrue||2 and the error of additional k-ratio measurements ||k+(p)k+exp||2 (with beam energies 11 keV and 14 keV) during the iterations of the optimization. The kinks in the error of the objective function are an artifact of the Optim.jl implementation. Their default implementation is based on the combination of a line search method (Hager and Zhang Citation2005) and the BFGS algorithm presented in (Nocedal and Wright Citation2006).

Figure 7. The value of the objective function and additional reconstruction quality measures during the iterations of the optimization for all parametrizations. The objective function ||k(p)kexp||2 is visualized as a solid line. Also, the error in the mass concentrations ||ρ(x)ρtrue(x)||2 (dashed line) and the error of k-ratios ||k+(p)k+exp||2 (dotted line) from additional beam energies (11keV and 14keV) is plotted.

Figure 7. The value of the objective function and additional reconstruction quality measures during the iterations of the optimization for all parametrizations. The objective function ||k(p)−k exp ||2 is visualized as a solid line. Also, the error in the mass concentrations ||ρ(x)−ρtrue(x)||2 (dashed line) and the error of k-ratios ||k+(p)−k+ exp ||2 (dotted line) from additional beam energies (11keV and 14keV) is plotted.

Differences in the performance of the different parametrizations are clearly visible. The error in the mass concentrations for the piecewise-constant parametrization for the sharp interface is by far the smallest, due to the possible perfect reconstruction. Only the non-linear parametrization performs similarly well for both examples.

Comparing the error in mass concentrations for piecewise-constant and piecewise-linear parametrization for the sharp and the diffusive interface, they vary. A correlation between the error in mass concentrations and the error of additional k-ratios would be beneficial but is hard to obtain visually. Ideally, both would behave similarly, because then we could propose the error in additional k-ratios as a suitable reconstruction quality measure. However, the evaluation of reconstruction quality measures is beyond the scope of this work and is deferred to further research.

4.4. 2D reconstruction of an ellipsoidal inclusion

We describe the use case of a reconstruction of an ellipsoidal copper Cu inclusion in an iron Fe substrate from k-ratios obtained from a 12 keV electron beam and multiple beam positions. In , we plot k-ratios obtained from an artificial line-scan of the inclusion. Only the ten k-ratios retrieved from five beam positions μy={50,25,0,25,50} nm, which are marked by black crosses, are used for the reconstruction. To compute the k-ratios, the reference material is assumed to be given. We visualize the total density ρtot of the reference material in and additionally show the ionization distributions of (Cu,KL2) and (Fe,KL2) that define the size of the interaction volume. The ellipsoidal Cu inclusion is clearly smaller than the interaction volumes. Additional settings of the k-ratio model are shown in .

Figure 8. The k-ratio profile of the elliptical Cu inclusion in the Fe substrate computed using a 12 keV electron beam. Dashed lines show the k-ratio profiles of (Cu,KL2) (blue) and (Fe,KL2) (orange) with a high beam resolution. Black markers show the k-ratios which are used for the reconstruction. While not directly visible, the shape and height of the k-ratio curves encode information about the shape and location of the inclusion and the structure of the interface between inclusion and substrate.

Figure 8. The k-ratio profile of the elliptical Cu inclusion in the Fe substrate computed using a 12 keV electron beam. Dashed lines show the k-ratio profiles of (Cu,K−L2) (blue) and (Fe,K−L2) (orange) with a high beam resolution. Black markers show the k-ratios which are used for the reconstruction. While not directly visible, the shape and height of the k-ratio curves encode information about the shape and location of the inclusion and the structure of the interface between inclusion and substrate.

Figure 9. The total density and two ionization distribution fields of the reference material. The ionization distribution curves determine the size of the interaction volume. The size of the ellipsoidal Cu inclusion is smaller than the interaction volume.

Figure 9. The total density and two ionization distribution fields of the reference material. The ionization distribution curves determine the size of the interaction volume. The size of the ellipsoidal Cu inclusion is smaller than the interaction volume.

Table 5. Model settings used for the 2D reconstruction of the ellipsoidal Cu inclusion.

The profile shows an increase in the (Cu,KL2) k-ratio for a beam position close to 25 nm. We assume an elliptical Cu inclusion which we parametrize by p=(μ1,μ2,a,b,r,â,b̂)T, where μ1 and μ2 specify position, a and b specify scaling factors of the principal axes, r specifies rotation and â and b̂ specify the inclusion interface. The weight fractions ω(x) are given by the following.(68a) v(x)=((x1μ1)cos(r)+(x2μ2)sin(r))2a2+((x1μ1)sin(r)(x2μ2)cos(r))2b2(68a) (68b) ω1(x)=sig(âv(x)+b̂ŵ(x))(68b) (68c) ω2(x)=1ω1(x)(68c)

And their adjoint(69a) a¯̂=Gsig(ŵ(x))v(x)(ω¯1(x)ω¯2(x))dx(69a) (69b) b¯̂=Gsig(ŵ(x))(ω¯1(x)ω¯2(x))dx(69b) (69c) v¯(x)=sig(ŵ(x))â(ω¯1(x)ω¯2(x))(69c) (69d) {μ¯1,μ¯2,a¯,b¯,r¯}=Gv(x){μ1,μ2,a,b,r}v¯(x)dx(69d)

For the reconstruction, we use the L-BFGS algorithm implemented in Optim.jl (Mogensen and Riseth Citation2018). The total density of the material ρtot during the iteration steps is visualized in . From the initial guess (educated guess: higher copper concentration at x 30 nm, see (a)) the reconstruction converges to the reference material (see (a)). Alongside the total density of the reconstructed material, we visualize the k-ratio profiles of (Cu,KL2) and (Fe,KL2) in . The shapes of the curves quickly coincide with the measured k-ratios (black crosses). At iteration 30 the measured k-ratios are already well approximated, the total density, however, is not. For sufficient reconstruction of the ellipsoidal structure, the error in the k-ratios (the objective function) must be further reduced. In , we show the value of the objective function ||k(p)kexp||2 and the error in mass concentrations ||ρ(p)ρtrue||2 during the iterations of the optimization. For the first 50 iterations, the value of the objective function decreases, whereas the error in the mass concentrations remains close to the initial error. Only after the first 50 iterations, the error in the mass concentrations also decreases. In , this effect is also observed, where only after iteration 90 the shape of the reference ellipse starts to form.

Figure 10. The reconstructed density ρtot of the Fe material with a Cu inclusion during the iterations of the L-BFGS optimization.

Figure 10. The reconstructed density ρtot of the Fe material with a Cu inclusion during the iterations of the L-BFGS optimization.

Figure 11. The k-ratio profile of (Cu,KL2) and (Fe,KL2) during the iterations of the reconstruction. Black crosses mark the measured k-ratios which are considered in the objective function ||k(p)kexp||2 (cf. ). The visualized iterations are the same as in . Note that the k-ratios quickly converge to the measured values, although the shape of the ellipse does not yet coincide with the shape of the ellipse in the true material .

Figure 11. The k-ratio profile of (Cu,K−L2) and (Fe,K−L2) during the iterations of the reconstruction. Black crosses mark the measured k-ratios which are considered in the objective function ||k(p)−k exp ||2 (cf. 8). The visualized iterations are the same as in 10. Note that the k-ratios quickly converge to the measured values, although the shape of the ellipse does not yet coincide with the shape of the ellipse in the true material (9).

Figure 12. Errors of the objective function ||k(p)kexp||2 and the error of the mass concentrations ||ρ(p)ρtrue||2 during the iterations of the L-BFGS optimization. Although the objective function significantly decreases for the first 50 iterations, the error of the mass concentrations remains close to the initial error. The same behavior is noticed in and .

Figure 12. Errors of the objective function ||k(p)−k exp ||2 and the error of the mass concentrations ||ρ(p)−ρtrue||2 during the iterations of the L-BFGS optimization. Although the objective function significantly decreases for the first 50 iterations, the error of the mass concentrations remains close to the initial error. The same behavior is noticed in 10 and 11.

5. Conclusion

In this paper, we derive an adjoint approach to compute derivatives for a deterministic k-ratio model to be applied in reconstruction in EPMA. The k-ratio model is based on the PN-model, a moment expansion of the continuous slowing down approximation of the linear Boltzmann equation that is pioneered for EPMA in Bünger, Richter, and Torrilhon (Citation2022). We formulate the model with a focus on the inverse problem and describe the application of the model within an adjoint algorithmic differentiation framework. This work thus represents a building block in the development of a high-resolution reconstruction method that uses gradient-based optimization techniques. By replacing material parametrizations, experiments can be realized that reconstruct different material parameters and take into account different prior knowledge.

We conclude the paper with a validation of our differentiation method and reconstruction experiments. The experiments show the influence of the material parametrization in the reconstruction and illustrate our idea of an application of our method. The combination of measurements with prior knowledge makes high-resolution reconstruction problems tractable with a reasonable amount of measurement data. At the same time, we demonstrate the extensibility of our implementation to various material structures.

High-resolution imaging, i.e., the reconstruction of material structures that are smaller than the interaction volume, is possible with sufficient data or sufficient material assumptions. But there will always be a balance between the accuracy of the measurement and the model to the accuracy of the reconstruction. Especially for high resolution, the analysis of this relation is key. To achieve a reliable reconstruction result, the quantification and propagation of measurement and model uncertainties are necessary.

Data/code availability

The source code accompanying this article is made available via GitHub: https://github.com/tam724/pnepma.

Disclosure statement

The authors declare no conflict of interest.

Additional information

Funding

The authors gratefully acknowledge funding from the German Research Foundation (DFG) under grant number TO 414/7-1 as well as additional funding contributed under the excellence strategy of the German federal and state governments.

Notes

1 In IUPAC notation (Jenkins et al. Citation1991), an x-ray level corresponds to an electron configuration that is typically described by the removal of an electron from the configuration of neutral ground state (e.g., K, L2). A characteristic x-ray is emitted by transitioning from an initial to a final x-ray level, namely the x-ray transition (e.g., KL2).

2 In the literature the mass attenuation coefficient is commonly denoted using (μ/ρ)α,i. It depends on the medium i and the x-ray wavelength α. For brevity of notation, we write τα,i.

3 For x-ray transitions αL where the initial state is K, the x-ray production cross-section can be replaced by the ionization cross-section of the K shell (k-ratio normalization eliminates the fluorescence yield). For other transitions, e.g. L or M as the final state, the x-ray production cross-section should include the ionization cross-section of all lower shells weighted by transition probabilities.

4 For brevity of notation we divide both quantities by the respective atomic masses.

References

  • Achuda, G., T. Claus, S. Richter, and M. Torrilhon. 2023. Subscale inversion of x-ray emission in electron probe microanalysis based on deterministic transport equations. (submitted to Proceedings of the 17th European Workshop on modern developments and applications in microbeam analysis - EMAS2023).
  • Bünger, J. 2021. Three-dimensional modelling of x-ray emission in electron probe microanalysis based on deterministic transport equations. PhD thesis. RWTH Aachen University, Number: RWTH-2021-05180.
  • Bünger, J., N. Sarna, and M. Torrilhon. 2022. Stable boundary conditions and discretization for PN equations. JCM. 40 (6):977–1003. 10.4208/jcm.2104-m2019-0231
  • Bünger, J., S. Richter, and M. Torrilhon. 2022. A model for characteristic x-ray emission in electron probe microanalysis based on the (filtered) spherical harmonic (PN) method for electron transport. Microsc. Microanal. 28 (2):454–68. 10.1017/S1431927622000083
  • Burghardt, O., P. Gomes, T. Kattmann, T. D. Economon, N. R. Gauger, and R. Palacios. 2022. Discrete adjoint methodology for general multiphysics problems. Struct. Multidisc. Optim. 65 (1):28. 10.1007/s00158-021-03117-5
  • Buse, B., and S. L. Kearns. 2020. Spatial resolution limits of EPMA. IOP Conf. Ser: Mater. Sci. Eng. 891 (1):012005. 10.1088/1757-899X/891/1/012005
  • Carpenter, P. K., and B. L. Jolliff. 2015. Improvements in EPMA: Spatial resolution and analytical accuracy. Microsc. Microanal. 21 (S3):1443–4. 10.1017/S1431927615007990
  • Case, K. M., and P. F. Zweifel. 1967. Linear transport theory. Addison-Wesley series in nuclear engineering. Reading, Mass: Addison-Wesley.
  • Dorigo, T., A. Giammanco, P. Vischia, M. Aehle, M. Bawaj, A. Boldyrev, P. de Castro Manzano, D. Derkach, J. Donini, A. Edelen, et al. 2023. Toward the end-to-end optimization of particle physics instruments with differentiable programming. Rev. Phys. 10:100085. 10.1016/j.revip.2023.100085
  • Griewank, A. 2003. A mathematical view of automatic differentiation. Acta Numer. 12:321–98. 10.1017/S0962492902000132
  • Hager, W. W., and H. Zhang. 2005. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16 (1):170–92. 10.1137/030601880
  • Hüser, J. 2022. Discrete tangent and adjoint sensitivity analysis for discontinuous solutions of hyperbolic conservation laws. PhD thesis. RWTH Aachen University.
  • Innes, M. 2018. Don’t unroll adjoint: Differentiating SSA-form programs. CoRR.
  • Jenkins, R., R. Manne, R. Robin, and C. Senemaud. 1991. Iupac—nomenclature system for x-ray spectroscopy. X-Ray Spectrom. 20 (3):149–55. 10.1002/xrs.1300200308
  • K. F. J. Heinrich and D. Newbury, eds. 1991. Electron probe quantitation. USA: Springer.
  • Kreyszig, E. 1991. Introductory functional analysis with applications. Wiley Classics Library. Wiley.
  • Larsen, E. W., M. M. Miften, B. A. Fraass, and I. A. Bruinvis. 1997. Electron dose calculations using the method of moments. Med. Phys. 24 (1):111–25. 10.1118/1.597920
  • Llovet, X., A. Moy, P. T. Pinard, and J. H. Fournelle. 2021. Electron probe microanalysis: A review of recent developments and applications in materials science and engineering. Prog. Mater. Sci. 116:100673. 10.1016/j.pmatsci.2020.100673
  • Mark, J. C. 1944. The spherical harmonic method, part I. Technical Report MT 92, National Research Council of Canada.
  • Mark, J. C. 1945. The spherical harmonic method, part II. Technical Report MT 97, National Research Council of Canada.
  • Mogensen, P. K., and A. N. Riseth. 2018. Optim: A mathematical optimization package for Julia. Joss. 3 (24):615. 10.21105/joss.00615
  • Moy, A., and J. Fournelle. 2017. Analytical spatial resolution in EPMA: What is it and how can it be estimated? Microsc. Microanal. 23 (S1):1098–9. 10.1017/S1431927617006158
  • Moy, A., and J. Fournelle. 2020. Badgerfilm: An open source thin film analysis program. Microsc. Microanal. 26 (S2):496–8. 10.1017/S1431927620014853
  • Moy, A., J. H. Fournelle, and A. von der Handt. 2019. Solving the iron quantification problem in low-kV EPMA: An essential step toward improved analytical spatial resolution in electron probe microanalysis—Olivines. Am. Mineralogist 104 (8):1131–42. 10.2138/am-2019-6865
  • Naumann, U. 2011. The art of differentiating computer programs. Philadelphia: Software, Environments and Tools. Society for Industrial and Applied Mathematics.
  • Nocedal, J., and S. Wright. 2006. Numerical optimization. Springer series in operations research and financial engineering. 2 ed. New York: Springer-Verlag.
  • Olbrant, E. 2012. Models and numerical methods for time- and energy-dependent particle transport. PhD thesis. Aachen, 2012. Aachen, Techn.
  • Pinard, P. T., A. Schwedt, A. Ramazani, U. Prahl, and S. Richter. 2013. Characterization of dual-phase steel microstructure by combined submicrometer EBSD and EPMA carbon measurements. Microsc. Microanal. 19 (4):996–1006. 10.1017/S1431927613001554
  • Plessix, R.-E. 2006. A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J.l Int. 167 (2):495–503. 10.1111/j.1365-246X.2006.02978.x
  • Reimer, L. 1998. Scanning electron microscopy: Physics of image formation and microanalysis. 2 ed. Springer Series in Optical Sciences. Berlin Heidelberg: Springer-Verlag.
  • Salvat, F., M. J. Berger, A. Jablonski, I. K. Bronic, J. Mitroy, C. J. Powell, and L. Sanche. 2007. Elastic Scattering of Electrons and Positrons. ICRU Report 77. International Commission on Radiation Units & Measurements, Bethesda, MD.
  • Seibold, B., and M. Frank. 2014. Starmap—a second order staggered grid method for spherical harmonics moment equations of radiative transfer. ACM Trans. Math. Softw. 41 (1):1–28. 10.1145/2590808
  • Stuart, A. M. 2010. Inverse problems: A Bayesian perspective. Acta Numer. 19:451–559. 10.1017/S0962492910000061
  • Tarantola, A. 2005. Inverse problem theory and methods for model parameter estimation. Society for Industrial and Applied Mathematics.

Appendix A.

Adjoint continuous implicit differentiation

Consider the implicit definition: Given xX, find yY such that F(y=g(x),x)=0. We refer to the implicit function with y=g(x) and assume that it is unique at x. Given a (tangent) direction x˙, we realize that the directional derivative of F in direction x˙ does not change. Employing the chain rule we find the following. (70) hF(g(x+hx˙),x+hx˙)|h=0=F(y,x)y[g(x)x[x˙]]+F(y,x)x[x˙]=0(70)

We consider the previous statement in a weak sense, meaning we test the constraint with F¯T, where T is an appropriate space.(71) F¯,F(y,x)y[g(x)x[x˙]]+F(y,x)x[x˙]T=0F¯T(71)

We identify the tangent of y with y˙=gx[x˙] and use adjoint operators Fy*[·] and Fx*[·] of the tangent operators Fy[·] and Fx[·] to derive the following.(72) F¯,F(y,x)y[y˙]T=F¯,F(y,x)x[x˙]T    x˙XF(y,x)y*[F¯]=y¯,y˙Y=F(y,x)x*[F¯]=x¯,x˙Xx˙X(72)

Comparing Equation(72) with the definition of the adjoint Equation(31) y¯,y˙Y=x¯,x˙X we derive the two steps for the implicit adjoint. Given y¯Y, if we can find F¯T such that Equation(73a) holds, the required adjoint x¯ is given by Equation(73b).(73a) F(y,x)y*[F¯]=y¯(73a) (73b) x¯=F(y,x)x*[F¯].(73b)

Example

We consider the Poisson-equation in 2D with homogeneous (Dirichlet) boundary conditions. Given a domain ΩR2 with ξΩ and x:ΩR a suitable right-hand side. The primal problem is to find y:ΩR with y(ξ)=0 ξΩ such that(74) F(y,x)=Δy(·)x(·)=0.(74)

To derive the adjoint, we first analyze the left-hand side of Equation(72). Using integration by parts twice, we derive the adjoint operator Fy*[·].(75) F¯,F(y,x)y[y˙]=F¯,Δy˙=ΩF¯(ξ)y˙(ξ)ny˙(ξ)F¯(ξ)ndS=0+ΔF¯=F(y,x)y*[F¯],y˙(75)

 The boundary integral in Equation(75) yields boundary conditions for the adjoint F¯. Since y˙(ξ)=0 ξΩ, we assign F¯(ξ)=0 ξΩ. From the right-hand side of Equation(72) we trivially find(76) F(y,x)x*[F¯]=F¯.(76)

Computing the adjoint x¯ therefore consists of the following steps. Given y¯(·) find F¯(·) with F¯(ξ)=0 ξΩ such that Equation(77a) holds. Then, the required adjoint x¯(·) is given by Equation(77b) (trivial for this example).(77a) ΔF¯=y¯(77a) (77b) x¯=F¯(77b)