Full article: An Adjoint Method for High-Resolution EPMA Based on the Spherical Harmonics (PN) Model of Electron Transport

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Electron Probe Microanalysis (EPMA) is a nondestructive technique to determine the chemical composition of material samples in the micro- to nanometer range. Based on intensity measurements of x-radiation, the reconstruction of the material composition and structure poses an inverse problem. The reconstruction methods currently applied are based on models that assume a homogeneous or a layered structure of the studied material. To increase the spatial resolution of reconstruction in EPMA the combination of a more sophisticated reconstruction method, which is based on a model that allows complex material structure, together with multiple measurements with varying beam configurations is required. We present a deterministic k-ratio model that is based on the P_N model, an approximation of the radiative transfer equation for electron transport. Our goal is to approximate a maximum likelihood solution of the inverse problem using gradient-based optimization. We detail the application of the model in the context of algorithmic differentiation, in particular by deriving its continuous adjoint formulation. Algorithmic differentiation provides the flexibility to adapt the reconstruction method to various material parametrizations and thus to regularize and take into account prior knowledge. Through examples, we verify our implementation and demonstrate the flexibility of the reconstruction/differentiation method.

Keywords:

1. Introduction

Electron Probe Microanalysis (EPMA) (Heinrich and Newbury Citation1991; Reimer Citation1998) is an imaging technique used for the quantitative analysis of the composition of solid material samples at the micro- to nanometer scale. The sample is excited by a focused beam of electrons which induces multiple relaxation processes inside the sample. In EPMA, the emission of characteristic x-rays is of special focus. If an electron that is induced by the beam strikes a bound electron that occupies an atomic shell of an atom inside the specimen, the bound electron is ejected from its shell and the atom is left with a vacancy. Outer shell electrons fill this vacancy by emitting a quantized x-ray with an energy corresponding to the energy level difference of the excited (initial) and the relaxed (final) level. The energy levels of electron shells are characteristic of a specific atom, hence the energy of the emitted x-ray provides information about the composition of the material sample. Reconstructing the chemical composition from intensities of emitted x-radiation (normalized into k-ratios ${(k_{α})}_{α \in A}, A :$ measurement setups) forms the inverse problem in EPMA. In , we outline the main physical processes that occur during an experiment.

Figure 1. A sketch of the physical processes in EPMA. The sample G is rastered by electron beams (blue) with different positions μ_x. Electrons e^– from the electron beam (blue dotted line) scatter inside the sample and strike bound electrons that leave a vacancy (blue circle). Outer shell electrons (blue discs) fill this vacancy and release an x-ray of characteristic energy (red wobbly line). The x-ray travels through the sample and is counted by a detector. Beam electrons only excite a certain volume of the sample, the interaction volume (gray ellipses) which scales with the beam energy $μ_{ϵ} .$

The crucial ingredient to the inverse problem of reconstruction is the definition of a model $k \circ P : P \to R^{| A |}$ that maps parameters $p \in P \subseteq R^{n_{p}}$ ( $P :$ parameter space, n_p: number of parameters) to k-ratios $(k \circ P) (p) \in R^{| A |}$ ( $| A | :$ number of measurements). Then the reconstruction can be formalized as the following inverse problem: Find the set of parameters $p^{*}$ such that the modeled k-ratios $(k \circ P) (p^{*})$ reproduce the experimental k-ratios $k^{\exp}$ as good as possible.

The applications of EPMA are manifold (Llovet et al. Citation2021; Moy, Fournelle, and von der Handt Citation2019; Pinard et al. Citation2013). From geology, and material science to electronics, researchers rely on EPMA to determine the material composition and structure of samples. Every application brings with it different prior knowledge and objectives that must be considered in the reconstruction. We subdivide our model into a k-ratio model $k : (G \to R^{n_{e}}) \to R^{| A |}, ρ \mapsto k [ρ]$ and a material parametrization $P : P \to (G \to R^{n_{e}}), p \mapsto P (p) = ρ .$ Thereby $ρ_{i} (x)$ is the mass concentration of element i in a compound with n_e constituents at a point $x \in G .$ The sample domain $G \subset R^{3}$ will be specified later. While the k-ratio model k can handle general materials, the reconstruction can be tailored to specific applications via the material parametrization $P$ (e.g., using prior knowledge). In particular, our definition includes reconstruction approaches currently applied in EPMA, which are however very restrictive on the material parametrizations $P .$ ZAF-models assume homogeneity of the material inside the interaction volume of each beam. Effectively, this assumption limits the spatial resolution of EPMA to the size of the interaction volume (Buse and Kearns Citation2020; Carpenter and Jolliff Citation2015; Moy and Fournelle Citation2017). Reconstruction methods based on the integration of $ϕ (ρ z)$ -curves currently only allow for samples that are layered in depth (Moy and Fournelle Citation2020). We aim to leverage a more sophisticated k-ratio model and the combination of k-ratio measurements from various beam positions and energies (resp. measurement setups $A$ ) to obtain a high-resolution spatial image of the material composition.

In this article, we present our model for k-ratios (taken from Bünger, Richter, and Torrilhon Citation2022) with a detail on the dependence on the mass concentrations ρ). The main part is dedicated to the derivation of an efficient and modular approach to compute the derivatives of our model, facilitating its integration into differentiable programming (Dorigo et al. Citation2023). We derive our differentiation approach based on the Adjoint Mode of Algorithmic Differentiation (Griewank Citation2003; Naumann Citation2011) and the Continuous Adjoint Method (Plessix Citation2006). The derivative forms the crucial part in the implementation of gradient-based optimization methods that find a local minimum of an objective function as a candidate solution of the inverse problem. We show the agreement of our gradient computation with finite differences and show the advantages of the adjoint method in terms of computation time. We also provide reconstruction experiments that compare different parametrizations in 1D and show a 2D reconstruction example.

1.1. The inverse problem of reconstruction

In theory (Stuart Citation2010; Tarantola Citation2005), the solution of the inverse problems is defined as knowledge of the posterior uncertainty of the parameters based on the likelihood of the data, which describes the uncertainty of the model and the measurement, and the prior uncertainty of the parameters. So far, we consider a Bayesian inversion of the presented problem as intractable. However, the statistical understanding interprets our approach as finding the maximum likelihood estimate under the assumption of an isotropic Gaussian likelihood. We define the reconstruction problem as the following optimization problem:(1) $p^{*} = \underset{p \in P}{arg \min} \underset{= (J \circ k \circ P) (p)}{\underset{⏟}{\sum_{α \in A} ((k_{α} \circ P) (p) - k_{α}^{\exp})^{2}}}$ (1)

Assuming that the objective function $J \circ k \circ P$ is differentiable w.r.t. p, gradient-based optimization methods (Steepest Descent, Conjugate Gradient, L-BFGS) (Nocedal and Wright Citation2006) can be utilized to iteratively approximate a local minimum $p^{*} \in P .$ For an efficient application of gradient-based methods, the efficient computation of the gradient is crucial. Additionally, we require our code to remain modular, such that either different parts of the model and, more importantly, the parametrization of the mass concentrations $P$ is exchangeable. We remark that the computation of maximum likelihood estimates can be ambiguous and ill-posed depending on the number of available measurements and the chosen parametrization. Additional prior information about the material would be necessary that could enter our minimization problem Equation(1)(1) $p^{*} = \underset{p \in P}{arg \min} \underset{= (J \circ k \circ P) (p)}{\underset{⏟}{\sum_{α \in A} ((k_{α} \circ P) (p) - k_{α}^{\exp})^{2}}}$ (1) as an additional regularization term $R (ρ (\cdot), p) .$ Regarding the ill-posedness of the inverse problem, we refer to further research and note that the addition of a differentiable regularization term is possible.

2. The forward model

The characteristic x-ray intensity model used in this article combines a model for electron transport based on the continuous slowing down approximation (CSD) of the linear Boltzmann equation (radiative transfer) with a subsequently applied model for x-ray generation and attenuation. In Bünger (Citation2021); Bünger, Sarna, and Torrilhon (Citation2022); Bünger, Richter, and Torrilhon (Citation2022), the model is derived in more detail. Here we derive and summarize the main equations and address the dependence on mass concentrations ρ in detail.

2.1. K-ratios

In EPMA, x-ray intensity is measured by counting all characteristic x-rays emitted by electron scattering induced by the electron beam. Because of multiple uncertain quantities concerning the experimental device, the x-ray intensity $I_{α}$ is normalized into a k-ratio $k_{α}$ using standard intensities $I_{α}^{std}$ measured from a known reference sample.(2) $k_{α} = \frac{I_{α}}{I_{α}^{std}}$ (2)

Therefore, the normalization eliminates uncertain multiplicative factors that influence the intensity, e.g., the detector efficiency.

An experiment yields multiple k-ratios. Each k-ratio corresponds to a specific x-ray transition and a specific experimental setup, e.g., the beam position, energy or angle. To distinguish between k-ratios, we use $α = (α_{L}, α_{B}) \in A$ to denote a multi-index, that contains all information about the transition α_L and the beam setup α_B.(3) $\begin{matrix} α = & (\underset{= α_{L}}{\underset{⏟}{Z_{i} \in {Z_{1}, Z_{2}, \dots, Z_{n_{e}}}, I \in {K, L 1, \dots, M 1, \dots}, F \in {L 1, \dots, M 1, \dots},}} \\ \underset{= α_{B}}{\underset{⏟}{(μ_{x, beam}, Σ_{x, beam}) \in R^{2} \times R^{2 \times 2}, (μ_{ϵ, beam}, Σ_{ϵ, beam}) \in R \times R, (μ_{Ω, beam}, κ_{Ω, beam}) \in S^{2} \times R}}) \end{matrix}$ (3)

The individual symbols are the atomic number $Z_{i} \in α_{L},$ the initial (excited) $I \in α_{L}$ and the final (relaxed) $F \in α_{L}$ x-ray level of the electron-transition,Footnote¹ the mean and variance of the beam’s position $(μ_{x, beam}, Σ_{x, beam}) \in α_{B},$ of the beam’s energy $(μ_{ϵ, beam}, Σ_{ϵ, beam}) \in α_{B}$ and the mean and concentration parameter of the beam’s direction $(μ_{Ω, beam}, κ_{Ω, beam}) \in α_{B} .$

2.2. Characteristic X-ray emission model

Based on mass concentrations ρ, we aim for a generic model for k-ratios $k_{α}$ which, according to the usual definition in EPMA, are computed as the ratio of a measured $I_{α}$ and a standard intensity $I_{α}^{std} .$ (4) $k_{α} [ρ] = \frac{I_{α} [ρ]}{I_{α} [ρ_{α}^{std}]}$ (4)

Both intensities can be computed from the same model, only the underlying mass concentrations, ρ resp. $ρ_{α}^{std},$ changes.

Neglecting multiplicative factors that cancel due to the normalization with standard intensities $I_{α}^{std},$ we formulate the intensity of characteristic x-rays $I_{α}$ as the integral of the attenuation field $A_{α},$ the number of atoms $N_{α}$ and the generation field of characteristic x-rays $I_{α}$ over a domain $G \subset R^{3}$ of the sample which is chosen such that G covers the support of $I_{α},$ i.e., $I_{α} (x) = 0 \forall x \in R^{3} ∖ G .$ (5) $I_{α} [ρ] = \int_{G} A_{α} [ρ] (x) N_{α} [ρ] (x) I_{α} [ρ] (x) d x$ (5)

The attenuation field $A_{α}$ describes the intensity reduction of generated x-rays while traveling through the sample toward the detector. Attenuation is quantified by Beer-Lamberts law where the attenuation coefficient $μ_{α}$ is the weighted sum of mass concentrations ρ_i and mass attenuation coefficients $τ_{α, i} .$ Footnote²(6) $A_{α} [ρ] (x) = \exp (- \int_{l (x, x_{d})} μ_{α} (y) d y) = \exp (- \int_{l (x, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} ρ_{i} (y) d y)$ (6)

In Equation(6)(6) $A_{α} [ρ] (x) = \exp (- \int_{l (x, x_{d})} μ_{α} (y) d y) = \exp (- \int_{l (x, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} ρ_{i} (y) d y)$ (6) the integration domain is the straight line $l (x, x_{d}) \subset G$ from a point $x \in G$ toward the detector position $x_{d} \in \partial G,$ such that the line $l (x, x_{d})$ lies inside the domain G. Neglecting the part outside the domain G assumes that x-rays travel unattenuated in the vacuum surrounding the material sample.

The number of atoms $N_{α}$ (per unit volume) is given by the division of the mass concentration field ρ_i by the atomic mass $A_{α}$ of the element $Z_{i} \in α .$ (7) $N_{α} [ρ] (x) = \frac{{ρ_{i} (x) |}_{Z_{i} \in α}}{A_{α}}$ (7)

The third factor that influences the x-ray intensities $I_{α}$ is the field of characteristic x-ray generation.(8) $I_{α} [ρ] (x) = \int_{E} σ_{α} (ϵ) \int_{S^{2}} \underset{= ψ_{α} (x, ϵ, Ω)}{\underset{⏟}{| v (ϵ) | f (x, ϵ, Ω)}} d Ω d ϵ$ (8)

It contains the electron fluence $ψ_{α} (x, ϵ, Ω) = | v (ϵ) | f (x, ϵ, Ω)$ ( $v (ϵ) :$ electron velocity, $f (x, ϵ, Ω) :$ number density of electrons) of beam electrons at $x \in G$ with energy $ϵ \in E = [ϵ_{cut}, ϵ_{\max}]$ into direction $Ω \in S^{2}$ inside the material probe which is weighted by the x-ray production cross-section $σ_{α} .$ Footnote³ The energy interval E is defined by the cutoff energy $ϵ_{cut},$ which is chosen so small that no more ionization can occur, and $ϵ_{\max},$ which is chosen sufficiently large to capture all beam electrons.

Describing the electron fluence $ψ_{α}$ forms the most difficult part of modeling the emission of characteristic x-rays for inhomogeneous samples. We detail our model for the electron fluence $ψ_{α}$ in the following section.

2.3. Deterministic electron transport model

The dynamics of the beam electrons are described by a linear Boltzmann equation, an integro-differential equation describing the evolution of electron number density in phase-space (position-velocity space), whereby the physical processes governing the evolution of the electron number density are free-flight phases interrupted by elastic and inelastic collisions with atoms of the material probe. Since the timescale of the physical processes is very small in comparison to the duration of the experimental measurements of x-ray intensities, it is valid to neglect the time dependence and assume a steady-state distribution of beam electrons immediately after switching on the electron beam. Furthermore, as inelastic scattering cross-sections are highly peaked around small energy losses, i.e., electrons most probably lose a significant amount of energy in a sequence of small energy losses, it is common in EPMA to model the energy loss of projectile electrons as a continuous deceleration along the trajectory. These simplifications lead to modeling the beam electrons by the Boltzmann equation of electron transport in continuous slowing down (CSD, resp. BCSD) approximation – an evolution equation in energy space given by (Larsen et al. Citation1997)(9) $- \partial_{ϵ} (S [ρ] (x, ϵ) ψ_{α} (x, ϵ, Ω)) + Ω \cdot \nabla_{x} ψ_{α} (x, ϵ, Ω) = (Q [ρ] ψ_{α}) (x, ϵ, Ω),$ (9) where the energy loss is governed by the stopping power S and the scattering operator $Q$ accounts for elastic and inelastic collisions (CSD-average). Both, the stopping power S and the scattering operator $Q$ are derived from cross-sections for compounds, which are defined by means of the additivity approximation. We use the cross-sections from Salvat et al. (Citation2007) that are also used in Bünger (Citation2021); Olbrant (Citation2012). Assuming a well-defined atomic mass, we can write both as the sum of atomic quantities S_i and $Q_{i}$ Footnote⁴ weighted by mass concentrations ρ_i.(10) $S [ρ] (x, ϵ) = \sum_{i = 1}^{n_{e}} ρ_{i} (x) S_{i} (ϵ)$ (10) (11) $(Q [ρ] ψ_{α}) (x, ϵ, Ω) = \sum_{i = 1}^{n_{e}} ρ_{i} (x) (Q_{i} ψ_{α}) (x, ϵ, Ω)$ (11)

The BCSD model describes the evolution of the electron fluence $ψ_{α}$ in energy space and consequently needs to be equipped with initial and boundary conditions. As the evolution is prescribed from high to low energies an initial condition defines the electron fluence at the maximal energy $ϵ_{\max}$ (12) $ψ_{α} (x, ϵ = ϵ_{\max}, Ω) = ψ_{α, 0} (x, Ω) = 0.$ (12)

We will initialize the electron fluence at an energy $ϵ_{\max}$ sufficiently larger than the mean beam energy $μ_{ϵ, beam}$ by $ψ_{α, 0} (x, Ω) = 0$ and model the electron beam by boundary conditions. The electron transport model can be equipped with Dirichlet-type boundary conditions by prescribing the incoming half of the electron fluence at the boundary(13) $ψ_{α} (x, ϵ, Ω) = ψ_{α, in} (x, ϵ, Ω) \forall Ω \in S^{2} : n \cdot Ω < 0$ (13) where n denotes the outward-pointing normal-vector at $x \in \partial G$ and $ψ_{α, in}$ is a given distribution of incoming particles. We model electron beams by a product of Gaussian distributions in space and energy together with a von-Mises-Fisher distribution in direction. An electron beam that is aligned with the outward-pointing normal n of the polished material surface and focused onto the point $μ_{x, beam} \in \partial G$ is modeled by(14) $ψ_{α, in} (x, ϵ, Ω) = N_{α, x} (x) N_{α, ϵ} (ϵ) F_{α, Ω} (Ω),$ (14) where $N_{α, x} (x)$ is the probability density function of a Gaussian with mean $μ_{x, beam} \in α$ and covariance $Σ_{x, beam} \in α .$ Respectively the distribution in energy $N_{α, ϵ} (ϵ)$ (Gaussian with mean $μ_{ϵ, beam} \in α$ and variance $Σ_{ϵ, beam} \in α$ ) and the distribution in direction $F_{α, Ω} (Ω)$ (von-Mises-Fisher with mean $μ_{Ω, beam} \in α$ and concentration $κ_{Ω, beam} \in α$ ) is defined.

2.4. Spherical harmonic approximation

A numerical solution of the electron transport model in EquationEquation (9)(9) $- \partial_{ϵ} (S [ρ] (x, ϵ) ψ_{α} (x, ϵ, Ω)) + Ω \cdot \nabla_{x} ψ_{α} (x, ϵ, Ω) = (Q [ρ] ψ_{α}) (x, ϵ, Ω),$ (9) is expensive due to the high dimension of phase space of $ψ_{α} (x, ϵ, Ω)$ and complexity of the scattering operator $Q .$ Therefore, we employ the method of moments to derive a modal approximation of the linear Boltzmann equation. We reduce the phase space by replacing the direction Ω as an independent variable by a small set of moments to model the electron fluence ψ. Here we consider the spherical harmonic (P_N) moment approximation (Case and Zweifel Citation1967; Mark Citation1944; Mark Citation1945; Seibold and Frank Citation2014), i.e., we approximate the electron fluence ψ by its series expansion in spherical harmonics $Y_{l}^{k} : S^{2} \to R$ truncated after some finite degree $l \leq N \in N .$ (15) $ψ_{α} (x, ϵ, Ω) \approx ψ_{α}^{P_{N}} (x, ϵ, Ω) = \sum_{l = 0}^{N} \sum_{k = - l}^{l} u_{α}^{l, k} (x, ϵ) Y_{l}^{k} (Ω)$ (15) where the expansion coefficients are the spherical harmonic moments of the angular distribution(16) $u_{α}^{l, k} (x, ϵ) = \int_{S^{2}} Y_{l}^{k} (Ω) ψ_{α} (x, ϵ, Ω) d Ω \forall l \leq N, k \leq | l |,$ (16) and we model the fluence through the system of evolution equations of its moments $u_{α}^{l, k}$ which we refer to as the P_N equations. The P_N equations are obtained by weakly enforcing EquationEquation (9)(9) $- \partial_{ϵ} (S [ρ] (x, ϵ) ψ_{α} (x, ϵ, Ω)) + Ω \cdot \nabla_{x} ψ_{α} (x, ϵ, Ω) = (Q [ρ] ψ_{α}) (x, ϵ, Ω),$ (9) for $ψ_{α}^{P_{N}}$ on the test space spanned by spherical harmonics up to degree N. By gathering the coefficients $u_{α}^{l, k}$ and ansatz functions $Y_{l}^{k}$ into vector functions $u_{α} : G \times E \to R^{{(N + 1)}^{2}}$ and $ϒ_{N} : S^{2} \to R^{{(N + 1)}^{2}}$ such that $ψ_{α}^{P_{N}} = {(ϒ_{N})}^{T} u_{α},$ the equation for $u_{α}$ can be written as $F_{α} (u_{α}, ρ) = 0 .$ (17) $\underset{= F_{α} (u_{α}, ρ)}{\underset{⏟}{- \partial_{ϵ} (S [ρ] (x, ϵ) u_{α} (x, ϵ)) + \sum_{d = 1}^{3} A_{(d)} \partial_{x_{d}} u_{α} (x, ϵ) - Q [ρ] (x, ϵ) u_{α} (x, ϵ)}} = 0 \forall x \in G, ϵ \in E$ (17)

With the following definitions for $A_{(d)}$ and $Q [ρ] .$ (18) $A_{(d)} = \int_{S^{2}} Ω_{d} ϒ_{N} {(ϒ_{N})}^{T} d Ω$ (18) (19) $Q [ρ] (x, ϵ) = \int_{S^{2}} ϒ_{N} Q [ρ] {(ϒ_{N})}^{T} d Ω = \sum_{i = 1}^{n_{e}} ρ_{i} (x) Q_{i} (ϵ)$ (19)

Using the method of moments, we reduced the phase space by two dimensions at the cost of replacing a scalar equation with a system of ${(N + 1)}^{2}$ equations. Additionally, $A_{(d)}$ and Q exhibit structures that can be exploited to develop efficient numerical solvers. $Q [ρ] (x, ϵ)$ is diagonal as spherical harmonics are eigenfunctions of the collision operator $Q [ρ]$ such that the computational cost of the scattering operation grows linearly in the number of moments.

The transport matrices $A_{(d)}$ inherit a sparsity pattern from the recursion relation of spherical harmonics. When ordering spherical harmonics that are odd (even) in Cartesian direction d into a vector function $ϒ_{N}^{[e, d]} : S^{2} \to R^{(N + 1) (N + 2) / 2}$ (resp. $ϒ_{N}^{[o, d]} : S^{2} \to R^{N (N + 1) / 2}$ ) and choosing $ϒ_{N} = {({(ϒ_{N}^{[e, d]})}^{T}, {(ϒ_{N}^{[o, d]})}^{T})}^{T}$ the matrix $A_{(d)}$ is of the following form.(20) $A_{(d)} = (\begin{matrix} 0 & {\hat{A}}_{(d)}^{T} \\ {\hat{A}}_{(d)} & 0 \end{matrix})$ (20) (21) ${\hat{A}}_{(d)} = \int_{S^{2}} Ω_{d} ϒ_{N}^{[o, d]} {(ϒ_{N}^{[e, d]})}^{T} d Ω$ (21)

Hence, the even (odd) moments $u_{α}^{[e, d]}$ couple with the odd (even) moments $u_{α}^{[o, d]}$ only through their spatial derivatives in direction d, which can be employed by central finite difference schemes on staggered grids (Seibold and Frank Citation2014).

We employ the boundary conditions introduced in Bünger, Sarna, and Torrilhon (Citation2022). The boundary conditions follow from taking half moments of the Boltzmann boundary condition Equation(13)(13) $ψ_{α} (x, ϵ, Ω) = ψ_{α, in} (x, ϵ, Ω) \forall Ω \in S^{2} : n \cdot Ω < 0$ (13) with odd (in direction $d \in {1, 2, 3}$ ) spherical harmonics as weighting functions and a stabilization step that is similar to the truncation of the series expansion in Equation(15)(15) $ψ_{α} (x, ϵ, Ω) \approx ψ_{α}^{P_{N}} (x, ϵ, Ω) = \sum_{l = 0}^{N} \sum_{k = - l}^{l} u_{α}^{l, k} (x, ϵ) Y_{l}^{k} (Ω)$ (15) . For a given boundary point $x \in \partial G$ with outward pointing boundary normal e_d we formulate boundary conditions for the moment vector $u_{α} = {({(u_{α}^{[e, d]})}^{T}, {(u_{α}^{[o, d]})}^{T})}^{T} .$ The boundary conditions are given by(22) $u_{α}^{[o, d]} = L_{(d)} {\hat{A}}_{(d)} u_{α}^{[e, d]} + g_{α}$ (22) (23) $L_{(d)} = 2 \int_{e_{d}^{T} Ω \geq 0} \frac{1}{Ω_{d}} ϒ_{N}^{[o, d]} {(ϒ_{N}^{[o, d]})}^{T} d Ω and g_{α} = \int_{e_{d}^{T} Ω \leq 0} ψ_{α, in} ϒ_{N}^{[o, d]} d Ω .$ (23)

These boundary conditions are compatible with the characteristics of the P_N Equationequations (17)(17) $\underset{= F_{α} (u_{α}, ρ)}{\underset{⏟}{- \partial_{ϵ} (S [ρ] (x, ϵ) u_{α} (x, ϵ)) + \sum_{d = 1}^{3} A_{(d)} \partial_{x_{d}} u_{α} (x, ϵ) - Q [ρ] (x, ϵ) u_{α} (x, ϵ)}} = 0 \forall x \in G, ϵ \in E$ (17) and allow us to assure energy stability, see Bünger, Sarna, and Torrilhon (Citation2022) for details. From Equation(12)(12) $ψ_{α} (x, ϵ = ϵ_{\max}, Ω) = ψ_{α, 0} (x, Ω) = 0.$ (12) we derive the following initial condition for the moments $u_{α} .$ (24) $u_{α} (x, ϵ = ϵ_{\max}) = \int_{S^{2}} ϒ_{N} (Ω) ψ_{α, 0} (x, Ω) d Ω = 0$ (24)

The P_N Equationequations (17)(17) $\underset{= F_{α} (u_{α}, ρ)}{\underset{⏟}{- \partial_{ϵ} (S [ρ] (x, ϵ) u_{α} (x, ϵ)) + \sum_{d = 1}^{3} A_{(d)} \partial_{x_{d}} u_{α} (x, ϵ) - Q [ρ] (x, ϵ) u_{α} (x, ϵ)}} = 0 \forall x \in G, ϵ \in E$ (17) describe the evolution of all ${(N + 1)}^{2}$ moments $u_{α}$ of $ψ_{α},$ whereas the intensity model Equation(8)(8) $I_{α} [ρ] (x) = \int_{E} σ_{α} (ϵ) \int_{S^{2}} \underset{= ψ_{α} (x, ϵ, Ω)}{\underset{⏟}{| v (ϵ) | f (x, ϵ, Ω)}} d Ω d ϵ$ (8) only requires the angular average of $ψ_{α}$ because we assumed the isotropic emission of x-rays. The spherical harmonic $Y_{0}^{0} (Ω)$ is constant, so we identify the moment $u_{α}^{0, 0} \propto \int_{S^{2}} ψ_{α} (x, ϵ, Ω) d Ω$ as part of our x-ray emission model and simplify the integral Equation(8)(8) $I_{α} [ρ] (x) = \int_{E} σ_{α} (ϵ) \int_{S^{2}} \underset{= ψ_{α} (x, ϵ, Ω)}{\underset{⏟}{| v (ϵ) | f (x, ϵ, Ω)}} d Ω d ϵ$ (8) to the following.(25) $I_{α} (x) = \int_{E} σ_{α} (ϵ) \int_{S^{2}} ψ_{α} (x, ϵ, Ω) d Ω d ϵ \propto \int_{E} σ_{α} (ϵ) u_{α}^{0, 0} (x, ϵ) d ϵ$ (25)

2.5. Mass concentration model

Typically, the quantities of interest in EPMA are weight fractions $ω_{i} (x) = \frac{M_{i} ( d x)}{M ( d x)}$ that form a partition of unity $\sum_{i = 1}^{n_{e}} ω_{i} (x) = 1 .$ In order to determine mass concentrations $ρ (x)$ from weight fractions $ω (x),$ additional assumptions are necessary. We neglect the influence of different molecular or crystal structures and assume that the compound ${ω_{1}, \dots, ω_{n_{e}}}$ is formed in a volume-preserving manner. Then the total volume is $V ( d x) = \sum_{i = 1}^{n_{e}} V_{i} ( d x) = \sum_{i = 1}^{n_{e}} \frac{M_{i} ( d x)}{ρ_{i}^{pure}} = \sum_{i = 1}^{n_{e}} \frac{ω_{i} (x) M ( d x)}{ρ_{i}^{pure}} .$ We derive the following relation between mass fractions and mass concentrations.(26) $ρ_{i} (x) = ω_{i} (x) \underset{: = ρ_{tot} (x)}{\underset{⏟}{{(\sum_{j = 1}^{n_{e}} \frac{ω_{j} (x)}{ρ_{j}^{pure}})}^{- 1}}}$ (26)

As a first possibility, we model weight fractions $ω_{i}^{p c} (x)$ as a piecewise-constant function. We subdivide the spatial domain G into $n_{k} \in N$ disjoint subdomains G_k and introduce the indicator function $1_{G_{k}} (x) .$ To guarantee the partition of unity we only use $n_{p} = n_{k} \times (n_{e} - 1)$ parameters $p_{i, k}^{p c}$ and define the parametrization(27) $ω_{i}^{p c} (x) = {\begin{matrix} \sum_{k = 1}^{n_{k}} p_{i, k}^{p c} 1_{G_{k}} (x) & i \in {1, \dots n_{e} - 1} \\ 1 - \sum_{k = 1}^{n_{k}} \sum_{j = 1}^{n_{e} - 1} p_{j, k}^{p c} 1_{G_{k}} (x) & i = n_{e} \end{matrix}$ (27) with the additional constraints, that $p_{i, k}^{p c} \geq 0$ and $\sum_{j = 1}^{n_{e} - 1} p_{j, k}^{p c} \leq 1 .$ Together with Equation(26)(26) $ρ_{i} (x) = ω_{i} (x) \underset{: = ρ_{tot} (x)}{\underset{⏟}{{(\sum_{j = 1}^{n_{e}} \frac{ω_{j} (x)}{ρ_{j}^{pure}})}^{- 1}}}$ (26) , Equation(27)(27) $ω_{i}^{p c} (x) = {\begin{matrix} \sum_{k = 1}^{n_{k}} p_{i, k}^{p c} 1_{G_{k}} (x) & i \in {1, \dots n_{e} - 1} \\ 1 - \sum_{k = 1}^{n_{k}} \sum_{j = 1}^{n_{e} - 1} p_{j, k}^{p c} 1_{G_{k}} (x) & i = n_{e} \end{matrix}$ (27) forms the piecewise-constant material parametrization $P_{p c} : P \to (G \to R^{n_{e}}) .$ In the examples, we will replace the weight fraction parametrization $ω^{p c} (x)$ to investigate other parametrizations $P (p) .$

3. Differentiation of the forward model

As described in the introduction, we consider the inverse problem as a minimization problem and apply iterative gradient-based optimization methods to find a candidate solution $p^{*} .$ (28) $p^{*} = \underset{p \in P}{arg \min} (J \circ k \circ P) (p)$ (28)

Gradient-based optimization methods require the gradient of the objective function $\frac{d J}{d p} .$ An efficient implementation of the gradient of our model, that simultaneously remains extensible and modular (particularly regarding the parametrization) is a challenge. While automatic algorithmic differentiation tools have the potential to transform any numerical code, the current versions of AD tools in julia (Zygote.jl, ReverseDiff.jl) cannot be applied to our model due to memory limitations and lack of performance.

In the following, we present a method, that adopts the structural implementation of derivative code of numerical code from algorithmic differentiation (AD). We focus on the adjoint mode of AD, since we are mainly interested in the gradient of the scalar objective function. The advantage of the adjoint mode is that it scales with the number of outputs, whereas with numerical differentiation (finite differences) or the tangent mode of AD, the complexity scales with the number of inputs. Also, the analytical derivation of our method provides a clear understanding that facilitates its memory-efficient implementation.

The core idea of AD is to base the derivative computation on local function transformations that are automatically composed. This also achieves modularity, i.e., if a particular function is modified, only the corresponding transformation needs to be adjusted.

3.1. Adjoint algorithmic differentiation

Algorithmic differentiation (AD) (Burghardt et al. Citation2022; Hüser Citation2022; Innes Citation2018; Kreyszig Citation1991; Naumann Citation2011) defines a framework to compute derivatives of numerical program code. In contrast to symbolic or numerical differentiation (finite differences), algorithmic differentiation is without approximation error and modular, so the code remains extensible. Also, the automatic differentiation of numerical code is possible. We review the basic definitions of AD with a (unusual) focus on high/infinite dimensional spaces. Alongside a motivating example, we describe the systematic composition into code that is able to compute derivatives.

Consider a differentiable function $f : U \subseteq X \to Y, x \mapsto f (x) = y,$ where X and Y are Hilbert spaces and U is the open domain of f. Given an input $x \in U,$ we define the tangent operator $\frac{\partial f (x)}{\partial x} [\cdot] : X \to Y$ as the linear bounded operator that satisfies(29) $\lim_{h \to 0} \frac{f (x + h \dot{x}) - f (x)}{h} = \frac{\partial f (x)}{\partial x} [\dot{x}] \forall \dot{x} \in X .$ (29)

Additionally, given a tangent $\dot{x} \in X$ of the input, we define the tangent $\dot{y}$ of the output by(30) $\dot{y} = \frac{\partial f (x)}{\partial x} [\dot{x}] .$ (30)

Given the tangent operator $\frac{\partial f (x)}{\partial x} [\cdot],$ the corresponding adjoint operator ${\frac{\partial f (x)}{\partial x}}^{*} [\cdot] : Y \to X$ is defined by(31) ${〈 \bar{y}, \frac{\partial f (x)}{\partial x} [\dot{x}] 〉}_{Y} = {〈 {\frac{\partial f (x)}{\partial x}}^{*} [\bar{y}], \dot{x} 〉}_{X} \forall \dot{x} \in X, \bar{y} \in Y$ (31) where ${〈 \cdot, \cdot 〉}_{X}$ and ${〈 \cdot, \cdot 〉}_{Y}$ are the respective inner products of X and Y. Similarly to the tangent, given an adjoint $\bar{y} \in Y$ of the output, we define the adjoint $\bar{x}$ of the input by(32) $\bar{x} = {\frac{\partial f (x)}{\partial x}}^{*} [\bar{y}] .$ (32)

Consequently, the identity ${〈 \bar{y}, \dot{y} 〉}_{Y} = {〈 \bar{x}, \dot{x} 〉}_{X}$ holds for all $\dot{x} \in X$ and $\bar{y} \in Y .$

In real, finite dimensional spaces (e.g., $X = R^{n}, Y = R^{m}$ ), the tangent operator can be represented by the Jacobian $D f (x) \in R^{m \times n}$ and the adjoint operator by its transpose $D f {(x)}^{T} \in R^{n \times m} .$ For examples in infinite dimensional (function) spaces, we refer to Sec. 3.2.

The core idea of AD is to subdivide a function into individual subfunctions (with individual tangent and adjoint operators) and intermediate variables (with intermediate tangents and adjoints). As an example, consider the following decomposition of $f : U \to Y, x \mapsto f (x) = y$ into two subfunctions $g : U \to V$ and $h : V \times U \to Y$ where we use an intermediate variable $v \in V$ (V is another Hilbert space).(33) $\begin{matrix} v & = g (x) \\ y & = h (v, x) \end{matrix}$ (33)

Given $x \in U, v \in V$ and $y \in Y$ as well as an input tangent $\dot{x} \in X,$ the tangent $\dot{y}$ of is given by(34) $\dot{y} = \frac{\partial f (x)}{\partial x} [\dot{x}] = \frac{\partial h (v, x)}{\partial v} [\underset{= \dot{v}}{\underset{⏟}{\frac{\partial g (x)}{\partial x} [\dot{x}]}}] + \frac{\partial h (v, x)}{\partial x} [\dot{x}]$ (34) where we employ the chain rule to split the tangent operator of f into a combination of tangent operators of g and h. Additionally, we identify the tangent $\dot{v}$ of the intermediate variable.

To find the corresponding adjoint operator, we successively use adjoint operators of the subfunctions g and h to isolate $\dot{x}$ in the inner product identity(35) $\begin{matrix} {〈 \bar{y}, \dot{y} 〉}_{Y} = {〈 \bar{y}, \frac{\partial h}{\partial v} [\dot{v}] + \frac{\partial h}{\partial x} [\dot{x}] 〉}_{Y} & = {〈 {\frac{\partial h}{\partial v}}^{*} [\bar{y}], \underset{= \dot{v}}{\underset{⏟}{\frac{\partial g}{\partial x} [\dot{x}]}} 〉}_{V} + {〈 {\frac{\partial h}{\partial x}}^{*} [\bar{y}], \dot{x} 〉}_{X} \\ = {〈 {\frac{\partial g}{\partial x}}^{*} [\underset{= \bar{v}}{\underset{⏟}{{\frac{\partial h}{\partial v}}^{*} [\bar{y}]}}] + {\frac{\partial h}{\partial x}}^{*} [\bar{y}], \dot{x} 〉}_{X} = {〈 \bar{x}, \dot{x} 〉}_{X} . \end{matrix}$ (35)

To summarize, given the adjoint $\bar{y} \in Y,$ the adjoints $\bar{v}$ and $\bar{x}$ are(36) $\bar{x} = {\frac{\partial g (x)}{\partial x}}^{*} [\underset{= \bar{v}}{\underset{⏟}{{\frac{\partial h (v, x)}{\partial v}}^{*} [\bar{y}]}}] + {\frac{\partial h (v, x)}{\partial x}}^{*} [\bar{y}] .$ (36)

Based on the chain rule, the composition of individual tangent and adjoint operators of a larger decomposition into an algorithm to compute the derivative is very systematic and can be implemented automatically. We describe the methodology using an arbitrary decomposition of a function $f : x \mapsto f (x) = y$ with n inputs and m outputs into subfunctions $ϕ_{i = 1, \dots I} .$ (37) $\begin{matrix} {(v_{0}, v_{- 1}, \dots, v_{- (n - 1)})}^{T} = x \\ v_{i} = ϕ_{i} ({(v_{j})}_{j ≺ i}) i = 1, \dots, I \\ y = {(v_{I}, v_{I - 1}, \dots v_{I - m})}^{T} . \end{matrix}$ (37)

With ${(v_{j})}_{j ≺ i}$ we denote all predecessors of v_i, i.e., all intermediate variables v_j that v_i directly depends on, and with ${(v_{j})}_{j ≻ i}$ we denote all successors of v_i. Tangent mode AD is the successive computation of tangents(38) ${\dot{v}}_{i} = \sum_{k ≺ i} \frac{\partial ϕ_{i} ({(v_{j})}_{j ≺ i})}{\partial v_{k}} [{\dot{v}}_{k}] .$ (38)

Given an input tangent $\dot{x},$ a tangent ${\dot{v}}_{i}$ represents the directional derivative of v_i into direction $\dot{x} .$ Note that the tangent mode follows the sequence of operations in the evaluation of the primal. In adjoint mode AD, the sequence of operations is reversed. To compute an adjoint ${\bar{v}}_{i},$ we consider the adjoints ${({\bar{v}}_{k})}_{k ≻ i}$ of all intermediate variables, that depend on v_i.(39) ${\bar{v}}_{i} = \sum_{k ≻ i} {\frac{\partial ϕ_{k} ({(v_{j})}_{j ≺ k})}{\partial v_{i}}}^{*} [{\bar{v}}_{k}] .$ (39)

Usually, in AD literature the single assignment code is considered, where all intermediate variables $v_{i} \in R$ are scalar, and the tangent operators (resp. multiplication with a scalar) is self-adjoint. Then the $\cdot^{*}$ can be neglected. For real multivariate intermediate variables the adjoint is the transpose $\cdot^{*} = \cdot^{T} .$

We briefly comment on some aspects of AD:

Computational Complexity: Implementations of tangent and adjoint mode AD differ in algorithmic complexity. The Jacobian of a function $f : R^{n} \to R^{m}$ is obtained using tangent mode AD by $j = 1, \dots, n$ successive executions of Equation(38)(38) ${\dot{v}}_{i} = \sum_{k ≺ i} \frac{\partial ϕ_{i} ({(v_{j})}_{j ≺ i})}{\partial v_{k}} [{\dot{v}}_{k}] .$ (38) where the input tangents are seeded with unit vectors $\dot{x} = e_{j} .$ Elements of the Jacobian are then retrieved from ${(D f (x))}_{i j} = {{\dot{y}}_{i} |}_{\dot{x} = e_{j}} .$ In contrast to the tangent mode, using adjoint mode AD the Jacobian is obtained by $i = 1, \dots, m$ successive applications of Equation(39)(39) ${\bar{v}}_{i} = \sum_{k ≻ i} {\frac{\partial ϕ_{k} ({(v_{j})}_{j ≺ k})}{\partial v_{i}}}^{*} [{\bar{v}}_{k}] .$ (39) where the output adjoints are seeded with unit vector $\bar{y} = e_{i} .$ Elements of the Jacobian are then retrieved from ${(D f (x))}_{i j} = {{\bar{x}}_{j} |}_{\bar{y} = e_{i}} .$ Hence, to compute the full Jacobian, an implementation of tangent mode scales in the number of input variables while the adjoint mode scales in the number of output variables.
Modularity and Extensibility: An implementation of either tangent or adjoint mode AD is modular and extensible. It consists of a systematic composition of individual tangent and adjoint operators. For example, changing one subfunction in Equation(37)(37) $\begin{matrix} {(v_{0}, v_{- 1}, \dots, v_{- (n - 1)})}^{T} = x \\ v_{i} = ϕ_{i} ({(v_{j})}_{j ≺ i}) i = 1, \dots, I \\ y = {(v_{I}, v_{I - 1}, \dots v_{I - m})}^{T} . \end{matrix}$ (37) only affects its corresponding tangent and adjoint operator in Equation(38)(38) ${\dot{v}}_{i} = \sum_{k ≺ i} \frac{\partial ϕ_{i} ({(v_{j})}_{j ≺ i})}{\partial v_{k}} [{\dot{v}}_{k}] .$ (38) resp. Equation(39)(39) ${\bar{v}}_{i} = \sum_{k ≻ i} {\frac{\partial ϕ_{k} ({(v_{j})}_{j ≺ k})}{\partial v_{i}}}^{*} [{\bar{v}}_{k}] .$ (39) . Implementations of the other operators remain.
Computational Graph: Both modes of AD can be visualized using a computational graph consisting of vertices for each variable (input, output, intermediate) and directed edges for direct dependence of variables. Assume a labeling of the edges with their corresponding tangent/adjoint operator applied to the tail(tangent)/head(adjoint) of the edge. The tangent is defined as the sum of all incoming edge tangent operators, while the adjoint is defined as the sum of all outgoing edge adjoint operators (c.f. Equations Equation(38)(38) ${\dot{v}}_{i} = \sum_{k ≺ i} \frac{\partial ϕ_{i} ({(v_{j})}_{j ≺ i})}{\partial v_{k}} [{\dot{v}}_{k}] .$ (38) and Equation(39)(39) ${\bar{v}}_{i} = \sum_{k ≻ i} {\frac{\partial ϕ_{k} ({(v_{j})}_{j ≺ k})}{\partial v_{i}}}^{*} [{\bar{v}}_{k}] .$ (39) ).
Checkpointing – Adjoint Memory Consumption: While the tangents can be implemented alongside the execution of the primal function and intermediate primal variables (x, v) are available, they must be kept in memory for the computation of adjoints. Storing all intermediate variables is expensive. Checkpointing remedies the high memory consumption by recomputing certain intermediate variables while computing the adjoint operators. In the example Equation(36)(36) $\bar{x} = {\frac{\partial g (x)}{\partial x}}^{*} [\underset{= \bar{v}}{\underset{⏟}{{\frac{\partial h (v, x)}{\partial v}}^{*} [\bar{y}]}}] + {\frac{\partial h (v, x)}{\partial x}}^{*} [\bar{y}] .$ (36) , the primal $v = g (x)$ would be recomputed while evaluating the adjoint of h.

Our model includes the special case, that the mapping from $ρ (x)$ to $u_{α} (x, ϵ)$ is given implicitly by the PDE in EquationEquation (17)(17) $\underset{= F_{α} (u_{α}, ρ)}{\underset{⏟}{- \partial_{ϵ} (S [ρ] (x, ϵ) u_{α} (x, ϵ)) + \sum_{d = 1}^{3} A_{(d)} \partial_{x_{d}} u_{α} (x, ϵ) - Q [ρ] (x, ϵ) u_{α} (x, ϵ)}} = 0 \forall x \in G, ϵ \in E$ (17) . We state the two basic steps of calculating adjoints in implicit relations and refer to the Appendix A for a more detailed derivation. Consider the implicit definition $F (y = g (x), x) = 0$ (with differentiable F(y, x)), where we assume that it uniquely defines the differentiable mapping $y = g (x) .$ We introduce an additional intermediate adjoint variable $\bar{F} .$ Then, given the adjoint $\bar{y}$ of the output, the adjoint $\bar{x}$ of the input can be computed from(40) ${\frac{\partial F}{\partial y}}^{*} [\bar{F}] = \bar{y} and \bar{x} = - {\frac{\partial F}{\partial x}}^{*} [\bar{F}]$ (40) without explicit definition of $\bar{x} = {\frac{\partial g (x)}{\partial x}}^{*} [\bar{y}] .$

3.2. Differentiation method for our model

In the following, we will successively define adjoint operators for the individual parts of the model described in Sec. 2. All derivations follow the same scheme: First, we define the tangent operator using the directional derivative and second we derive the adjoint operator from the inner product identity. We will elaborate on complex derivations and only state the adjoint operator for simple ones. In this paper, the only relevant derivatives are with respect to the mass concentrations ρ, hence all quantities that do not depend on ρ are treated as constants and their adjoints are omitted.

Compared to the forward computation, adjoints are computed in reversed order. We follow this behavior while deriving the adjoint operators for the individual dependencies. For brevity of notation we omit the accumulation of adjoints (tangents) and respectively only consider the operators that describe the direct propagation. The accumulation becomes apparent in the high-level computational graph in . If multiple edges leave a node, the adjoints of all outgoing edges have to be summed up.

$[{\bar{k}}_{α} = {\frac{\partial J}{\partial k_{α}}}^{*} \bar{J}] :$ The final operator Equation(1)(1) $p^{*} = \underset{p \in P}{arg \min} \underset{= (J \circ k \circ P) (p)}{\underset{⏟}{\sum_{α \in A} ((k_{α} \circ P) (p) - k_{α}^{\exp})^{2}}}$ (1) of our model computes the squared error J from a set of simulated k-ratios $k_{α}, α \in {α_{1}, \dots} .$ Given the adjoint $\bar{J},$ which for the computation of derivatives $\frac{\partial J}{\partial \cdot}$ must be chosen $\bar{J} = 1,$ the definition of the adjoint operator of the squared error from EquationEquation (1)(1) $p^{*} = \underset{p \in P}{arg \min} \underset{= (J \circ k \circ P) (p)}{\underset{⏟}{\sum_{α \in A} ((k_{α} \circ P) (p) - k_{α}^{\exp})^{2}}}$ (1) is trivial.(41) ${\bar{k}}_{α} = 2 (k_{α} - k_{α}^{\exp}) \bar{J}$ (41)
$[{\bar{I}}_{α} = {\frac{\partial k_{α}}{\partial I_{α}}}^{*} {\bar{k}}_{α}] :$ Also the adjoint operator for normalization with standard intensities in EquationEquation (4)(4) $k_{α} [ρ] = \frac{I_{α} [ρ]}{I_{α} [ρ_{α}^{std}]}$ (4) is derived by simple means.(42) ${\bar{I}}_{α} = \frac{{\bar{k}}_{α}}{I_{α}^{std}}$ (42)
$[{{\bar{A}}_{α}, {\bar{N}}_{α}, {\bar{I}}_{α}} = {\frac{\partial I_{α}}{\partial {A_{α}, N_{α}, I_{α}}}}^{*} {\bar{I}}_{α}] :$ We elaborate on the adjoint operators derived from EquationEquation (5)(5) $I_{α} [ρ] = \int_{G} A_{α} [ρ] (x) N_{α} [ρ] (x) I_{α} [ρ] (x) d x$ (5) . Linearity of the integral in Equation(5)(5) $I_{α} [ρ] = \int_{G} A_{α} [ρ] (x) N_{α} [ρ] (x) I_{α} [ρ] (x) d x$ (5) directly yields the tangent ${\dot{I}}_{α} .$ Using the standard definition of the inner products in $R, {〈 a, b 〉}_{R} = a b$ and in $L^{2} (G), {〈 a, b 〉}_{L^{2} (G)} = \int_{G} a^{T} (x) b (x) d x$ we derive adjoints ${\bar{A}}_{α}, {\bar{N}}_{α}$ and ${\bar{I}}_{α} .$ We detail ${\bar{A}}_{α} .$ (43) $\begin{matrix} {〈 {\bar{I}}_{α}, {\dot{I}}_{α} 〉}_{R} & = {〈 {\bar{I}}_{α}, \int_{G} N_{α} (x) I_{α} (x) {\dot{A}}_{α} (x) d x 〉}_{R} \\ = \int_{G} N_{α} (x) I_{α} (x) {\bar{I}}_{α} {\dot{A}}_{α} (x) d x \\ = {〈 \underset{: = {\bar{A}}_{α} (\cdot)}{\underset{⏟}{N_{α} (\cdot) I_{α} (\cdot) {\bar{I}}_{α}}}, {\dot{A}}_{α} (\cdot) 〉}_{L^{2} (G)} \end{matrix}$ (43)
Accordingly, the adjoints ${\bar{N}}_{α} (\cdot)$ and ${\bar{I}}_{α} (\cdot)$ are derived.(44) ${\bar{N}}_{α} (x) = A_{α} (x) I_{α} (x) {\bar{I}}_{α} and {\bar{I}}_{α} (x) = N_{α} (x) A_{α} (x) {\bar{I}}_{α}$ (44)
$[\bar{ρ} = {\frac{\partial N_{α}}{\partial ρ}}^{*} {\bar{N}}_{α}] :$ From EquationEquation (7)(7) $N_{α} [ρ] (x) = \frac{{ρ_{i} (x) |}_{Z_{i} \in α}}{A_{α}}$ (7) , we find the adjoint operator.(45) ${{\bar{ρ}}_{i} (x) |}_{Z_{i} \in α} = \frac{{\bar{N}}_{α} (x)}{A_{α}}$ (45)
$[\bar{ρ} = {\frac{\partial A_{α}}{\partial ρ}}^{*} {\bar{A}}_{α}] :$ Deriving the adjoint from the attenuation operator given in Equation(6)(6) $A_{α} [ρ] (x) = \exp (- \int_{l (x, x_{d})} μ_{α} (y) d y) = \exp (- \int_{l (x, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} ρ_{i} (y) d y)$ (6) requires more analysis. The tangent of the absorption operator ${\dot{A}}_{α} [ρ] (x)$ is given by the directional derivative into direction $\dot{ρ} (x) .$ (46) ${\dot{A}}_{α} [ρ] (x) = - A_{α} [ρ] (x) \int_{l (x, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} {\dot{ρ}}_{i} (y) d y$ (46)
By inserting the tangent ${\dot{A}}_{α}$ into the definition of the adjoint Equation(31)(31) ${〈 \bar{y}, \frac{\partial f (x)}{\partial x} [\dot{x}] 〉}_{Y} = {〈 {\frac{\partial f (x)}{\partial x}}^{*} [\bar{y}], \dot{x} 〉}_{X} \forall \dot{x} \in X, \bar{y} \in Y$ (31) , we derive the adjoint of the attenuation operator. In the following derivation, we rewrite the line integral from Equation(46)(46) ${\dot{A}}_{α} [ρ] (x) = - A_{α} [ρ] (x) \int_{l (x, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} {\dot{ρ}}_{i} (y) d y$ (46) as an integral over the whole space G by introducing $δ (d (x, x_{d}, y))$ to denote the Dirac, that indicates the line segment $l (x, x_{d}) .$ So $d (x, x_{d}, y)$ denotes the distance of y to the line segment $l (x, x_{d})$ between x and x_d.(47) $\begin{matrix} {〈 {\bar{A}}_{α}, {\dot{A}}_{α} 〉}_{L^{2} (G)} & = {〈 {\bar{A}}_{α} (\cdot), - A_{α} [ρ] (\cdot) \int_{l (\cdot, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} {\dot{ρ}}_{i} (y) d y 〉}_{L^{2} (G)} \\ = \int_{G} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) \int_{l (x, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} {\dot{ρ}}_{i} (y) d y d x \\ = \int_{G} \int_{G} \sum_{i = 1}^{n_{e}} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) τ_{α, i} {\dot{ρ}}_{i} (y) δ (d (x, x_{d}, y)) d y d x \\ = \int_{G} \sum_{i = 1}^{n_{e}} \int_{G} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) τ_{α, i} δ (d (x, x_{d}, y)) d x {\dot{ρ}}_{i} (y) d y \\ = \int_{G} \sum_{i = 1}^{n_{e}} \int_{l (y, x_{d^{*}})} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) τ_{α, i} d x {\dot{ρ}}_{i} (y) d y \\ = {〈 \int_{l (\cdot, x_{d^{*}})} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) (\begin{matrix} τ_{α, 1} \\ ⋮ \\ τ_{α, n_{e}} \end{matrix}) d x, \dot{ρ} (\cdot) 〉}_{L^{2} (G)} \\ = {〈 \bar{ρ}, \dot{ρ} 〉}_{L^{2} (G)} \end{matrix}$ (47)
By swapping the order of integration $ d y d x$ to $ d x d y,$ we are able to separate ${\dot{ρ}}_{i},$ but the interpretation of $δ (d (x, x_{d}, y))$ changes. For a given y, $δ (d (x, x_{d}, y))$ now indicates the points x of all lines $l (x, x_{d})$ that contain y. In other words: $δ (d (x, x_{d}, y))$ indicates the line $l (y, x_{d^{*}})$ from y to the point reflection $x_{d^{*}} \in \partial G$ of $x_{d} \in \partial G$ at $x \in G .$ Exemplarily, we illustrate $l (x, x_{d}), l (x, x_{d^{*}})$ and $d (x, x_{d}, y)$ in . From Equation(47)(47) $\begin{matrix} {〈 {\bar{A}}_{α}, {\dot{A}}_{α} 〉}_{L^{2} (G)} & = {〈 {\bar{A}}_{α} (\cdot), - A_{α} [ρ] (\cdot) \int_{l (\cdot, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} {\dot{ρ}}_{i} (y) d y 〉}_{L^{2} (G)} \\ = \int_{G} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) \int_{l (x, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} {\dot{ρ}}_{i} (y) d y d x \\ = \int_{G} \int_{G} \sum_{i = 1}^{n_{e}} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) τ_{α, i} {\dot{ρ}}_{i} (y) δ (d (x, x_{d}, y)) d y d x \\ = \int_{G} \sum_{i = 1}^{n_{e}} \int_{G} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) τ_{α, i} δ (d (x, x_{d}, y)) d x {\dot{ρ}}_{i} (y) d y \\ = \int_{G} \sum_{i = 1}^{n_{e}} \int_{l (y, x_{d^{*}})} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) τ_{α, i} d x {\dot{ρ}}_{i} (y) d y \\ = {〈 \int_{l (\cdot, x_{d^{*}})} - {\bar{A}}_{α} (x) A_{α} [ρ] (x) (\begin{matrix} τ_{α, 1} \\ ⋮ \\ τ_{α, n_{e}} \end{matrix}) d x, \dot{ρ} (\cdot) 〉}_{L^{2} (G)} \\ = {〈 \bar{ρ}, \dot{ρ} 〉}_{L^{2} (G)} \end{matrix}$ (47) , we identify the adjoint operator of Equation(6)(6) $A_{α} [ρ] (x) = \exp (- \int_{l (x, x_{d})} μ_{α} (y) d y) = \exp (- \int_{l (x, x_{d})} \sum_{i = 1}^{n_{e}} τ_{α, i} ρ_{i} (y) d y)$ (6) .(48) ${\bar{ρ}}_{i} (x) = τ_{α, i} \int_{l (x, x_{d^{*}})} - {\bar{A}}_{α} (y) A_{α} [ρ] (y) d y$ (48) Where the integration domain $l (x, x_{d^{*}})$ is the line between x and the reflection $x_{d^{*}}$ of x_d at x.
$[{\bar{u}}_{α} = {\frac{\partial I_{α}}{\partial u_{α}}}^{*} {\bar{I}}_{α}] :$ In EquationEquation (25)(25) $I_{α} (x) = \int_{E} σ_{α} (ϵ) \int_{S^{2}} ψ_{α} (x, ϵ, Ω) d Ω d ϵ \propto \int_{E} σ_{α} (ϵ) u_{α}^{0, 0} (x, ϵ) d ϵ$ (25) , the only relevant moment for the ionization field $I_{α}$ is $u_{α}^{(0, 0)},$ hence the adjoint ${\bar{u}}_{α} .$ (49) ${\bar{u}}_{α}^{l, k} (x, ϵ) = {\begin{matrix} σ_{α} (ϵ) {\bar{I}}_{α} (x) & l, k = 0 \\ 0 & else \end{matrix}$ (49)
$[\bar{ρ} = {\frac{\partial u_{α}}{\partial ρ}}^{*} {\bar{u}}_{α}] :$ The relation between ρ and $u_{α}$ is implicitly given by the P_N equations and the boundary conditions, i.e., by the implicit PDE operator $F_{α} (u_{α}, ρ) = 0$ Equation(17)(17) $\underset{= F_{α} (u_{α}, ρ)}{\underset{⏟}{- \partial_{ϵ} (S [ρ] (x, ϵ) u_{α} (x, ϵ)) + \sum_{d = 1}^{3} A_{(d)} \partial_{x_{d}} u_{α} (x, ϵ) - Q [ρ] (x, ϵ) u_{α} (x, ϵ)}} = 0 \forall x \in G, ϵ \in E$ (17) . We describe the derivation of the adjoint operators Equation(40)(40) ${\frac{\partial F}{\partial y}}^{*} [\bar{F}] = \bar{y} and \bar{x} = - {\frac{\partial F}{\partial x}}^{*} [\bar{F}]$ (40) for the P_N equations. For a detailed derivation of Equation(40)(40) ${\frac{\partial F}{\partial y}}^{*} [\bar{F}] = \bar{y} and \bar{x} = - {\frac{\partial F}{\partial x}}^{*} [\bar{F}]$ (40) we refer to the Appendix A. Let us introduce the auxiliary adjoint ${\bar{F}}_{α}$ and derive the operators ${\frac{\partial F_{α}}{\partial u_{α}}}^{*} [\cdot]$ and ${\frac{\partial F_{α}}{\partial ρ}}^{*} [\cdot]$ in the following. Successively, both operators form the adjoint ${\frac{\partial u_{α}}{\partial ρ}}^{*} [\cdot] .$ In particular, we elaborate on the identities(50) ${〈 {\bar{F}}_{α}, \frac{\partial F}{\partial u_{α}} {\dot{u}}_{α} 〉}_{L^{2} (G \times E)} = {〈 \underset{= {\bar{u}}_{α}}{\underset{⏟}{{\frac{\partial F}{\partial u_{α}}}^{*} {\bar{F}}_{α}}}, {\dot{u}}_{α} 〉}_{L^{2} (G \times E)} and - {〈 {\bar{F}}_{α}, \frac{\partial F}{\partial ρ} \dot{ρ} 〉}_{L^{2} (G \times E)} = {〈 \underset{= \bar{ρ}}{\underset{⏟}{- {\frac{\partial F}{\partial ρ}}^{*} {\bar{F}}_{α}}}, \dot{ρ} 〉}_{L^{2} (G \times E)} .$ (50)

$[{\frac{\partial F_{α}}{\partial u_{α}}}^{*} {\bar{F}}_{α} = {\bar{u}}_{α}] :$ The adjoint operator ${\frac{\partial F_{α}}{\partial u_{α}}}^{*}$ applied to ${\bar{F}}_{α}$ describes the continuous adjoint equation.(51) $\underset{= {\frac{\partial F_{α}}{\partial u_{α}}}^{*} {\bar{F}}_{α}}{\underset{⏟}{S [ρ] (x, ϵ) \partial_{ϵ} {\bar{F}}_{α} (x, ϵ) - \sum_{d = 1}^{3} A_{(d)} \partial_{x_{i}} {\bar{F}}_{α} (x, ϵ) - Q [ρ] (x, ϵ) {\bar{F}}_{α} (x, ϵ)}} = {\bar{u}}_{α} (x, ϵ)$ (51) In analogy to the reversed application of adjoint operators, the adjoint equation describes the evolution of the adjoint ${\bar{F}}_{α}$ in reversed direction (w.r.t. the energy ϵ). Initial and boundary conditions follow from the derivation.(52) ${\bar{F}}_{α} (x, ϵ_{\min}) = 0$ (52) (53) ${\bar{F}}_{α}^{[o, d]} = - L_{(d)} {\hat{A}}_{(d)} {\bar{F}}_{α}^{[e, n]}$ (53)
Derivation Linearity of $F_{α}$ in $u_{α}$ directly yields the tangent $\frac{\partial F_{α}}{\partial u_{α}} {\dot{u}}_{α} .$ Using integration by parts and symmetry of $A_{(d)}$ and Q, we obtain(54) $\begin{array}{l} 〈 {\bar{F}}_{α}, \frac{\partial F_{α}}{\partial u_{α}} {\dot{u}}_{α} 〉 = \int_{E} \int_{G} - {(\partial_{ϵ} (S {\dot{u}}_{α}))}^{T} {\bar{F}}_{α} + \sum_{d = 1}^{3} {(A_{(d)} \partial_{x_{d}} {\dot{u}}_{α})}^{T} {\bar{F}}_{α} - {(Q {\dot{u}}_{α})}^{T} {\bar{F}}_{α} d x d ϵ \\ \begin{matrix} = \int_{E} \int_{G} - \partial_{ϵ} (S {\dot{u}}_{α}^{T} {\bar{F}}_{α}) + {\dot{u}}_{α}^{T} (S \partial_{ϵ} {\bar{F}}_{α}) \\ + \sum_{d = 1}^{3} \partial_{x_{d}} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (A_{(d)} \partial_{x_{d}} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (Q {\bar{F}}_{α}) d x d ϵ \end{matrix} \\ \begin{matrix} = \int_{E} \int_{G} {\dot{u}}_{α}^{T} (S \partial_{ϵ} {\bar{F}}_{α}) - \sum_{d = 1}^{3} {\dot{u}}_{α}^{T} (A_{(d)} \partial_{x_{d}} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (Q {\bar{F}}_{α}) d x d ϵ \\ + \underset{= 0}{\underset{⏟}{\int_{E} \int_{G} - (\partial_{ϵ} (S {\dot{u}}_{α}^{T} {\bar{F}}_{α})) + \sum_{d = 1}^{3} \partial_{x_{d}} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) d x d ϵ}} \end{matrix} \\ = 〈 {\frac{\partial F_{α}}{\partial u_{α}}}^{*} {\bar{F}}_{α}, {\dot{u}}_{α} 〉 = 〈 {\bar{u}}_{α}, {\dot{u}}_{α} 〉 \end{array}$ (54)
The last two terms in Equation(54)(54) $\begin{array}{l} 〈 {\bar{F}}_{α}, \frac{\partial F_{α}}{\partial u_{α}} {\dot{u}}_{α} 〉 = \int_{E} \int_{G} - {(\partial_{ϵ} (S {\dot{u}}_{α}))}^{T} {\bar{F}}_{α} + \sum_{d = 1}^{3} {(A_{(d)} \partial_{x_{d}} {\dot{u}}_{α})}^{T} {\bar{F}}_{α} - {(Q {\dot{u}}_{α})}^{T} {\bar{F}}_{α} d x d ϵ \\ \begin{matrix} = \int_{E} \int_{G} - \partial_{ϵ} (S {\dot{u}}_{α}^{T} {\bar{F}}_{α}) + {\dot{u}}_{α}^{T} (S \partial_{ϵ} {\bar{F}}_{α}) \\ + \sum_{d = 1}^{3} \partial_{x_{d}} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (A_{(d)} \partial_{x_{d}} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (Q {\bar{F}}_{α}) d x d ϵ \end{matrix} \\ \begin{matrix} = \int_{E} \int_{G} {\dot{u}}_{α}^{T} (S \partial_{ϵ} {\bar{F}}_{α}) - \sum_{d = 1}^{3} {\dot{u}}_{α}^{T} (A_{(d)} \partial_{x_{d}} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (Q {\bar{F}}_{α}) d x d ϵ \\ + \underset{= 0}{\underset{⏟}{\int_{E} \int_{G} - (\partial_{ϵ} (S {\dot{u}}_{α}^{T} {\bar{F}}_{α})) + \sum_{d = 1}^{3} \partial_{x_{d}} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) d x d ϵ}} \end{matrix} \\ = 〈 {\frac{\partial F_{α}}{\partial u_{α}}}^{*} {\bar{F}}_{α}, {\dot{u}}_{α} 〉 = 〈 {\bar{u}}_{α}, {\dot{u}}_{α} 〉 \end{array}$ (54) vanish due to the initial and boundary conditions imposed to ${\dot{u}}_{α}$ and ${\bar{F}}_{α} .$ The former term is zero because of the conditions ${\dot{u}}_{α} (\cdot, ϵ_{\max}) = 0$ and ${\bar{F}}_{α} (\cdot, ϵ_{\min}) = 0 .$ (55) $\int_{E} \int_{G} (\partial_{ϵ} (S {\dot{u}}_{α}^{T} {\bar{F}}_{α})) d x d ϵ = \int_{G} \underset{= 0 \forall {\dot{u}}_{α}}{\underset{⏟}{(S {\dot{u}}_{α}^{T} {\bar{F}}_{α}) |_{ϵ = ϵ_{\max}}}} - \underset{= 0 \forall {\bar{F}}_{α}}{\underset{⏟}{(S {\dot{u}}_{α}^{T} {\bar{F}}_{α}) |_{ϵ = ϵ_{\min}}}} d x = 0$ (55)
The latter term in Equation(54)(54) $\begin{array}{l} 〈 {\bar{F}}_{α}, \frac{\partial F_{α}}{\partial u_{α}} {\dot{u}}_{α} 〉 = \int_{E} \int_{G} - {(\partial_{ϵ} (S {\dot{u}}_{α}))}^{T} {\bar{F}}_{α} + \sum_{d = 1}^{3} {(A_{(d)} \partial_{x_{d}} {\dot{u}}_{α})}^{T} {\bar{F}}_{α} - {(Q {\dot{u}}_{α})}^{T} {\bar{F}}_{α} d x d ϵ \\ \begin{matrix} = \int_{E} \int_{G} - \partial_{ϵ} (S {\dot{u}}_{α}^{T} {\bar{F}}_{α}) + {\dot{u}}_{α}^{T} (S \partial_{ϵ} {\bar{F}}_{α}) \\ + \sum_{d = 1}^{3} \partial_{x_{d}} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (A_{(d)} \partial_{x_{d}} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (Q {\bar{F}}_{α}) d x d ϵ \end{matrix} \\ \begin{matrix} = \int_{E} \int_{G} {\dot{u}}_{α}^{T} (S \partial_{ϵ} {\bar{F}}_{α}) - \sum_{d = 1}^{3} {\dot{u}}_{α}^{T} (A_{(d)} \partial_{x_{d}} {\bar{F}}_{α}) - {\dot{u}}_{α}^{T} (Q {\bar{F}}_{α}) d x d ϵ \\ + \underset{= 0}{\underset{⏟}{\int_{E} \int_{G} - (\partial_{ϵ} (S {\dot{u}}_{α}^{T} {\bar{F}}_{α})) + \sum_{d = 1}^{3} \partial_{x_{d}} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) d x d ϵ}} \end{matrix} \\ = 〈 {\frac{\partial F_{α}}{\partial u_{α}}}^{*} {\bar{F}}_{α}, {\dot{u}}_{α} 〉 = 〈 {\bar{u}}_{α}, {\dot{u}}_{α} 〉 \end{array}$ (54) vanishes due to the choice of the boundary conditions for ${\dot{u}}_{α}$ and ${\bar{F}}_{α} .$ With the divergence theorem ( $n = {(n_{1}, n_{2}, n_{3})}^{T}$ the boundary normal)(56) $\int_{G} \sum_{d = 1}^{3} \partial_{x_{d}} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) d x = \int_{\partial G} \sum_{d = 1}^{3} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) n_{d} d S,$ (56) and the boundary conditions ${\dot{u}}_{e}^{[o, d]} = L_{(d)} {\hat{A}}_{(d)} {\dot{u}}_{α}^{[e, d]}$ ( ${\dot{g}}_{α} = 0$ ) and ${\bar{F}}_{α}^{[o, d]} = - L_{(d)} {\hat{A}}_{(d)} {\bar{F}}_{α}^{[e, d]}$ the integrand in Equation(56)(56) $\int_{G} \sum_{d = 1}^{3} \partial_{x_{d}} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) d x = \int_{\partial G} \sum_{d = 1}^{3} ({\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α}) n_{d} d S,$ (56) is zero because(57) $\begin{array}{l} {\dot{u}}_{α}^{T} A_{(d)} {\bar{F}}_{α} = {(\begin{matrix} {\dot{u}}_{α}^{[e, d]} \\ {\dot{u}}_{α}^{[o, d]} \end{matrix})}^{T} (\begin{matrix} 0 & {\hat{A}}_{(d)}^{T} \\ {\hat{A}}_{(d)} & 0 \end{matrix}) (\begin{matrix} {\bar{F}}_{α}^{[e, d]} \\ {\bar{F}}_{α}^{[o, d]} \end{matrix}) \\ = {({\dot{u}}_{α}^{[e, d]})}^{T} {\hat{A}}_{(d)}^{T} {\bar{F}}_{α}^{[o, d]} + {({\dot{u}}_{α}^{[o, d]})}^{T} {\hat{A}}_{(d)} {\bar{F}}_{α}^{[e, d]} \\ = - {({\dot{u}}_{α}^{[e, d]})}^{T} {\hat{A}}_{(d)}^{T} (L_{(d)} {\hat{A}}_{(d)} {\bar{F}}_{α}^{[e, d]}) + {(L_{(d)} {\hat{A}}_{(d)} {\dot{u}}_{α}^{[e, d]})}^{T} {\hat{A}}_{(d)} {\bar{F}}_{α}^{[e, d]} \\ = {({\dot{u}}_{α}^{[e, d]})}^{T} \underset{= 0}{\underset{⏟}{(- {\hat{A}}_{(d)}^{T} L_{(d)} {\hat{A}}_{(d)} + {\hat{A}}_{(d)}^{T} L_{(d)}^{T} {\hat{A}}_{(d)})}} {\bar{F}}_{α}^{[e, d]} = 0. \end{array}$ (57)
Using that $L_{(d)} = L_{(d)}^{T},$ defined in Equation(23)(23) $L_{(d)} = 2 \int_{e_{d}^{T} Ω \geq 0} \frac{1}{Ω_{d}} ϒ_{N}^{[o, d]} {(ϒ_{N}^{[o, d]})}^{T} d Ω and g_{α} = \int_{e_{d}^{T} Ω \leq 0} ψ_{α, in} ϒ_{N}^{[o, d]} d Ω .$ (23) , is symmetric.
$[\bar{ρ} = - {\frac{\partial F_{α}}{\partial ρ}}^{*} {\bar{F}}_{α}] :$ The tangent $\frac{\partial F_{α}}{\partial ρ} \dot{ρ}$ follows directly from linearity of $F_{α}$ (and S and Q) in ρ. For the adjoint we find(58) $\begin{matrix} - 〈 {\bar{F}}_{α}, \frac{\partial F_{α}}{\partial ρ} \dot{ρ} 〉 & = - {〈 {\bar{F}}_{α}, - \partial_{ϵ} (\sum_{i = 1}^{n_{e}} S_{i} {\dot{ρ}}_{i} u_{α}) - \sum_{i = 1}^{n_{e}} Q_{i} {\dot{ρ}}_{i} u_{α} 〉}_{L^{2} (G \times E)} \\ = - \int_{G} \sum_{i = 1}^{n_{e}} {\dot{ρ}}_{i} \int_{E} \sum_{l \leq N, | k | \leq l} {\bar{F}}_{α}^{l, k} (- \partial_{ϵ} (S_{i} u_{α}^{l, k}) - Q_{i} u_{α}^{l, k}) d ϵ d x \\ = {〈 \int_{E} \sum_{l \leq N, | k | \leq l} {\bar{F}}_{α}^{l, k} (\partial_{ϵ} ((\begin{matrix} S_{1} \\ \dots \\ S_{n_{e}} \end{matrix}) u_{α}^{l, k}) + (\begin{matrix} Q_{1} \\ \dots \\ Q_{n_{e}} \end{matrix}) u_{α}^{l, k}) d ϵ, \dot{ρ} 〉}_{L^{2} (G)} \\ = {〈 \bar{ρ}, \dot{ρ} 〉}_{L^{2} (G)}, \end{matrix}$ (58) and identify(59) ${\bar{ρ}}_{i} (x) = \int_{E} {\bar{F}}_{α}^{T} (x, ϵ) (\partial_{ϵ} (S_{i} (ϵ) u_{α} (x, ϵ)) + Q_{i} (ϵ) u_{α} (x, ϵ)) d ϵ .$ (59)
$[{\bar{ω}}^{p c} = {\frac{\partial ρ}{\partial ω}}^{*} \bar{ρ}] :$ The adjoint of the mass concentration model Equation(26)(26) $ρ_{i} (x) = ω_{i} (x) \underset{: = ρ_{tot} (x)}{\underset{⏟}{{(\sum_{j = 1}^{n_{e}} \frac{ω_{j} (x)}{ρ_{j}^{pure}})}^{- 1}}}$ (26) can again be derived by simple means.(60) ${\bar{ω}}_{i}^{p c} (x) = ρ_{tot} (x) \bar{ρ_{i}} (x) - \frac{ρ_{tot}^{2} (x)}{ρ_{i}^{pure}} \sum_{k = 1}^{n_{e}} ω_{k} (x) {\bar{ρ}}_{k} (x)$ (60)
$[\bar{p} = {\frac{\partial ω^{p c}}{\partial p}}^{*} {\bar{ω}}^{p c}] :$ The indicator functions in Equation(27)(27) $ω_{i}^{p c} (x) = {\begin{matrix} \sum_{k = 1}^{n_{k}} p_{i, k}^{p c} 1_{G_{k}} (x) & i \in {1, \dots n_{e} - 1} \\ 1 - \sum_{k = 1}^{n_{k}} \sum_{j = 1}^{n_{e} - 1} p_{j, k}^{p c} 1_{G_{k}} (x) & i = n_{e} \end{matrix}$ (27) lead to a partitioning of the integration domain G into G_k for the adjoints ${\bar{p}}_{i, k} .$ (61) ${\bar{p}}_{i, k} = \int_{G_{k}} {\bar{ω}}_{i}^{p c} (x) - {\bar{ω}}_{n_{e}}^{p c} (x) d x$ (61)

Figure 2. The abstract computational graph of our model. Straight edges (bottom to top) correspond to the primal function evaluation. Bend edges (top to bottom) correspond to the adjoint evaluation and are labeled with the respective adjoint operators. Dashed boxes highlight the individual model evaluation for every experimental setup $α \in A .$ Note that multiple adjoint edges entering a node correspond to the accumulation of the respective adjoint operators. This is not explicitly addressed in the derivation in section 3.2.

Figure 3. The absorption path $l (x, x_{d}) = {y | d (x, x_{d}, y) = 0}$ from x toward the detector position x_d (solid line) and the path used for the adjoint $l (x, x_{d^{*}}) = {y | d (y, x_{d}, x) = 0}$ (dotted line). G is the computational domain.

This completes the derivation of adjoint formulations for all parts of the model presented in Sec. 2. Note that for nonlinear operators, the primary result is required to calculate the adjoint, and therefore must be saved during the forward execution or recalculated. This is analogous to discrete adjoint tools. Furthermore, all derived operators are modular and only need to be assembled according to the computational graph. When replacing parts of the model, only the corresponding adjoint operator needs to be adjusted. In the examples in Sections 4.3 and 4.4, we will use this modularity and exchange the parametrizations $P .$

3.3. Computational effort

The motivation for deriving and implementing the adjoint formulation of our model is that of the efficiency advantage over finite differences and the tangent mode of AD. Our implementation inherits the runtime complexity of the adjoint mode, as the runtime of a Jacobian calculation scales in the number of output variables of the primal. The price to pay for faster execution is the higher memory requirement since all primal variables have to be stored during forward execution or recalculated later.

Leveraging the convenience of automated AD tools to derive the discrete adjoint code of our implementation was not possible with current tools in julia. However, our approach somewhat mimics the development approach that tools, e.g., Zygote.jl or ChainRulesCore.jl take. Instead of implementing only tangent and adjoint versions of low-level operations, they aim for efficient implementations of high-level methods. We adopt this approach and derive an adjoint version in the continuous framework and implement discretized versions of the operators in code.

The high-level view offers further insight, and one more possibility to reduce the computational effort for the gradient. Note that the naive adjoint implementation requires $| A |$ “solves” of the adjoint PDE (same amount as the forward). We separate the multi-index into $α = (α_{L}, α_{B}) \in A_{L} \times A_{B}$ where α_L describes the k-ratio line (material, transition) and α_B the beam setup (position, energy). We revisit EquationEquations (49) ${\bar{u}}_{α}^{l, k} (x, ϵ) = {\begin{matrix} σ_{α} (ϵ) {\bar{I}}_{α} (x) & l, k = 0 \\ 0 & else \end{matrix}$ (49) Equation(44)(44) ${\bar{N}}_{α} (x) = A_{α} (x) I_{α} (x) {\bar{I}}_{α} and {\bar{I}}_{α} (x) = N_{α} (x) A_{α} (x) {\bar{I}}_{α}$ (44) Equation(49)(49) ${\bar{u}}_{α}^{l, k} (x, ϵ) = {\begin{matrix} σ_{α} (ϵ) {\bar{I}}_{α} (x) & l, k = 0 \\ 0 & else \end{matrix}$ (49) and Equation(51)(51) $\underset{= {\frac{\partial F_{α}}{\partial u_{α}}}^{*} {\bar{F}}_{α}}{\underset{⏟}{S [ρ] (x, ϵ) \partial_{ϵ} {\bar{F}}_{α} (x, ϵ) - \sum_{d = 1}^{3} A_{(d)} \partial_{x_{i}} {\bar{F}}_{α} (x, ϵ) - Q [ρ] (x, ϵ) {\bar{F}}_{α} (x, ϵ)}} = {\bar{u}}_{α} (x, ϵ)$ (51) . We realize that the only dependence of all equations from α_B is due to the scalar ${\bar{I}}_{α} .$ Instead of computing ${\bar{I}}_{α}, {\bar{u}}_{α}$ and ${\bar{F}}_{α},$ we consider ${\tilde{I}}_{α_{L}} = \frac{{\bar{I}}_{α}}{{\bar{I}}_{α}}, {\tilde{u}}_{α_{L}} = \frac{{\bar{u}}_{α}}{{\bar{I}}_{α}}, {\tilde{F}}_{α_{L}} = \frac{{\bar{F}}_{α}}{{\bar{I}}_{α}}$ and eliminate thereby the dependence from the beam setup α_B. The operators for ${\tilde{I}}_{α_{L}}, {\tilde{u}}_{α_{L}}$ and ${\tilde{F}}_{α_{L}}$ do not depend on α_B. The accumulation of the adjoint ${\bar{ρ}}_{i}$ (cf. EquationEquation (59)(59) ${\bar{ρ}}_{i} (x) = \int_{E} {\bar{F}}_{α}^{T} (x, ϵ) (\partial_{ϵ} (S_{i} (ϵ) u_{α} (x, ϵ)) + Q_{i} (ϵ) u_{α} (x, ϵ)) d ϵ .$ (59) , where we sum over all incoming edges in the computational graph ), can then be rewritten to(62) $\begin{matrix} {\bar{ρ}}_{i} (x) & = \sum_{α \in A_{L} \times A_{B}} \int_{E} {\bar{F}}_{α}^{T} (x, ϵ) (- \partial_{ϵ} (S_{i} (ϵ) u_{α} (x, ϵ)) - Q_{i} (ϵ) u_{α} (x, ϵ)) d ϵ \\ = \sum_{α_{L} \in A_{L}} {\tilde{F}}_{α_{L}}^{T} (x, ϵ) \int_{E} \sum_{α_{B} \in A_{B}} {\bar{I}}_{α} (- \partial_{ϵ} (S_{i} (ϵ) u_{α} (x, ϵ)) - Q_{i} (ϵ) u_{α} (x, ϵ)) d ϵ . \end{matrix}$ (62)

Instead of solving the adjoint PDE for every ${\bar{F}}_{α}$ it is sufficient to compute ${\tilde{F}}_{α_{L}}$ for every α_L. The number of expensive adjoint PDE solves that are required is reduced from $| A |$ to $| A_{L} | .$

4. Numerical examples

Preliminary results for the examples presented in section 4.3 and 4.4 can be found in (Achuda et al. Citation2023).

4.1. Implementation and discretization

Conceptually the discretization of the P_N-equation Equation(17)(17) $\underset{= F_{α} (u_{α}, ρ)}{\underset{⏟}{- \partial_{ϵ} (S [ρ] (x, ϵ) u_{α} (x, ϵ)) + \sum_{d = 1}^{3} A_{(d)} \partial_{x_{d}} u_{α} (x, ϵ) - Q [ρ] (x, ϵ) u_{α} (x, ϵ)}} = 0 \forall x \in G, ϵ \in E$ (17) and the adjoint P_N-equation Equation(51)(51) $\underset{= {\frac{\partial F_{α}}{\partial u_{α}}}^{*} {\bar{F}}_{α}}{\underset{⏟}{S [ρ] (x, ϵ) \partial_{ϵ} {\bar{F}}_{α} (x, ϵ) - \sum_{d = 1}^{3} A_{(d)} \partial_{x_{i}} {\bar{F}}_{α} (x, ϵ) - Q [ρ] (x, ϵ) {\bar{F}}_{α} (x, ϵ)}} = {\bar{u}}_{α} (x, ϵ)$ (51) are the most challenging. For the numerical computation of the moments $u_{α}$ and ${\bar{F}}_{α}$ we implemented a 3D staggered grid finite-difference method based on the method StaRMAP presented in (Seibold and Frank Citation2014). StaRMAP is specifically designed to exploit the structure of P_N-equations. Individual sets of moments $u_{α}$ are discretized on the staggered grids in such a way, that sparsity of the transport matrices $A_{(d)}$ decouples the system and enables the use of second-order central differences that approximate the spatial derivative on the respective displaced grid. Second-order integration in energy is achieved using a splitting of the moments and half steps for each part of the solution.

The boundary conditions in Equation(22)(22) $u_{α}^{[o, d]} = L_{(d)} {\hat{A}}_{(d)} u_{α}^{[e, d]} + g_{α}$ (22) and Equation(53)(53) ${\bar{F}}_{α}^{[o, d]} = - L_{(d)} {\hat{A}}_{(d)} {\bar{F}}_{α}^{[e, n]}$ (53) are used to compute moments on ghost nodes, located outside the computational domain G. Values of the moments for ghost nodes are set, such that the boundary condition holds for interpolated moments on the boundary of the domain $\partial G .$ For details see Bünger (Citation2021).

For integrations in energy and space as well as the line integral for the absorption we use the trapezoidal rule.

4.2. Comparison: sensitivities of a 1D material with finite differences

We consider a binary material consisting of copper Cu and nickel Ni and compare sensitivities $\frac{\partial I_{α}}{\partial p}$ computed using our implementation (resp. $\bar{p}$ where ${\bar{I}}_{α}$ is seeded with unit vectors) with sensitivities computed by central finite differences. By means of this comparison of our implementation to finite differences we address several questions at the same time:

By introducing a second parametrization, we exemplarily show the modularity of our implementation.
We provide insight into the nature of the model. The sensitivity $\frac{\partial I_{α}}{\partial p}$ is the linear approximation of intensities with respect to model parameters, hence a large sensitivity relates to a strong dependency and a small sensitivity to a weak dependency. This is particularly interesting for the inverse problem because it also unveils possible difficulties in the reconstruction.
Agreement with finite differences validates our implementation.
We demonstrate the claimed superiority of our method compared to finite differences in terms of computation time.

We introduce an alternative parametrization, where $Λ_{k} (x), k \in {1, \dots n_{k}}$ are triangular functions covering the domain G, in particular $\sum_{k = 1}^{n_{k}} Λ_{k} (x) = 1 \forall x \in G .$ The partition of unity $\sum_{i = 1}^{n_{e}} ω_{i} (x) = 1$ is (again) guaranteed by $p_{i, k}^{pl} \geq 0$ and $\sum_{j = 1}^{n_{e} - 1} p_{j, k}^{pl} \leq 1 .$ We refer to the parametrization as piecewise-linear.(63) $ω_{i}^{pl} (x) = {\begin{matrix} \sum_{k = 1}^{n_{k}} p_{i, k}^{pl} Λ_{k} (x) & i \in {1, \dots n_{e} - 1} \\ 1 - \sum_{k = 1}^{n_{k}} \sum_{j = 1}^{n_{e} - 1} p_{j, k}^{pl} Λ_{k} (x) & i = n_{e} \end{matrix}$ (63)

The adjoint of the piecewise-linear parametrization ${\bar{p}}^{pl}$ is given by(64) ${\bar{p}}_{i, k}^{pl} = \int_{G} Λ_{k} (x) ({\bar{ω}}_{i}^{pl} (x) - {\bar{ω}}_{n_{e}}^{pl} (x)) d x .$ (64)

Modularity is set out by the fact, that only the parametrization and its adjoint are implemented, the rest of the code remains unchanged.

Now both parametrizations, the piecewise-constant parametrization (with n_k = 20 parameters) and the piecewise-linear parametrization (also with n_k = 20 parameters) are employed to compute sensitivities $\frac{\partial I_{α}}{\partial p^{pc}}$ resp. $\frac{\partial I_{α}}{\partial p^{pl}} .$ The physical setup and the settings of the P_N approximation for this example can be found in . Parameter values for the parametrizations are sampled independently from a uniform distribution in $[0, 1] .$

Table 1. Model settings used for the comparison of 1D sensitivities.

Display Table

We visualize the sensitivities by plotting $ω_{1}^{pc} (x) |_{p = \frac{\partial I_{α}}{\partial p^{pc}}}$ and $ω_{1}^{pl} (x) |_{p = \frac{\partial I_{α}}{\partial p^{pl}}}$ in . Using the parametrizations $ω (x)$ is gentle abuse but since the parameters in $ω^{pc}$ and $ω^{pl}$ are related to space points, an interpretation of the plot is possible. Increasing a parameter, increases the mass concentration of Cu and simultaneously decreases the mass concentration of Ni, hence the positive values for Cu x-rays and the negative value for Ni x-rays. In depth the sensitivity decreases because with greater depth the generation of x-rays decreases and the absorption increases.

Figure 4. A visualization of the sensitivities computed using our method and finite differences. We abuse the parametrization function of each parametrization and plot $ω_{1} (x) |_{p = \frac{\partial I_{α}}{\partial p}} .$ The curves computed using our method (solid lines) coincide with the curves computed using finite differences (black dashed lines). A quantitative comparison of sensitivity values is given in .

also compares sensitivities computed by our method (colored, solid lines) with the sensitivities computed by central finite differences (black, dashed lines). The curves coincide. Additionally, we compare relative errors (65) $\max_{k} | \frac{1}{\frac{\partial I_{α}}{\partial p_{k}}} (\frac{\partial I_{α}}{\partial p_{k}} - \frac{I_{α} (p + h e_{k}) - I_{α} (p - h e_{k})}{2 h}) |$ (65) of sensitivities in . Derivatives computed by algorithmic differentiation are exact (AD computes the exact derivative of the numerical approximation up to machine precision) (Naumann Citation2011). Our method as well as finite differences only provide approximations of the derivative. For our method, the underlying approximation of the adjoint state variable introduces errors, for finite differences the approximation error originates from the perturbation h.

Table 2. Relative errors in the sensitivities/derivatives of intensities for multiple x-rays α_L and parametrization methods $ω (x) .$ Sensitivities/derivatives are computed using our implementation and using a central finite difference approximation of order 2.

Display Table

In , the superiority of our method in terms of computation time compared to finite differences becomes apparent. We compare the runtime of a central finite difference method (2nd order) with our implementation for two different numbers of parameters (n_p = 20 and n_p = 40). The runtime of the forward model is $\approx 0.5 s,$ and the computation of the derivative with finite differences clearly scales with the number of parameters n_p. The runtime of our implementation scales in the number of outputs (here $| A | = 4$ ) and clearly does not scale with n_p. Note the minor overhead because of the backward execution.

Table 3. Runtimes of the computation of sensitivities/derivatives using our implementation and a central finite difference approximation of order 2 (FiniteDiff.jl).

Display Table

Table 4. Model settings used for the reconstruction of the 1D interfaces.

Display Table

4.3. 1D reconstruction of a sharp and a diffusive interface

We present the reconstruction of a coated material consisting of iron Fe and nickel Ni with a sharp (discontinuous) and a diffusive (continuous) interface between the layers.

We show the necessity of choosing a suitable parametrization.
We compare the previously defined parametrizations (Equation(27)(27) $ω_{i}^{p c} (x) = {\begin{matrix} \sum_{k = 1}^{n_{k}} p_{i, k}^{p c} 1_{G_{k}} (x) & i \in {1, \dots n_{e} - 1} \\ 1 - \sum_{k = 1}^{n_{k}} \sum_{j = 1}^{n_{e} - 1} p_{j, k}^{p c} 1_{G_{k}} (x) & i = n_{e} \end{matrix}$ (27) and Equation(63)(63) $ω_{i}^{pl} (x) = {\begin{matrix} \sum_{k = 1}^{n_{k}} p_{i, k}^{pl} Λ_{k} (x) & i \in {1, \dots n_{e} - 1} \\ 1 - \sum_{k = 1}^{n_{k}} \sum_{j = 1}^{n_{e} - 1} p_{j, k}^{pl} Λ_{k} (x) & i = n_{e} \end{matrix}$ (63) ) to an additional non-linear parametrization Equation(66)

We introduce another parametrization that is based on the non-linear structure of a neural network (with x as the input and $ω^{nl}$ as the output. The activation function for one hidden layer is $\tanh ()$ and $sig ()$ for the output layer to enforce $0 \leq ω_{i}^{nl} \leq 1$ ). The parametrization uses n_p = 10 parameters, which we call $p^{nl} = {(a_{1}, a_{2}, a_{3}, b_{1}, b_{2}, b_{3}, {\hat{a}}_{1}, {\hat{a}}_{2}, {\hat{a}}_{3}, \hat{b})}^{T} .$ (66a) $v_{i} (x) = \tanh (\underset{w_{i} (x)}{\underset{⏟}{a_{i} x + b_{i}}}) \forall i = 1, 2, 3$ (66a) (66b) $ω_{1}^{nl} (x) = sig (\underset{= \hat{w} (x)}{\underset{⏟}{\sum_{i = 1}^{3} {\hat{a}}_{i} v_{i} (x) + \hat{b}}})$ (66b) (66c) $ω_{2}^{nl} (x) = 1 - ω_{1}^{nl} (x)$ (66c)

The adjoints for the non-linear parametrization are given in the following.(67a) ${\hat{\bar{a}}}_{i} = \int_{G} sig' (\hat{w} (x)) v_{i} (x) ({\bar{ω}}_{1}^{nl} (x) - {\bar{ω}}_{2}^{nl} (x)) d x$ (67a) (67b) $\hat{\bar{b}} = \int_{G} sig' (\hat{w} (x)) ({\bar{ω}}_{1}^{nl} (x) - {\bar{ω}}_{2}^{nl} (x)) d x$ (67b) (67c) ${\bar{v}}_{i} (x) = sig' (\hat{w} (x)) {\hat{a}}_{i} ({\bar{ω}}_{1}^{nl} (x) - {\bar{ω}}_{2}^{nl} (x))$ (67c) (67d) ${\bar{a}}_{i} = \int_{G} \tanh' (w_{i} (x)) x {\bar{v}}_{i} (x) d x$ (67d) (67e) ${\bar{b}}_{i} = \int_{G} \tanh' (w_{i} (x)) {\bar{v}}_{i} (x) d x$ (67e)

For a fair comparison, the number of parameters in the other parametrizations is also chosen to be 10. For the binary material the scalar $ω_{1} (x)$ completely describes the material, because $ω_{2} (x) = 1 - ω_{1} (x) .$

The artificial measurements for the reconstruction are k-ratios from multiple different beam energies between $9 keV$ and $15 keV .$ To compute the measurements we assume the reference material to be given. Additional P_N model settings and the considered x-rays are tabulated in . We choose the BFGS method implemented in Optim.jl (Mogensen and Riseth Citation2018) as optimization method (with additional box-constraints for the piecewise-constant and the piecewise-linear parametrization, and no constraints for the non-linear parametrization).

In , the reconstructed density of the material with a sharp interface is visualized using the different parametrizations. From initial configurations (piecewise-constant, piecewise-linear: 0.5 Fe and 0.5 Ni; non-linear: random) the parametrizations iterate toward the reference density (black line) during the optimization. We visualize the initial, first, fifth iteration and the 100th iteration. The inability of the piecewise-linear parametrization to represent discontinuities becomes apparent. However, the piecewise-constant parametrization only approximates perfectly because one of the subdomain interfaces matches the interface of the reference material. On the other hand, the non-linear parametrization is flexible enough to identify the location of the interface and to approximate the discontinuity.

Figure 5. The reconstructed density of a material with a sharp interface between an iron Fe layer covering an Ni substrate. We use the piecewise-constant, the piecewise-linear and the non-linear parametrization for the reconstruction. For the piecewise-constant and the piecewise-linear parametrization, the geometry (interfaces) are visualized by gray vertical lines. The black line shows the reference density. All parametrizations converge. The piecewise-constant parametrization has a clear advantage because the interface aligns with an interface of the parametrization. The piecewise-linear parametrization cannot approximate the discontinuous interface, hence it converges to presented smoothed representation. The non-linear parametrization is flexible enough to identify the location of the interface and also approximates the discontinuity using a strong gradient.

In , the reconstructed density of the material with a diffusive interface is visualized. Except for the reference material, exactly the same settings as for the reconstruction of the sharp interface are used. Again, the parametrizations iterate toward the reference density (black line). For the diffusive interface, none of the parametrizations can reconstruct the interface perfectly, but the piecewise-linear and the non-linear parametrizations perform better than the piecewise-constant parametrization.

Figure 6. The reconstructed density of a material with a diffusive interface between an iron Fe layer covering an Ni substrate. We use the piecewise-constant, the piecewise-linear and the non-linear parametrization for the reconstruction. For the piecewise-constant and the piecewise-linear parametrization, the geometry (interfaces) are visualized by gray vertical lines. The black line shows the reference density. All parametrizations converge. None of the parametrizations can represent the interface perfectly. The piecewise-linear and the non-linear parametrization perform better for this example, because of their flexibility to approximate continuous functions.

In , we visualize the normalized error, the value of the objective function $| | k (p) - k^{\exp} | |^{2},$ the error of the mass concentrations $| | ρ (x) - ρ^{true} | |^{2}$ and the error of additional k-ratio measurements $| | k_{+} (p) - k_{+}^{\exp} | |^{2}$ (with beam energies $11 keV$ and $14 keV$ ) during the iterations of the optimization. The kinks in the error of the objective function are an artifact of the Optim.jl implementation. Their default implementation is based on the combination of a line search method (Hager and Zhang Citation2005) and the BFGS algorithm presented in (Nocedal and Wright Citation2006).

Figure 7. The value of the objective function and additional reconstruction quality measures during the iterations of the optimization for all parametrizations. The objective function $| | k (p) - k^{\exp} | |^{2}$ is visualized as a solid line. Also, the error in the mass concentrations $| | ρ (x) - ρ^{true} (x) | |^{2}$ (dashed line) and the error of k-ratios $| | k_{+} (p) - k_{+}^{\exp} | |^{2}$ (dotted line) from additional beam energies ( $11 keV$ and $14 keV$ ) is plotted.

Differences in the performance of the different parametrizations are clearly visible. The error in the mass concentrations for the piecewise-constant parametrization for the sharp interface is by far the smallest, due to the possible perfect reconstruction. Only the non-linear parametrization performs similarly well for both examples.

Comparing the error in mass concentrations for piecewise-constant and piecewise-linear parametrization for the sharp and the diffusive interface, they vary. A correlation between the error in mass concentrations and the error of additional k-ratios would be beneficial but is hard to obtain visually. Ideally, both would behave similarly, because then we could propose the error in additional k-ratios as a suitable reconstruction quality measure. However, the evaluation of reconstruction quality measures is beyond the scope of this work and is deferred to further research.

4.4. 2D reconstruction of an ellipsoidal inclusion

We describe the use case of a reconstruction of an ellipsoidal copper Cu inclusion in an iron Fe substrate from k-ratios obtained from a $12 keV$ electron beam and multiple beam positions. In , we plot k-ratios obtained from an artificial line-scan of the inclusion. Only the ten k-ratios retrieved from five beam positions $μ_{y} = {- 50, - 25, 0, 25, 50} nm,$ which are marked by black crosses, are used for the reconstruction. To compute the k-ratios, the reference material is assumed to be given. We visualize the total density $ρ_{tot}$ of the reference material in and additionally show the ionization distributions of $(C u, K - L_{2})$ and $(F e, K - L_{2})$ that define the size of the interaction volume. The ellipsoidal Cu inclusion is clearly smaller than the interaction volumes. Additional settings of the k-ratio model are shown in .

Figure 8. The k-ratio profile of the elliptical Cu inclusion in the Fe substrate computed using a $12 keV$ electron beam. Dashed lines show the k-ratio profiles of $(C u, K - L_{2})$ (blue) and $(F e, K - L_{2})$ (orange) with a high beam resolution. Black markers show the k-ratios which are used for the reconstruction. While not directly visible, the shape and height of the k-ratio curves encode information about the shape and location of the inclusion and the structure of the interface between inclusion and substrate.

Figure 9. The total density and two ionization distribution fields of the reference material. The ionization distribution curves determine the size of the interaction volume. The size of the ellipsoidal Cu inclusion is smaller than the interaction volume.

Table 5. Model settings used for the 2D reconstruction of the ellipsoidal Cu inclusion.

Display Table

The profile shows an increase in the $(C u, K - L_{2})$ k-ratio for a beam position close to $\approx 25 nm .$ We assume an elliptical Cu inclusion which we parametrize by $p = {(μ_{1}, μ_{2}, a, b, r, \hat{a}, \hat{b})}^{T},$ where μ₁ and μ₂ specify position, a and b specify scaling factors of the principal axes, r specifies rotation and $\hat{a}$ and $\hat{b}$ specify the inclusion interface. The weight fractions $ω (x)$ are given by the following.(68a) $v (x) = \frac{{((x_{1} - μ_{1}) \cos (r) + (x_{2} - μ_{2}) \sin (r))}^{2}}{a^{2}} + \frac{{((x_{1} - μ_{1}) \sin (r) - (x_{2} - μ_{2}) \cos (r))}^{2}}{b^{2}}$ (68a) (68b) $ω_{1} (x) = sig (\underset{\hat{w} (x)}{\underset{⏟}{\hat{a} v (x) + \hat{b}}})$ (68b) (68c) $ω_{2} (x) = 1 - ω_{1} (x)$ (68c)

And their adjoint(69a) $\hat{\bar{a}} = \int_{G} sig' (\hat{w} (x)) v (x) ({\bar{ω}}_{1} (x) - {\bar{ω}}_{2} (x)) d x$ (69a) (69b) $\hat{\bar{b}} = \int_{G} sig' (\hat{w} (x)) ({\bar{ω}}_{1} (x) - {\bar{ω}}_{2} (x)) d x$ (69b) (69c) $\bar{v} (x) = sig' (\hat{w} (x)) \hat{a} ({\bar{ω}}_{1} (x) - {\bar{ω}}_{2} (x))$ (69c) (69d) ${{\bar{μ}}_{1}, {\bar{μ}}_{2}, \bar{a}, \bar{b}, \bar{r}} = \int_{G} \frac{\partial v (x)}{\partial {μ_{1}, μ_{2}, a, b, r}} \bar{v} (x) d x$ (69d)

For the reconstruction, we use the L-BFGS algorithm implemented in Optim.jl (Mogensen and Riseth Citation2018). The total density of the material $ρ_{tot}$ during the iteration steps is visualized in . From the initial guess (educated guess: higher copper concentration at $x 30 nm,$ see (a)) the reconstruction converges to the reference material (see (a)). Alongside the total density of the reconstructed material, we visualize the k-ratio profiles of $(C u, K - L_{2})$ and $(F e, K - L_{2})$ in . The shapes of the curves quickly coincide with the measured k-ratios (black crosses). At iteration 30 the measured k-ratios are already well approximated, the total density, however, is not. For sufficient reconstruction of the ellipsoidal structure, the error in the k-ratios (the objective function) must be further reduced. In , we show the value of the objective function $| | k (p) - k^{\exp} | |^{2}$ and the error in mass concentrations $| | ρ (p) - ρ^{true} | |^{2}$ during the iterations of the optimization. For the first 50 iterations, the value of the objective function decreases, whereas the error in the mass concentrations remains close to the initial error. Only after the first 50 iterations, the error in the mass concentrations also decreases. In , this effect is also observed, where only after iteration 90 the shape of the reference ellipse starts to form.

Figure 10. The reconstructed density $ρ_{tot}$ of the Fe material with a Cu inclusion during the iterations of the L-BFGS optimization.

Figure 11. The k-ratio profile of $(C u, K - L_{2})$ and $(F e, K - L_{2})$ during the iterations of the reconstruction. Black crosses mark the measured k-ratios which are considered in the objective function $| | k (p) - k^{\exp} | |^{2}$ (cf. ). The visualized iterations are the same as in . Note that the k-ratios quickly converge to the measured values, although the shape of the ellipse does not yet coincide with the shape of the ellipse in the true material .

Figure 12. Errors of the objective function $| | k (p) - k^{\exp} | |^{2}$ and the error of the mass concentrations $| | ρ (p) - ρ^{true} | |^{2}$ during the iterations of the L-BFGS optimization. Although the objective function significantly decreases for the first 50 iterations, the error of the mass concentrations remains close to the initial error. The same behavior is noticed in and .

5. Conclusion

In this paper, we derive an adjoint approach to compute derivatives for a deterministic k-ratio model to be applied in reconstruction in EPMA. The k-ratio model is based on the P_N-model, a moment expansion of the continuous slowing down approximation of the linear Boltzmann equation that is pioneered for EPMA in Bünger, Richter, and Torrilhon (Citation2022). We formulate the model with a focus on the inverse problem and describe the application of the model within an adjoint algorithmic differentiation framework. This work thus represents a building block in the development of a high-resolution reconstruction method that uses gradient-based optimization techniques. By replacing material parametrizations, experiments can be realized that reconstruct different material parameters and take into account different prior knowledge.

We conclude the paper with a validation of our differentiation method and reconstruction experiments. The experiments show the influence of the material parametrization in the reconstruction and illustrate our idea of an application of our method. The combination of measurements with prior knowledge makes high-resolution reconstruction problems tractable with a reasonable amount of measurement data. At the same time, we demonstrate the extensibility of our implementation to various material structures.

High-resolution imaging, i.e., the reconstruction of material structures that are smaller than the interaction volume, is possible with sufficient data or sufficient material assumptions. But there will always be a balance between the accuracy of the measurement and the model to the accuracy of the reconstruction. Especially for high resolution, the analysis of this relation is key. To achieve a reliable reconstruction result, the quantification and propagation of measurement and model uncertainties are necessary.

Data/code availability

The source code accompanying this article is made available via GitHub: https://github.com/tam724/pnepma.

Disclosure statement

The authors declare no conflict of interest.

Additional information

Funding

The authors gratefully acknowledge funding from the German Research Foundation (DFG) under grant number TO 414/7-1 as well as additional funding contributed under the excellence strategy of the German federal and state governments.

Notes

1 In IUPAC notation (Jenkins et al. Citation1991), an x-ray level corresponds to an electron configuration that is typically described by the removal of an electron from the configuration of neutral ground state (e.g., K, L₂). A characteristic x-ray is emitted by transitioning from an initial to a final x-ray level, namely the x-ray transition (e.g.,

K - L_{2}

2 In the literature the mass attenuation coefficient is commonly denoted using ${(μ / ρ)}_{α, i} .$ It depends on the medium i and the x-ray wavelength α. For brevity of notation, we write $τ_{α, i} .$

3 For x-ray transitions α_L where the initial state is K, the x-ray production cross-section can be replaced by the ionization cross-section of the K shell (k-ratio normalization eliminates the fluorescence yield). For other transitions, e.g. L or M as the final state, the x-ray production cross-section should include the ionization cross-section of all lower shells weighted by transition probabilities.

4 For brevity of notation we divide both quantities by the respective atomic masses.

References

Achuda, G., T. Claus, S. Richter, and M. Torrilhon. 2023. Subscale inversion of x-ray emission in electron probe microanalysis based on deterministic transport equations. (submitted to Proceedings of the 17th European Workshop on modern developments and applications in microbeam analysis - EMAS2023).
Google Scholar
Bünger, J. 2021. Three-dimensional modelling of x-ray emission in electron probe microanalysis based on deterministic transport equations. PhD thesis. RWTH Aachen University, Number: RWTH-2021-05180.
Google Scholar
Bünger, J., N. Sarna, and M. Torrilhon. 2022. Stable boundary conditions and discretization for PN equations. JCM. 40 (6):977–1003. 10.4208/jcm.2104-m2019-0231
Google Scholar
Bünger, J., S. Richter, and M. Torrilhon. 2022. A model for characteristic x-ray emission in electron probe microanalysis based on the (filtered) spherical harmonic (PN) method for electron transport. Microsc. Microanal. 28 (2):454–68. 10.1017/S1431927622000083
Web of Science ®Google Scholar
Burghardt, O., P. Gomes, T. Kattmann, T. D. Economon, N. R. Gauger, and R. Palacios. 2022. Discrete adjoint methodology for general multiphysics problems. Struct. Multidisc. Optim. 65 (1):28. 10.1007/s00158-021-03117-5
Web of Science ®Google Scholar
Buse, B., and S. L. Kearns. 2020. Spatial resolution limits of EPMA. IOP Conf. Ser: Mater. Sci. Eng. 891 (1):012005. 10.1088/1757-899X/891/1/012005
Google Scholar
Carpenter, P. K., and B. L. Jolliff. 2015. Improvements in EPMA: Spatial resolution and analytical accuracy. Microsc. Microanal. 21 (S3):1443–4. 10.1017/S1431927615007990
PubMedGoogle Scholar
Case, K. M., and P. F. Zweifel. 1967. Linear transport theory. Addison-Wesley series in nuclear engineering. Reading, Mass: Addison-Wesley.
Google Scholar
Dorigo, T., A. Giammanco, P. Vischia, M. Aehle, M. Bawaj, A. Boldyrev, P. de Castro Manzano, D. Derkach, J. Donini, A. Edelen, et al. 2023. Toward the end-to-end optimization of particle physics instruments with differentiable programming. Rev. Phys. 10:100085. 10.1016/j.revip.2023.100085
Google Scholar
Griewank, A. 2003. A mathematical view of automatic differentiation. Acta Numer. 12:321–98. 10.1017/S0962492902000132
Google Scholar
Hager, W. W., and H. Zhang. 2005. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16 (1):170–92. 10.1137/030601880
Web of Science ®Google Scholar
Hüser, J. 2022. Discrete tangent and adjoint sensitivity analysis for discontinuous solutions of hyperbolic conservation laws. PhD thesis. RWTH Aachen University.
Google Scholar
Innes, M. 2018. Don’t unroll adjoint: Differentiating SSA-form programs. CoRR.
Google Scholar
Jenkins, R., R. Manne, R. Robin, and C. Senemaud. 1991. Iupac—nomenclature system for x-ray spectroscopy. X-Ray Spectrom. 20 (3):149–55. 10.1002/xrs.1300200308
Web of Science ®Google Scholar
K. F. J. Heinrich and D. Newbury, eds. 1991. Electron probe quantitation. USA: Springer.
Google Scholar
Kreyszig, E. 1991. Introductory functional analysis with applications. Wiley Classics Library. Wiley.
Google Scholar
Larsen, E. W., M. M. Miften, B. A. Fraass, and I. A. Bruinvis. 1997. Electron dose calculations using the method of moments. Med. Phys. 24 (1):111–25. 10.1118/1.597920
PubMed Web of Science ®Google Scholar
Llovet, X., A. Moy, P. T. Pinard, and J. H. Fournelle. 2021. Electron probe microanalysis: A review of recent developments and applications in materials science and engineering. Prog. Mater. Sci. 116:100673. 10.1016/j.pmatsci.2020.100673
Web of Science ®Google Scholar
Mark, J. C. 1944. The spherical harmonic method, part I. Technical Report MT 92, National Research Council of Canada.
Google Scholar
Mark, J. C. 1945. The spherical harmonic method, part II. Technical Report MT 97, National Research Council of Canada.
Google Scholar
Mogensen, P. K., and A. N. Riseth. 2018. Optim: A mathematical optimization package for Julia. Joss. 3 (24):615. 10.21105/joss.00615
Google Scholar
Moy, A., and J. Fournelle. 2017. Analytical spatial resolution in EPMA: What is it and how can it be estimated? Microsc. Microanal. 23 (S1):1098–9. 10.1017/S1431927617006158
Google Scholar
Moy, A., and J. Fournelle. 2020. Badgerfilm: An open source thin film analysis program. Microsc. Microanal. 26 (S2):496–8. 10.1017/S1431927620014853
Google Scholar
Moy, A., J. H. Fournelle, and A. von der Handt. 2019. Solving the iron quantification problem in low-kV EPMA: An essential step toward improved analytical spatial resolution in electron probe microanalysis—Olivines. Am. Mineralogist 104 (8):1131–42. 10.2138/am-2019-6865
Web of Science ®Google Scholar
Naumann, U. 2011. The art of differentiating computer programs. Philadelphia: Software, Environments and Tools. Society for Industrial and Applied Mathematics.
Google Scholar
Nocedal, J., and S. Wright. 2006. Numerical optimization. Springer series in operations research and financial engineering. 2 ed. New York: Springer-Verlag.
Google Scholar
Olbrant, E. 2012. Models and numerical methods for time- and energy-dependent particle transport. PhD thesis. Aachen, 2012. Aachen, Techn.
Google Scholar
Pinard, P. T., A. Schwedt, A. Ramazani, U. Prahl, and S. Richter. 2013. Characterization of dual-phase steel microstructure by combined submicrometer EBSD and EPMA carbon measurements. Microsc. Microanal. 19 (4):996–1006. 10.1017/S1431927613001554
PubMed Web of Science ®Google Scholar
Plessix, R.-E. 2006. A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J.l Int. 167 (2):495–503. 10.1111/j.1365-246X.2006.02978.x
Web of Science ®Google Scholar
Reimer, L. 1998. Scanning electron microscopy: Physics of image formation and microanalysis. 2 ed. Springer Series in Optical Sciences. Berlin Heidelberg: Springer-Verlag.
Google Scholar
Salvat, F., M. J. Berger, A. Jablonski, I. K. Bronic, J. Mitroy, C. J. Powell, and L. Sanche. 2007. Elastic Scattering of Electrons and Positrons. ICRU Report 77. International Commission on Radiation Units & Measurements, Bethesda, MD.
Google Scholar
Seibold, B., and M. Frank. 2014. Starmap—a second order staggered grid method for spherical harmonics moment equations of radiative transfer. ACM Trans. Math. Softw. 41 (1):1–28. 10.1145/2590808
Web of Science ®Google Scholar
Stuart, A. M. 2010. Inverse problems: A Bayesian perspective. Acta Numer. 19:451–559. 10.1017/S0962492910000061
Google Scholar
Tarantola, A. 2005. Inverse problem theory and methods for model parameter estimation. Society for Industrial and Applied Mathematics.
Google Scholar

Appendix A.

Adjoint continuous implicit differentiation

Consider the implicit definition: Given $x \in X,$ find $y \in Y$ such that $F (y = g (x), x) = 0 .$ We refer to the implicit function with $y = g (x)$ and assume that it is unique at x. Given a (tangent) direction $\dot{x},$ we realize that the directional derivative of F in direction $\dot{x}$ does not change. Employing the chain rule we find the following. (70) ${\frac{\partial}{\partial h} F (g (x + h \dot{x}), x + h \dot{x}) |}_{h = 0} = \frac{\partial F (y, x)}{\partial y} [\frac{\partial g (x)}{\partial x} [\dot{x}]] + \frac{\partial F (y, x)}{\partial x} [\dot{x}] = 0$ (70)

We consider the previous statement in a weak sense, meaning we test the constraint with $\bar{F} \in T,$ where T is an appropriate space.(71) ${〈 \bar{F}, \frac{\partial F (y, x)}{\partial y} [\frac{\partial g (x)}{\partial x} [\dot{x}]] + \frac{\partial F (y, x)}{\partial x} [\dot{x}] 〉}_{T} = 0 \forall \bar{F} \in T$ (71)

We identify the tangent of y with $\dot{y} = \frac{\partial g}{\partial x} [\dot{x}]$ and use adjoint operators ${\frac{\partial F}{\partial y}}^{*} [\cdot]$ and ${\frac{\partial F}{\partial x}}^{*} [\cdot]$ of the tangent operators $\frac{\partial F}{\partial y} [\cdot]$ and $\frac{\partial F}{\partial x} [\cdot]$ to derive the following.(72) $\begin{matrix} {〈 \bar{F}, \frac{\partial F (y, x)}{\partial y} [\dot{y}] 〉}_{T} = - {〈 \bar{F}, \frac{\partial F (y, x)}{\partial x} [\dot{x}] 〉}_{T} \forall \dot{x} \in X \\ \Leftrightarrow {〈 \underset{= \bar{y}}{\underset{⏟}{{\frac{\partial F (y, x)}{\partial y}}^{*} [\bar{F}]}}, \dot{y} 〉}_{Y} & = {〈 \underset{= \bar{x}}{\underset{⏟}{- {\frac{\partial F (y, x)}{\partial x}}^{*} [\bar{F}]}}, \dot{x} 〉}_{X} \forall \dot{x} \in X \end{matrix}$ (72)

Comparing Equation(72)(72) $\begin{matrix} {〈 \bar{F}, \frac{\partial F (y, x)}{\partial y} [\dot{y}] 〉}_{T} = - {〈 \bar{F}, \frac{\partial F (y, x)}{\partial x} [\dot{x}] 〉}_{T} \forall \dot{x} \in X \\ \Leftrightarrow {〈 \underset{= \bar{y}}{\underset{⏟}{{\frac{\partial F (y, x)}{\partial y}}^{*} [\bar{F}]}}, \dot{y} 〉}_{Y} & = {〈 \underset{= \bar{x}}{\underset{⏟}{- {\frac{\partial F (y, x)}{\partial x}}^{*} [\bar{F}]}}, \dot{x} 〉}_{X} \forall \dot{x} \in X \end{matrix}$ (72) with the definition of the adjoint Equation(31)(31) ${〈 \bar{y}, \frac{\partial f (x)}{\partial x} [\dot{x}] 〉}_{Y} = {〈 {\frac{\partial f (x)}{\partial x}}^{*} [\bar{y}], \dot{x} 〉}_{X} \forall \dot{x} \in X, \bar{y} \in Y$ (31) ${〈 \bar{y}, \dot{y} 〉}_{Y} = {〈 \bar{x}, \dot{x} 〉}_{X}$ we derive the two steps for the implicit adjoint. Given $\bar{y} \in Y,$ if we can find $\bar{F} \in T$ such that Equation(73a)(73a) ${\frac{\partial F (y, x)}{\partial y}}^{*} [\bar{F}] = \bar{y}$ (73a) holds, the required adjoint $\bar{x}$ is given by Equation(73b)(73b) $\bar{x} = - {\frac{\partial F (y, x)}{\partial x}}^{*} [\bar{F}] .$ (73b) .(73a) ${\frac{\partial F (y, x)}{\partial y}}^{*} [\bar{F}] = \bar{y}$ (73a) (73b) $\bar{x} = - {\frac{\partial F (y, x)}{\partial x}}^{*} [\bar{F}] .$ (73b)

Example

We consider the Poisson-equation in 2D with homogeneous (Dirichlet) boundary conditions. Given a domain $Ω \subset R^{2}$ with $ξ \in Ω$ and $x : Ω \to R$ a suitable right-hand side. The primal problem is to find $y : Ω \to R$ with $y (ξ) = 0 \forall ξ \in \partial Ω$ such that(74) $F (y, x) = Δ y (\cdot) - x (\cdot) = 0.$ (74)

To derive the adjoint, we first analyze the left-hand side of Equation(72)(72) $\begin{matrix} {〈 \bar{F}, \frac{\partial F (y, x)}{\partial y} [\dot{y}] 〉}_{T} = - {〈 \bar{F}, \frac{\partial F (y, x)}{\partial x} [\dot{x}] 〉}_{T} \forall \dot{x} \in X \\ \Leftrightarrow {〈 \underset{= \bar{y}}{\underset{⏟}{{\frac{\partial F (y, x)}{\partial y}}^{*} [\bar{F}]}}, \dot{y} 〉}_{Y} & = {〈 \underset{= \bar{x}}{\underset{⏟}{- {\frac{\partial F (y, x)}{\partial x}}^{*} [\bar{F}]}}, \dot{x} 〉}_{X} \forall \dot{x} \in X \end{matrix}$ (72) . Using integration by parts twice, we derive the adjoint operator ${\frac{\partial F}{\partial y}}^{*} [\cdot] .$ (75) $〈 \bar{F}, \frac{\partial F (y, x)}{\partial y} [\dot{y}] 〉 = 〈 \bar{F}, Δ \dot{y} 〉 = \underset{= 0}{\underset{⏟}{\int_{\partial Ω} \bar{F} (ξ) \nabla \dot{y} (ξ) n - \dot{y} (ξ) \nabla \bar{F} (ξ) n d S}} + 〈 \underset{= {\frac{\partial F (y, x)}{\partial y}}^{*} [\bar{F}]}{\underset{⏟}{Δ \bar{F}}}, \dot{y} 〉$ (75)

The boundary integral in Equation(75)(75) $〈 \bar{F}, \frac{\partial F (y, x)}{\partial y} [\dot{y}] 〉 = 〈 \bar{F}, Δ \dot{y} 〉 = \underset{= 0}{\underset{⏟}{\int_{\partial Ω} \bar{F} (ξ) \nabla \dot{y} (ξ) n - \dot{y} (ξ) \nabla \bar{F} (ξ) n d S}} + 〈 \underset{= {\frac{\partial F (y, x)}{\partial y}}^{*} [\bar{F}]}{\underset{⏟}{Δ \bar{F}}}, \dot{y} 〉$ (75) yields boundary conditions for the adjoint $\bar{F} .$ Since $\dot{y} (ξ) = 0 \forall ξ \in \partial Ω,$ we assign $\bar{F} (ξ) = 0 \forall ξ \in \partial Ω .$ From the right-hand side of Equation(72)(72) $\begin{matrix} {〈 \bar{F}, \frac{\partial F (y, x)}{\partial y} [\dot{y}] 〉}_{T} = - {〈 \bar{F}, \frac{\partial F (y, x)}{\partial x} [\dot{x}] 〉}_{T} \forall \dot{x} \in X \\ \Leftrightarrow {〈 \underset{= \bar{y}}{\underset{⏟}{{\frac{\partial F (y, x)}{\partial y}}^{*} [\bar{F}]}}, \dot{y} 〉}_{Y} & = {〈 \underset{= \bar{x}}{\underset{⏟}{- {\frac{\partial F (y, x)}{\partial x}}^{*} [\bar{F}]}}, \dot{x} 〉}_{X} \forall \dot{x} \in X \end{matrix}$ (72) we trivially find(76) ${\frac{\partial F (y, x)}{\partial x}}^{*} [\bar{F}] = \bar{F} .$ (76)

Computing the adjoint $\bar{x}$ therefore consists of the following steps. Given $\bar{y} (\cdot)$ find $\bar{F} (\cdot)$ with $\bar{F} (ξ) = 0 \forall ξ \in \partial Ω$ such that Equation(77a)(77a) $Δ \bar{F} = \bar{y}$ (77a) holds. Then, the required adjoint $\bar{x} (\cdot)$ is given by Equation(77b)(77b) $\bar{x} = \bar{F}$ (77b) (trivial for this example).(77a) $Δ \bar{F} = \bar{y}$ (77a) (77b) $\bar{x} = \bar{F}$ (77b)

An Adjoint Method for High-Resolution EPMA Based on the Spherical Harmonics (PN) Model of Electron Transport

Abstract

1. Introduction

1.1. The inverse problem of reconstruction