Abstract
Electron Probe Microanalysis (EPMA) is a nondestructive technique to determine the chemical composition of material samples in the micro- to nanometer range. Based on intensity measurements of x-radiation, the reconstruction of the material composition and structure poses an inverse problem. The reconstruction methods currently applied are based on models that assume a homogeneous or a layered structure of the studied material. To increase the spatial resolution of reconstruction in EPMA the combination of a more sophisticated reconstruction method, which is based on a model that allows complex material structure, together with multiple measurements with varying beam configurations is required. We present a deterministic k-ratio model that is based on the PN model, an approximation of the radiative transfer equation for electron transport. Our goal is to approximate a maximum likelihood solution of the inverse problem using gradient-based optimization. We detail the application of the model in the context of algorithmic differentiation, in particular by deriving its continuous adjoint formulation. Algorithmic differentiation provides the flexibility to adapt the reconstruction method to various material parametrizations and thus to regularize and take into account prior knowledge. Through examples, we verify our implementation and demonstrate the flexibility of the reconstruction/differentiation method.
1. Introduction
Electron Probe Microanalysis (EPMA) (Heinrich and Newbury Citation1991; Reimer Citation1998) is an imaging technique used for the quantitative analysis of the composition of solid material samples at the micro- to nanometer scale. The sample is excited by a focused beam of electrons which induces multiple relaxation processes inside the sample. In EPMA, the emission of characteristic x-rays is of special focus. If an electron that is induced by the beam strikes a bound electron that occupies an atomic shell of an atom inside the specimen, the bound electron is ejected from its shell and the atom is left with a vacancy. Outer shell electrons fill this vacancy by emitting a quantized x-ray with an energy corresponding to the energy level difference of the excited (initial) and the relaxed (final) level. The energy levels of electron shells are characteristic of a specific atom, hence the energy of the emitted x-ray provides information about the composition of the material sample. Reconstructing the chemical composition from intensities of emitted x-radiation (normalized into k-ratios measurement setups) forms the inverse problem in EPMA. In , we outline the main physical processes that occur during an experiment.
The crucial ingredient to the inverse problem of reconstruction is the definition of a model that maps parameters ( parameter space, np: number of parameters) to k-ratios ( number of measurements). Then the reconstruction can be formalized as the following inverse problem: Find the set of parameters such that the modeled k-ratios reproduce the experimental k-ratios as good as possible.
The applications of EPMA are manifold (Llovet et al. Citation2021; Moy, Fournelle, and von der Handt Citation2019; Pinard et al. Citation2013). From geology, and material science to electronics, researchers rely on EPMA to determine the material composition and structure of samples. Every application brings with it different prior knowledge and objectives that must be considered in the reconstruction. We subdivide our model into a k-ratio model and a material parametrization Thereby is the mass concentration of element i in a compound with ne constituents at a point The sample domain will be specified later. While the k-ratio model k can handle general materials, the reconstruction can be tailored to specific applications via the material parametrization (e.g., using prior knowledge). In particular, our definition includes reconstruction approaches currently applied in EPMA, which are however very restrictive on the material parametrizations ZAF-models assume homogeneity of the material inside the interaction volume of each beam. Effectively, this assumption limits the spatial resolution of EPMA to the size of the interaction volume (Buse and Kearns Citation2020; Carpenter and Jolliff Citation2015; Moy and Fournelle Citation2017). Reconstruction methods based on the integration of -curves currently only allow for samples that are layered in depth (Moy and Fournelle Citation2020). We aim to leverage a more sophisticated k-ratio model and the combination of k-ratio measurements from various beam positions and energies (resp. measurement setups ) to obtain a high-resolution spatial image of the material composition.
In this article, we present our model for k-ratios (taken from Bünger, Richter, and Torrilhon Citation2022) with a detail on the dependence on the mass concentrations ρ). The main part is dedicated to the derivation of an efficient and modular approach to compute the derivatives of our model, facilitating its integration into differentiable programming (Dorigo et al. Citation2023). We derive our differentiation approach based on the Adjoint Mode of Algorithmic Differentiation (Griewank Citation2003; Naumann Citation2011) and the Continuous Adjoint Method (Plessix Citation2006). The derivative forms the crucial part in the implementation of gradient-based optimization methods that find a local minimum of an objective function as a candidate solution of the inverse problem. We show the agreement of our gradient computation with finite differences and show the advantages of the adjoint method in terms of computation time. We also provide reconstruction experiments that compare different parametrizations in 1D and show a 2D reconstruction example.
1.1. The inverse problem of reconstruction
In theory (Stuart Citation2010; Tarantola Citation2005), the solution of the inverse problems is defined as knowledge of the posterior uncertainty of the parameters based on the likelihood of the data, which describes the uncertainty of the model and the measurement, and the prior uncertainty of the parameters. So far, we consider a Bayesian inversion of the presented problem as intractable. However, the statistical understanding interprets our approach as finding the maximum likelihood estimate under the assumption of an isotropic Gaussian likelihood. We define the reconstruction problem as the following optimization problem:(1) (1)
Assuming that the objective function is differentiable w.r.t. p, gradient-based optimization methods (Steepest Descent, Conjugate Gradient, L-BFGS) (Nocedal and Wright Citation2006) can be utilized to iteratively approximate a local minimum For an efficient application of gradient-based methods, the efficient computation of the gradient is crucial. Additionally, we require our code to remain modular, such that either different parts of the model and, more importantly, the parametrization of the mass concentrations is exchangeable. We remark that the computation of maximum likelihood estimates can be ambiguous and ill-posed depending on the number of available measurements and the chosen parametrization. Additional prior information about the material would be necessary that could enter our minimization problem Equation(1)(1) (1) as an additional regularization term Regarding the ill-posedness of the inverse problem, we refer to further research and note that the addition of a differentiable regularization term is possible.
2. The forward model
The characteristic x-ray intensity model used in this article combines a model for electron transport based on the continuous slowing down approximation (CSD) of the linear Boltzmann equation (radiative transfer) with a subsequently applied model for x-ray generation and attenuation. In Bünger (Citation2021); Bünger, Sarna, and Torrilhon (Citation2022); Bünger, Richter, and Torrilhon (Citation2022), the model is derived in more detail. Here we derive and summarize the main equations and address the dependence on mass concentrations ρ in detail.
2.1. K-ratios
In EPMA, x-ray intensity is measured by counting all characteristic x-rays emitted by electron scattering induced by the electron beam. Because of multiple uncertain quantities concerning the experimental device, the x-ray intensity is normalized into a k-ratio using standard intensities measured from a known reference sample.(2) (2)
Therefore, the normalization eliminates uncertain multiplicative factors that influence the intensity, e.g., the detector efficiency.
An experiment yields multiple k-ratios. Each k-ratio corresponds to a specific x-ray transition and a specific experimental setup, e.g., the beam position, energy or angle. To distinguish between k-ratios, we use to denote a multi-index, that contains all information about the transition αL and the beam setup αB.(3) (3)
The individual symbols are the atomic number the initial (excited) and the final (relaxed) x-ray level of the electron-transition,Footnote1 the mean and variance of the beam’s position of the beam’s energy and the mean and concentration parameter of the beam’s direction
2.2. Characteristic X-ray emission model
Based on mass concentrations ρ, we aim for a generic model for k-ratios which, according to the usual definition in EPMA, are computed as the ratio of a measured and a standard intensity (4) (4)
Both intensities can be computed from the same model, only the underlying mass concentrations, ρ resp. changes.
Neglecting multiplicative factors that cancel due to the normalization with standard intensities we formulate the intensity of characteristic x-rays as the integral of the attenuation field the number of atoms and the generation field of characteristic x-rays over a domain of the sample which is chosen such that G covers the support of i.e., (5) (5)
The attenuation field describes the intensity reduction of generated x-rays while traveling through the sample toward the detector. Attenuation is quantified by Beer-Lamberts law where the attenuation coefficient is the weighted sum of mass concentrations ρi and mass attenuation coefficients Footnote2(6) (6)
In Equation(6)(6) (6) the integration domain is the straight line from a point toward the detector position such that the line lies inside the domain G. Neglecting the part outside the domain G assumes that x-rays travel unattenuated in the vacuum surrounding the material sample.
The number of atoms (per unit volume) is given by the division of the mass concentration field ρi by the atomic mass of the element (7) (7)
The third factor that influences the x-ray intensities is the field of characteristic x-ray generation.(8) (8)
It contains the electron fluence ( electron velocity, number density of electrons) of beam electrons at with energy into direction inside the material probe which is weighted by the x-ray production cross-section Footnote3 The energy interval E is defined by the cutoff energy which is chosen so small that no more ionization can occur, and which is chosen sufficiently large to capture all beam electrons.
Describing the electron fluence forms the most difficult part of modeling the emission of characteristic x-rays for inhomogeneous samples. We detail our model for the electron fluence in the following section.
2.3. Deterministic electron transport model
The dynamics of the beam electrons are described by a linear Boltzmann equation, an integro-differential equation describing the evolution of electron number density in phase-space (position-velocity space), whereby the physical processes governing the evolution of the electron number density are free-flight phases interrupted by elastic and inelastic collisions with atoms of the material probe. Since the timescale of the physical processes is very small in comparison to the duration of the experimental measurements of x-ray intensities, it is valid to neglect the time dependence and assume a steady-state distribution of beam electrons immediately after switching on the electron beam. Furthermore, as inelastic scattering cross-sections are highly peaked around small energy losses, i.e., electrons most probably lose a significant amount of energy in a sequence of small energy losses, it is common in EPMA to model the energy loss of projectile electrons as a continuous deceleration along the trajectory. These simplifications lead to modeling the beam electrons by the Boltzmann equation of electron transport in continuous slowing down (CSD, resp. BCSD) approximation – an evolution equation in energy space given by (Larsen et al. Citation1997)(9) (9) where the energy loss is governed by the stopping power S and the scattering operator accounts for elastic and inelastic collisions (CSD-average). Both, the stopping power S and the scattering operator are derived from cross-sections for compounds, which are defined by means of the additivity approximation. We use the cross-sections from Salvat et al. (Citation2007) that are also used in Bünger (Citation2021); Olbrant (Citation2012). Assuming a well-defined atomic mass, we can write both as the sum of atomic quantities Si and Footnote4 weighted by mass concentrations ρi.(10) (10) (11) (11)
The BCSD model describes the evolution of the electron fluence in energy space and consequently needs to be equipped with initial and boundary conditions. As the evolution is prescribed from high to low energies an initial condition defines the electron fluence at the maximal energy (12) (12)
We will initialize the electron fluence at an energy sufficiently larger than the mean beam energy by and model the electron beam by boundary conditions. The electron transport model can be equipped with Dirichlet-type boundary conditions by prescribing the incoming half of the electron fluence at the boundary(13) (13) where n denotes the outward-pointing normal-vector at and is a given distribution of incoming particles. We model electron beams by a product of Gaussian distributions in space and energy together with a von-Mises-Fisher distribution in direction. An electron beam that is aligned with the outward-pointing normal n of the polished material surface and focused onto the point is modeled by(14) (14) where is the probability density function of a Gaussian with mean and covariance Respectively the distribution in energy (Gaussian with mean and variance ) and the distribution in direction (von-Mises-Fisher with mean and concentration ) is defined.
2.4. Spherical harmonic approximation
A numerical solution of the electron transport model in EquationEquation (9)(9) (9) is expensive due to the high dimension of phase space of and complexity of the scattering operator Therefore, we employ the method of moments to derive a modal approximation of the linear Boltzmann equation. We reduce the phase space by replacing the direction Ω as an independent variable by a small set of moments to model the electron fluence ψ. Here we consider the spherical harmonic (PN) moment approximation (Case and Zweifel Citation1967; Mark Citation1944; Mark Citation1945; Seibold and Frank Citation2014), i.e., we approximate the electron fluence ψ by its series expansion in spherical harmonics truncated after some finite degree (15) (15) where the expansion coefficients are the spherical harmonic moments of the angular distribution(16) (16) and we model the fluence through the system of evolution equations of its moments which we refer to as the PN equations. The PN equations are obtained by weakly enforcing EquationEquation (9)(9) (9) for on the test space spanned by spherical harmonics up to degree N. By gathering the coefficients and ansatz functions into vector functions and such that the equation for can be written as (17) (17)
With the following definitions for and (18) (18) (19) (19)
Using the method of moments, we reduced the phase space by two dimensions at the cost of replacing a scalar equation with a system of equations. Additionally, and Q exhibit structures that can be exploited to develop efficient numerical solvers. is diagonal as spherical harmonics are eigenfunctions of the collision operator such that the computational cost of the scattering operation grows linearly in the number of moments.
The transport matrices inherit a sparsity pattern from the recursion relation of spherical harmonics. When ordering spherical harmonics that are odd (even) in Cartesian direction d into a vector function (resp. ) and choosing the matrix is of the following form.(20) (20) (21) (21)
Hence, the even (odd) moments couple with the odd (even) moments only through their spatial derivatives in direction d, which can be employed by central finite difference schemes on staggered grids (Seibold and Frank Citation2014).
We employ the boundary conditions introduced in Bünger, Sarna, and Torrilhon (Citation2022). The boundary conditions follow from taking half moments of the Boltzmann boundary condition Equation(13)(13) (13) with odd (in direction ) spherical harmonics as weighting functions and a stabilization step that is similar to the truncation of the series expansion in Equation(15)(15) (15) . For a given boundary point with outward pointing boundary normal ed we formulate boundary conditions for the moment vector The boundary conditions are given by(22) (22) (23) (23)
These boundary conditions are compatible with the characteristics of the PN Equationequations (17)(17) (17) and allow us to assure energy stability, see Bünger, Sarna, and Torrilhon (Citation2022) for details. From Equation(12)(12) (12) we derive the following initial condition for the moments (24) (24)
The PN Equationequations (17)(17) (17) describe the evolution of all moments of whereas the intensity model Equation(8)(8) (8) only requires the angular average of because we assumed the isotropic emission of x-rays. The spherical harmonic is constant, so we identify the moment as part of our x-ray emission model and simplify the integral Equation(8)(8) (8) to the following.(25) (25)
2.5. Mass concentration model
Typically, the quantities of interest in EPMA are weight fractions that form a partition of unity In order to determine mass concentrations from weight fractions additional assumptions are necessary. We neglect the influence of different molecular or crystal structures and assume that the compound is formed in a volume-preserving manner. Then the total volume is We derive the following relation between mass fractions and mass concentrations.(26) (26)
As a first possibility, we model weight fractions as a piecewise-constant function. We subdivide the spatial domain G into disjoint subdomains Gk and introduce the indicator function To guarantee the partition of unity we only use parameters and define the parametrization(27) (27) with the additional constraints, that and Together with Equation(26)(26) (26) , Equation(27)(27) (27) forms the piecewise-constant material parametrization In the examples, we will replace the weight fraction parametrization to investigate other parametrizations
3. Differentiation of the forward model
As described in the introduction, we consider the inverse problem as a minimization problem and apply iterative gradient-based optimization methods to find a candidate solution (28) (28)
Gradient-based optimization methods require the gradient of the objective function An efficient implementation of the gradient of our model, that simultaneously remains extensible and modular (particularly regarding the parametrization) is a challenge. While automatic algorithmic differentiation tools have the potential to transform any numerical code, the current versions of AD tools in julia (Zygote.jl, ReverseDiff.jl) cannot be applied to our model due to memory limitations and lack of performance.
In the following, we present a method, that adopts the structural implementation of derivative code of numerical code from algorithmic differentiation (AD). We focus on the adjoint mode of AD, since we are mainly interested in the gradient of the scalar objective function. The advantage of the adjoint mode is that it scales with the number of outputs, whereas with numerical differentiation (finite differences) or the tangent mode of AD, the complexity scales with the number of inputs. Also, the analytical derivation of our method provides a clear understanding that facilitates its memory-efficient implementation.
The core idea of AD is to base the derivative computation on local function transformations that are automatically composed. This also achieves modularity, i.e., if a particular function is modified, only the corresponding transformation needs to be adjusted.
3.1. Adjoint algorithmic differentiation
Algorithmic differentiation (AD) (Burghardt et al. Citation2022; Hüser Citation2022; Innes Citation2018; Kreyszig Citation1991; Naumann Citation2011) defines a framework to compute derivatives of numerical program code. In contrast to symbolic or numerical differentiation (finite differences), algorithmic differentiation is without approximation error and modular, so the code remains extensible. Also, the automatic differentiation of numerical code is possible. We review the basic definitions of AD with a (unusual) focus on high/infinite dimensional spaces. Alongside a motivating example, we describe the systematic composition into code that is able to compute derivatives.
Consider a differentiable function where X and Y are Hilbert spaces and U is the open domain of f. Given an input we define the tangent operator as the linear bounded operator that satisfies(29) (29)
Additionally, given a tangent of the input, we define the tangent of the output by(30) (30)
Given the tangent operator the corresponding adjoint operator is defined by(31) (31) where and are the respective inner products of X and Y. Similarly to the tangent, given an adjoint of the output, we define the adjoint of the input by(32) (32)
Consequently, the identity holds for all and
In real, finite dimensional spaces (e.g., ), the tangent operator can be represented by the Jacobian and the adjoint operator by its transpose For examples in infinite dimensional (function) spaces, we refer to Sec. 3.2.
The core idea of AD is to subdivide a function into individual subfunctions (with individual tangent and adjoint operators) and intermediate variables (with intermediate tangents and adjoints). As an example, consider the following decomposition of into two subfunctions and where we use an intermediate variable (V is another Hilbert space).(33) (33)
Given and as well as an input tangent the tangent of is given by(34) (34) where we employ the chain rule to split the tangent operator of f into a combination of tangent operators of g and h. Additionally, we identify the tangent of the intermediate variable.
To find the corresponding adjoint operator, we successively use adjoint operators of the subfunctions g and h to isolate in the inner product identity(35) (35)
To summarize, given the adjoint the adjoints and are(36) (36)
Based on the chain rule, the composition of individual tangent and adjoint operators of a larger decomposition into an algorithm to compute the derivative is very systematic and can be implemented automatically. We describe the methodology using an arbitrary decomposition of a function with n inputs and m outputs into subfunctions (37) (37)
With we denote all predecessors of vi, i.e., all intermediate variables vj that vi directly depends on, and with we denote all successors of vi. Tangent mode AD is the successive computation of tangents(38) (38)
Given an input tangent a tangent represents the directional derivative of vi into direction Note that the tangent mode follows the sequence of operations in the evaluation of the primal. In adjoint mode AD, the sequence of operations is reversed. To compute an adjoint we consider the adjoints of all intermediate variables, that depend on vi.(39) (39)
Usually, in AD literature the single assignment code is considered, where all intermediate variables are scalar, and the tangent operators (resp. multiplication with a scalar) is self-adjoint. Then the can be neglected. For real multivariate intermediate variables the adjoint is the transpose
We briefly comment on some aspects of AD:
Computational Complexity: Implementations of tangent and adjoint mode AD differ in algorithmic complexity. The Jacobian of a function is obtained using tangent mode AD by successive executions of Equation(38)(38) (38) where the input tangents are seeded with unit vectors Elements of the Jacobian are then retrieved from In contrast to the tangent mode, using adjoint mode AD the Jacobian is obtained by successive applications of Equation(39)(39) (39) where the output adjoints are seeded with unit vector Elements of the Jacobian are then retrieved from Hence, to compute the full Jacobian, an implementation of tangent mode scales in the number of input variables while the adjoint mode scales in the number of output variables.
Modularity and Extensibility: An implementation of either tangent or adjoint mode AD is modular and extensible. It consists of a systematic composition of individual tangent and adjoint operators. For example, changing one subfunction in Equation(37)(37) (37) only affects its corresponding tangent and adjoint operator in Equation(38)(38) (38) resp. Equation(39)(39) (39) . Implementations of the other operators remain.
Computational Graph: Both modes of AD can be visualized using a computational graph consisting of vertices for each variable (input, output, intermediate) and directed edges for direct dependence of variables. Assume a labeling of the edges with their corresponding tangent/adjoint operator applied to the tail(tangent)/head(adjoint) of the edge. The tangent is defined as the sum of all incoming edge tangent operators, while the adjoint is defined as the sum of all outgoing edge adjoint operators (c.f. Equations Equation(38)(38) (38) and Equation(39)(39) (39) ).
Checkpointing – Adjoint Memory Consumption: While the tangents can be implemented alongside the execution of the primal function and intermediate primal variables (x, v) are available, they must be kept in memory for the computation of adjoints. Storing all intermediate variables is expensive. Checkpointing remedies the high memory consumption by recomputing certain intermediate variables while computing the adjoint operators. In the example Equation(36)(36) (36) , the primal would be recomputed while evaluating the adjoint of h.
Our model includes the special case, that the mapping from to is given implicitly by the PDE in EquationEquation (17)(17) (17) . We state the two basic steps of calculating adjoints in implicit relations and refer to the Appendix A for a more detailed derivation. Consider the implicit definition (with differentiable F(y, x)), where we assume that it uniquely defines the differentiable mapping We introduce an additional intermediate adjoint variable Then, given the adjoint of the output, the adjoint of the input can be computed from(40) (40) without explicit definition of
3.2. Differentiation method for our model
In the following, we will successively define adjoint operators for the individual parts of the model described in Sec. 2. All derivations follow the same scheme: First, we define the tangent operator using the directional derivative and second we derive the adjoint operator from the inner product identity. We will elaborate on complex derivations and only state the adjoint operator for simple ones. In this paper, the only relevant derivatives are with respect to the mass concentrations ρ, hence all quantities that do not depend on ρ are treated as constants and their adjoints are omitted.
Compared to the forward computation, adjoints are computed in reversed order. We follow this behavior while deriving the adjoint operators for the individual dependencies. For brevity of notation we omit the accumulation of adjoints (tangents) and respectively only consider the operators that describe the direct propagation. The accumulation becomes apparent in the high-level computational graph in . If multiple edges leave a node, the adjoints of all outgoing edges have to be summed up.
The final operator Equation(1)(1) (1) of our model computes the squared error J from a set of simulated k-ratios Given the adjoint which for the computation of derivatives must be chosen the definition of the adjoint operator of the squared error from EquationEquation (1)(1) (1) is trivial.(41) (41)
Also the adjoint operator for normalization with standard intensities in EquationEquation (4)(4) (4) is derived by simple means.(42) (42)
We elaborate on the adjoint operators derived from EquationEquation (5)(5) (5) . Linearity of the integral in Equation(5)(5) (5) directly yields the tangent Using the standard definition of the inner products in and in we derive adjoints and We detail (43) (43)
Accordingly, the adjoints and are derived.(44) (44)
From EquationEquation (7)(7) (7) , we find the adjoint operator.(45) (45)
Deriving the adjoint from the attenuation operator given in Equation(6)(6) (6) requires more analysis. The tangent of the absorption operator is given by the directional derivative into direction (46) (46)
By inserting the tangent into the definition of the adjoint Equation(31)(31) (31) , we derive the adjoint of the attenuation operator. In the following derivation, we rewrite the line integral from Equation(46)(46) (46) as an integral over the whole space G by introducing to denote the Dirac, that indicates the line segment So denotes the distance of y to the line segment between x and xd.(47) (47)
By swapping the order of integration to we are able to separate but the interpretation of changes. For a given y, now indicates the points x of all lines that contain y. In other words: indicates the line from y to the point reflection of at Exemplarily, we illustrate and in . From Equation(47)(47) (47) , we identify the adjoint operator of Equation(6)(6) (6) .(48) (48) Where the integration domain is the line between x and the reflection of xd at x.
In EquationEquation (25)(25) (25) , the only relevant moment for the ionization field is hence the adjoint (49) (49)
The relation between ρ and is implicitly given by the PN equations and the boundary conditions, i.e., by the implicit PDE operator Equation(17)(17) (17) . We describe the derivation of the adjoint operators Equation(40)(40) (40) for the PN equations. For a detailed derivation of Equation(40)(40) (40) we refer to the Appendix A. Let us introduce the auxiliary adjoint and derive the operators and in the following. Successively, both operators form the adjoint In particular, we elaborate on the identities(50) (50)
The adjoint operator applied to describes the continuous adjoint equation.(51) (51) In analogy to the reversed application of adjoint operators, the adjoint equation describes the evolution of the adjoint in reversed direction (w.r.t. the energy ϵ). Initial and boundary conditions follow from the derivation.(52) (52) (53) (53)
Derivation Linearity of in directly yields the tangent Using integration by parts and symmetry of and Q, we obtain(54) (54)
The last two terms in Equation(54)(54) (54) vanish due to the initial and boundary conditions imposed to and The former term is zero because of the conditions and (55) (55)
The latter term in Equation(54)(54) (54) vanishes due to the choice of the boundary conditions for and With the divergence theorem ( the boundary normal)(56) (56) and the boundary conditions () and the integrand in Equation(56)(56) (56) is zero because(57) (57)
Using that defined in Equation(23)(23) (23) , is symmetric.
The tangent follows directly from linearity of (and S and Q) in ρ. For the adjoint we find(58) (58) and identify(59) (59)
The adjoint of the mass concentration model Equation(26)(26) (26) can again be derived by simple means.(60) (60)
The indicator functions in Equation(27)(27) (27) lead to a partitioning of the integration domain G into Gk for the adjoints (61) (61)
This completes the derivation of adjoint formulations for all parts of the model presented in Sec. 2. Note that for nonlinear operators, the primary result is required to calculate the adjoint, and therefore must be saved during the forward execution or recalculated. This is analogous to discrete adjoint tools. Furthermore, all derived operators are modular and only need to be assembled according to the computational graph. When replacing parts of the model, only the corresponding adjoint operator needs to be adjusted. In the examples in Sections 4.3 and 4.4, we will use this modularity and exchange the parametrizations
3.3. Computational effort
The motivation for deriving and implementing the adjoint formulation of our model is that of the efficiency advantage over finite differences and the tangent mode of AD. Our implementation inherits the runtime complexity of the adjoint mode, as the runtime of a Jacobian calculation scales in the number of output variables of the primal. The price to pay for faster execution is the higher memory requirement since all primal variables have to be stored during forward execution or recalculated later.
Leveraging the convenience of automated AD tools to derive the discrete adjoint code of our implementation was not possible with current tools in julia. However, our approach somewhat mimics the development approach that tools, e.g., Zygote.jl or ChainRulesCore.jl take. Instead of implementing only tangent and adjoint versions of low-level operations, they aim for efficient implementations of high-level methods. We adopt this approach and derive an adjoint version in the continuous framework and implement discretized versions of the operators in code.
The high-level view offers further insight, and one more possibility to reduce the computational effort for the gradient. Note that the naive adjoint implementation requires “solves” of the adjoint PDE (same amount as the forward). We separate the multi-index into where αL describes the k-ratio line (material, transition) and αB the beam setup (position, energy). We revisit EquationEquations (49) (49) Equation(44)(44) (44) Equation(49)(49) (49) and Equation(51)(51) (51) . We realize that the only dependence of all equations from αB is due to the scalar Instead of computing and we consider and eliminate thereby the dependence from the beam setup αB. The operators for and do not depend on αB. The accumulation of the adjoint (cf. EquationEquation (59)(59) (59) , where we sum over all incoming edges in the computational graph ), can then be rewritten to(62) (62)
Instead of solving the adjoint PDE for every it is sufficient to compute for every αL. The number of expensive adjoint PDE solves that are required is reduced from to
4. Numerical examples
Preliminary results for the examples presented in section 4.3 and 4.4 can be found in (Achuda et al. Citation2023).
4.1. Implementation and discretization
Conceptually the discretization of the PN-equation Equation(17)(17) (17) and the adjoint PN-equation Equation(51)(51) (51) are the most challenging. For the numerical computation of the moments and we implemented a 3D staggered grid finite-difference method based on the method StaRMAP presented in (Seibold and Frank Citation2014). StaRMAP is specifically designed to exploit the structure of PN-equations. Individual sets of moments are discretized on the staggered grids in such a way, that sparsity of the transport matrices decouples the system and enables the use of second-order central differences that approximate the spatial derivative on the respective displaced grid. Second-order integration in energy is achieved using a splitting of the moments and half steps for each part of the solution.
The boundary conditions in Equation(22)(22) (22) and Equation(53)(53) (53) are used to compute moments on ghost nodes, located outside the computational domain G. Values of the moments for ghost nodes are set, such that the boundary condition holds for interpolated moments on the boundary of the domain For details see Bünger (Citation2021).
For integrations in energy and space as well as the line integral for the absorption we use the trapezoidal rule.
4.2. Comparison: sensitivities of a 1D material with finite differences
We consider a binary material consisting of copper Cu and nickel Ni and compare sensitivities computed using our implementation (resp. where is seeded with unit vectors) with sensitivities computed by central finite differences. By means of this comparison of our implementation to finite differences we address several questions at the same time:
By introducing a second parametrization, we exemplarily show the modularity of our implementation.
We provide insight into the nature of the model. The sensitivity is the linear approximation of intensities with respect to model parameters, hence a large sensitivity relates to a strong dependency and a small sensitivity to a weak dependency. This is particularly interesting for the inverse problem because it also unveils possible difficulties in the reconstruction.
Agreement with finite differences validates our implementation.
We demonstrate the claimed superiority of our method compared to finite differences in terms of computation time.
We introduce an alternative parametrization, where are triangular functions covering the domain G, in particular The partition of unity is (again) guaranteed by and We refer to the parametrization as piecewise-linear.(63) (63)
The adjoint of the piecewise-linear parametrization is given by(64) (64)
Modularity is set out by the fact, that only the parametrization and its adjoint are implemented, the rest of the code remains unchanged.
Now both parametrizations, the piecewise-constant parametrization (with nk = 20 parameters) and the piecewise-linear parametrization (also with nk = 20 parameters) are employed to compute sensitivities resp. The physical setup and the settings of the PN approximation for this example can be found in . Parameter values for the parametrizations are sampled independently from a uniform distribution in
We visualize the sensitivities by plotting and in . Using the parametrizations is gentle abuse but since the parameters in and are related to space points, an interpretation of the plot is possible. Increasing a parameter, increases the mass concentration of Cu and simultaneously decreases the mass concentration of Ni, hence the positive values for Cu x-rays and the negative value for Ni x-rays. In depth the sensitivity decreases because with greater depth the generation of x-rays decreases and the absorption increases.
also compares sensitivities computed by our method (colored, solid lines) with the sensitivities computed by central finite differences (black, dashed lines). The curves coincide. Additionally, we compare relative errors (65) (65) of sensitivities in . Derivatives computed by algorithmic differentiation are exact (AD computes the exact derivative of the numerical approximation up to machine precision) (Naumann Citation2011). Our method as well as finite differences only provide approximations of the derivative. For our method, the underlying approximation of the adjoint state variable introduces errors, for finite differences the approximation error originates from the perturbation h.
In , the superiority of our method in terms of computation time compared to finite differences becomes apparent. We compare the runtime of a central finite difference method (2nd order) with our implementation for two different numbers of parameters (np = 20 and np = 40). The runtime of the forward model is and the computation of the derivative with finite differences clearly scales with the number of parameters np. The runtime of our implementation scales in the number of outputs (here ) and clearly does not scale with np. Note the minor overhead because of the backward execution.
4.3. 1D reconstruction of a sharp and a diffusive interface
We present the reconstruction of a coated material consisting of iron Fe and nickel Ni with a sharp (discontinuous) and a diffusive (continuous) interface between the layers.
We show the necessity of choosing a suitable parametrization.
We compare the previously defined parametrizations (Equation(27)(27) (27) and Equation(63)(63) (63) ) to an additional non-linear parametrization Equation(66)
We introduce another parametrization that is based on the non-linear structure of a neural network (with x as the input and as the output. The activation function for one hidden layer is and for the output layer to enforce ). The parametrization uses np = 10 parameters, which we call (66a) (66a) (66b) (66b) (66c) (66c)
The adjoints for the non-linear parametrization are given in the following.(67a) (67a) (67b) (67b) (67c) (67c) (67d) (67d) (67e) (67e)
For a fair comparison, the number of parameters in the other parametrizations is also chosen to be 10. For the binary material the scalar completely describes the material, because
The artificial measurements for the reconstruction are k-ratios from multiple different beam energies between and To compute the measurements we assume the reference material to be given. Additional PN model settings and the considered x-rays are tabulated in . We choose the BFGS method implemented in Optim.jl (Mogensen and Riseth Citation2018) as optimization method (with additional box-constraints for the piecewise-constant and the piecewise-linear parametrization, and no constraints for the non-linear parametrization).
In , the reconstructed density of the material with a sharp interface is visualized using the different parametrizations. From initial configurations (piecewise-constant, piecewise-linear: 0.5 Fe and 0.5 Ni; non-linear: random) the parametrizations iterate toward the reference density (black line) during the optimization. We visualize the initial, first, fifth iteration and the 100th iteration. The inability of the piecewise-linear parametrization to represent discontinuities becomes apparent. However, the piecewise-constant parametrization only approximates perfectly because one of the subdomain interfaces matches the interface of the reference material. On the other hand, the non-linear parametrization is flexible enough to identify the location of the interface and to approximate the discontinuity.
In , the reconstructed density of the material with a diffusive interface is visualized. Except for the reference material, exactly the same settings as for the reconstruction of the sharp interface are used. Again, the parametrizations iterate toward the reference density (black line). For the diffusive interface, none of the parametrizations can reconstruct the interface perfectly, but the piecewise-linear and the non-linear parametrizations perform better than the piecewise-constant parametrization.
In , we visualize the normalized error, the value of the objective function the error of the mass concentrations and the error of additional k-ratio measurements (with beam energies and ) during the iterations of the optimization. The kinks in the error of the objective function are an artifact of the Optim.jl implementation. Their default implementation is based on the combination of a line search method (Hager and Zhang Citation2005) and the BFGS algorithm presented in (Nocedal and Wright Citation2006).
Differences in the performance of the different parametrizations are clearly visible. The error in the mass concentrations for the piecewise-constant parametrization for the sharp interface is by far the smallest, due to the possible perfect reconstruction. Only the non-linear parametrization performs similarly well for both examples.
Comparing the error in mass concentrations for piecewise-constant and piecewise-linear parametrization for the sharp and the diffusive interface, they vary. A correlation between the error in mass concentrations and the error of additional k-ratios would be beneficial but is hard to obtain visually. Ideally, both would behave similarly, because then we could propose the error in additional k-ratios as a suitable reconstruction quality measure. However, the evaluation of reconstruction quality measures is beyond the scope of this work and is deferred to further research.
4.4. 2D reconstruction of an ellipsoidal inclusion
We describe the use case of a reconstruction of an ellipsoidal copper Cu inclusion in an iron Fe substrate from k-ratios obtained from a electron beam and multiple beam positions. In , we plot k-ratios obtained from an artificial line-scan of the inclusion. Only the ten k-ratios retrieved from five beam positions which are marked by black crosses, are used for the reconstruction. To compute the k-ratios, the reference material is assumed to be given. We visualize the total density of the reference material in and additionally show the ionization distributions of and that define the size of the interaction volume. The ellipsoidal Cu inclusion is clearly smaller than the interaction volumes. Additional settings of the k-ratio model are shown in .
The profile shows an increase in the k-ratio for a beam position close to We assume an elliptical Cu inclusion which we parametrize by where μ1 and μ2 specify position, a and b specify scaling factors of the principal axes, r specifies rotation and and specify the inclusion interface. The weight fractions are given by the following.(68a) (68a) (68b) (68b) (68c) (68c)
And their adjoint(69a) (69a) (69b) (69b) (69c) (69c) (69d) (69d)
For the reconstruction, we use the L-BFGS algorithm implemented in Optim.jl (Mogensen and Riseth Citation2018). The total density of the material during the iteration steps is visualized in . From the initial guess (educated guess: higher copper concentration at see (a)) the reconstruction converges to the reference material (see (a)). Alongside the total density of the reconstructed material, we visualize the k-ratio profiles of and in . The shapes of the curves quickly coincide with the measured k-ratios (black crosses). At iteration 30 the measured k-ratios are already well approximated, the total density, however, is not. For sufficient reconstruction of the ellipsoidal structure, the error in the k-ratios (the objective function) must be further reduced. In , we show the value of the objective function and the error in mass concentrations during the iterations of the optimization. For the first 50 iterations, the value of the objective function decreases, whereas the error in the mass concentrations remains close to the initial error. Only after the first 50 iterations, the error in the mass concentrations also decreases. In , this effect is also observed, where only after iteration 90 the shape of the reference ellipse starts to form.
5. Conclusion
In this paper, we derive an adjoint approach to compute derivatives for a deterministic k-ratio model to be applied in reconstruction in EPMA. The k-ratio model is based on the PN-model, a moment expansion of the continuous slowing down approximation of the linear Boltzmann equation that is pioneered for EPMA in Bünger, Richter, and Torrilhon (Citation2022). We formulate the model with a focus on the inverse problem and describe the application of the model within an adjoint algorithmic differentiation framework. This work thus represents a building block in the development of a high-resolution reconstruction method that uses gradient-based optimization techniques. By replacing material parametrizations, experiments can be realized that reconstruct different material parameters and take into account different prior knowledge.
We conclude the paper with a validation of our differentiation method and reconstruction experiments. The experiments show the influence of the material parametrization in the reconstruction and illustrate our idea of an application of our method. The combination of measurements with prior knowledge makes high-resolution reconstruction problems tractable with a reasonable amount of measurement data. At the same time, we demonstrate the extensibility of our implementation to various material structures.
High-resolution imaging, i.e., the reconstruction of material structures that are smaller than the interaction volume, is possible with sufficient data or sufficient material assumptions. But there will always be a balance between the accuracy of the measurement and the model to the accuracy of the reconstruction. Especially for high resolution, the analysis of this relation is key. To achieve a reliable reconstruction result, the quantification and propagation of measurement and model uncertainties are necessary.
Data/code availability
The source code accompanying this article is made available via GitHub: https://github.com/tam724/pnepma.
Disclosure statement
The authors declare no conflict of interest.
Additional information
Funding
Notes
1 In IUPAC notation (Jenkins et al. Citation1991), an x-ray level corresponds to an electron configuration that is typically described by the removal of an electron from the configuration of neutral ground state (e.g., K, L2). A characteristic x-ray is emitted by transitioning from an initial to a final x-ray level, namely the x-ray transition (e.g., ).
2 In the literature the mass attenuation coefficient is commonly denoted using It depends on the medium i and the x-ray wavelength α. For brevity of notation, we write
3 For x-ray transitions αL where the initial state is K, the x-ray production cross-section can be replaced by the ionization cross-section of the K shell (k-ratio normalization eliminates the fluorescence yield). For other transitions, e.g. L or M as the final state, the x-ray production cross-section should include the ionization cross-section of all lower shells weighted by transition probabilities.
4 For brevity of notation we divide both quantities by the respective atomic masses.
References
- Achuda, G., T. Claus, S. Richter, and M. Torrilhon. 2023. Subscale inversion of x-ray emission in electron probe microanalysis based on deterministic transport equations. (submitted to Proceedings of the 17th European Workshop on modern developments and applications in microbeam analysis - EMAS2023).
- Bünger, J. 2021. Three-dimensional modelling of x-ray emission in electron probe microanalysis based on deterministic transport equations. PhD thesis. RWTH Aachen University, Number: RWTH-2021-05180.
- Bünger, J., N. Sarna, and M. Torrilhon. 2022. Stable boundary conditions and discretization for PN equations. JCM. 40 (6):977–1003. 10.4208/jcm.2104-m2019-0231
- Bünger, J., S. Richter, and M. Torrilhon. 2022. A model for characteristic x-ray emission in electron probe microanalysis based on the (filtered) spherical harmonic (PN) method for electron transport. Microsc. Microanal. 28 (2):454–68. 10.1017/S1431927622000083
- Burghardt, O., P. Gomes, T. Kattmann, T. D. Economon, N. R. Gauger, and R. Palacios. 2022. Discrete adjoint methodology for general multiphysics problems. Struct. Multidisc. Optim. 65 (1):28. 10.1007/s00158-021-03117-5
- Buse, B., and S. L. Kearns. 2020. Spatial resolution limits of EPMA. IOP Conf. Ser: Mater. Sci. Eng. 891 (1):012005. 10.1088/1757-899X/891/1/012005
- Carpenter, P. K., and B. L. Jolliff. 2015. Improvements in EPMA: Spatial resolution and analytical accuracy. Microsc. Microanal. 21 (S3):1443–4. 10.1017/S1431927615007990
- Case, K. M., and P. F. Zweifel. 1967. Linear transport theory. Addison-Wesley series in nuclear engineering. Reading, Mass: Addison-Wesley.
- Dorigo, T., A. Giammanco, P. Vischia, M. Aehle, M. Bawaj, A. Boldyrev, P. de Castro Manzano, D. Derkach, J. Donini, A. Edelen, et al. 2023. Toward the end-to-end optimization of particle physics instruments with differentiable programming. Rev. Phys. 10:100085. 10.1016/j.revip.2023.100085
- Griewank, A. 2003. A mathematical view of automatic differentiation. Acta Numer. 12:321–98. 10.1017/S0962492902000132
- Hager, W. W., and H. Zhang. 2005. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16 (1):170–92. 10.1137/030601880
- Hüser, J. 2022. Discrete tangent and adjoint sensitivity analysis for discontinuous solutions of hyperbolic conservation laws. PhD thesis. RWTH Aachen University.
- Innes, M. 2018. Don’t unroll adjoint: Differentiating SSA-form programs. CoRR.
- Jenkins, R., R. Manne, R. Robin, and C. Senemaud. 1991. Iupac—nomenclature system for x-ray spectroscopy. X-Ray Spectrom. 20 (3):149–55. 10.1002/xrs.1300200308
- K. F. J. Heinrich and D. Newbury, eds. 1991. Electron probe quantitation. USA: Springer.
- Kreyszig, E. 1991. Introductory functional analysis with applications. Wiley Classics Library. Wiley.
- Larsen, E. W., M. M. Miften, B. A. Fraass, and I. A. Bruinvis. 1997. Electron dose calculations using the method of moments. Med. Phys. 24 (1):111–25. 10.1118/1.597920
- Llovet, X., A. Moy, P. T. Pinard, and J. H. Fournelle. 2021. Electron probe microanalysis: A review of recent developments and applications in materials science and engineering. Prog. Mater. Sci. 116:100673. 10.1016/j.pmatsci.2020.100673
- Mark, J. C. 1944. The spherical harmonic method, part I. Technical Report MT 92, National Research Council of Canada.
- Mark, J. C. 1945. The spherical harmonic method, part II. Technical Report MT 97, National Research Council of Canada.
- Mogensen, P. K., and A. N. Riseth. 2018. Optim: A mathematical optimization package for Julia. Joss. 3 (24):615. 10.21105/joss.00615
- Moy, A., and J. Fournelle. 2017. Analytical spatial resolution in EPMA: What is it and how can it be estimated? Microsc. Microanal. 23 (S1):1098–9. 10.1017/S1431927617006158
- Moy, A., and J. Fournelle. 2020. Badgerfilm: An open source thin film analysis program. Microsc. Microanal. 26 (S2):496–8. 10.1017/S1431927620014853
- Moy, A., J. H. Fournelle, and A. von der Handt. 2019. Solving the iron quantification problem in low-kV EPMA: An essential step toward improved analytical spatial resolution in electron probe microanalysis—Olivines. Am. Mineralogist 104 (8):1131–42. 10.2138/am-2019-6865
- Naumann, U. 2011. The art of differentiating computer programs. Philadelphia: Software, Environments and Tools. Society for Industrial and Applied Mathematics.
- Nocedal, J., and S. Wright. 2006. Numerical optimization. Springer series in operations research and financial engineering. 2 ed. New York: Springer-Verlag.
- Olbrant, E. 2012. Models and numerical methods for time- and energy-dependent particle transport. PhD thesis. Aachen, 2012. Aachen, Techn.
- Pinard, P. T., A. Schwedt, A. Ramazani, U. Prahl, and S. Richter. 2013. Characterization of dual-phase steel microstructure by combined submicrometer EBSD and EPMA carbon measurements. Microsc. Microanal. 19 (4):996–1006. 10.1017/S1431927613001554
- Plessix, R.-E. 2006. A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J.l Int. 167 (2):495–503. 10.1111/j.1365-246X.2006.02978.x
- Reimer, L. 1998. Scanning electron microscopy: Physics of image formation and microanalysis. 2 ed. Springer Series in Optical Sciences. Berlin Heidelberg: Springer-Verlag.
- Salvat, F., M. J. Berger, A. Jablonski, I. K. Bronic, J. Mitroy, C. J. Powell, and L. Sanche. 2007. Elastic Scattering of Electrons and Positrons. ICRU Report 77. International Commission on Radiation Units & Measurements, Bethesda, MD.
- Seibold, B., and M. Frank. 2014. Starmap—a second order staggered grid method for spherical harmonics moment equations of radiative transfer. ACM Trans. Math. Softw. 41 (1):1–28. 10.1145/2590808
- Stuart, A. M. 2010. Inverse problems: A Bayesian perspective. Acta Numer. 19:451–559. 10.1017/S0962492910000061
- Tarantola, A. 2005. Inverse problem theory and methods for model parameter estimation. Society for Industrial and Applied Mathematics.
Appendix A.
Adjoint continuous implicit differentiation
Consider the implicit definition: Given find such that We refer to the implicit function with and assume that it is unique at x. Given a (tangent) direction we realize that the directional derivative of F in direction does not change. Employing the chain rule we find the following. (70) (70)
We consider the previous statement in a weak sense, meaning we test the constraint with where T is an appropriate space.(71) (71)
We identify the tangent of y with and use adjoint operators and of the tangent operators and to derive the following.(72) (72)
Comparing Equation(72)(72) (72) with the definition of the adjoint Equation(31)(31) (31) we derive the two steps for the implicit adjoint. Given if we can find such that Equation(73a)(73a) (73a) holds, the required adjoint is given by Equation(73b)(73b) (73b) .(73a) (73a) (73b) (73b)
Example
We consider the Poisson-equation in 2D with homogeneous (Dirichlet) boundary conditions. Given a domain with and a suitable right-hand side. The primal problem is to find with such that(74) (74)
To derive the adjoint, we first analyze the left-hand side of Equation(72)(72) (72) . Using integration by parts twice, we derive the adjoint operator (75) (75)
The boundary integral in Equation(75)(75) (75) yields boundary conditions for the adjoint Since we assign From the right-hand side of Equation(72)(72) (72) we trivially find(76) (76)
Computing the adjoint therefore consists of the following steps. Given find with such that Equation(77a)(77a) (77a) holds. Then, the required adjoint is given by Equation(77b)(77b) (77b) (trivial for this example).(77a) (77a) (77b) (77b)