478
Views
0
CrossRef citations to date
0
Altmetric
Materials Data Analysis and Utilization

Bayesian estimation to identify crystalline phase structures for X-ray diffraction pattern analysis

ORCID Icon, , , &
Article: 2300698 | Received 05 Sep 2023, Accepted 22 Dec 2023, Published online: 30 Jan 2024

ABSTRACT

Crystalline phase structure is essential for understanding the performance and properties of a material. Therefore, this study identified and quantified the crystalline phase structure of a sample based on the diffraction pattern observed when the crystalline sample was irradiated with electromagnetic waves such as X-rays. Conventional analysis necessitates experienced and knowledgeable researchers to shorten the list from many candidate crystalline phase structures. However, the Conventional diffraction pattern analysis is highly analyst-dependent and not objective. Additionally, there is no established method for discussing the confidence intervals of the analysis results. Thus, this study aimed to establish a method for automatically inferring crystalline phase structures from diffraction patterns using Bayesian inference. Our method successfully identified true crystalline phase structures with a high probability from 50 candidate crystalline phase structures. Further, the mixing ratios of selected crystalline phase structures were estimated with a high degree of accuracy. This study provided reasonable results for well-crystallized samples that clearly identified the crystalline phase structures.

GRAPHICAL ABSTRACT

IMPACT STATEMENT

We developed a method to automatically estimate crystalline phase structures from diffraction patterns using Bayesian inference. Our method successfully identified true crystalline phase structures with a high probability from 50 candidate crystalline phase structures.

1. Introduction

Crystalline phase structure is essential for understanding the performance and properties of a material. Therefore, this study identified and quantified the crystalline phase structure of a sample based on the diffraction pattern observed when the crystalline sample was irradiated with electromagnetic waves such as X-rays. The measurement of the diffraction patterns using X-rays as probes is known as X-ray diffraction (XRD). The crystal structure of a material can be understood by analyzing the diffraction peaks in the XRD data.

A typical XRD data analysis method involves a simple comparison of the measured XRD data with a database. This method first detects the diffraction peaks in the measured XRD data by the smoothed derivative [Citation1–3]. Thereafter, the diffraction angles of the detected peaks are compared with those of the diffraction patterns registered in the database and the similarity to the diffraction patterns in the database is calculated. The diffraction patterns ranked by similarity are suggested by an analyst. Thus, in a typical analysis, the experience and knowledge of the researcher are crucial to shorten the list from several candidate crystal structures. However, the typical diffraction pattern analysis is highly analyst-dependent and not objective. Additionally, there is no established method for discussing the confidence intervals of the analysis results. Consequently, the interpretation of the analysis results is highly dependent on the analysts. Diffraction pattern analysis methods have been proposed to solve such analytical problems [Citation4,Citation5].

In recent years, methods for diffraction-pattern analysis using Bayesian estimation have been proposed, allowing confidence intervals to be discussed [Citation6,Citation7]. Mikhalychev proposed a method for naive Bayesian estimation of crystalline phase structure using a simple model with intensity information only, without considering the peak profile, the background, and the Poisson noise model [Citation7]. In addition, black-box optimization methods have been proposed for hyperparameters that are subjectively determined by an analyst [Citation8]. The proposed method is effective for solving several problems in diffraction pattern analyses. However, this has not been sufficiently discussed from the perspective of automatic estimation of the crystal structure contained in a measured sample from the diffraction pattern. Identifying the crystalline phase structures contained in a diffraction pattern is challenging because the number of candidate crystalline phase structures can be in the order of tens or hundreds, leading to combination explosions. Moreover, this problem requires considerable computational time because the crystal structure contains dozens of diffraction peaks. Despite these challenges, it is necessary to establish a method for identifying crystalline phase structures from diffraction patterns with confidence intervals (probability).

This study aimed to establish a method for the automatic estimation of crystalline phase structures from diffraction patterns. The proposed method decomposes the measured diffraction patterns and automatically selects crystalline phase structures using the diffraction patterns measured at each institute associated with the crystal structures or obtained via simulations as basis functions. The proposed method makes three main contributions to literature.

  • Crystalline phase structures can be selected precisely and automatically.

  • Posterior distributions can be estimated (confidence intervals can be discussed).

  • A global solution is provided (no initial value dependence).

The proposed method, which extracts material descriptors corresponding to the crystal structure from measured diffraction patterns, is expected to play an important role in promoting the development of data-driven materials. Note that in this paper the term ‘crystal structure’ refers specifically to the crystalline phase structure.

In this study, we provide reasonable results that allow clearer identification of the crystalline phase structures for more crystalline structures. Surprisingly, our method successfully detected crystalline phase structures containing only 2 [%]. However, a crystalline phase structure that contained only 1 [%] of the sample could not be detected, although we proposed a similar crystalline phase structure with low probability, suggesting a reasonable result.

2. Concept

shows an observation process of XRD data and a conceptual diagram of the proposed method. We suppose a multitude of candidate crystal phases and structures F when preparing the materials. The crystal structures contained in the material are selected by material synthesis, manufacturing processes, etc. This study treats the control variable dealing with crystal structure selection as the indicator variable g{0,1}. Ideally, the crystalline materials produced should have diffraction line spectra corresponding to the crystal phases and structures they contain. In practice, we observe diffraction peaks whose shapes are dependent on the profile parameters Θ that correspond to the measurement environment. We considered a situation wherein only the observed diffraction data D and candidate crystal structures F were provided.

Figure 1. Observation process of XRD data and a conceptual diagram of the proposed method, that is, Bayesian inverse estimation to identify the crystalline phase structure and their known structures for XRD analysis [Citation9].

Figure 1. Observation process of XRD data and a conceptual diagram of the proposed method, that is, Bayesian inverse estimation to identify the crystalline phase structure and their known structures for XRD analysis [Citation9].

The purpose of this study is to develop an automated Bayesian XRD analysis linked a database. In advanced materials, the crystalline phase structure contained in the XRD data is often unknown. This requires a method to automatically identify the crystalline phase structures in the data from a huge number of crystal phase structures in databases. We have therefore developed a methodology to precisely estimate the crystalline phase structure from XRD data by using the Bayesian estimation. In this study, we aimed to inversely estimate the structural indicator g and profile parameter set Θ from the observed diffraction data (XRD data) shown in . The proposed method is a Bayesian inverse estimation method used to identify crystal structures for XRD analysis.

3. Model

3.1. Problem setting

The purpose is to estimate the profile parameters and the crystalline phase structures in the measured sample, considering the measured XRD data D={(xi,yi)}i=1N and the candidate crystal structure F. Here, xi(0,180) and yiN denote the diffraction angle 2θ [ ] and the diffraction intensity [counts], respectively.

The candidate crystal structure factor set F is expressed as:

(1) F={Fk|k{1,2,,K}},(1)
(2) whereFk={(pm(k),Im(k))|m{1,2,,Mk}}F,(2)

where KN is the number of candidate crystal structures and Fk is the k-th crystal structure factor. The elements of the crystal structure factor pm(k)(0,180) and Im(k)[0,1] are the diffraction angle (peak position) [ ] and relative intensity of the m-th diffraction peak in Fk for a crystal structure k. Further, MkN denotes the number of peaks in Fk. In this study, the candidate crystal structure factor set F is provided.

3.2. Profile function

XRD data can be represented by a profile function fF(xi;Θ):RR0+, which is a linear sum of the signal spectrum SF(xi;ΘS) and the background B(xi;ΘB):

(3) yifF(xi;Θ),(3)
(4)                   =SF(xi;ΘS)+B(xi;ΘB),(4)

where (xi,yi) denote the measured data points, the function SF(xi;ΘS) denotes the signal spectrum based on the candidate crystal structures F, and the function B(xi;ΘB) denotes the background. We set Θ={ΘS,ΘB} as the profile parameter set. In addition, the sets ΘS and ΘB are the signal spectrum and background parameter sets, respectively.

The signal spectrum SF(xi;ΘS) is expressed as a linear sum of the profile function (peaks) CFk(xi;ΘS(k)):RR0+ in a crystal structure Fk among the several candidates [Citation10]:

(5) SF(xi;ΘS)=k=1KhkCFk(xi;ΘS(k)),(5)

where hkR+ denotes the signal intensity of crystal structure factor Fk. The profile function CFk(xi;ΘS(k)) of candidate crystal structure k is defined as follows:

(6) CFk(xi;ΘS(k))=m=1MkIm(k)Vxi;ρmk,Σk,Ωk,rk,(6)
(7)                  =m=1MkIm(k){(1rk)G(xi;ρmk,Σk)+rkL(xi;ρmk,Ωk)},(7)
(8) whereρmk=pm(k)+μk,(8)

where μkR and rk[0,1] are the peak shift and Gauss-Lorentz ratio at the peak of crystal structure k, respectively, ρmkR is the peak position of the peak function, and the function V(xi):RR0+ is a pseudo-Voigt function [Citation11]. In addition, G(xi):RR0+ and L(xi):RR0+ are Gaussian and Lorentz functions, respectively. Σk=Σ(xi;uk,vk,wk,αk):(0,180)R+ and Ωk=Ω(xi;sk,tk,αk):(0,180)R+ are the Gaussian and Lorentzian widths of the peak, respectively, as a function of the diffraction angle xi(0,180). The width functions Σk(xi) and Ωk(xi) are expressed as

(9) Σ(xi;uk,vk,wk,αk)=A(xi;αk)uktan2xi2vktanxi2+wk,(9)
(10) Ω(xi;sk,tk,αk)=A(xi;αk)sksecxi2+tktanxi2,(10)
(11) whereA(xi;αk)=αk(xiρk)1(xi<ρk),(11)
(12)                 =sign(xiρk)αk12+αk+12,(12)

where {uk,vk,wk} and {sk,tk} are the Gaussian and Lorentzian width parameter sets, respectively. Function A(xi;αk):RR is a function expressing the peak asymmetry, and αkR+ is the asymmetry parameter for the peak function. Further, the function sign():R{1,1} is the sign function and the trigonometric function sec(x) is sec(x)=1/cos(x).

The optimization parameter set for the signal spectrum SF(xi;ΘS) is expressed as:

  ΘS={ΘS(k)|k{1,2,,K}},
whereΘS(k)={(hk,μk,αk,rk,uk,vk,wk,sk,tk)}.

Background B(xi;ΘB):RR is defined as follows:

(13) B(xi;ΘB)=aV(xi;0.0,σbg,σbg,rbg)+b,(13)

where the background parameter set is ΘB={a,σbg,rbg,b}.

3.3. Generation model

We assume that the observed data {(Y,X)}={(xi,yi)}i=0N are stochastically distributed owing to statistical noise in the measurement. Next, we consider the joint distribution P(Y,Θ), which can be expanded to P(Y,Θ)=P(Θ|Y)P(Y). Using Bayes’ theorem to swap the orders of Y and Θ, we can expand P(Y,Θ)=P(Y|Θ)P(Θ). Hence, the posterior distribution P(Θ|Y) is expressed as:

(14) P(Θ|Y)=P(Y|Θ)P(Θ)P(Y)P(Y|Θ)P(Θ),(14)

where P(Θ|Y) and P(Θ) are the posterior and prior distributions, respectively, in the Bayesian inference. Further, P(Y|Θ) is the conditional probability of Y given the model parameter set Θ, which is a probability distribution explained by error theory.

To derive P(Y|Θ), we consider the observation process of {(xi,yi)} at the observation data points. Assuming that the observed data are independent of each other, the conditional probability of the observed data {(Y,X)} can be expressed as:

(15) P(Y|Θ)=i=0NP(yi|Θ).(15)

As XRD spectra are count data, the conditional probability P(yi|Θ) of the intensity yi for the diffraction angle xi follows a Poisson distribution P(yi|fF(xi;Θ)):

(16) P(yi|Θ)=P(yi|fF(xi;Θ))(16)
(17)               =fF(xi;Θ)yiexp(fF(xi;Θ))yi!.(17)

The cost function E(Θ)R is defined by the negative log-likelihood function E(Θ)=lnP(Y|Θ) and is expressed as follows:

(18) E(Θ)=i=0NlnP(yi|Θ),(18)
(19)            =i=0NyilnfF(xi;Θ)fF(xi;Θ)lnyi!.(19)

Further, P(Θ|Y) is expressed using the cost function E(Θ) and the prior distribution P(Θ) as follows:

(20) P(Θ|Y)P(Y|Θ)P(Θ),(20)
(21)           =exp{lnP(Y|Θ)}P(Θ),(21)
(22)       =exp{E(Θ)}P(Θ).(22)

3.4. Identification of crystalline phase structures

In the analysis of XRD spectra, the crystal structure contained in the measured sample is often unknown. Therefore, it is important to accurately estimate the true crystal structure contained in the candidate crystal structures F. We introduce an indicator vector g={gk{0,1}|k{1,2,,K}}, which controls the existence of the crystal structure factors in EquationEquation (5):

(23)      fF(xi;g,Θ)=SF(xi;g,ΘS)+B(xi;ΘB),(23)
(24) SF(xi;g,ΘS)=k=1KgkhkCFk(xi;ΘS(k)),(24)

where gk=1 indicates that the crystal structure factor Fk is present in the sample. Conversely, gk=0 implies that it is absent.

We now consider the joint distribution P(g,Y,θ). P(g,Y,θ) can be expanded to P(g,Y,θ)=P(Y|g,Θ)P(g)P(Θ). According to Bayes’ theorem, the posterior distribution P(g,Θ|Y) is expressed as:

(25) P(g,Θ|Y)P(Y|g,Θ)P(g)P(Θ),(25)
(26)               =exp(E(g,Θ))P(g)P(Θ).(26)

The cost function E(g,Θ) that introduces the indicator vector g is expressed as:

(27) E(g,Θ)=i=0NlnP(yi|g,Θ),(27)
(28)           =i=0NyilnfF(xi;g,Θ)fF(xi;g,Θ)lnyi!.(28)

Using the joint distribution presented above, the indicator vector g is estimated from the marginal posterior distribution as follows:

(29) P(g|Y)=dΘP(g,Θ|Y),(29)
(30)   =P(g)dΘexp(E(g,Θ))P(Θ),(30)

We estimate the profile and background parameters using the posterior distribution P(Θ|Y,g) on parameter set Θ.

4. Algorithm

4.1. Replica exchange monte carlo method — REMC method

We perform posterior visualization and the maximum a posteriori (MAP) estimation through sampling from the posterior distribution. A popular sampling method is the Monte Carlo (MC) method, which may be bounded by local solutions for cases when the initial value is affected or the cost function landscape is complex.

Therefore, the replica exchange Monte Carlo (REMC) method [Citation12,Citation13] was used to estimate the global solution. For sampling using the REMC method, a replica was prepared with the inverse temperature β introduced as follows:

(31) P(g,Θ|Y;β=βτ)=exp(βτE(g,Θ))P(g)P(Θ),(31)

where the inverse temperature β is 0=β1<β2<<βτ<βT=1. For each replica, the parameters were sampled using the Monte Carlo method.

5. Technique

5.1. Tricks for high speeds

This sub-section describes the techniques used to realize Bayesian inference of XRD spectra. In XRD spectral analysis, the number of candidate crystal structure factors {Fk}k=1K and the number of peaks Mk for each crystal structure factor Fk are enormous. Therefore, to calculate the cost function E() for each sample, multiple loops of i=1Nk=1Km=1MkE(xi) must be computed. Mk is an immutable value because it is inherently determined by the crystal structure. Although reduction in the number of data N by downsampling is feasible, it is expected that the peak structure will be broken or the separation accuracy will be significantly reduced owing to sharp XRD peaks.

Herein, we focused on the number of candidate crystal structures K. This study screened the candidate crystallographic structure factors. To calculate the cost function E(), the crystal structures with gk=0 need not be calculated. In other words, only the selected crystal structures {Fk|gk=1} need to be considered. In the proposed method, we compute i=1Nk{Fk|gk=1}m=1MkE(xi) where n({Fk|gk=1})<K.

5.2. Rough pre-screening

In this study, we screened candidate crystallographic structures as described in sub-Section 5.1. This subsection describes the screening procedure. The similarity between the observed XRD data D and the crystal structure factor Fk was calculated and screening was performed by thresholding the similarity.

presents a supplementary diagram of the similarity calculation procedure, where parts (a) and (b) show the observed XRD data D and crystal structure factor Fk, respectively. We resampled the data points close to pm(k) of Fk from the observed data D. The resampled data points are indicated by the red points in .

Figure 2. Supplementary diagram of the similarity calculation procedure. [(a) Observed XRD data D and (b) Crystal structure factor Fk].

Figure 2. Supplementary diagram of the similarity calculation procedure. [(a) Observed XRD data D and (b) Crystal structure factor Fk].

The resampled data points are denoted by the vector yNMK. The intensity vector of the crystal structure is denoted by Ik=(I1(k),I2(k),,IMk(k))TRMk. Our method computed the similarity between the vectors y and Ik for each crystal structure k. In this study, we used cosine similarity as the vector similarity.

6. Scope and limitations

This section presents the three limitations of the proposed method:

  • The proposed method cannot refine the structural parameters owing to only the crystal structure selection and profile parameters being used as probability variables. Therefore, for precise crystal structure analysis, Rietveld analysis [Citation14–17] must be performed with reference to the posterior distribution of the selected crystal structure and profile parameters.

  • The proposed method automatically selects the crystal structure contained in the measurement sample from the candidate crystal structures. Therefore, crystal structures that are not included in the candidates or unknown crystal structures cannot be analyzed.

  • The proposed method may provide confusing results when similar crystal structures are included in the candidates. We need to combine other measurement techniques to refine the crystalline phase structure.

7. Configuration

7.1. Configuration of prior distribution

We set the prior distribution over the parameter set ΘS of the profile function as follows:

PatternIntensity:hkGkG=4.00,θG=yminymax4,
Patternshift:μkN(μN=0.00,σN=0.05),
Peakasymmetry:αkG(kG=5.00,θG=0.25),
LorentzGaussianratio:rkU(uU=0.00,lU=1.00),
Gaussianwidthparameter:ukG(kG=1.00,θG=0.10),
Gaussianwidthparameter:vkG(kG=1.00,θG=0.10),
Gaussianwidthparameter:wkG(kG=2.00,θG=0.05),
Lorentzwidthparameter:skG(kG=2.00,θG=0.05),
Lorentzwidthparameter:tkG(kG=1.00,θG=0.10).

In addition, we set the prior distribution of the background (BG) parameter ΘB as follows:

BGpeakarea:aG(kG=2.00,θG=ymax),
BGpeakwidth:σbgG(kσ=2.00,θσ=2.50),
LorentzGaussianratio:rbgU(uU=0.00,lU=1.00),
BaselineofBG:bUuU=yminymin2,lU=ymin+ymin2.

where the probability distribution G(kG,θG) is the gamma distribution and kGR+ and θGR+ are the shape and scale parameters, respectively. The probability distribution N(μN,σN) is a normal distribution, and μNR and σNR+ are the mean and standard deviation, respectively. Whereas, the probability distribution U(uU,lU) is a uniform distribution, with uUR and lUR being the maximum and minimum values, respectively. Further, the values yminN,ymaxN are ymin=min(y) and ymax=max(y), where y=(y1,y2,,yN)T.

7.2. Configuration of the sampling algorithm

For the exchange MC simulation, we performed 1000 steps of calculations and rejected 1000 of them as burn-in. The inverse temperature was set as follows:

(32) βτ=0(τ=0)ητT(τ0),(32)
(33) whereτ{0,1,2,,T},(33)

where the proportion ηR+ was set to η=1.2, and the number of temperatures TN was set to T=64. The exchange of parameter sets between replicates was performed at each step.

7.3. Calculator specification

The calculator specifications were AMD Ryzen Thread ripper 3990X (64 core, 128 thread), 256GB DDR4–3200/PC4-25600SD, Ubuntu 18.04.5 LTS. We performed sampling using the REMC method with 32 threads.

7.4. Configuration of candidate crystal structures

We prepared 50 candidate crystal structures from the AtomWork [Citation18], which is an inorganic material database containing data on the crystal structures, X-ray diffraction, properties, and state diagrams of inorganic materials extracted from scientific and technical literature. We selected 50 candidates based on the condition that they contained titanium (Ti) or oxygen (O) in their composition because this study analyzed the XRD data of the titanium dioxide TiO2 samples. lists the 50 prepared candidate crystal structures.

Table 1. Fifty candidate crystal structures prepared from the AtomWork [Citation17], which is the inorganic material database. Two MnO2 structure types were obtained from the literature [Citation19,Citation20], respectively.

8. Results and discussion

8.1. Fitting results in actual measurement data

We conducted a calculation experiment on the measured XRD data. The results for artificial data, for which the true values are known, are presented in Appendix A. The measurement sample was a mixture of multiple types of TiO2: Anatase, Brookite, and Rutile. The mixture ratios were equal (1/1/1 wt. %). We prepared measurement samples such that the crystalline phases were homogeneous. Consequently, we measured the XRD data by using monochromatic X-rays of CuKα1. Further, a non-reflecting plate cut from a specific orientation of a single crystal of silicon was used as the sample plate. The diffraction angles 2θ were in the range of 10–60[ ], with 2θ of x=(10.00,10.02,10.04,,60.00)T.

presents the selection results for each temperature obtained using the REMC method. The x- and y-axes denote the candidate crystal structures and inverse temperature index τ, respectively. This figure visualizes the probability of indicator P(g|Y;β=βτ) [%]. The proposed method estimated the crystal structure of a sample from 50 candidates. Candidate crystal structures were obtained from AtomWork as described in Section 7.4. A high index corresponds to a lower temperature. The color scale indicates the probability of gk=1 calculated from the sampling frequency. The dark red color indicates the presence of a crystal structure in the measured sample. The result for the lowest temperature (τ=64) shows that our method could select the true crystal structures, that is, Anatase, Brookite, and Rutile, with 100 [%] probability. A computational time of approximately 3 h was required to obtain this result. Therefore, our method can be used to estimate the crystal structures of a sample by analyzing the full diffraction profile using Bayesian inference. Thus, the contribution of our method is the simultaneous identification of profile parameters and crystal structures and the provision of their posterior distributions.

Figure 3. Selection results from 50 candidates for each temperature in the REMC method. The x- and the y-axes denote the candidate crystal structures and index of inverse temperature τ, respectively. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red indicates the presence of crystal structure in the measured sample.

Figure 3. Selection results from 50 candidates for each temperature in the REMC method. The x- and the y-axes denote the candidate crystal structures and index of inverse temperature τ, respectively. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red indicates the presence of crystal structure in the measured sample.

As shown in , the selection probability of Brookite decreases at medium to high temperatures compared to those of Anatase and Rutile. This suggests that Brookite was more difficult to identify than Anatase and Rutile. In crystallography, Brookite is a low-temperature phase and is known to exhibit a poorer crystal structure than Rutile, which is high-temperature-stable. This difficulty in its determination is believed to originate from the low crystallinity of Brookite.

An analysis using all 50 candidates would require a considerable amount of time. Therefore, we performed prescreening using the cosine similarity described in Section 5. shows the cosine similarity between the measured XRD data D and crystal structure factors F during prescreening. In this figure, the red line denotes the threshold value, which was set to 0.5. The y-axis denotes cosine similarity. We performed the analysis using crystal structure factors with a cosine similarity greater than 0.5. This prescreening narrowed the list from 50 to 12 candidates. This is expected to result in significant reduction in the computational costs.

Figure 4. Cosine similarity between the measurement XRD data D and the crystal structure factors F for prescreening. In this figure, the red line denotes the threshold value set at 0.5.

Figure 4. Cosine similarity between the measurement XRD data D and the crystal structure factors F for prescreening. In this figure, the red line denotes the threshold value set at 0.5.

We analyzed the measured XRD data using the 12 candidates that were narrowed down by prescreening. presents the selection results for each temperature obtained using the REMC method. The x- and y-axes denote the candidate crystal structures and the index of the inverse temperature τ. The proposed method could select the true crystal structures of Anatase, Brookite, and Rutile with 100 [%] probability. The crystal structures and results of sampling all 50 candidates were successfully identified (shown in ). The computation required approximately 1 h, and pre-screening reduced the computational cost by a factor of three. These results indicate that prescreening can effectively improve the efficiency of the calculations. However, prescreening may exclude true crystal structures from the candidates.

Figure 5. Selection result from 12 candidates for each temperature in the REMC method. The x- and the y-axes denote the candidate crystal structures and the index of the inverse temperature τ. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red color indicates the presence of crystal structures in the measured sample.

Figure 5. Selection result from 12 candidates for each temperature in the REMC method. The x- and the y-axes denote the candidate crystal structures and the index of the inverse temperature τ. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red color indicates the presence of crystal structures in the measured sample.

shows the fitting results via the profile function in the measurement XRD data. In , the black and red lines indicate the measured XRD data and the fitting profile functions, respectively. shows the peak components of the three crystal structures of TiO2: Anatase, Brookite, and Rutile. The red, green, and blue lines indicate the peaks of Anatase, Brookite, and Rutile, respectively. As shown in this figure, the estimated profile function facilitated a good fit of the XRD data. The mean Poisson cost E(Θˆ) was 5.026.

Figure 6. Result of profile analysis in the measurement XRD data using our method [(a): Fitting result via profile function in the measurement XRD data. In this figure, the black and the red lines indicate the measurement XRD data and the fitting profile functions, respectively. (b): Peak components in three crystal structures of TiO2; Anatase, Brookite, and rutile. The red, green, and blue lines indicate the peaks of Anatase, Brookite, and rutile, respectively].

Figure 6. Result of profile analysis in the measurement XRD data using our method [(a): Fitting result via profile function in the measurement XRD data. In this figure, the black and the red lines indicate the measurement XRD data and the fitting profile functions, respectively. (b): Peak components in three crystal structures of TiO2; Anatase, Brookite, and rutile. The red, green, and blue lines indicate the peaks of Anatase, Brookite, and rutile, respectively].

shows the posterior distribution of the peak area ratio in the triangular diagram when the measurement XRD data were analyzed using the proposed method. In , the blue dots indicate the sampling results obtained using the REMC method. The units on the axes are percentages [%]. As evident, the results of the area ratio estimation exhibited extremely high precision.

Figure 7. Posterior distribution of the peak area ratio on the triangular diagram when analyzing the measurement XRD data using the proposed method. The units for the axes are percentages [%].

Figure 7. Posterior distribution of the peak area ratio on the triangular diagram when analyzing the measurement XRD data using the proposed method. The units for the axes are percentages [%].

shows an expanded view of the posterior distribution of the peak used to determine its shape of the posterior distribution. In , the red, green, and blue histograms correspond to the posterior distributions of Anatase, Brookite, and Rutile, respectively. As evident, the posterior distribution of Rutile, which has the best crystallinity, exhibited a sharper shape than Anatase and Brookite. The shape of the posterior distribution was similar to that of a quadratic function, where the y-axis represents a logarithmic scale. This implies that the posterior distribution exhibits a Gaussian probability distribution shape. The MAP estimate of the ratio was Anatase:Brookite:Rutile=35.7:31.8:32.5 [%]. Because the structural ratio of the preparation is Anatase:Brookite:Rutile=33.33˙:33.33˙:33.33˙ [%], the proposed method is considered a reasonable estimation.

Figure 8. Expanded view of the posterior distribution of the peak area ratio when analyzing the measurement XRD data using the proposed method. The units for the axes are percentages [%].

Figure 8. Expanded view of the posterior distribution of the peak area ratio when analyzing the measurement XRD data using the proposed method. The units for the axes are percentages [%].

shows the posterior distribution of the profile parameters when analyzing the measurement XRD data using the proposed method. The red, green, and blue histograms represent the posterior distributions of Anatase, Brookite, and Rutile, respectively. ) show the peak height h, peak shift μ, Gaussian – Lorentz ratio r, and asymmetry parameter α, respectively. ) show the Gaussian width Σ(xi;uk,vk,wk,αk) and the Lorentz width Ω(xi;uk,vk,wk,αk), where xi is 2θ=60 [°]. As indicated in part (a) of this figure, the height h can be estimated with high precision using the proposed method. The figure shows that the peak shifts for all three crystal structures were positive (μ=0.040.06). This may be attributed to minute calibration deviations in the measurement device such as eccentricity and zero-point errors. As shown in parts (e) and (f) of this figure, the peak width of Rutile was narrow, indicating good crystallinity. Furthermore, we confirmed that Rutile with good crystallinity exhibited a sharp posterior distribution shape for most of the profile parameters. By contrast, Brookite with poor crystallinity, tended to exhibit a broad posterior distribution. This indicates that a structure with good crystallinity provides a highly precise estimation.

Figure 9. Posterior distribution of the profile parameters in the measurement XRD data. The red, green, and blue histograms correspond to the posterior distribution of Anatase, Brookite, and rutile, respectively. [(a): Peak height h, (b): Peak shift μ, (c): Gauss-Lorentz ratio r, and (d): asymmetry parameter α]. (e) and (f) are Gauss width Σ(xi;uk,vk,wk,αk) and Lorentz width Ω(xi;uk,vk,wk,αk), where xi is 2θ=60 [ ]. The black dot-dash line is a true parameter of the profile function.

Figure 9. Posterior distribution of the profile parameters in the measurement XRD data. The red, green, and blue histograms correspond to the posterior distribution of Anatase, Brookite, and rutile, respectively. [(a): Peak height h, (b): Peak shift μ, (c): Gauss-Lorentz ratio r, and (d): asymmetry parameter α]. (e) and (f) are Gauss width Σ(xi;uk,vk,wk,αk) and Lorentz width Ω(xi;uk,vk,wk,αk), where xi is 2θ=60 [ ∘]. The black dot-dash line is a true parameter of the profile function.

8.2. Analysis in measured data of non-uniform mixing ratios

Here, we show the analysis results for the measured data of non-uniform mixing ratios. We measured the samples with two different mixing ratios: (Case-01) Rutile/Anatase=80/20mass% and (Case-02) Brookite/Rutile/Anatase=97/1/2mass%. In addition to these ratios, the conditions were identical to those described in Section 8.1. The proposed method was applied to the measured data in both cases. The data for Case-01 and Case-02 are shown in and of Appendix B, respectively.

First, we present the analysis results for Case-01. The data for Case-01 were included in Anatase and Rutile. Prescreening was used to reduce the number of candidates from 50 to 8. presents the selection results for each temperature using the REMC method. The x- and y-axes denote the candidate crystalline phase structures and the index of the inverse temperature τ. The proposed method could select the true crystalline phase structures, which were Anatase and Rutile with a probability of 100 [%]. As shown in , Anatase was not sampled using the REMC method at high temperatures. As evident, the determination of Anatase is more challenging than that of Rutile. This indicates that estimating structures with small component ratios is difficult. The fitting results are presented in in Appendix B.

Figure 10. Selection result for each temperature in the REMC method in case-01. The x- and the y-axes denote the candidate crystal structures and the index of the inverse temperature τ. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red color indicates the presence of crystalline phase structures in the measured sample.

Figure 10. Selection result for each temperature in the REMC method in case-01. The x- and the y-axes denote the candidate crystal structures and the index of the inverse temperature τ. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red color indicates the presence of crystalline phase structures in the measured sample.

Second, we present the analysis results for Case-02. The data for Case-02 included Anatase, Brookite, and Rutile. These data have a significantly low ratio of Anatase and Rutile. In particular, Rutile accounted for only 1%. Prescreening was used to reduce the number of candidates from 50 to 15. presents the selection results for each temperature obtained using the REMC method. The x- and y-axes denote the candidate crystalline phase structures and the index of the inverse temperature τ. The proposed method selected Brookite and Anatase as the MAP estimation. The two selected structures were included in the true structures. Surprisingly, we succeeded in detecting Anatase, which accounted for only 2%. As shown in , the estimation of Anatase, which has a low mixing ratio, was difficult. However, it can be confirmed that Rutile was not selected by the proposed method. We consider this a reasonable result because Rutile accounted for only 1% of the measured sample. The figure also shows that the TiO2 of MnO2 structure type was selected with a probability of approximately 50%. lists the cell lattice parameters of MnO2 and Rutile crystal structures in TiO2 [Citation19,Citation21]. The TiO2 of MnO2 structure type was obtained from literature [Citation19]. As shown in the table, MnO2 and Rutile crystalline phase structures are similar. Therefore, it was suggested that MnO2 structure type should be sampled instead of Rutile. Notably, the TiO2 of MnO2 structure type was not selected as the MAP solution. Our method may suggest a crystal structure similar to the true crystal structure by presenting a low probability of existence if the candidate does not contain a true crystalline phase structure. It is important to confirm the behaviour when the true crystalline phase structure is not included and when the crystal lattice is scaled, and this will be addressed in future work. The fitting results are shown in in Appendix B.

Figure 11. Selection result for each temperature in the REMC method in case-02. The x- and the y-axes denote the candidate crystal structures and the index of inverse temperature τ. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red color indicates the presence of crystalline phase structures in the measured sample.

Figure 11. Selection result for each temperature in the REMC method in case-02. The x- and the y-axes denote the candidate crystal structures and the index of inverse temperature τ. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red color indicates the presence of crystalline phase structures in the measured sample.

Table 2. Cell lattice parameters of MnO2 and rutile crystal structure types in TiO2 [Citation19,Citation21].

9. Conclusion

The knowledge of the probability that a sample contains a candidate crystal structure from full-range XRD data considering both the diffraction angles of the peaks and the profile functions, is essential. This study aimed at the Bayesian estimation of the structure contained in a sample from a large number of crystal structure candidates in the analysis of XRD data. Therefore, indicator vectors were introduced into the profile function and the XRD data were analyzed by sampling the posterior distribution using the REMC method. Consequently, we succeeded in identifying the true crystal structures of 50 candidates with high probability. The proposed method also estimated the mixing ratio of the selected crystal structures with high precision.

In this study, we provide reasonable results that allow clearer identification of the crystal structure for more crystalline structures. Surprisingly, our method successfully detected crystal structures that accounted for only 2 [%] of the sample. However, a crystal structure that contained only 1 [%] of the sample could not be detected, although we proposed a similar crystal structure with low probability, suggesting a reasonable result. Our method is a highly sensitive and probabilistic analysis method that can automatically identify crystal structures from full-range XRD data.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by MEXT/JSPS KAKENHI Grant Numbers [18K05191, and 19K12154].

References

  • Press WH, Teukolsky SA. Savitzky-golay smoothing filters. Comput Phys. 1990;4(6):669–20. doi:10.1063/1.4822961
  • Taupin D. Automatic peak determination in x-ray powder patterns. J Appl Crystallogr. 1973;6(4):266–273. doi:10.1107/S0021889873008666
  • Huang TC. Precision peak determination in x-ray powder diffraction. Aust J Phys. 1988;41(2):201–212. doi:10.1071/PH880201
  • Surdu V-A, Győrgy R. X-ray diffraction data analysis by machine learning methods—a review. Appl Sci. 2023;13(17):9992. doi: 10.3390/app13179992
  • Greasley J, Hosein P. Exploring supervised machine learning for multi-phase identification and quantification from powder X-ray diffraction spectra. J Mater Sci. 2023;58:1–15. doi:10.1007/s10853-023-08343-4
  • Fancher CM, Han Z, Levin I, et al. Use of Bayesian inference in crystallographic structure refinement via full diffraction profile analysis. Sci Rep. 2016;6:8. doi: 10.1038/srep31625
  • Mikhalychev A, Ulyanenkov A. Bayesian approach to powder phase identification. J Appl Crystallogr. 2017 Jun;50(3):776–786. doi: 10.1107/S1600576717004393
  • Suzuki Y. Automated data analysis for powder X-ray diffraction using machine learning. Synchrotron Radiat News. 2022;35(4):9–15. doi:10.1080/08940886.2022.2112496
  • Momma K, Izumi F. VESTA: a three-dimensional visualization system for electronic and structural analysis. J Appl Crystallogr. 2008 Jun;41(3):653–658. doi: 10.1107/S0021889808012016
  • Toraya H. Array-type universal profile function for powder pattern fitting. J Appl Crystallogr. 1990 Dec;23(6):485–491. doi: 10.1107/S002188989000704X
  • Wertheim GK, Butler MA, West KW, et al. Determination of the gaussian and lorentzian content of experimental line shapes. Rev Sci Instrum. 1974;45(11):1369–1371. doi: 10.1063/1.1686503
  • Hukushima K, Nemoto K. Exchange Monte Carlo method and application to spin glass simulations. J Phys Soc Jpn. 1996;65(6):1604–1608. doi:10.1143/JPSJ.65.1604
  • Nagata K, Sugita S, Okada M. Bayesian spectral deconvolution with the exchange Monte Carlo method. Neural Netw. 2012;28:82–89. doi:10.1016/j.neunet.2011.12.001
  • Rietveld HM, HM Rietveld. Line profiles of neutron powder-diffraction peaks for structure refinement. Acta Crystallogr. 1967;22(1):151–152. doi:10.1107/S0365110X67000234
  • Rietveld HM. A profile refinement method for nuclear and magnetic structures. J Appl Crystallogr. 1969 Jun;2(2):65–71. doi: 10.1107/S0021889869006558
  • Rodríguez-Carvajal J. Recent advances in magnetic structure determination by neutron powder diffraction. Phys B Condens Matter. 1993;192(1–2):55–69. doi:10.1016/0921-4526(93)90108-I
  • Akselrud L, Grin Y. Wincsd: software package for crystallographic calculations (version 4). J Appl Crystallogr. 2014;47(2):803–805. doi:10.1107/S1600576714001058
  • Xu Y, Yamazaki M, Villars P. Inorganic materials database for exploring the nature of material. Jpn J Appl Phys. 2011 Nov;50(11S):11RH02. doi: 10.1143/JJAP.50.11RH02
  • Akimoto J, Gotoh Y, Oosawa Y, et al. Topotactic oxidation of ramsdellite-type li0.5tio2, a new polymorph of titanium dioxide: Tio2(r). J Solid State Chem. 1994;113(1):27–36. doi: 10.1006/jssc.1994.1337
  • Latroche M, Brohan L, Marchand R, et al. New hollandite oxides: Tio2(h) and k0.06tio2. J Solid State Chem. 1989;81(1):78–82. doi: 10.1016/0022-4596(89)90204-1
  • Kim D-W, Enomoto N, Nakagawa Z-E, et al. Molecular dynamic simulation in titanium dioxide polymorphs: rutile, brookite, and anatase. J Am Ceram Soc. 1996;79(4):1095–1099. doi: 10.1111/j.1151-2916.1996.tb08553.x

Appendix A.

Fitting results in artificial data

We performed a calculation experiment with the artificial data. It was assumed that the XRD data were measured for a sample containing the three crystal structures of TiO2: Anatase, Brookite, and Rutile. We generated artificial data based on a set of true parameters and the model defined in Section 3. lists the true parameters of the profile function for the artificial data. We set the x-axis 2θ between 10° and 60° in increments of 0.02[ ], such that 2θ is x=(10.00,10.02,10.04,,60.00)T. The background parameters ΘB were set to a=5000.0, σbg=6.0, b=10.0, rbg=0.80. We performed the analysis using the proposed method without knowing the crystal structure of the sample. The purpose of this experiment was to select the true crystal structures (Anatase, Brookite, and Rutile of TiO2) from the XRD data.

shows the selection results obtained using the proposed method on artificial XRD data. This figure shows the indicator probability P(g=1|Y) [%]. The proposed method estimates the crystal structure of a sample from 50 candidates. Candidate crystal structures were obtained from AtomWork as described in Section 7.4. The y- and x-axes indicate the crystal structure and indicator probability, respectively. As evident, the proposed method could select true crystal structures, that is, Anatase, Brookite, and Rutile, from 50 candidates with 100 [%] probability.

shows the fitting results obtained using the profile function for the artificial data. In , the black and red lines represent the artificial data and fitting profile function, respectively. shows the peak components of the three crystal structures of TiO2: Anatase, Brookite, and Rutile. The red, green, and blue lines represent peaks of Anatase, Brookite, and Rutile, respectively. As evident, the estimated profile function provides a good fit. The mean Poisson cost E(Θˆ) was 4.476, which is close to the true cost of 4.469 given by the true parameters.

shows the posterior distributions of the profile parameters. The red, green, and blue histograms represent the posterior distributions of Anatase, Brookite, and Rutile, respectively. show the peak height h, peak shift μ, Gaussian–Lorentz ratio r, and asymmetry parameter α, respectively. show the Gaussian widths Σ(xi;uk,vk,wk,αk) and Lorentz width Ω(xi;uk,vk,wk,αk), where xi is 2θ=60 [ ]. In , the black dot-dashed line represents the true parameter of the profile function. As evident, the expectation of the posterior distribution is in good agreement with the true values. In addition, the shape of the histogram is a quadratic function, where the y-axis represents the log scale. This implies that the posterior distribution exhibits a Gaussian probability distribution shape. Therefore, this experiment confirms that the proposed method works well on artificial data.

Appendix B.

Data and Fitting in Section 8.2

and show the fitting result of Case-01 and Case-02 in Section 8.2, respectively. In Figure (a), the black and red lines represent the measurement XRD data and the fitting profile function, respectively. Figure (b) shows the peak components. The red, green, and blue lines represent peaks of Anatase, Brookite, and Rutile, respectively.

Figure B1. Result of profile analysis in case-01 using our method [(a): Fitting result via profile function in the measurement XRD data. The black and the red lines represent the measurement XRD data and the fitting profile function, respectively. (b): Peak components in two crystal structures of TiO2; Anatase and rutile. The red and blue lines denote peaks of Anatase and rutile, respectively].

Figure B1. Result of profile analysis in case-01 using our method [(a): Fitting result via profile function in the measurement XRD data. The black and the red lines represent the measurement XRD data and the fitting profile function, respectively. (b): Peak components in two crystal structures of TiO2; Anatase and rutile. The red and blue lines denote peaks of Anatase and rutile, respectively].

Table B1. True parameter of profile function in artificial data. We assumed that the XRD data were measured on a sample containing three crystalline phase structures of TiO2: Anatase, Brookite, and rutile.

Figure B2. Selection result from 50 candidates for each temperature using the REMC method for artificial XRD data. The x- and the y-axes denote the candidate crystal structures and the index of the inverse temperature τ. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red indicates the presence of the crystal structure in the measured sample.

Figure B2. Selection result from 50 candidates for each temperature using the REMC method for artificial XRD data. The x- and the y-axes denote the candidate crystal structures and the index of the inverse temperature τ. This figure shows a visualization of the indicator probability P(g|Y;β=βτ) [%]. The large index corresponds to lower temperatures. The color scale denotes the sampling frequency of gk=1 on a log scale. The dark red indicates the presence of the crystal structure in the measured sample.

Figure B3. Result of profile analysis in the artificial data using our method [(a): Fitting result via profile function in the artificial data. In this figure, the black and the red lines indicate the artificial data and the fitting profile function, respectively. (b): Peak components in three crystal structures of TiO2: Anatase, Brookite, and rutile. The red, green, and blue lines indicate peaks of Anatase, Brookite, and rutile, respectively].

Figure B3. Result of profile analysis in the artificial data using our method [(a): Fitting result via profile function in the artificial data. In this figure, the black and the red lines indicate the artificial data and the fitting profile function, respectively. (b): Peak components in three crystal structures of TiO2: Anatase, Brookite, and rutile. The red, green, and blue lines indicate peaks of Anatase, Brookite, and rutile, respectively].

Figure B4. Posterior distribution of the profile parameters in the artificial data. The red, green, and blue histograms are the posterior distribution of Anatase, Brookite, and rutile, respectively. [(a): Peak height h, (b): peak shift μ, (c): Gauss-Lorentz ratio r, and (d): asymmetry parameter α]. (e) and (f) are Gauss width Σ(xi;uk,vk,wk,αk) and Lorentz width Ω(xi;uk,vk,wk,αk), where xi is 2θ=60 [degree]. The black dot-dashed line denotes the true parameter of the profile function.

Figure B4. Posterior distribution of the profile parameters in the artificial data. The red, green, and blue histograms are the posterior distribution of Anatase, Brookite, and rutile, respectively. [(a): Peak height h, (b): peak shift μ, (c): Gauss-Lorentz ratio r, and (d): asymmetry parameter α]. (e) and (f) are Gauss width Σ(xi;uk,vk,wk,αk) and Lorentz width Ω(xi;uk,vk,wk,αk), where xi is 2θ=60 [degree]. The black dot-dashed line denotes the true parameter of the profile function.

Figure B5. Result of profile analysis in case-02 using our method [(a): Fitting result via profile function in the measurement XRD data. The black and red lines denote the measurement XRD data and the fitting profile function, respectively. (b): Peak components in two crystal structures of TiO2; Anatase and Brookite. The red and green lines denote peaks of Anatase and Brookite, respectively].

Figure B5. Result of profile analysis in case-02 using our method [(a): Fitting result via profile function in the measurement XRD data. The black and red lines denote the measurement XRD data and the fitting profile function, respectively. (b): Peak components in two crystal structures of TiO2; Anatase and Brookite. The red and green lines denote peaks of Anatase and Brookite, respectively].