877
Views
1
CrossRef citations to date
0
Altmetric
Research Article

On Fostering Predictions in Data-Driven Reduced Order Model for Eulerian–Lagrangian Simulations: Decision of Sufficient Training Data

, , &
Article: 2316155 | Received 20 Oct 2023, Accepted 04 Feb 2024, Published online: 22 Feb 2024

Abstract

The development of a data-driven surrogate model (SM) is extensively studied in Eulerian–Lagrangian simulations for its advantage of high computational speed. However, in the application of granular systems with violent fluid-solid flows, how to select sufficient training data to ensure consistency between the high-fidelity model and SM remains unknown and highly challenging. The accuracy of SM can be easily deteriorated due to insufficient training data. This necessitates a trial-and-error process and hinders its industrial applications. To address this issue, this study newly reveals a finding that data density is a key to sufficient training, and we propose a novel technique for deciding the sufficient training data of SM. Specifically, a feasibility index is proposed based on posterior error analysis. It is demonstrated that when the training data is determined under the proposed feasibility index 2, the consistency of granular dynamics between SM and the high-fidelity model can be guaranteed. Employed in a representative SM, a reduced order model (ROM), this technique enables the successful decision of sufficient training data, resulting in the remarkable predictability in violent fluid-solid flows without trial-and-error. This technique holds great potential in solving the predicament of deciding training data for data-driven models.

1. Introduction

In chemical engineering, multi-phase flows including solid particles are often encountered, such as fluidizations (Mori et al. Citation2019; Sakai et al. Citation2010; Takabatake and Sakai Citation2020), conveying (Sun and Sakai Citation2015), mixing (Li et al. Citation2021; Mori and Sakai Citation2022; Tsugeno et al. Citation2021), milling (Liu et al. Citation2018; Ogi et al. Citation2017; Tanaka et al. Citation2019; Yamamoto et al. Citation2012) and die-filling (Widartiningsih et al. Citation2020; Yao et al. Citation2018). To study such multi-phase processes, coupled discrete element method–computational fluid dynamics (DEM–CFD) is widely employed as a reliable approach for the associated Eulerian–Lagrangian simulations. Advanced applications of the DEM–CFD method have been extensively reported including but not limited to particle separation in recycling biodegradable plastics (Tsunazawa et al. Citation2023), wall erosion caused by particle conveying in vortex elbows (Xiao et al. Citation2023), chip-like particles flow in fluidized beds of solar panel modules recycling (Wang and Shen Citation2022), a resolved model for gas-solid-liquid three phase flows with super-quadrics particles (Washino et al. Citation2023), and extended DEM-based modelling for wet granulation in a rotating drum (Nakamura et al. Citation2022).

However, when it comes to the industrial scale, the high computational cost associated with the total amount of particles and solving complex equations in the DEM–CFD method has hindered further applications (Jiang et al. Citation2021; Li et al. Citation2022a). It is pointed out that even with the latest devices of CPUs and GPUs, days of simulations are usually required for an industrial-scale granular system (Napolitano et al. Citation2022; Sakai Citation2016). In a sand production case, it is reported necessary to employ high-performance computers (HPCs) for computations (Kazidenov et al. Citation2023). Thus, there has been significant interest in simplifying such simulations.

Recently, with the advances in data-science, the data-driven surrogate model (SM) has been considered highly potential to solve the problem of heavy computational cost in Eulerian–Lagrangian simulations (Li et al. Citation2022a). A typical representative is a data-driven reduced order model (ROM). It is a surrogate model that simplifies the model mathematically based on the dynamics identification technique. It can achieve substantial dimensionality reductions while preserving essential dynamics to reconstruct the original solutions (Karhunen Citation1947; Taira et al. Citation2017; Trefethen and Bau Citation1997).

The thrusts in computational speed and the physical-equation-free nature of ROMs have opened up an innovative approach for Eulerian–Lagrangian simulations. It is of profound significance in fostering real-time simulations for digital twin in the digital transformation (DX) (Chakraborty et al. Citation2021; Geier et al. Citation2023; Hartmann et al. Citation2021, Citation2018; VanDerHorn and Mahadevan Citation2021).

The most common approach to obtain the dynamical features for ROM is proper orthogonal decomposition (POD) (Lumley Citation1967). It has been widely applied to and studied in Eulerian simulations (Allery et al. Citation2008; Xiao et al. Citation2015; Xiao Citation2019). However, only limited studies on Lagrangian simulations have been reported (e.g., Lu and Tartakovsky Citation2020) due to the accuracy concern caused by complicated motion of particles. One alternative in studies of disperse phase is to map Lagrangian information onto Eulerian meshes (Higham et al. Citation2020; Pang and Wei Citation2013), but applicable parameters are limited and extra computational cost and error can be introduced in large-scale and complicated systems. Empowered by the sharply increased computational power, new approaches start to consider Lagrangian variables in ROM directly (Li et al. Citation2022a; Lu and Tartakovsky Citation2020; Schiødt et al. Citation2022). It has been validated efficient using Lanczos POD in mixing phenomena quantifications (Li et al. Citation2022b,Citation2022c) and ROM reproductions of Eulerian–Lagrangian simulations. The implementation of Lanczos iteration (Fahl Citation2001; Li et al. Citation2022a) improves the computational efficiency drastically and makes POD more suitable for large-scale cases.

However, like many other data-driven approaches, ROMs also suffer the lack of training data problems, which is commonly regarded as a fatal issue in predictions (Brunton and Kutz Citation2019; ; Jiang et al. Citation2023). Especially, when the solid-fluid flows in Eulerian–Lagrangian simulations become highly fierce and complicated (e.g., in a bead mill), the assumption of consistent dynamics between ROMs and high-fidelity models (Kherad et al. Citation2020; Lucia et al. Citation2004) may not be guaranteed due to the lack of training data in dynamics identifications (e.g., POD), and the accuracy of ROMs can be significantly deteriorated even with the previously suggested local POD calculated on each parameter (Hajisharifi et al. Citation2023; Li et al. Citation2022a), particularly in the particle phase as shown in . This is caused by the highly random and rapid motion of particles. Moreover, the random and non-linear behaviors of all particles make it rather challenging to select the training data for particle motions with significant different characteristics.

Figure 1. (Left) Comparisons of Dynamics and Typical snapshot of High-fidelity Model and ROM with insufficient training data; (Right) Snapshots of Particle & Fluid Phase at T = 2.0s in High-fidelity Model and ROM with insufficient training data.

Figure 1. (Left) Comparisons of Dynamics and Typical snapshot of High-fidelity Model and ROM with insufficient training data; (Right) Snapshots of Particle & Fluid Phase at T = 2.0s in High-fidelity Model and ROM with insufficient training data.

To address the inaccurate ROMs problem and maximize the thrust in the reduction of computational cost, this research first reveals that the density of training data is crucial in terms of the temporal interpolations in ROM. Although more computational resource will be required, it can significantly improve the consistency of particle dynamics between ROM and high-fidelity model, thereby enhance the accuracy of ROM. Based on this finding, the aim is to propose a novel technique for the decision of sufficient training data in data-driven ROM for violent fluid–solid flows simulations. First, two bead mill cases are simulated by an in-house validated high-fidelity DEM–CFD code, FELMI (Mori and Sakai Citation2021), with different operational conditions for training data preparations. Second, a more violent case is selected as the representative of a violent granular system to conduct ROM under different selections (amount and density) of training data. Subsequently, a new feasibility index for the decision of sufficient training data is proposed and proven adequate by comparing ROM results and POD coefficients. Finally, the validation on the other case demonstrates successful reproduction and prediction when sufficient training data is determined with the proposed feasibility index 2.

As a result, the technique for the decision of sufficient training data in data-driven ROMs for Eulerian–Lagrangian simulations is established and validated in the considered bead mill cases. This method is designed to determine the density of training data, it is applicable to any kind of Eulerian–Lagrangian simulations including pseudo-periodic and non-periodic flows. In addition, it holds remarkable value in demonstrating the sophisticated balance between efficiency and accuracy for ROMs. Moreover, this technique guarantees consistent dynamics fundamentally with a substantially reduced computational cost, thereby high-speed & accurate ROM predictions become achievable. Notably, not only for ROMs, this method can be universally extended for evaluating the predictive accuracy of any data-driven models based on dynamics identifications.

2. Methodology

In this study, the employed high-fidelity model for the DEM–CFD method is an in-house code FELMI, and the employed data-driven ROM is LPOD-based ROM. This section provides the fundamentals of FELMI, LPOD-based ROM and the proposed technique for the decision of sufficient training data.

2.1. FELMI: DEM–CFD

In this study, the DEM–CFD method is conducted using FELMI (Mori and Sakai Citation2021). In detail, this method uses the Signed Distance Function (SDF), the Immersed Boundary Method (IBM), and the implicit algorithm for the drag force to couple the DEM and CFD simulations to consider all phases simultaneously. The specific numerical modeling is as follows.

2.1.1. Solid (granular) phase

The translational and rotational motion of particles (the solid phase) are described as the two following governing equations: (1) mdvdt=Fc+FdVpp+mg,(1) and (2) Idωdt=T,(2) where m, v, t, Fc,Fd, Vp, p, g, I, ω, and T represent the mass of the particle, the velocity of the particle, time, contact force, fluid drag force, volume of the particle, pressure, gravitational acceleration, momentum inertia, the rotational velocity of the particle, and torque, respectively.

The contact force acting on the particle is described in the normal and tangential directions: (3) Fc=Fcn+Fct.(3)

In detail, Fcn and Fct are calculated by, (4) Fcn=kδnηvrn,Fct={kδtηvrt,|Fct|<μp|Fcn|μp|Fcn|vrt|vrt|,|Fct|μp|Fcn|,(4) where vrt,vrn,δt,δn, k, η, and μp are the relative velocity of the particle in tangential direction, the relative velocity of the particle in normal direction, the displacement of the particle in tangential direction, the displacement of the particle in normal direction, the spring constant, the damping coefficient, and the friction coefficient, respectively.

Moreover, η is given by calculating the energy dissipation as the following equation, (5) η=2ln(e)kmln2(e)+π2,(5) where e is the restitution coefficient. The drag force acting on the particle is given by, (6) Fd=βVp1ε(ufv),(6) where β, ε, and uf are the interphase momentum transfer coefficient, void fraction, and fluid velocity, respectively. Similar to the previous solid–fluid coupling simulations Sun et al. (Citation2013); Sakai et al. (Citation2010, Citation2012, Citation2014); Sun and Sakai (Citation2015) a combination of equations Ergun and Orning (Citation1949); Ergun (Citation1952); Wen (Citation1966) was employed for coefficient β as follows, (7) β={150μf(1ε)2εd2+1.75ρ(1ε)d|ufv|,ε0.834CD|ufv|ε(1ε)dε2.65,ε>0.8,(7) (8) CD={24(1+0.15Rep0.687)Rep,Rep10000.44,Rep>1000,(8) and (9) Rep=|ufv|ρεdμf,(9) where ρ, μf, d, CD, and Rep represent the fluid density, fluid viscosity, particle diameter, drag coefficient, and Reynolds number of the particle, respectively.

2.1.2. Fluid phase

The governing equations for fluid phase in FELMI are composed by 3-D Navier–Stokes equation and the continuity equation as follows with implementation of the local volume average method (Anderson and Jackson Citation1967), (10) (ερuf)t+·(ερufuf)=εp+fs+·(ετ)+ερg,(10) and (11) εt+·(εuf)=0,(11) where fs, and τ indicate the drag force acting on the fluid, and viscous stress, respectively. The drag force in EquationEq. (10) is calculated using EquationEq. (6) as follows, (12) fs=fdVgrid,(12) where Vgrid is the volume of the fluid grid.

2.1.3. Wall boundary

The wall boundary in FELMI is given by the combination of Signed Distance Function (SDF) (Shigeto and Sakai Citation2013) and Immersed Boundary Method (IBM) (Sun and Sakai Citation2017, Citation2016), which is proven accurate and robust in a much simpler manner (Mori et al. Citation2019; Tsugeno et al. Citation2021; Mori and Sakai Citation2021; Yao et al. Citation2018).

2.1.3.1. Solid phase

The wall boundary of the solid (granular) phase is modeled by the SDF as the following equation, (13) ϕ(x)=s(x)d(x),(13) where x, s(x), and d(x) are the position vector of the particle, the sign based on the position of the particle, and the distance between the particle and the surface of the wall boundary, respectively. For points inside of the calculation domain, the s(x) is positive, and vice versa. The contact force between the particle and the wall is given as, (14) Fcn=kpw|ϕ|δnSDFηpwvrn,(14) and (15) Fct={kpwδtηpwvrt ,|Fct|<μpw|Fcn|μpw|Fcn|vrt|vrt|,|Fct|μpw|Fcn|,(15) where the subscript pw represents the particle-wall physical properties.

While the overlap considered in SDF for the soft particle model, δnSDF is defined as, (16) δnSDF=(ϕD2)nSDF,(16) and (17) nSDF=ϕ|ϕ|.(17)

Therefore, elastic energy is conserved in SDF.

2.1.3.2. Fluid phase

IBM is introduced to directly simulate the interaction between the solid wall and the fluid. In IBM, we first calculate the volume–weighted average velocity, u, by the following equation, (18) u=(1α)uf+αuwb,(18) where α and uwb are the local volume fraction calculated by SDF and the wall boundary velocity, respectively. In what’s after, the 3-D Navier-Stokes equation can be corrected as follows, (19) (ερu)t+·(ερuu)=εp+fs+·(ετ)+ερg+εfIB,(19) where fIB is given by, (20) fIB=αρuwbu^Δt,(20)

The above provides necessary information of this method, while the repeated conductions of SDF/IBM boundary conditions are out of the cope of this paper. For interested readers, more details of our original wall boundary conditions including SDF and IBM can be reached in the related papers mentioned above.

2.2. LPOD-based ROM

LPOD-based ROM conducts the simulations by (1) calculating the POD modes of different variables in each case by collecting a series of snapshots and solving the eigenvalue problem; (2) interpolating the corresponding coefficients temporally; (3) projecting the coefficients back for solution generation. The implementation of Lanczos iteration assists with the computational efficiency in POD calculations by allowing the calculation of POD modes using a subset of the total snapshots.

A series of snapshots from the results of high-fidelity model DEM–CFD or experiments (here, the FELMI is employed) is taken and written as {W(ti)}i=1Nt, where W can be any Eulerian or Lagrangian variable, ti{ti}i=1Nt refers to the time of the snapshot(s), Nt is the total time number, and {·} represents a set of elements. (21) W(ti)=W¯+W˜(ti),(21) (22) W¯=1Nti=1NtW(ti),(22) where W¯ is the mean part and W˜ is the time-varying part. W˜(ti) can be derived through solving the following minimization problem, in which {φi}i=1Nt is a set of orthogonal basis. (23) min{φj}j=1d1Nti=1NtW˜(ti)j=1d(W˜(ti)φj)φj2,s.t.(φi,φj)=δij(1id,1ji),(23) where ||·|| represents the L2 norm ||f||2=(f,f). Given the orthogonality of φj (j=1,2,,N), the problem can be transformed into the following eigenvalue problem by introducing a Lagrangian multiplier, (24) Aφ=λφ,A=1Nti=1NtW˜(ti)(W˜(ti))T,(24) where, A is a covariance matrix, and φ is the POD modes, representing the orthogonal basis vectors, and λ is the eigenvalues, also called the proper orthogonal values (POVs), representing the energy level of the corresponding POD modes. Hereafter, the original problem has been converted to an eigenvalue problem, which is solved by an efficient LPOD technique.

For better computational efficiency in POD, Lanczos iteration is implemented as LPOD in this data-driven model. A reduced set of Krylov vectors can be conducted by exploiting the tridiagonalization of A as follows, (25) QkTAQk=Tk,(25) (26) Tk=[α1β1β1α2β20β200000βk1βk1αk],(26) (27) Qk=[q1,q2,,qk],(27) where subscript k is the Lanczos-step number. A sufficiently large k can ensure the convergence of the Lanczos iterations (Li et al. Citation2022a). Therefore, by calculating the tridiagonalization of A, the expected eigenvalues/eigenvectors can be approximated mathematically using only a portion of snapshots. The convergence behavior of LPOD is shown in .

Figure 2. LPOD convergence behavior of various variables.

Figure 2. LPOD convergence behavior of various variables.

Similar to a normal POD, the captured energy proportion (i.e., the Lanczos ratio) can be calculated as follows, (28) Ik(p)=i=1pλii=1Nλi1I(k)·I(p),(28) where λi is the i-th eigenvalue from LPOD. By choosing the Lanczos ratio, the employed modes can be determined according to the needs of computational cost reduction and accuracy. This employed modes number is called the least reconstructable mode, nlr. It should be set as the lower limit of employed modes to guarantee reconstructability.

Given LPOD, a considered variable x at a specific time tn can be expressed as the following approximation, (29) W(x,tn)=W¯(x)+i=1Nci(tn)φi,(29) where N is the considered number of POD modes in ROM, ci(tn),(i{1,2,,N}), represents the i-th coefficient of the corresponding POD mode at tn. A non-intrusive ROM is developed to interpolate the coefficients as follows, (30) ci(tn)=fi(c(tn1)),n{1,2,,Nt}.(30)

Here, fi represents a set of interpolation functions that interpolate ci temporally. In this study, radial basis function (RBF) is employed for the interpolation as developed and validated in previous studies (Hajisharifi et al. Citation2023; Li et al. Citation2022a). Meanwhile, other neural-network approaches can also be used for manipulating the coefficients in the reduced spaces in place of fi.

Herein, essential fundamentals of the introduced LPOD-based ROM method are provided.

2.3. Technique for the decision of sufficient training data

To develop a technique for decision of sufficient training data, it is necessary to guarantee consistent dynamics by only using POD but not ROM so that sufficient training data can be decided without trial-and-error.

In this section, a technique is established quantitatively as the following two steps: (1) error analysis, consistency ratio for evaluating the consistency between captured dynamics and real system dynamics; (2) a feasibility index for the performance anticipation.

2.3.1. Error analysis and consistency ratio

Errors in POD-based ROM can be simply categorized as truncation error, coefficient error, and consistency error (Li et al. Citation2022a; Xiao et al. Citation2017).

Briefly, the truncation error is caused by employing insufficient POD modes. It can be reduced effectively by increasing employed POD modes (Luo et al. Citation2007; Xiao et al. Citation2015). While the coefficient error implies the error caused by the prediction of the coefficient using a regression technique (e.g., RBF or LSTM network). This error has been studied in different kinds of ROMs and its bounds has been derived (Homescu et al. Citation2005; Luo et al. Citation2007; Wu and Schaback Citation1993). By introducing a weight for ROM reconstructions, this error can be compensated properly (Li et al. Citation2022a; Xiao et al. Citation2015). Other probable ways to reduce this error can refer to using an appropriate model or increasing model complexity (e.g., using a multi-layers neural network). The source of consistency error refers to the inconsistency between captured dynamics and the real underlying dynamics in the system (; Duan et al. Citation2024; Kherad et al. Citation2020; Lucia et al. Citation2004), which is the main error to be analyzed in this research. It has been proved and demonstrated as the main error that causes the poor predictive ability of ROM (Duan et al. Citation2024). Due to the linear assumption in POD (Lucia et al. Citation2004), relatively sparse training data can significantly deteriorate the consistency between the captured dynamics and the real dynamics. Therefore, the sufficient training data is decided by both the amount and the density. In this study, within the same time range, Δτ (interval between training data snapshots) is used to control the amount and density of training data. A smaller sampling interval, Δτ, indicates the training data with a larger amount and a higher density. Through this interval, an effective operation to decide the sufficient training data can be conducted.

To avoid any trial-and-error and potential coefficient error, a systematic error analysis posterior to POD but priori to ROM is preferable. After the calculation of POD, the dynamics are obtained and then projected directly to the training set and testing set to generate the projected solutions. The training set refers to the same data used for POD calculation, while the testing set refers to the much denser data covering the same time range (in this study, Δτ of testing set is set to 0.001 s). With POD calculations, they can be described as follows, (31) Wtrain(x,tn)=W¯(x)+i=1Nci,train(tn)φi,(31) (32) Wtest(x,tn)=W¯(x)+i=1Nci,test(tn)φi,(32) where Wtrain(x,tn) and Wtest(x,tn) are the projected solutions of the training set and testing set, respectively. The coefficients ci,train(tn) and ci,test(tn) represent the POD coefficients calculated directly using the same POD on training and testing data. Subsequently, the error on each snapshot is calculated using root mean square error (RMSE) between the projected solutions and the original solutions from the high-fidelity model. (33) Etrain(tn)=RMSE(Wtrain(x,tn),Woriginal_train(x,tn)),(33) (34) Etest(tn)=RMSE(Wtest(x,tn),Woriginal_test(x,tn)),(34)

With the consistency errors above, a consistency ratio is defined as follows to evaluate the consistency between the dynamics of the training set and testing set, (35) Rc(n)=Mean(Etest)Mean(Etrain).(35)

By calculating Rc with different employed POD modes n, the consistent mode number that guarantees the consistency between the dynamics of the testing set and training set can be found. The increase of n can reduce the error in the testing set and training set simultaneously so that Rc is around 1 when the consistent dynamics hold. However, when it doesn’t hold, the increase of n can trigger the increase of Rc by increasing Etest. This indicates that the consistency error gradually becomes dominant. In other words, the error between ROM predictions and high-fidelity simulations is increased. Note that the testing error is always higher than the training error. With sufficient snapshots considered in LPOD (e.g., 80% of total data), the value of Rc=1.05 is the threshold to guarantee consistency for successful reconstructions of physical variables or observations, where n can be called the consistent mode, nc. The modes lower than nc contribute equivalently to both the training set and testing set, while the modes higher than nc contribute only to the training set but not the testing set. Consequently, nc should be set as the upper limit of n to guarantee consistent dynamics.

Hereby, the conditions for consistent dynamics and successful reconstructions in ROM can be clarified as,

  1. n is higher than nlr and lower than nc;

  2. ncnlr, consistent dynamics can be obtained with guaranteed reconstructability.

2.3.2. Feasibility index: Deciding sufficient training data

Because the training data is not a variable in POD, a feasibility index is presented to evaluate the feasibility of ROM with different amount/density of training data. The feasibility index is defined as follows, (36) Kf=ncnlr,(36)

This index can indicate the predictive accuracy of data-driven ROM under a set of input training data. Through the aforementioned error analysis, a suitable value of Kf indicating accurate ROMs can be derived, so that sufficient training data can be decided. With the derivation of Kf, the reconstructability, consistent dynamics and sufficient training data in POD can be all guaranteed simultaneously.

Hereby, the technique for the decision of sufficient training data can be summarized as follows,

  1. Given a time range of snapshots, for Δτ = 0.01, 0.009, 0.008…0.002 s, conduct LPOD and get nlr for reconstructability; (the range and step of Δτ can be roughly determined by ensuring a basic visual continuity in the considered systems, in this case 0.01 as an upper limit Δτup, 0.001 as the lower limit Δτlow and step Δτstep)

  2. Calculate Etrain,Etest and Rc to get nc for consistent dynamics;

  3. Calculate Kf for feasibility. When Kf2.0,Δτ is selected for the decision of sufficient training data and break.

The predictions on all variables between two adjacent output moments using data-driven ROM with sufficient training data can be achieved as illustrated in .

Figure 3. Idea diagram of reproductions and predictions using LPOD-based ROM with sufficient training data.

Figure 3. Idea diagram of reproductions and predictions using LPOD-based ROM with sufficient training data.

3. Results and Discussion

In this chapter, two bead mill simulations are first conducted through FELMI. Second, LPOD-based ROMs are conducted on Case 1 with different training data. Subsequently, the criterion for the decision of sufficient training data is proposed and proven adequate by comparing ROM results and POD coefficients. Finally, the validation and predictions using LPOD-based ROM with sufficient training data in Case 2 is presented.

3.1. Bead mill simulations

The fluid & particles properties and calculation condition used in FELMI are shown in , the reliability of coefficient values have been studied thoroughly for the bead mill (Tanaka et al. Citation2019). In this study, two bead mill cases with 500,000 particles are considered, the dimensions are shown in , these cases’ setup is listed in . One typical case, i.e. Case 1, is considered for determining an appropriate value of the feasibility index. After this value is found, Case 2 is used to validate the adequacy of this index. lists the Δτ considered in trial-and-error searching of sufficient training data.

Figure 4. Dimensions of considered bead mill.

Figure 4. Dimensions of considered bead mill.

Table 1. Fluid & particle properties and calculation conditions.

Table 2. Cases setup.

Table 3. Considered Δτ in trial-and-error.

3.2. Qualitative comparisons with different Δτ

The qualitative comparisons of cases with different Δτ are provided through typical snapshots and POD coefficients as shown in and . They indicates qualitatively that the decrease of training data density leads to better consistent dynamics and reconstructions. lists the training data and speed comparisons of cases with different Δτ. It is clear that the accuracy will increase when Δτ is small enough, however, the amount of training data and associated computational cost will increase as well.

Figure 5. Typical snapshots of particle phase in FELMI and ROM with different Δτ.

Figure 5. Typical snapshots of particle phase in FELMI and ROM with different Δτ.

Figure 6. 1st POD coefficients comparison (particle coordinate in X-direction).

Figure 6. 1st POD coefficients comparison (particle coordinate in X-direction).

Table 4. Training data and one-step speed comparison of FELMI and ROM with different Δτ (Sequantial on AMD RyzenTM ThreadripperTM Pro 5995WX).

3.3 Analysis on Kf and decision of sufficient training data

In the previous study, it was reported that a Lanczos ratio higher than 0.99 was deemed sufficient for both the fluid and particle phases (Li et al. Citation2022a). However, due to the violence in the bead mill, our preliminary research suggests that a Lanczos ratio greater than 0.999 (the corresponding mode number is considered nlr) is required specifically for the particle phase. Nonetheless, even with a Lanczos ratio close to 1, it cannot guarantee a successful reconstruction in the particle phase when the training data is not dense enough.

In this section, the proposed technique is applied to Case 1 to analyse Kf for an accurate particle phase of ROM in the bead mill. Importantly, it must be conducted without ROM calculation so that the decision the sufficient training data can be made in advance without any trial-and error of ROMs. Notably, the results mainly show the calculations on particle coordinate in the X-direction, for its Lanczos-ratio convergence is slower than Y-direction (particles hold more complicated motions in the X-direction), and is similar to Z-direction due to geometrical symmetry.

shows the error calculated by direct projections with all POD modes employed. Note that without conducting ROM, the errors are dominated by consistency error. As the decrease of Δτ shrinks the band-shaped area formed by testing error, it is indicated that the captured dynamics are becoming similar to the real dynamics in the bead mill system. Also, the decrease of particle position error implies the decrease of unphysical contacts such as the particle overlaps or collision energy.

Figure 7. History of training and testing error with different Δτ (Particle coordinate in X-direction).

Figure 7. History of training and testing error with different Δτ (Particle coordinate in X-direction).

The consistency ratio is calculated as shown in . With threshold = 1.05, nc can be obtained.

Figure 8. Consistency ratio of particle coordinate in X-direction in cases with different Δτ.

Figure 8. Consistency ratio of particle coordinate in X-direction in cases with different Δτ.

With the calculation of Ik and Rc(n), nlr and nc can be determined. Subsequently, Kf can be calculated. lists the values of Kf of cases with different Δτ. With examining the contours, it is suggested that a Kf2.0 is sufficient to guarantee reconstructability, consistent dynamics simultaneously, thereby the sufficient training data in POD can be decided.

Table 5. Feasibility index Kf for particle coordinate in X-direction with different Δτ.

3.4. Validation of the proposed technique

Case 2 is calculated using LPOD-based ROM with sufficient training data for validations. With the guaranteed consistent dynamics under Kf2.0, the reproductions and predictions can be realized simultaneously. lists the related calculations. Notably, sufficient training data is decided prior by only using projections and the condition Kf2.0 without trial-and-error.

Table 6. Calculations for decision of sufficient training data.

shows the comparisons of FELMI and ROM solutions (reproductions and predictions) (Δτ is 0.004 s in LPOD, 0.001 s in ROM solutions). Remarkably, given the temporal interpolation in POD, the prediction can be done for any time within one Δτ. For instance, from 2.000 s to 2.004 s, only 2.000 and 2.004 are considered in LPOD, while the predictions on 2.001, 2.002, 2.003 are all achievable in ROM. Furthermore, as shown in , over 1500 times’ reduction of computational time is achieved using LPOD-based ROM with sufficient training data.

Figure 9. Reproductions and predictions of LPOD-based ROM using sufficient training data (Green box Rep.: Reproduction; Red box Pred.: Prediction).

Figure 9. Reproductions and predictions of LPOD-based ROM using sufficient training data (Green box Rep.: Reproduction; Red box Pred.: Prediction).

Table 7. One-step speed comparison of FELMI and ROM with sufficient training data (Sequantial on AMD RyzenTM ThreadripperTM Pro 3995WX).

3.5. Discussion

As the challenge of accuracy is commonly encountered in various data-driven models for both reproductions and predictions, the proposed technique offers a new approach to address the problem focusing on sufficing training data.

This technique for the decision of sufficient training data is developed without trial-and-error to guarantee consistency between captured dynamics and actual dynamics, it has the potential to be universally applied to other data-driven models. By the design based on error analysis on the training set and testing set, this technique can also be a reference of a framework for extended quantitative study on other data-driven models’ predictive performance.

Although for the considered bead mill case, a value of 2.0 is recommended for Kf to primarily focus on solving the inaccurate reconstructions in the particle phase, it is worth pointing out that different feasibility index can be applied to different variables to save more computational resource, as well as different consistency ratios, Rc can be used for different requirements of accuracy. Accordingly, a multi-timescale ROM is expected to be developed in future work to utilize the proposed technique for DX of systems using Eulerian–Lagrangian simulations.

4. Conclusions

In this study, it is firstly found that the increase of training data density has potential to solve the concern of accuracy in ROM on Lagrangian variables. Specifically, the sufficient training data for accurate data-driven ROMs needs to be decided from both data density and amount.

Therefore, a novel technique for decision of sufficient training data is established to solve the inaccurate reconstructions in particle phase. By using this technique, the sufficient training data can be decided without trial-and-error. Meanwhile, it can evaluate ROM in a temporal perspective to keep a sophisticated balance between efficiency and accuracy.

Notably, by employing this technique, the reproductions and predictions can both be achieved with a substantially reduced computational cost. Furthermore, since this technique is established through a mathematical error estimate, it can be potentially employed to other data-driven models.

Apart from solving the inaccurate reconstruction in the particle phase, it is suggested that the different sufficient training data (density and amount) can vary for different variables. Thus, by tailoring training data for each variable, towards the real-time simulations, a multi-timescale data-driven model for Eulerian–Lagrangian simulations is expected to be established in the future. Also, it is worth developing further advanced machine-learning techniques for ROM to extend the generalization capability.

Disclosure statement

The authors declare that there is no conflict of interests.

Additional information

Funding

The authors acknowledge the financial support from the New Energy and Industrial Technology Development Organization (Grant No. JPNP22005) and Japan Society for the Promotion of Science KAKENHI (Grant No. 21H04870). Also, the bead mill CAD data was provided by AIMEX CO.,Ltd. One of the authors, Kai-en Yang, was supported by the Quantum Science and TEchnology fellowship Program (Q-STEP) of the University of Tokyo.

References