Full article: GPU-based parallel programming for FEM analysis in the optimization of steel frames

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Optimization of large-scale frame structures consumes a vast amount of time since the analysis of such complex systems contains several iterative processes. Mitigating computational burden and reducing this time to a reasonable level is possible by running GPU (Graphical Processing Unit) processors, which can be found on standard computers. This study presents an algorithm for the acceleration of size optimization of steel frames by using the BBO (Biogeography-Based Optimization) method that is suitable for GPU architecture. The GPU-based parallel algorithm, designed for FEM (Finite Element Method) analysis, is applied to three hypothetical steel-frame case structures with different numbers of members and nodes; and processed on four different computers which are available on the market. The presented case studies revealed that the proposed solution’s efficiency increases as the number of members increases and confirmed the ability of the acceleration algorithm for optimization of large-scale frame structures and provided time efficiency.

GRAPHICAL ABSTRACT

KEYWORDS:

1. Introduction

Frame construction systems stand as a cornerstone in civil engineering, frequently utilized for various applications. Modern approaches have turned to computer-based methods for optimizing these systems, as evidenced by recent (I. Aydoğdu Citation2017; Gandomi et al. Citation2023; Hong, Nguyen, and Nguyen Citation2022; Kazemzadeh Azad Citation2020; Kazemzadeh Azad, Hasancebi, and Kazemzadeh Azad Citation2013; Kociecki and Adeli Citation2014, Citation2015; Mikes and Kappos Citation2023; Vu et al. Citation2023). Among these methods, metaheuristic techniques have emerged as key players, particularly for large-scale steel frame structures (I. Aydoğdu, Çarbaş, and Akın Citation2017; Gholizadeh and Mohammadi Citation2017; Kaveh et al. Citation2020; Kazemzadeh Azad Citation2017), thanks to their ability to tackle complex discrete variable problems.

The realm of metaheuristic optimization methods, often referred to as “Nature-inspired optimization”, showcases remarkable flexibility in approaching intricate challenges. Although these techniques have exhibited prowess, it is crucial to acknowledge their inherent drawbacks. Notably, their convergence can be a demanding endeavor, necessitating a substantial number of iterative operations. This characteristic becomes especially prominent when addressing intricate and extensive problems, often involving long simulation times through methods like the Finite Element Method (FEM).

In response to the challenges posed by time-consuming nature-inspired algorithms in optimizing complex systems, researchers have devised a range of strategies tailored to specific aspects of the optimization process. One approach involves the use of surrogate models through FEM (Gholizadeh Citation2015; Gholizadeh and Milany Citation2016; Kaveh, Gholipour, and Rahami Citation2008) and constraint handling (Kaveh, Laknejadi, and Alinejad Citation2012; Papadrakakis, Lagaros, and Tsompanakis Citation1998) approximation methods. Surrogate models provide an efficient alternative to exact and computationally demanding finite element analyses. These methods revolve around creating simplified approximations of structural behavior, significantly alleviating the computational burden. These surrogate models can accurately estimate system responses by leveraging a diverse and comprehensive training set. However, it is important to acknowledge that the accuracy of these approximations can vary based on problem complexity and training data quality. Another avenue for improvement is through Improving Optimization Methodology using Hybrid Algorithms. Hybrid algorithms integrate multiple nature-inspired techniques, harnessing their collective strengths (Cheng et al. Citation2016; Dillen, Lombaert, and Schevenels Citation2021; Kaveh and Ilchi Citation2018; Kaveh and Mahdavi Citation2015; Kaveh, Bakhshpoori, and Afshari Citation2014; Kazemzadeh Azad Citation2017; Tayfur, Yilmaz, and Daloglu Citation2021; Ting et al. Citation2015). By combining algorithms with complementary characteristics, researchers seek to bolster the robustness and efficiency of the optimization process. It is essential to recognize that the performance of these hybrid algorithms can be problem-dependent, meaning their efficacy may vary based on the specific optimization challenge at hand. A distinct category is comprised of Incorporating Design-Driven Methods. These techniques tailor the optimization process to the unique characteristics of the problem, aiming to reduce the maximum number of iterations required. Drawing from domain-specific knowledge, design-driven methods guide the algorithm toward optimal solutions more efficiently. While highly effective for their intended problems, these methods are confined by their problem-dependent nature, thus limiting their applicability across a spectrum of optimization scenarios. A particularly promising strategy entails Parallel or Distributed Computing with Additional Programming Challenges. Unlike other strategies, this method taps into the power of parallel or distributed computing to significantly expedite the optimization process. By executing tasks concurrently across multiple processors or machines, this approach yields substantial time savings. Remarkably, parallel and distributed computing does not necessarily necessitate methodological alterations. It capitalizes on hardware architecture to partition intricate tasks into manageable components that can be processed simultaneously. This versatility renders it applicable to problems of varying structures, rendering it a compelling solution for addressing time-intensive optimization challenges.

Parallel computing, as traditionally depicted in the literature, has largely been synonymous with deploying clusters of interconnected computers or CPUs to simultaneously analyze complex structures (Hasançebi et al. Citation2011; Martínez-Frutos and Herrero-Pérez Citation2017; Papadrakakis, Lagaros, and Fragakis Citation2003; Sarma and Adeli Citation2001; Zegard and Paulino Citation2013; Zhu Citation2010). While this approach presents considerable potential, it invariably requires significant financial investments to establish and maintain the necessary hardware infrastructure. However, a transformative progression has emerged due to recent advancements in GPU (Graphics Processing Unit) processors. This evolution enables the realization of parallel programming by harnessing the capabilities of standard computer graphics cards. These readily available components empower researchers and engineers to tap into parallel computing without the need for specialized and expensive hardware setups. Although the GPU method’s applicability has been tested in various fields in recent years, ranging from mechanical engineering to medical sciences (Couty et al. Citation2021; Lei et al. Citation2019; Luo, Ye, and Chen Citation2020; Shkurti et al. Citation2013; Vargas et al. Citation2022; ZendehAli, Emdad, and Abouali Citation2023), the studies in the field of structural analysis and structural optimization are limited (Georgescu, Chow, and Okuda Citation2013; Träff et al. Citation2023; Xu et al. Citation2014; Zhu Citation2010). Therefore, the contribution of GPU-based parallel programming to structural optimization problems stands as a compelling research question.

This study aims to unveil the inherent advantages of employing a GPU-based approach to reduce computation time during the optimization of steel frame systems. To accomplish this goal, a GPU-based parallel algorithm has been meticulously crafted utilizing CUDA (Compute Unified Device Architecture). The Biogeography-Based Optimization (BBO) method is selected for its effectiveness in frame optimization, and its widespread use in Finite Element Method (FEM) analysis of frame structures, making it a popular choice in the literature (Artar and Çarbaş Citation2021; I. Aydoğdu Citation2017; Çarbaş Citation2016; Ghatte Citation2021; Guo et al. Citation2017; Hosseini, Ghasemi, and Dizangian Citation2022). This method is supported by numerous theses and articles in the literature, many of which feature benchmark problems, providing ample opportunities to assess the impact of GPU acceleration on computational performance. In this study, examples from these benchmarks were utilized, ensuring consistency with established practices regarding parameters such as the required number of iterations and load cases/types (Akın and Aydoğdu Citation2015; Aydoğdu Citation2010; Çarbaş Citation2016; Örmecioğlu Citation2019). The examples considered wind load as the dynamic load, aligning with previous literature. In a bid to assess the program’s efficacy, a version devoid of parallel processing, aligned with the CPU (Central Processing Unit) architecture, was developed for comparative purposes.

Both versions are applied to two hypothetical steel-frame case structures with increasing numbers of members and nodes. A comparison between the two versions is conducted using various load combinations. The constraint functions of the design criteria consist of strength constraints, horizontal and inter-story displacement constraints, as well as column-to-column and column-to-beam constraints of the “Load and Resistance Factor Design Specifications for Steel Buildings” by American Institute of Steel Construction (LRFD-ASIC Citation2000; AISC, Citation1986). For the research, a Geforce GTX 1080 titan processor (which consists of 3584 CUDA cores and 28 SMs) was adopted as the main hardware platform. In addition, the algorithm was tested on three other computer configurations for comparison. The results indicate that GPU processors are more successful in FEM operations than CPU processors and verified the GPU-based parallel algorithm’s efficiency in the optimization of large-scale frame structures (Örmecioğlu Citation2019).

2. Methodology

2.1. Frame optimization

Frame optimization is a process used in structural engineering, aiming to improve structural efficiency and overall stability most efficiently. The primary objectives of this optimization process are minimizing material usage, reducing costs, enhancing structural performance, and meeting safety and building code requirements. One of the fundamental attributes of frame optimization is its systematic approach to enhancing structural frame designs. This process commences with the initial design and Finite Element Method (FEM) analysis. It involves defining objectives and constraints, adjusting design variables, and applying optimization algorithms. In a broader context, optimization problems can generally be delineated by three primary components: objective functions, design variables, and constraints.

For most of the frame optimizations, the mathematical model of the design optimization problem is designed to minimize the overall weight of the frame system which directly affects the material cost. Hence, the objective function of frame optimization can be formulated as;

(1)

Minimum W (x) = \sum_{r = 1}^{ng} (m_{r} \sum_{s = 1}^{t_{r}} l_{s})

(1)

where $W$ represents the total weight of the frame system and $m_{r}$ is the weight per unit length of the steel profile assigned to group $r$ . While $t_{r}$ indicates the total number of elements within group $r$ , $ng$ represents the total number of groups in the entire frame system and $l_{s}$ represents the length of member s. $x (x = [x_{1}, x_{2}, \dots x_{ng}])$ is the vector of integer values representing the sequence numbers of steel sections assigned to member groups, which are referred to as design variables.

Next, the constraint functions are defined. In this study constraint functions are defined as (1) Strength constraints, (2) Displacement constraints, and (3) Geometric constraints based on LRFD-ASIC standards.

Strength constraints for the beam-column members of the structures (LRFD-ASIC Citation2000) are;

(2)

g_{s, i} (x) = {(\frac{P_{u}}{ϕ P_{n}})}_{il} + \frac{8}{9} {(\frac{M_{ux}}{ϕ_{b} M_{nx}} + \frac{M_{uy}}{ϕ_{b} M_{ny}})}_{i, l} - 1.0 \leq 0 \breakfor \frac{P_{u}}{\emptyset P_{n}} \geq 0.2

(2)

(3)

g_{s, i} (x) = {(\frac{P_{u}}{2 ϕ P_{n}})}_{il} + {(\frac{M_{ux}}{ϕ_{b} M_{nx}} + \frac{M_{uy}}{ϕ_{b} M_{ny}})}_{i, l} - 1.0 \leq 0 \breakfor \frac{P_{u}}{\emptyset P_{n}} < 0.2

(3)

i = 1, 2, \dots, nm

where, the symbol $M_{nx}$ represents the nominal flexural strength along the strong axis (x-axis), $M_{ny}$ represents the nominal flexural strength along the weak axis (y-axis), $M_{uy}$ represents the required flexural strength on the weak axis, $M_{ux}$ represents the required flexural strength on the strong axis, $P_{n}$ represents the nominal axial strength (tension and compression), $P_{u}$ represents the required axial strength (tension and compression), $i$ stands for the member l denotes the load case, and $nm$ is the total number of members in the frame.

The formula for deformation constraints is;

(4)

g_{d, j} (x) = \frac{δ_{jl}}{δ_{j}^{u}} - 1 \leq 0 j = 1, \dots, n_{sm}, l = 1, \dots, n_{lc}

(4)

where, $δ_{jl}$ represents the maximum deflection of element number j under load case-l, $δ_{j}^{u}$ represents the upper limit of deflection that the steel member can endure, $n_{sm}$ represents all elements to which the deflection limit is applied, and $n_{lc}$ represents the number of load cases.

Also, the top-story displacement and inter-story displacement constraints are formulated respectively as;

(5)

g_{td, j} (x) = \frac{{(Δ_{top})}_{jl}}{Limit} - 1 \leq 0 j = 1, \dots, n_{jtop}, l = 1, \dots, n_{k}

(5)

and,

(6)

g_{id, j} = \frac{{(Δ_{oh})}_{jl}}{Limit} - 1 \leq 0 j = 1, \dots, n_{st}, l = 1, \dots, n_{lc}

(6)

where, $n_{jtop}$ represents the number of node points on the top floor, and ${(Δ_{top})}_{jl}$ represents the displacement condition of node j on the top floor with load case-l. In EquationEq. 6(6) $g_{id, j} = \frac{{(Δ_{oh})}_{jl}}{Limit} - 1 \leq 0 j = 1, \dots, n_{st}, l = 1, \dots, n_{lc}$ (6) $n_{st}$ denotes the number of floors, $n_{lc}$ denotes the number of load conditions, and ${(Δ_{oh})}_{jl}$ denotes the value of displacement at floor number j, in load case-l. In EquationEqs. (2)(2) $g_{s, i} (x) = {(\frac{P_{u}}{ϕ P_{n}})}_{il} + \frac{8}{9} {(\frac{M_{ux}}{ϕ_{b} M_{nx}} + \frac{M_{uy}}{ϕ_{b} M_{ny}})}_{i, l} - 1.0 \leq 0 \breakfor \frac{P_{u}}{\emptyset P_{n}} \geq 0.2$ (2) -(Equation6(6) $g_{id, j} = \frac{{(Δ_{oh})}_{jl}}{Limit} - 1 \leq 0 j = 1, \dots, n_{st}, l = 1, \dots, n_{lc}$ (6) ), the allowable deflection, displacement, and drift values are computed per the ASCE Ad Hoc Committee report (Ellingwood Citation1986)

Finally, geometric constraints for the column-to-column (CtoC) and beam-to-column (B to C) connections of the frame structures are formulated as;

(7)

g_{cd, i} (x) = {(\frac{d_{u}}{d_{l}})}_{i} - 1 \leq 0 i = 1, \dots, n_{cc}

(7)

(8)

g_{cm, i} (x) = {(\frac{m_{u}}{m_{l}})}_{i} - 1 \leq 0 i = 1, \dots, n_{cc}

(8)

(9)

g_{bc, i} (x) = {(\frac{b f_{b}}{d_{c} - 2 t f_{c}})}_{i} - 1 \leq 0, i = 1, \dots, n_{bc}

(9)

(10)

g_{bb, i} (x) = {(\frac{b f_{b}}{b f_{c}})}_{i} - 1 \leq 0, i = 1, \dots, n_{bc}

(10)

In EquationEq. (7)(7) $g_{cd, i} (x) = {(\frac{d_{u}}{d_{l}})}_{i} - 1 \leq 0 i = 1, \dots, n_{cc}$ (7) and (Equation8(8) $g_{cm, i} (x) = {(\frac{m_{u}}{m_{l}})}_{i} - 1 \leq 0 i = 1, \dots, n_{cc}$ (8) ), subscripts $i$ , $u$ , and $l$ respectively donate the CtoC connection ID, upper column ID on connection $i$ , and upper column ID on connection $i$ . Here, $d$ represents the depth of the column, while $m$ indicates the weight of the column per unit length. The variable $n_{cc}$ signifies the number of CtoC connections considered in the optimization problem.

In EquationEq. (9)(9) $g_{bc, i} (x) = {(\frac{b f_{b}}{d_{c} - 2 t f_{c}})}_{i} - 1 \leq 0, i = 1, \dots, n_{bc}$ (9) and (Equation10(10) $g_{bb, i} (x) = {(\frac{b f_{b}}{b f_{c}})}_{i} - 1 \leq 0, i = 1, \dots, n_{bc}$ (10) ), the subscripts $i$ , $b$ , and $c$ respectively stand for the BtoC connection ID, beam ID on connection $i$ , and column ID on connection $i$ . Meanwhile, $n_{bc}$ represents the number of BtoC connections within the frame. Additionally, $bf$ and denote the flange width and flange thickness of the section. An illustration of the BtoC connection is given in .

Figure 1. The beam-column connection in a steel frame.

EquationEq. (9)(9) $g_{bc, i} (x) = {(\frac{b f_{b}}{d_{c} - 2 t f_{c}})}_{i} - 1 \leq 0, i = 1, \dots, n_{bc}$ (9) is applied when the beam is connected depth of the column, while EquationEq. (10)(10) $g_{bb, i} (x) = {(\frac{b f_{b}}{b f_{c}})}_{i} - 1 \leq 0, i = 1, \dots, n_{bc}$ (10) applies when the beam is connected flange of the column. For more comprehensive information, please refer to the following reference studies: Aydoğdu (Citation2010) and Örmecioğlu (Citation2019).

The values derived from these constraint functions are then converted to a single value within the framework of the structure’s weight using a cost function ( $f_{cost}$ ). In doing so, a discriminative equation (EquationEq.11(11) $f_{cost} = W (x) {[1 + \sum c_{i}]}^{P}; C_{i} = \{\begin{matrix} 0 for g_{i} (x) \leq 0 \\ g_{i} (x) for g_{i} (x) > 0 \end{matrix}\} i = 1, 2, \dots, NC$ (11) ) is used to distinguish successful structures from failed ones, highlighting the difference.

(11)

f_{cost} = W (x) {[1 + \sum c_{i}]}^{P}; C_{i} = \{\begin{matrix} 0 for g_{i} (x) \leq 0 \\ g_{i} (x) for g_{i} (x) > 0 \end{matrix}\} i = 1, 2, \dots, NC

(11)

Here, $P$ is the violation factor recommended as 2, $NC$ is the number of contains in the optimization problem. This ensures that steel frame structures that cannot fulfill the constraints are given significantly higher values than those that pass, placing them at the lower end of the population set.

2.2. GPU (graphical processing unit) based parallel processing and CUDA (compute unified device architecture)

To optimize frame structures, civil engineers utilize CAD software, specialized structural analysis tools, and optimization algorithms driven by recent advancements in computer technology (Meon et al. Citation2012). These tools enable modeling and analysis of various design alternatives, considering factors like load types, site conditions, materials, and project-specific parameters, aiming for a safe, efficient, and cost-effective structural solution. Finite element analysis (FEM) allows for detailed simulations of frame behavior under different loads, aiding in the evaluation of design changes. However, the Gaussian elimination phase in FEM is time-consuming, especially with the increasing number of structural members and stiffness matrices in complex frame structures, exceeding the capacity of existing computers.

This study focuses on accelerating the Gaussian elimination process, crucial in FEM analysis, by utilizing GPU architecture due to its ability to handle parallel tasks (Duff et al. Citation1988). The phase, which solves the system stiffness matrix, becomes more time-consuming as the number of matrices increases in large-scale frame structures. The iterative nature of this operation in structure optimization requires repeated FEM analysis for each emerging cluster, further extending the computational time. Metaheuristic optimization, such as the Biogeography-Based Optimization (BBO) method, is then applied to approach the optimal solution (Onan Citation2013). This computational intensity makes GPU architecture highly suitable for optimizing steel frame systems.

GPU (Graphical Processing Unit) processors were originally designed specifically for the computation of a serial sequence of repetitive graphical operations called GP (Graphics Pipeline). However, over the years, their design has evolved towards more flexible programmability, making them suitable for very intensive repetitive and data-intensive computations called “High-Performance Computing” (HPC) (). Especially the release of CUDA (Compute Unified Device Architecture) by NVIDIA in 2007 became a milestone in this regard (McClanahan Citation2011; Örmecioğlu Citation2019).

Figure 2. Graphic showing the performance of GPU processors in numerical computations over the years (2000–2023).

In GPU parallel programming and the CUDA framework, “kernels” and “threads” are foundational concepts. A kernel is essentially a self-contained, parallelizable function written in CUDA C/C++ that is executed on the GPU. It serves as a unit of work that can be processed concurrently by multiple threads. Kernels are designed to perform specific tasks or computations on GPU data, often operating on different portions of the data in parallel. When a kernel is launched, it runs in parallel across multiple threads on the GPU, with each thread executing the same code but typically working on distinct data elements or performing slightly different computations.

Threads, on the other hand, refer to individual execution units within a kernel (). Threads are the smallest units of work on the GPU and are organized into a hierarchical structure: threads within a block, and blocks within a grid. Threads within a block can communicate and synchronize with each other. This parallelism at the thread level is what enables the GPU to achieve high-performance parallel processing. Modern GPUs can execute many threads simultaneously, making them well-suited for computationally intensive tasks such as scientific simulations, deep learning, and rendering.

Figure 3. Parallel thread hierarchy in CUDA.

Nevertheless, typically, conventional optimization algorithms are often unsuitable for GPU-based systems due to their design, which relies on the sequential nature of each cycle affecting the next. Considering such chains of interactions, it is, therefore, necessary to develop a specialized GPU-based approach that focuses on the most time-consuming stage in structural metaheuristic optimization algorithms: the structural analysis phase. Within the context of this article, the methods to accelerate this pivotal phase by utilizing the parallel processing capabilities of GPUs have been examined.

2.3. The BBO algorithm

The Biogeography-Based Optimization method is a nature-inspired optimization algorithm based on the principles of biogeography, which delves into the geographical distribution of biological organisms. In BBO, potential solutions to an optimization problem are represented as habitats, and the algorithm simulates the migration of species (solution components) between these habitats based on their suitability values. The goal is to improve the suitability of the solutions over iterations by allowing them to adapt and evolve in a way like how species evolve in real ecosystems. In this optimization technique where individuals, affected by regional influences, strive to achieve optimal outcomes within the context of their movements under these influences. The algorithm mimics the process of species migration and adaptation across diverse habitats to find optimal solutions to optimization problems. In this method, individuals tend to either migrate or remain in their current regions based on their Habitat Suitability Index (HSI), while also exhibiting a simultaneous tendency to migrate to other regions under the guidance of external dispersal influences due to Suitability Index Variable (SIV) by undergoing mutations (Kucukkulahli, Erdogmus, and Polat Citation2017; Simon Citation2008).

Through these movements, rather than concentrating solely on a single solution point, the BBO approach shifts towards focusing on different solution regions guided by specified external influences. In BBO, these external influences represent diverse environmental factors that impact the suitability of different solution regions. Instead of fixating on one point, individuals (solutions) explore multiple regions, akin to species exploring various habitats. Certain regions might offer more favorable conditions for optimization, while others might present challenges. This broader exploration diversifies the optimization search, increasing the likelihood of finding a more optimal solution. By responding to external cues, individuals are prompted to explore new regions, analogous to how species migrate in response to environmental changes. This dynamic exploration enables a better understanding of the optimization landscape and helps avoid getting trapped in local optima. In summary, the essence of BBO lies in its ability to adaptively explore multiple solution regions guided by external influences. This approach fosters the discovery of more optimal solutions by avoiding fixation on a single point and allowing the optimization process to traverse diverse solution spaces.

The mathematical modeling of the BBO process comprises two primary phases: migration and mutation. During the migration phase, individuals within a habitat move from one habitat to another within a specific population set. During their movement, each individual within the population changes as a result of external influences (design variable). As a result of this process, the migrating population is referred to as the changing solution, and the habitat receiving the migration is called the reference solution. The reference solution is selected from among the solutions within the population using the roulette method. The probability of each habitat (solution) being the reference solution depends on the probability ratio of the habitat receiving the migration, calculated by the formula below:

(12)

P (x_{j}) = \frac{μ_{j}}{\sum_{i = 1}^{N} μ_{i}}; j = 1, \dots, PS

(12)

where N represents the number of organisms in the habitat, $μ_{j}$ represents the migration coefficient for habitat, and PS represents the population size.

During the mutation phase, individuals within the migrating population undergo mutations due to external influences and are then added to other habitats. In the BBO algorithm, the mutation process occurs if the generated random number falls below a previously defined mutation rate. If mutation occurs, the design variable that underwent mutation during the migration phase is randomly determined.

(13)

x_{i} = x_{li} + rand (0, 1) (x_{ui} - x_{li}); i = 1, \dots, PS

(13)

Where, $x_{i}$ represents the design variable of the solution undergoing mutation, $x_{ui}$ and $x_{li}$ represent the upper and lower design limits of the variable, respectively, and PS represents the population size. “rand(0,1)” is a function that assigns a random value between 0 and 1.

In the context of the study, the FEM analysis of steel frame systems detailed above has been integrated with BBO optimization, as visualized by algorithmic flow steps in . Subsequently, the cyclical nature of the process has been fundamentally streamlined into a series of sequential phases by using . Beginning with the “sorting of the population” to the “selection of the best individual”, this approach has enabled the entire process to be compartmentalized into distinct phases, allowing the identification of the smallest processing steps. These include vital BBO procedures like Migration and mutation, as well as Structural Analysis and Evaluation phases. Among these, selected phases have been reconfigured to run exclusively on the GPU processor, based on their compatibility with parallel programming paradigms. GPU-based Parallel Algorithm for FEM is detailed in section 2.4.

Figure 4. Programmatic diagram visualizing the algorithmic flow specialized for BBO optimization of steel frame systems. Grey indicated the GPU-accelerated structural analysis phase.

Algorithm 1. The bbo optization algorithm.

Display Table

2.4. GPU-based parallel algorithm for FEM

2.4.1. FEM method

The basic idea of the FEM method is dividing the complex structure into a finite number of interconnected members, to determine the loads acting on each node, to calculate the displacements in the direction of these loads, and thus to reach a result from unit to whole. The joint displacements and end forces are defined in the local axis of a member is visually depicted in .

Figure 5. The forces, moments, and displacements that occur at two adjacent nodes of a frame.

By using this method, differential equations are formulated to calculate the end forces and displacements of each discrete member which will be represented by its stiffness matrix, numerically representing the member’s behavior. Since there are 6-degree of freedom defined for each node of the space frame element, the size of the [ $k_{i}$ ] is 12 × 12.

The member stiffness matrix determines the values that affect the cross-sectional forces for each member in the local coordinate system, which needs to be converted to a global coordinate using the following formula.

(14)

[K_{i}] = {[B_{i}]}^{T} [k_{i}] [B_{i}]; i = 1, 2, \dots, m

(14)

Here, $[K_{i}]$ is the stiffness matrix of member i for global coordinate, $[B_{i}]$ is the transformation matrix, and $m$ is the number of members defined in the structure.

After coordinate transformation, the system stiffness matrix ( $K_{sys}$ ) that enables an understanding of how the system’s overall behavior is affected by the members’ displacements is computed by using Eq. 15

(15)

[K_{sys}] = \sum_{i = 1}^{m} [K_{i}]

(15)

After obtaining ( $K_{sys}$ ), the following EquationEq. 16(16) $\{P_{sys}\} = [K_{sys}] \{d_{sys}\}$ (16) is derived by transferring the external loads ( $P_{sys}$ ) acting on the structural system to find the system displacement vector $\{d_{sys}\}$ which provides the displacements and rotations for each degree of freedom in the structural system.

(16)

\{P_{sys}\} = [K_{sys}] \{d_{sys}\}

(16)

The following matrix, resulting from EquationEq. 16(16) $\{P_{sys}\} = [K_{sys}] \{d_{sys}\}$ (16) is a complex system of equations, hence, linear numerical analysis techniques such as the Gaussian elimination method are employed in solving it and obtaining the system displacement vector $\{d_{sys}\}$ .

Consequently, utilizing the value of the system displacement vector $\{d_{sys}\}$ , the internal forces within the members and section requirements in the system are determined.

After computation of $\{d_{sys}\},$ the member forces and moments ( $\{f_{i}\}$ ) are computed using the following formula;

(17)

\{f_{i}\} = [k_{i}] [B_{i}] \{u_{i}\}; i = 1, 2, \dots, m

(17)

where $\{u_{i}\}$ is the global displacement vector of member i.

2.4.2. Acceleration of FEM for GPU

Within the framework of the BBO algorithm’s sequential execution, the Structural Analysis stage, distinguished by its intensive time demands, is broken down into the steps shown in , with parallel processing on the GPU enabled by . The phase’s initial task, “Creation of $K_{sys}$ ”, is accomplished following the steps outlined in EquationEq. 15(15) $[K_{sys}] = \sum_{i = 1}^{m} [K_{i}]$ (15) , through a process of triplet matrix multiplication that integrates the outcomes via the interconnected structural elements. The procedural formulations for each component in EquationEquation 14(14) $[K_{i}] = {[B_{i}]}^{T} [k_{i}] [B_{i}]; i = 1, 2, \dots, m$ (14) are extracted, arranged into a matrix with dimensions of 12 × 12, and made operational through CUDA. Following this, the CUDA function allows for the parallel formation of the $K_{i}$ result for each component, thereby completing the requisite operations for EquationEq. 15(15) $[K_{sys}] = \sum_{i = 1}^{m} [K_{i}]$ (15) in a parallel manner.

Figure 6. Programmatic diagram visualizing the algorithmic flow of the structural analysis module.

Algorithm 2. Structural optimization module.

Display Table

The next step in the analysis is determining the “U displacement Vector.” This vector provides a comprehensive view of the displacements at the structure’s nodes, illustrating the three-dimensional movements at each node’s coordinates, which is indispensable for evaluating the system’s behavior under different conditions. The K_sys matrix, derived from EquationEq. 15(15) $[K_{sys}] = \sum_{i = 1}^{m} [K_{i}]$ (15) , is applied in EquationEq. 16(16) $\{P_{sys}\} = [K_{sys}] \{d_{sys}\}$ (16) . At this juncture, a parallel Gauss elimination process is conducted using the Parallel CuBLAS library functions, which are optimized for the GPU processor. This process is carried out distinctly for each structural loading condition, leading to the accurate calculation of the $d_{sys}$ displacement vector. In the parallel Gauss elimination process, the inherent advantages of stiffness matrices always being sparse matrices have been effectively utilized. In the context of structural FEM analysis, the stiffness matrices consistently take the form of a sparse matrix and some non-zero elements adhere to a diagonal arrangement. As a result, these matrices exhibit symmetry along this diagonal axis, thereby providing a computational advantage through this discernible pattern.

In the ultimate phase, the “Creation of the f vector,” the calculations for loads and moments on each member are performed. The operations for each element outlined in EquationEq. 17(17) $\{f_{i}\} = [k_{i}] [B_{i}] \{u_{i}\}; i = 1, 2, \dots, m$ (17) have been systematically derived and assembled into a 12 × 1vector, which is then operationalized with CUDA. Through this CUDA function, the parallel computation of the $f_{i}$ result for each element is achieved, ensuring that the function is executed in parallel for all structural elements, significantly improving the computational process and the precision of the structural analysis.

All these features have been seamlessly integrated into the design of the algorithm developed for this purpose. Subsequently, formulas were developed on CUDA (Compute Unified Device Architecture) for all loads acting in each direction, and the algorithm was customized to effectively harness the processing power of GPUs through CUDA integration. Thus, the developed formulas were assigned to GPU processor threads for all nodes, resulting in a parallel process that gains processing speed.

3. Design examples and test computers

In this study, three distinct hypothetical structures, each with varying numbers of elements, were implemented under three progressively increasing loading scenarios. The initial structure comprises 105 members, followed by the second structure with 460 and the third with 1024 members. The load cases for each structural category are quantified at 1, 4, and 7, respectively. All design examples used were selected from benchmark problems established in the literature, certifying consistency with established examples regarding parameters such as load types and load cases (Akın and Aydoğdu Citation2015; İ. Aydoğdu and Saka Citation2012; Çarbaş Citation2016; Örmecioğlu Citation2019). Hence, the whole load combinations of the study for both design examples are defined as; 1.2DL + 1.6LL + 0.5SL, 1.2DL + 0.5LL + 1.6SL, 1.2DL + 1.6WXL+LL + 0.5SL, 1.2DL + 1.6WYL+LL + 0.5SL, 1.2DL + 0.8WXL +1.6SL, 1.2DL + 0.8WYL +1.6SL and 1.4DL where DL, LL, SL and WL represents dead load, live load, snow load and wind load respectively. The design loads, basic wind speed, drift, and deflection limits of the frames are based on ASCE 7–05 (ASCE 7–05 2005) and Ad Hoc Committee on Serviceability, and are given in .

Table 1. Comparison of loading details and displacement limitations for the first and second design examples.

Display Table

This study aimed to prepare an algorithm that would facilitate the optimization analyses of steel frame structures using commonly available hardware in the market, rather than specialized hardware. Using the algorithm developed for this purpose, the study attempted to evaluate the suitability of different hardware configurations commonly found in the market for structural optimization tasks. Therefore, hardware selection was based on market availability and categorized by age (old, medium, new). Subsequently, the algorithm developed to accelerate the structural analysis phase of the BBO process was executed independently on four distinct test computers. These test computers were selected to represent a range of hardware ages and capabilities, including both desktop and laptop configurations. The goal was to evaluate how different hardware setups influenced the optimization process, with a focus on achieving efficient and effective structural analyses for steel frame structures.

Test computer-A, Geforce GTX 1080 titan processor is a standard desktop graphics processor that includes features such as video decoding, OpenGL, Vulkan, and DirectX graphics processing pipelines, and it consists of a total of 3584 CUDA cores and 28 streaming multiprocessors (SMs). Each SM contains 128 CUDA cores, and within a single SM, all cores execute the same commands, forming an integrated parallel structure known as the MP (Massively Parallel) type. Alongside this processor, a desktop computer with an i7-8700K CPU, featuring 6 cores, served as the primary hardware platform (main test computer) for the study, while the other test computers were randomly selected for comparison.

Additionally, independent from main test computer, others were utilized, including a laptop with a GeForce GT 730 M GPU processor and a 4-core i7-3630QM CPU (test computer-B), a laptop with a GeForce GTX 950 M graphics processor and a 4-core i7-6700HQ CPU (test computer-C), and another laptop with a GeForce GTX 1050 Ti GPU processor and an i7-7700HQ CPU (test computer-D), each equipped with the specifications listed in . Each test computer functioned separately, and the performance metrics were collected accordingly.

The selection of benchmark examples was crucial for ensuring a fair comparison between the different hardware configurations. These examples were chosen based on the graphics card with the lowest memory capacity among the various hardware options selected for this study. Additionally, the design of the algorithm was tailored to accommodate this lowest capacity for each hardware configuration, rather than being designed specifically for each one. This approach helped standardize the testing conditions and allowed for a more meaningful comparison of the hardware’s performance in optimizing steel frame structures.

Table 2. Specifications of the test computers.

Download CSV Display Table

In the study, 8-byte double precision was utilized for both GPU and CPU computations to ensure high precision in the optimization analyses of steel frame structures. With the aim of comparison, the identical analytical procedure was repeated twice on each test computer, employing both CPU and GPU processors. This led to a total of 48 analysis runs, encompassing two distinct case structures – one with 105 members and another with 1024 members. These analyses were conducted on four different computer systems, each subjected to three distinct loading conditions, utilizing both CPU and GPU processors.

provide time values for all the test computers during analyses performed under 1, 4, and 7 loading conditions.

3.1. Design example-1: five-story, 105-member steel frame

Figure 7. Design example-1. 3D, side, isometric, and plan views of frame structure with 105 members.

The first design example is a five-story steel space frame structure with 105 members, and 54 joints, which is based on previous studies in the literature (Akın and Aydoğdu Citation2015; I. Aydoğdu Citation2010; Çarbaş Citation2016; Örmecioğlu Citation2019). The 3D, side, and plan views of the frame structure are shown in . In the design example-1, the frame members are grouped into 11 independent design variables which are illustrated in , and the group definitions of the structure were taken from Aydoğdu (Citation2010). The frame members are chosen from a range of 272 W-sections, spanning from W100 × 19.3 to W1100 × 499 mm, as specified in LRFD-AISC (LRFD 2000). In addition to dead and live loads, the structure is exposed to snow and wind loads.

Table 3. Group definition of the design example-1:The 105-member frame structure.

Download CSV Display Table

In an instance comprising 105 elements, optimization was carried out with CPU and GPU, employing the load combinations provided above and the computers specified in Table. This process was repeated a total of 2 times, and the cumulative duration of results for an average iteration is displayed in Furthermore, depicts the distribution of CPU and GPU core utilization and memory during this iteration.

Figure 8. GPU/CPU kernel times of the 105-member structure under 1,4 and 7 load cases. LC: the number of load cases.

Figure 9. Process/Memory, time percent graph of internal GPU kernel for the 105-member structure under 1, 4, and 7 load cases. LC: the number of load cases.

In , it is evident that the GPU processing time is significantly higher than that of the CPU, ranging from 2.7 to 70 times faster. The discrepancy in GPU-CPU processing time is most pronounced in the “D” computer configuration, with an average 49-fold difference, and least pronounced in the “A” configuration, with an average fourfold difference. As the number of load combinations increases, both CPU and GPU solution times on the sample also increase.

In , the percentage of time spent on memory operations falls within the range of 65% to 80%. Notably, in the case of single-loading scenarios, there exists a more inconsistent distribution between memory and processing times, whereas, under conditions of 4 and 7 loading combinations, the relationship between memory and processing times appears to be more closely aligned.

Within , the minimum weight of the 105-member structure, as computed by our developed algorithm, is contrasted against findings from existing literature. A solution is yielded by our BBO-GPU algorithm that is %4.79 heavier than the lightest weight, %40.6 lighter than the heaviest weight, and %6.36 lighter than the average weight. These deviations suggest that the BBO-GPU solution falls within an acceptable range.

Table 4. Design sections and limit values of the optimum designs for the 105-member space frame.

Download CSV Display Table

Figure 10. Search history of 105-member space frame.

shows the search history behavior of the BBO-GPU concerning the other optimization algorithm previously used in the 105-member example. This figure reveals that the BBO-GPU’s convergence remains in the interval of the optimization algorithms.

3.2. Design example-2: twenty-story, 460-member steel frame

Figure 11. Design example-2. 3D, side, isometric, and plan views of frame structure with 460 members.

The second design example involves a twenty-story steel space frame structure comprising 460 members and 210 joints. This design is based on previous research outlined in Aydoğdu (Citation2010), and the group definitions of the structure were taken from Aydoğdu (Citation2010) article; they were conducted as depicted in . displays 3D, side, isometric, and plan views of the frame structure. For this particular design example, we have categorized the frame members into 13 distinct design variables, which are listed in . These frame members have been selected from a range of 272 W-sections, with sizes ranging from W100 × 19.3 to W1100 × 499 mm, following LRFD-AISC guidelines (LRFD 2000).

Table 5. Group definition of the design example-2:The 460-member frame structure. See for a-, b-, c-, d, and e-beam locations.

Download CSV Display Table

Optimization was conducted on a system comprising 460 elements, utilizing both CPU and GPU processing. This optimization process incorporated the load combinations detailed earlier and was executed on the computer systems specified in Table. This procedure was iterated twice, and provides an overview of the average duration of these iterations. Additionally, illustrates the utilization of CPU and GPU cores, as well as memory distribution throughout this iterative process.

In , we observe that the GPU processing time outperformed the CPU in only 4 out of 12 combinations, achieving a speedup factor of 1.1–1.4 times. Conversely, in the remaining 8 scenarios, the GPU lagged behind the CPU, with processing times ranging from 1.1 to 4.0 times slower. The most significant discrepancy in GPU-CPU processing times was observed in computer “A,” with an average improvement of 1.2 times, while the smallest difference occurred in computer “C,” where the GPU was 3.4 times slower on average. Notably, as the number of load combinations increased, the CPU and GPU solution times also exhibited an upward trend. Across computers A, B, and C, the average time differences amounted to 6.5 milliseconds, whereas computer D showed a considerably larger average difference of 395.5 milliseconds.

Figure 12. GPU/CPU kernel times of design example-2:The 460-member structure under 1,4 and 7 load cases.

illustrates that the time allocated to memory operations ranges from 65% to 93% of the total processing time. Remarkably, computer “D” consistently registered the highest processing times for each load type, while computer “A” consistently demonstrated the shortest processing times. Furthermore, it is noteworthy that, for computer “A,” the proportion of memory time relative to the overall time remained above 90% across all three load types.

Figure 13. Process/Memory, Time Percent Graph of Internal GPU kernel for the 460-member structure under 1, 4, and 7 load cases.

Previously, the optimization of the 460-member structure was conducted using the Ant Colony Optimization (ACO) algorithm (İ. Aydoğdu and Saka Citation2012). When comparing the outcomes of BBO-GPU against those of ACO, a 0,35% difference becomes apparent between the two sets of results (). This variation remains within an acceptable range for evaluation.

Table 6. Design sections and limit values of the optimum designs for the 460-member space frame.

Download CSV Display Table

Figure 14. Search history of 460-member space frame.

presents a comparison of the search history patterns between the BBO-GPU and the optimization algorithm that was used earlier in the 460-member example. The data from this figure indicates that the BBO-GPU’s convergence performance falls within the boundaries set by the earlier optimization algorithms.

3.3. Design example-3: eight-story, 1024-member steel frame

The third design example is an eight-story steel space frame structure with 460 members and 384 joints in 40 independent design groups, taken from previous studies (I. Aydoğdu and Akın Citation2014; İ. Aydoğdu and Saka Citation2012; Örmecioğlu Citation2019). displays the isometric, side, and plan views for design example 3, while outlines the grouping of frame members. The group definitions of the structure were taken from Aydoğdu and Saka (Citation2012). Like the design example-1, the frame members are chosen from a selection of 272 W-sections, ranging from W100 × 19.3 to W1100 × 499 mm, as outlined in LRFD-AISC (LRFD 2000).

Table 7. Group definition of the design example-3:The 1024-member frame structure.

Download CSV Display Table

Figure 15. Design example-3. 3D, side, isometric, and plan views of frame structure with 1024 members.

In the context of a scenario involving 1024 elements, optimization was conducted using both CPU and GPU, employing the load combinations previously outlined and the computer systems specified in Table. This optimization process was replicated twice, as depicted in , which presents the cumulative duration of results for an average iteration. Additionally, offers a visual representation of the distribution of CPU and GPU core utilization, as well as memory usage, throughout these optimization iterations. This graphical representation provides a comprehensive insight into the allocation of computational resources during the optimization process.

presents a clear comparison between GPU and CPU processing times across 12 distinct combinations. Notably, in 7 of these scenarios, the GPU outperformed the CPU, completing tasks 1.1 to 3.8 times faster. Conversely, in the remaining 5 scenarios, the GPU exhibited slower processing times, ranging from 1.0 to 1.7 times longer than the CPU. The largest discrepancy between GPU and CPU processing time was observed in the “A” computer configuration, with an average reduction of 2.6 times, while the “C” configuration showed the least significant difference, with an average increase of 1.1 times. It is important to note that as the number of load combinations increased, both CPU and GPU solution times demonstrated a gradual upward trend in the sample data.

Figure 16. GPU/CPU kernel times of the design example-3:The 1024-member structure under 1,4 and 7 load cases.

presents data indicating that the percentage of time dedicated to memory operations lies in the range of 53% to 74%. It is noteworthy that, across all loading types, the “D” computer configuration consistently records the highest processing time, whereas the “A” configuration consistently records the lowest processing time.

Figure 17. Process/Memory, time percent graph of internal GPU kernel for the 1024-member structure under 1, 4, and 7 load cases.

The minimum weight achieved using the developed algorithm is compared to the results presented in the literature (I. Aydoğdu et al. Citation2017; Çarbaş Citation2016), as shown in . Upon comparison, a difference of 0,44 % is observed between our findings and those of the referenced studies. Given the stochastic nature of metaheuristic algorithms, such a variance can be deemed acceptable.

Table 8. Design details of the best solutions for the 1024-member structure.

Download CSV Display Table

Figure 18. Search history of 1024-member space frame.

illustrates the comparative search history behavior of the BBO-GPU against the previously utilized optimization algorithm in the context of the 1024-member example. It is evident from this figure that the convergence of the BBO-GPU is maintained within the range defined by the other optimization algorithms.

4. Discussion

When analyzing the performance metrics of three design examples, a key distinction arises where the GPU processor struggles to outperform the CPU, especially in scenarios involving relatively few elements within the framework systems. However, a notable performance improvement is observed when analyzing a structure with 1024 members on “test computer-A,” where the GPU processor proves to be over three times faster than the CPU. To comprehend the reasons behind this phenomenon, a software profile analysis is conducted. This analysis involves capturing a comprehensive record of events and their durations during computations on the GPU processor.

As depicted in , nearly 60% of the total time consumed by the GPU is attributed to data transfer operations between the CPU (main computer) and the GPU processor (device). The remaining 40% represents the actual time spent by the GPU processor on mathematical computations. Comparing processing times without memory-related tasks to CPU processing time reveals that a performance increase of up to 9.5 times can be achieved, particularly evident in the analysis of the 1024-member structure in the case of “test computer-A,” while it has less impact on the 105-member structure. Additionally, it is observed that the 460-member structure achieves similar results across all conditions except for computer-D.

When examining the analyses conducted within this study, the results can be categorized into two main performance groups. The first classification is primarily based on architectural differences between CPU and GPU processor groups. The most significant difference is the scalability of performance with the amount of processed data. This becomes particularly evident when comparing the analysis of a 105-member framework, which requires fewer computations and less memory, to that of a 1024-member structure, which is more computation-intensive and memory-demanding. Furthermore, the 460-member structure tends to evenly allocate processor and memory consumption, positioning it between the results of the other two structure examples. demonstrates that under all load conditions, classical CPU processors outperform GPU processors in the analysis of the 105-member structure. However, as shown in , the CPU processor falls behind the GPU processor in the analysis of the 1024-member structure. indicates that CPU and GPU processor utilization rates are similar. Therefore, in data-intensive and repetitive operations, the GPU’s parallel processing capability tends to yield significantly better results.

The performance gap between GPU and CPU analyses can be attributed to two key factors. Firstly, differences in hardware configurations contribute significantly to this gap. Secondly, a bottleneck occurs in memory transfer between the host and the device. The study highlights the impact of GPU processor architecture, as different GPU models show varying performance levels due to differences in their command sets. GPUs with advanced command sets demonstrate superior parallel computational capabilities compared to those with older architectures, underscoring the importance of GPU model selection for achieving optimal performance in parallel programming for structural optimization. The hardware selection deliberately included devices from various release years, enhancing the diversity of GPU architectures tested. This variation in processing capabilities among the hardware led to differences in speed and performance. Moreover, a fixed portion of time is required for handshake, initialization, and data transfer during computation, which is necessary for GPU calculations but not for CPU calculations. Consequently, it is not possible to reduce the computation time below a certain threshold when using GPUs. To illustrate this, the study includes a specific example with minimal computation and memory transfer involving 105 elements, presenting this phenomenon through tables and graphs. and illustrate the percentage of the memory bottleneck, which is the reason for the occurrence of situations where the GPU spends more time than the CPU in the analysis of structures with a low number of members.

The second classification pertains to different GPU processor models, each characterized by unique processor architectures, based on their production years. As indicated in , variations in performance among GPU processors stem not only from differences in data pathways but also from disparities in processor architectures, particularly in the command sets. In the context of the SM 3.0 architecture, GPU processors in test machines other than “D” exhibit superior capabilities for parallel mathematical computations due to their more advanced command sets. The efficiency of these commands becomes evident in the obtained results. Conversely, the GPU processor in the “D” computer requires the use of older software solutions due to the absence of specific commands compared to others. Importantly, within the SM 3.0 architecture, no command set allows atomic addition for parallel floating-point calculations, unlike the processors used in other test machines. As a result, it has been observed that commands in the “D” test computer operate significantly slower than others due to this limitation.

The fluctuations In processing times observed In the single-load scenario shown in Figure may be attributed to the necessity of data transfer between the main computer and the GPU during GPU processing. These fluctuations serve as an indicator that using a GPU may not provide significant advantages in scenarios characterized by low computational requirements. Additionally, the similarity in processor/memory usage ratios for the 460-member example in highlights the threshold at which GPU usage may only be advantageous beyond a certain level of computational demand.

Table 9. GPU processor architectures with their production years.

Download CSV Display Table

To validate the optimality of the developed algorithm, we compared its optimum results with those found in existing literature. The comparison reveals a deviation of 6,36% for the first example and 0,35% for the second and third examples. These observed differences comfortably fall within acceptable ranges of evaluation, affirming the algorithm’s effectiveness.

5. Conclusion

The study explores the application of GPU-based parallel programming to optimize large-scale steel frame structures, addressing the computational complexities associated with iterative structural analysis processes. The research demonstrates that GPU processors provide a significant performance advantage over CPUs. The extent of this advantage depends on several factors, including processor architecture, computational demands, and the specific GPU model used. Notably, in scenarios with high computational requirements, such as analyzing a 1024-member frame structure, GPUs outperform CPUs by a factor of more than three. This highlights the effectiveness of GPU parallel processing in managing time-intensive optimization challenges in structural engineering.

Furthermore, the research emphasizes the importance of data transfer operations between the CPU and GPU, especially in situations with lower computational demands. As computational requirements increase, the benefits of GPU parallel processing become more evident. This underscores the necessity of carefully considering computational demands and GPU architecture when implementing parallel computing methods.

The study also highlights the impact of GPU processor architecture, as different GPU models exhibit varying levels of performance due to differences in their command sets. GPUs with advanced command sets demonstrate superior parallel computational capabilities compared to those with older architectures, emphasizing the role of GPU model selection in achieving optimal performance in parallel programming for structural optimization.

In conclusion, this research emphasizes the promising role of GPU-based parallel programming in accelerating the optimization of large-scale steel frame structures. It delivers significant improvements in time efficiency and performance, especially in scenarios involving complex and computationally intensive iterative processes. While this study focuses on the possibility of expediting the optimization of steel structures using ordinary devices, further research is warranted to fully explore the potential of GPU-based solutions in civil engineering and related fields. Specifically, future studies should focus on evaluating the performance of GPU-based methods in large-scale finite element analysis to gain a deeper understanding of their capabilities and limitations. Nevertheless, it is important to note that increasing the number of members in a group may lead to convergence delays. Therefore, future studies should consider the use of guide-based techniques for larger member numbers to address this omission and further enhance the optimization process for steel frame structures.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors used ChatGPT to improve readability and language. After using this tool/service, the authors reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Tevfik Oğuz Örmecioğlu

MSc Örmecioğlu is a PhD Candidate and a civil engineer with extensive experience in construction project management and infrastructure development. He has a strong academic background with a focus on structural optimization and artificial intelligence. He is also proficient in various programming languages and software applications, making him a versatile and skilled professional in his field.

İbrahim Aydoğdu

Dr. Aydoğdu is a faculty member at Akdeniz University, where he teaches courses on artificial intelligence, machine learning, and deep learning. His passion lies in sharing his knowledge and expertise with students, inspiring them to explore the exciting world of AI. Alongside teaching, he conducts research and publishes in the areas of natural inspired optimization algorithms and computational mechanics.

Hilal Tuğba Örmecioğlu

Dr. Örmecioğlu is an academic specializing in architecture, with a focus on the history of Modern Architecture, Construction, and Technology. Her interests cover earthquake-resistant architectural design to the preservation of modern architecture. Her research also extends to the integration of AI into architectural education and the utilization of new technologies in architectural and structural design.

References

AISC (American Institute of Steel Construction). 1986. “Load and Resistance Factor Design Specifications for Steel Buildings.” https://www.aisc.org/globalassets/aisc/manual/15th-ed-ref-list/load-and-resistance-factor-design-specification-for-structural-steel-buildings.pdf.
Google Scholar
Akın, A., and I. Aydoğdu. 2015. “Optimum Design of Steel Space Frames by Hybrid Teaching-Learning Based Optimization and Harmony Search.” The International Conference on Computing in Civil and Building Engineering, Disney World, Orlando, Florida, USA 1344–1351. Americal Society of Civil Engineers. https://doi.org/10.1061/9780784413616.269.
Google Scholar
Artar, M., and S. Çarbaş. 2021. “Discrete Sizing Design of Steel Truss Bridges Through Teaching-Learning-Based and Biogeography-Based Optimization Algorithms Involving Dynamic Constraints.” Structures 34:3533–3547. https://doi.org/10.1016/j.istruc.2021.09.101.
Web of Science ®Google Scholar
Aydoğdu, I. 2010. “Optimum Design of 3D Irregular Steel Frames Using Ant Colony Optimization and Harmony Search Algorithms.” Unpublished PhD Thesis, METU.
Google Scholar
Aydoğdu, I. 2017. “Cost Optimization of Reinforced Concrete Cantilever Retaining Walls Under Seismic Loading Using a Biogeography-Based Optimization Algorithm with Levy Flights.” Engineering Optimization 49 (3): 381–400. https://doi.org/10.1080/0305215X.2016.1191837.
Web of Science ®Google Scholar
Aydoğdu, I., and A. Akın. 2014. “Optimum Design of Geodesic Aluminum Domes Using Firefly Algorithm.” ACE 2014 11th International Congress On Advances In Civil Engineering, Antalya, Türkiye. Ekim 21-25, 2014, 1–6.
Google Scholar
Aydoğdu, I., S. Çarbaş, and A. Akın. 2017. “Effect of Levy Flight on the Discrete Optimum Design of Steel Skeletal Structures Using Metaheuristics.” Steel and Composite Structures 24 (1): 93–112. https://doi.org/10.12989/scs.2017.24.1.093.
Web of Science ®Google Scholar
Aydoğdu, I., P. Efe, M. Yetkin, and A. Akin. 2017. “Optimum Design of Steel Space Structures Using Social Spider Optimization Algorithm with Spider Jump Technique.” Structural Engineering and Mechanics 62 (3): 259–272. https://doi.org/10.12989/sem.2017.62.3.259.
Web of Science ®Google Scholar
Aydoğdu, İ., and M. P. Saka. 2012. “Ant Colony Optimization of Irregular Steel Frames Including Elemental Warping Effect.” Advances in Engineering Software 44 (1): 150–169. https://doi.org/10.1016/j.advengsoft.2011.05.029.
Web of Science ®Google Scholar
Çarbaş, S. 2016. “Optimum Structural Design of Spatial Steel Frames via Biogeography-Based Optimization.” Neural Computing & Applications 28 (6): 1525–1539. https://doi.org/10.1007/s00521-015-2167-6.
Web of Science ®Google Scholar
Cheng, M. Y., D. Prayogo, Y. W. Wu, and M. M. Lukito. 2016. “A Hybrid Harmony Search Algorithm for Discrete Sizing Optimization of Truss Structure.” Automation in Construction 69:21–33. https://doi.org/10.1016/j.autcon.2016.05.023.
Web of Science ®Google Scholar
Couty, V., J. F. Witz, P. Lecomte-Grosbras, J. Berthe, E. Deletombe, and M. Brieu. 2021. “GPUCorrel: A GPU Accelerated Digital Image Correlation Software Written in Python.” SoftwareX 16:100815. https://doi.org/10.1016/j.softx.2021.100815.
Web of Science ®Google Scholar
Dillen, W., G. Lombaert, and M. Schevenels. 2021. “AHybrid Gradient-Based/metaheuristic Method for Eurocode-Compliant Size, Shape and Topology Optimization of Steel Structures.” Engineering Structures 239 (July): 112137. https://doi.org/10.1016/j.engstruct.2021.112137.
Google Scholar
Duff, I. S., A. M. Erisman, C. W. Gear, and J. K. Reid. 1988. “Sparsity Structure and Gaussian Elimination ACM SIGNUM Newsletter.” ACM SIGNUM Newsletter 23 (2): 2–8. https://doi.org/10.1145/47917.47918.
Google Scholar
Ellingwood, B. 1986. “Structural Serviceability: A Critical Appraisal and Research Needs.” Journal of Structural Engineering 112 (12): 2646–2664. https://doi.org/10.1061/(ASCE)0733-9445(1986)112:12(2646).
Web of Science ®Google Scholar
Gandomi, A. H., D. Kalyanmoy, R. C. Averill, S. Rahnamayan, and M. N. Omidvar. 2023. “Variable Functioning and Its Application to Large-Scale Steel Frame Design Optimization.” Structural and Multidisciplinary Optimization 66 (1): 13. https://doi.org/10.1007/s00158-022-03435-2.
Web of Science ®Google Scholar
Georgescu, S., P. Chow, and H. Okuda. 2013. “GPU Acceleration for FEM-Based Structural Analysis.” Archives of Computational Methods in Engineering 20 (2): 111–121. https://doi.org/10.1007/s11831-013-9082-8.
Web of Science ®Google Scholar
Ghatte, F. H. 2021. “A Hybrid of Firefly and Biogeography-Based Optimization Algorithms for Optimal Design of Steel Frames.” Arabian Journal for Science and Engineering 46 (5): 4703–4717. https://doi.org/10.1007/s13369-020-05118-w.
Web of Science ®Google Scholar
Gholizadeh, S. 2015. “Performance-Based Optimum Seismic Design of Steel Structures by a Modified FireflY Algorithm and a New Neural Network.” Advances in Engineering Software 81 (March): 50–65. https://doi.org/10.1016/j.advengsoft.2014.11.003.
Google Scholar
Gholizadeh, S., and A. Milany. 2016. “Optimal Performance-Based Design of Steel Frames Using Advanced Metaheuristics.” Asian Journal of Civil Engineering 17 (5): 607–623.
Google Scholar
Gholizadeh, S., and M. Mohammadi. 2017. “Reliability-Based Seismic Optimization of Steel Frames by Metaheuristics and Neural Networks.” ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems Part A-Civil Engineering 3 (1): 04016013. https://doi.org/10.1061/AJRUA6.0000892.
Web of Science ®Google Scholar
Guo, W., M. Chen, L. Wang, Y. Mao, and Q. Wu. 2017. “A Survey of Biogeography-Based Optimization.” Neural Computing & Applications 28 (8): 1909–1926. https://doi.org/10.1007/s00521-016-2179-x.
Web of Science ®Google Scholar
Hasançebi, O., T. Bahçecioǧlu, Ö. Kurç, and M. P. Saka. 2011. “Optimum Design of High-Rise Steel Buildings Using an Evolution Strategy Integrated Parallel Algorithm.” Computers and Structures 89 (21–22): 2037–2051. https://doi.org/10.1016/j.compstruc.2011.05.019.
Google Scholar
Hong, W. H., V. T. Nguyen, and M. C. Nguyen. 2022. “Optimizing Reinforced Concrete Beams Cost Based on AI-Based Lagrange Functions.” Journal of Asian Architecture and Building Engineering 21 (6): 2426–2443. https://doi.org/10.1080/13467581.2021.2007105.
Web of Science ®Google Scholar
Hosseini, N., M. R. Ghasemi, and B. Dizangian. 2022. “ANFIS-Based Optimum Design of Real Power Transmission Towers with Size, Shape and Panel Design Variables Using BBO Algorithm.” IEEE Transactions on Power Delivery 37 (1): 29–39. https://doi.org/10.1109/TPWRD.2021.3052595.
Web of Science ®Google Scholar
Kaveh, A., T. Bakhshpoori, and E. Afshari. 2014. “An Efficient Hybrid Particle Swarm and Swallow Swarm Optimization Algorithm.” Computers & Structures 143:40–59. https://doi.org/10.1016/j.compstruc.2014.07.012.
Web of Science ®Google Scholar
Kaveh, A., Y. Gholipour, and H. Rahami. 2008. “Optimal Design of Transmission Towers Using Genetic Algorithm and Neural Networks.” International Journal of Space Structures 23 (1): 1–19. https://doi.org/10.1260/026635108785342073.
Google Scholar
Kaveh, A., K. B. Hamedani, S. M. Hosseini, and T. Bakhshpoori. 2020. “Optimal Design of Planar Steel Frame Structures Utilizing Metaheuristic Optimization Algorithms.” Structures 25:335–346. https://doi.org/10.1016/j.istruc.2020.03.032.
Web of Science ®Google Scholar
Kaveh, A., and G. M. Ilchi. 2018. “A New Hybrid Meta-Heuristic Algorithm for Optimal Design of Large-Scale Dome Structures.” Engineering Optimization 50 (2): 235–252. https://doi.org/10.1080/0305215X.2017.1313250.
Web of Science ®Google Scholar
Kaveh, A., K. Laknejadi, and B. Alinejad. 2012. “Performance-Based Multi-Objective Optimization of Large Steel Structures.” Acta Mechanica 223 (2): 355–369. https://doi.org/10.1007/s00707-011-0564-1.
Web of Science ®Google Scholar
Kaveh, A., and V. R. Mahdavi. 2015. “A Hybrid CBO-PSO Algorithm for Optimal Design of Truss Structures with Dynamic Constraints.” Applied Soft Computing 34:260–273. https://doi.org/10.1016/j.asoc.2015.05.010.
Web of Science ®Google Scholar
Kazemzadeh Azad, S. 2017. “Enhanced Hybrid Metaheuristic Algorithms for Optimal Sizing of Steel Truss Structures with Numerous Discrete Variables.” Structural and Multidisciplinary Optimization 55 (6): 2159–2180. https://doi.org/10.1007/s00158-016-1634-8.
Web of Science ®Google Scholar
Kazemzadeh Azad, S. 2020. “Design Optimization of Real-Size Steel Frames Using Monitored Convergence Curve.” Structural and Multidisciplinary Optimization 63 (1): 267–288. https://doi.org/10.1007/s00158-020-02692-3.
Web of Science ®Google Scholar
Kazemzadeh Azad, S., O. Hasancebi, and S. Kazemzadeh Azad. 2013. “Upper Bound Strategy for Metaheuristic-Based Design Optimization of Steel Frames.” Advances in Engineering Software 57:19–32. https://doi.org/10.1016/j.advengsoft.2012.11.016.
Web of Science ®Google Scholar
Kociecki, M., and H. Adeli. 2014. “Two-Phase Genetic Algorithm for Size Optimization of Free-Form Steel Space-Frame Roof Structures.” Engineering Applications of Artificial Intelligence 32:218–227. https://doi.org/10.1016/j.engappai.2014.01.010.
Web of Science ®Google Scholar
Kociecki, M., and H. Adeli. 2015. “Shape Optimization of Free-Form Steel Space-Frame Roof Structures with Complex Geometries Using Evolutionary Computing.” Engineering Applications of Artificial Intelligence 38:168–182. https://doi.org/10.1016/j.engappai.2014.10.012.
Web of Science ®Google Scholar
Kucukkulahli, E., P. Erdogmus, and K. Polat. 2017. “A Hybrid Approach to Image Segmentation: Combination of BBO (Biogeography Based Optimization) and Histogram Based Cluster Estimation.” Proceedings of the 2017 25th Signal Processing and Communications Applications Conference, Vol. V1, 1–4. Antalya, Turkey, https://doi.org/10.1109/SIU.2017.7960188
Google Scholar
Lei, J., D. Li, Y. Zhou, and W. Liu. 2019. “Optimization and Acceleration of Flow Simulations for CFD on CPU/GPU Architecture.” Journal of the Brazilian Society of Mechanical Sciences and Engineering 41 (7): 290. https://doi.org/10.1007/s40430-019-1793-9.
Web of Science ®Google Scholar
LRFD-ASIC. 2000. “Load & Resistance Factor Design Secondary Title Specification for Structural Steel Buildings.” https://www.aisc.org/LRFD-Specification-for-Structural-Steel-Buildings-1999.
Google Scholar
Luo, C. H., H. Ye, and X. Chen. 2020. “A Memory Optimization Method Combined with Adaptive Time-Step Method for Cardiac Cell Simulation Based on Multi-GPU.” Medical and Biological Engineering and Computing 58 (11): 2821–2833. https://doi.org/10.1007/s11517-020-02255-0.
PubMed Web of Science ®Google Scholar
Martínez-Frutos, J., and D. Herrero-Pérez. 2017. “GPU Acceleration for Evolutionary Topology Optimization of Continuum Structures Using Isosurfaces.” Computers and Structures 182:119–136. https://doi.org/10.1016/j.compstruc.2016.10.018.
Google Scholar
McClanahan, C. 2011. “History and Evolution of GPU Architecture.” A Survey Paper. https://mcclanahoochie.com/blog/wp-content/uploads/2011/03/gpu-hist-paper.pdf.
Google Scholar
Meon, M. S., M. A. Anuar, M. H. M. Ramli, W. Kuntjoro, and Z. Muhammad. 2012. “Frame Optimization Using Neural Network.” International Journal on Advanced Science, Engineering and Information Technology 2 (1): 28–33. https://doi.org/10.18517/ijaseit.2.1.148.
Google Scholar
Mikes, I. G., and A. J. Kappos. 2023. “Optimization of the Seismic Response of Bridges Using Variable-Width Joints.” Earthquake Engineering and Structural Dynamics 52 (1): 111–127. https://doi.org/10.1002/eqe.3751.
Web of Science ®Google Scholar
Onan, A. 2013. “Metaheuristic Methods and Their Application Areas.” Çukurova Üniversitesi İİBF Dergisi 17 (2): 113–128.
Google Scholar
Örmecioğlu, T. O. 2019. “Çelik Çerçeve Sistemlerin GPU Tabanlı Optimizasyonu.” MSc. Thesis, Akdeniz University.
Google Scholar
Papadrakakis, M., N. D. Lagaros, and Y. Fragakis. 2003. “Parallel Computational Strategies for Structural Optimization.” International Journal of Numerical Methods in Engineering 58 (9): 1347–1380. https://doi.org/10.1002/nme.821.
Web of Science ®Google Scholar
Papadrakakis, M., N. D. Lagaros, and Y. Tsompanakis. 1998. “Structural Optimization Using Evolution Strategies and Neural Networks.” Computer Methods in Applied Mechanics and Engineering 156 (1–4): 309–333. https://doi.org/10.1016/S0045-7825(97)00215-6.
Web of Science ®Google Scholar
Sarma, K. C., and H. Adeli. 2001. “Bilevel Parallel Genetic Algorithms for Optimization of Large Steel Structures.” Comput-Aided Civil and Infrastructure Engineering 16 (5): 295–304. https://doi.org/10.1111/0885-9507.00234.
Web of Science ®Google Scholar
Shkurti, A., M. Orsi, E. Macii, E. Ficarra, and A. Acquaviva. 2013. “Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures.” Journal of Computational Chemistry 34 (10): 803–818. https://doi.org/10.1002/jcc.23183.
PubMed Web of Science ®Google Scholar
Simon, D. 2008. “Biogeography-Based Optimization.” IEEE Transactions on Evolutionary Computation 12 (6): 702–713. https://doi.org/10.1109/TEVC.2008.919004.
Web of Science ®Google Scholar
Tayfur, B., H. Yilmaz, and A. T. Daloglu. 2021. “Hybrid Tabu Search Algorithm for Weight Optimization of Planar Steel Frames.” Engineering Optimization 53 (8): 1369–1383. https://doi.org/10.1080/0305215X.2020.1793977.
Web of Science ®Google Scholar
Ting, T. O., X. S. Yang, S. Cheng, and K. Huang. 2015. “Hybrid Metaheuristic Algorithms: Past, Present, and Future.” In Recent Advances in Swarm Intelligence and Evolutionary Computation. Studies in Computational Intelligence, edited by X. Yang. Vol. 585. Springer, Cham. https://doi.org/10.1007/978-3-319-13826-8_4.
Google Scholar
Träff, E. A., A. Rydahl, S. Karlsson, O. Sigmund, and N. Aage. 2023. “Simple and Efficient GPU Accelerated Topology Optimisation: Codes and Applications.” Computer Methods in Applied Mechanics and Engineering 410:116043. https://doi.org/10.1016/j.cma.2023.116043.
Web of Science ®Google Scholar
Vargas, A., T. M. Stitt, K. Weiss, V. Z. Tomov, J.-S. Camier, T. Kolev, R. N. Rieben, et al. 2022. “Matrix-Free Approaches for GPU Acceleration of a High-Order Finite Element Hydrodynamics Application Using MFEM, Umpire, and RAJA.” The International Journal of High Performance Computing Applications 36 (4): 492–509. https://doi.org/10.1177/10943420221100262.
Web of Science ®Google Scholar
Vu, Q. A., T. S. Cao, T. T. T. Nguyen, H. H. Nguyen, V. H. Truong, and M. H. Ha. 2023. “An Efficient Differential Evolution-Based Method for Optimization of Steel Frame Structures Using Direct Analysis.” Structures 51:67–78. https://doi.org/10.1016/j.istruc.2023.03.020.
Web of Science ®Google Scholar
Xu, Z., X. Lu, H. Guan, and A. Ren. 2014. “High-Speed Visualization of Time-Varying Data in Large-Scale Structural Dynamic Analyses with a GPU.” Automation in Construction 42:90–99. https://doi.org/10.1016/j.autcon.2014.02.020.
Web of Science ®Google Scholar
Zegard, T., and G. H. Paulino. 2013. “Toward GPU Accelerated Topology Optimization on Unstructured Meshes.” Structural and Multidisciplinary Optimization 48 (3): 473–485. https://doi.org/10.1007/s00158-013-0920-y.
Web of Science ®Google Scholar
ZendehAli, N., H. Emdad, and O. Abouali. 2023. “Developing a CPU-GPU LES Parallel Solver for Canonical Turbulent Flows.” Iranian Journal of Science and Technology Transactions of Mechanical Engineering 47 (4): 1535–1551. https://doi.org/10.1007/s40997-023-00618-0.
Web of Science ®Google Scholar
Zhu, W. 2010. “Parallel Biogeography-Based Optimization with GPU Acceleration for Nonlinear Optimization”. Proceedings of the ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference V1, 315–323. Montreal, Canada, August 15-18, 2010.
Google Scholar

GPU-based parallel programming for FEM analysis in the optimization of steel frames

ABSTRACT

GRAPHICAL ABSTRACT

1. Introduction

2. Methodology

2.1. Frame optimization

2.2. GPU (graphical processing unit) based parallel processing and CUDA (compute unified device architecture)

2.3. The BBO algorithm

Algorithm 1. The bbo optization algorithm.

2.4. GPU-based parallel algorithm for FEM

2.4.1. FEM method

2.4.2. Acceleration of FEM for GPU

Algorithm 2. Structural optimization module.

3. Design examples and test computers

Table 1. Comparison of loading details and displacement limitations for the first and second design examples.

Table 2. Specifications of the test computers.

3.1. Design example-1: five-story, 105-member steel frame

Table 3. Group definition of the design example-1:The 105-member frame structure.

Table 4. Design sections and limit values of the optimum designs for the 105-member space frame.

3.2. Design example-2: twenty-story, 460-member steel frame

Table 5. Group definition of the design example-2:The 460-member frame structure. See Figure 10 for a-, b-, c-, d, and e-beam locations.

Table 6. Design sections and limit values of the optimum designs for the 460-member space frame.

3.3. Design example-3: eight-story, 1024-member steel frame

Table 7. Group definition of the design example-3:The 1024-member frame structure.

Table 8. Design details of the best solutions for the 1024-member structure.

4. Discussion

Table 9. GPU processor architectures with their production years.

5. Conclusion

Declaration of generative AI and AI-assisted technologies in the writing process

Disclosure statement

Additional information

Notes on contributors

Tevfik Oğuz Örmecioğlu

İbrahim Aydoğdu

Hilal Tuğba Örmecioğlu

Unknown widget #5d0ef076-e0a7-421c-8315-2b007028953f

of type scholix-links

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature

Table 5. Group definition of the design example-2:The 460-member frame structure. See for a-, b-, c-, d, and e-beam locations.