273
Views
0
CrossRef citations to date
0
Altmetric
Production & Manufacturing

An intelligent monitoring method of underground unmanned electric locomotive loading process based on deep learning method

, &
Article: 2307174 | Received 31 Jul 2023, Accepted 15 Jan 2024, Published online: 27 Jan 2024

Abstract

The intelligent monitoring of electric locomotive loading is crucial in unmanned underground systems. A CNN-based monitoring scheme with migration learning was proposed to address efficiency, abnormality, and data acquisition challenges. Locomotive loading datasets are transformed, augmented, and equalized. Our model improves performance and training by modifying the fully connected layer, using optimized learning rate decay and adaptive algorithms. Training in PyTorch, the optimized VGG19-EL migration network achieves 99.85% recognition for 2-classifications, while the optimized RESNET50-EL migration network achieves 97.3% for 10-classifications. Overall, this study proposes a reliable and efficient model for liberating workers and monitoring locomotive loading.

1. Introduction

The intelligent construction of mines is accelerating, and the unmanned underground electric locomotive is receiving significant attention as an essential part of mine intelligence. In the complex underground environment with interlaced, narrow, dark, and wet tunnels as well as blind spots in the field of vision, the mine-tracked electric locomotive, as the primary means of transporting ores and materials underground, is susceptible to accidents caused by manual fatigue driving and misoperation. In contrast, the mine electric locomotive driverless can fundamentally avoid the problem of operator fatigue driving, with excellent safety and economic benefits, and is the solution to such problems.

Monitoring the ore loading process is critical for an unmanned underground locomotive system. Real-time monitoring of the ore loading in the hopper of the locomotive at the exit guides subsequent loading actions, allowing for a completely unmanned loading process. Although domestic unmanned underground rail transportation can achieve long-distance remote control and autonomous walking, monitoring the loading status of unmanned locomotives underground still poses several challenges. There are two ways to monitor the loading: equipping each electric locomotive with a release worker to release the mine on site or having workers in the ground dispatch center monitor the underground loading process in real-time through a computer screen and send control instructions to the electric locomotive. However, both methods have their drawbacks, such as inefficiency and not achieving true worker freedom.

In recent years, the field of mining engineering has witnessed a booming development of artificial intelligence and deep learning algorithms, which have provided innovative solutions to address many challenges in the industry (Guoli, Citation2022; Wang et al., Citation2023). This paper aims to explore the wide application of deep learning algorithms in mining engineering, especially the remarkable achievements in image recognition and classification, ore sorting and grade prediction, data processing and monitoring, target identification and hazardous source monitoring.

Deep learning has made significant progress in image recognition and classification in the mining industry. By employing deep learning models such as convolutional neural networks (CNNs), researchers are able to process image data from mining areas to achieve accurate recognition and classification of different ores, rocks, and mining facilities, thereby improving the efficiency and accuracy of resource exploration, ore mining, and mine management. Baklanova and Baklanov (Citation2016) discussed applying image recognition techniques for mineral rock identification and designed algorithms for determining rock color and shape. Yi et al. (Citation2021) identified ore knot bottom conditions based on image recognition technology and used the Canny operator edge detection algorithm to determine the location of the car outline. Liya (Citation2017) solved the problem of insufficient manual identification of hazard sources in high-risk areas underground through image recognition technology. Zheng et al. (Citation2015) performed gangue identification based on machine vision and developed an underground gangue pneumatic sorting system.

In the field of ore sorting and grade prediction, current research focuses on applying advanced image processing and machine learning technologies to automate the sorting process by analyzing images of ores, and simultaneously exploring the use of big data and sensor technologies for accurate grade prediction to improve ore processing efficiency and resource utilization. Kumar et al. (Citation2023) achieved a recognition accuracy of 98.2% of iron ore under the microscope based on convolutional neural network technology. Liguan et al. (Citation2020) developed a robust and fewer dataset wolframite image recognition beneficiation method based on deep learning. Xiang et al. (Citation2023) extracted features from the original graphite ore image by cutting it during data processing, and then fused the features from top to bottom to obtain the pyramid structure of the feature map. The global attention mechanism RGA was introduced to accurately grasp the whole image from both spatial and channel levels which can rapidly and conveniently identify the grade of the graphite ore. Lin et al. (Citation2019) studied truck loading ore quantity estimation using a technique based on a deep convolutional neural network. Intelligent sorting of wolframite was achieved by graying and noise reduction preprocessing of ore images (Changlu, Citation2020; Jiwei, Citation2019; Fang, Citation2020).

Deep learning techniques play a key role in mining data processing and monitoring. Models such as convolutional neural network (CNN) and recurrent neural network (RNN) are widely used in remote sensing image interpretation, geological exploration data processing and equipment condition monitoring, which improve the data processing speed and accuracy and provide reliable information support for decision-making in the mining industry. Wang et al. (Citation2021) designed a machine vision based system for detecting large foreign objects on belt conveyors which solves the problem of detecting large foreign objects on belt conveyors, thereby preventing belt tearing and ensuring the stable operation of belt conveyors. Sun et al. (Citation2021) constructed a model based on deep-learning techniques for monitoring ore loading in open-pit mine trucks. Patel et al. (Citation2019) predicted the ore grade during real-time belt transportation. Xu et al. (Citation2020) realized real-time monitoring of dynamic deformation of roadways based on computer vision technology. Dong et al. (Citation2022) (Weisi, Citation2021) implemented intelligent measurements of large piles of materials (mines, ports, and grain silos) and the volume of coal piles on belt conveyors based on binocular stereo vision technology. Shunling et al. (Citation2021) automatically recognized the truck loading condition by extracting deep convolutional features of the open pit mine truck loading condition image and using the same support vector machine multiclassification model. By machine vision technology, Yu Peng et al. (Citation2022) designed and developed an image recognition-based underground locomotive load metering system using image recognition and density model modeling.

In terms of target identification and hazard source monitoring, deep learning models excel in the identification of equipment, personnel, and potential hazard sources in mining areas. Researchers have used deep learning models to effectively identify various targets, including equipment, personnel, and transportation, and to achieve timely monitoring of potential sources of danger. By learning a large amount of labeled image and video data, the deep learning model is able to automatically extract features, thus achieving highly accurate target recognition and hazard source monitoring in the complex and changing mining environment. Zhang Liya (Citation2017) solved the problem of insufficient manual identification of hazard sources in high-risk areas underground through image recognition technology. Yao Yong (Citation2023) based on image and video processing in computer vision technology, combined with convolutional neural network (CNN) and other methods in deep learning technology, to achieve accurate discrimination and recognition of abnormal behavior of miners in a coal mine environment.

Deep learning can be applied in the field of mining in the areas of ore recognition, coal gangue recognition, ore volume estimation, and hazard monitoring, while there is not much research on real-time monitoring of the loading status of unmanned electric locomotives underground. To address this issue, a deep learning-based real-time intelligent monitoring method for locomotive loading status is proposed in this paper. This method involves constructing a training dataset by simulating the loading process of a locomotive since it is difficult to access ore loading pictures of mine cars in real-life scenes. The loading dataset is organized and classified accordingly. The dataset is analyzed based on various transformations such as rotation, translation, scaling, horizontal/vertical flip transformation, brightness, saturation, contrast, etc. The data are enhanced, and the dataset is divided into a training set, validation set, and test set. The corresponding CNN is loaded with the weight file to initialize the network parameters, modify the fully connected layer of the network, and optimize the neural network further by using learning rate decay and adaptive learning rate algorithms. Finally, the optimized neural network is used for new training in the Pytorch framework, and the network parameters are fine-tuned with test data. The method analyzes 2-classification and 10-classification loading status recognition to provide a reasonable real-time monitoring scheme for the loading status of unmanned underground motor vehicles.

This study conducted comprehensive experiments by integrating deep learning techniques to compare the performance of ResNet50-EL and VGG19-EL in load recognition tasks. By fine-tuning network parameters and adjusting the camera installation angles, a feasible real-time monitoring scheme for the loading status of underground unmanned electric vehicles was proposed by comparing and analyzing the load recognition situation in different situations. This provides practical technical solutions for solving practical problems in the coal industry, such as improving efficiency and ensuring safety. Especially with the optimized ResNet50 and VGG19 networks, intelligent and efficient monitoring of the loading capacity of underground electric locomotives has been successfully achieved, making it possible to truly achieve unmanned underground loading processes and providing innovative and feasible solutions for the intelligent construction of mines in the coal industry.

2. Analysis of underground unmanned electric locomotive loading process

The underground unmanned electric locomotive system is a modern mining technology that relies on automation and communication systems. This system integrates various components such as communication, automation, network, mechanical, electrical, remote control, and signal systems to enable automatic operation of underground rail transportation, except for the loading link. The system consists of two parts, namely the upper computer located in the central control room on the ground and the lower PLC control master and substation located near the underground substation and the controlled object. These parts enable data communication between the underground and up-hole through a fiber optic ring network, thus allowing control of underground electric locomotives and equipment in front of the central control room on the ground. The system mainly comprises a dispatching unit, locomotive unit, loading unit, unloading unit, track control unit, communication network unit, and control center system unit. Through the collaboration of each unit, the underground unmanned electric locomotive system can operate intelligently and efficiently, providing a safer and more conducive working environment for underground miners. As shown in and .

Figure 1. Underground track transportation automatic operation system.

Figure 1. Underground track transportation automatic operation system.

Figure 2. System composition diagram.

Figure 2. System composition diagram.

The loading unit of an unmanned electric locomotive is a critical component in the mining process. Keeping track of its loading status is essential to maximize mining production while keeping costs low. This study focuses on intelligent monitoring of the loading process of the locomotive, as shown in . Specifically, it monitors the loading status of the ore in the mine car during the release process, ensuring even distribution. This improves mining production efficiency and work safety. Traditional methods of monitoring the loading status of the mine car rely on remote cameras from the ground dispatch center. However, this subjective approach may lead to worker negligence and inaccurate real-time monitoring, resulting in overloaded or underloaded areas. As shown in is a front view of the locomotive loading process, B-B is a side view of the locomotive loading process, A-A shows the location of the B-B profile and B-B shows the location of the A-A profile.

Figure 3. Locomotive loading process diagram.

Figure 3. Locomotive loading process diagram.

3. Experimental data construction

3.1. Locomotive loading process simulation

3.1.1. Electric locomotive and mine outlet 3D model creation

Electric locomotives are crucial for transportation in underground mines. To better understand their physical characteristics and analyze ore loading processes, 3D modeling of electric locomotives and mine hoppers is useful, as shown in . The Solidworks digital modeling software was utilized to create a 3D model of the electric locomotive bucket and the discharge hopper. This software provides a variety of tools and functions, including basic geometry drawing, assembly, and constraints. Additionally, the EDEM discrete element simulation software was chosen to import the CAD models. EDEM is capable of recognizing CAD model files in various industry format standards for simulation analysis of the locomotive loading process. As shown in , (a) top view of the 3D model of the electric locomotive and mine outlet, (b) main view of the 3D model of the electric locomotive and mine outlet, (c) left view of the 3D model of the electric locomotive and mine outlet, (d) rear view of the 3D model of the electric locomotive and mine outlet, and (e) model of the electric locomotive.

Figure 4. Electric locomotive and mine opening 3D model creation diagram of electric locomotive and mine opening 3D model creation.

Figure 4. Electric locomotive and mine opening 3D model creation diagram of electric locomotive and mine opening 3D model creation.

3.1.2. Discrete element modeling of ores

In order to simulate the process of ore loading accurately, it is necessary to take into account various material properties such as density, Poisson’s ratio, and shear modulus, along with contact properties like recovery coefficient, static friction coefficient, and dynamic friction coefficient. This can be achieved through the use of EDEM simulation software. The experiment in this case focuses on a copper mine, with material and contact properties outlined in and , respectively.

Table 1. Material properties.

Table 2. Contact attributes.

To build a 3D discrete element model of rock particles with irregular geometry, a 3D geometric boundary method based on the triangular surface mesh of the superposition method is used to model the particles using EDEM discrete element simulation software. The geometry and size of the particles are varied by controlling the number, size, and combinations of particles. To balance the rationality between computational efficiency and computational results, four different particle models of single, double, four, and eight spheres were used within the rock geometry boundary to obtain clusters of particles with different geometries. the masking effect between particles can be genuinely reflected when the number of elemental spheres of the sphere assembly unit is more significant than 8 (Chen et al., Citation2017). Therefore, the particle representation of 8 spheres is chosen in this study to obtain more accurate calculation results. The 3D discrete cell modeling diagram of the rock is shown in , and the distribution of the rock particle spheres is shown in .

Figure 5. 3D discrete element modeling diagram of rock.

Figure 5. 3D discrete element modeling diagram of rock.

Table 3. Distribution of rock particle spheres.

3.1.3. Simulation experiment of the ore loading process

The primary process of loading ore from the outlet of the locomotive was simulated by building a simulation model of the locomotive, the outlet, and the ore particles, as shown in . In each experiment, the weight of the ore loaded by the ore car is 10%, 20%, ……, and 100% of the total weight of the locomotive bucket. In the first experiment, the weight of ore loaded in the mine car was 10% of the total weight; in the second experiment, it was 20%, and so on, and in the tenth experiment, the weight of ore loaded in the mine car was 100% of the total weight. The final result of this simulation experiment is to provide data support for constructing the data set of intelligent monitoring models for locomotive loading.

Figure 6. Loading process simulation experiment diagram.

Figure 6. Loading process simulation experiment diagram.

3.2. Organization and classification of training data

How training data are organized and classified can significantly impact the accuracy of neural network predictions. To successfully build an intelligent monitoring model for underground unmanned electric locomotive loading, the form of data organization is crucial. In order to explore the accuracy of locomotive loading status recognition under different data organization forms, we conducted data organization based on the aforementioned locomotive loading process simulation and the loading of the locomotive hopper. During the simulation, the locomotive bucket was fully loaded and the loading status was recorded as 100%. Images of the loading status of the ore in the mine car were also recorded from various angles in the simulation system.

Based on the collected loading pictures, the training data is organized into two categories: a 10-category form with ten loading process cases, and a 2-category form with loading and non-loading cases.

The ten classifications of data organization are based on different loading process situations, with the locomotive mine car being at 100% in the entire state. This is followed by controlled loading through simulation, recording the mine car bucket at 90%, 80%, and so on, down to 10% loading volume, as shown in . In total, 741 pictures were obtained, including 72 pictures of 100% loading, 75 pictures of 90% loading, 62 pictures of 80% loading, 60 pictures of 70% loading, 84 pictures of 60% loading, 90 pictures of 50% loading, 86 pictures of 40% loading, 89 pictures of 30% loading, 68 pictures of 20% loading, and ten pictures of 10% loading, resulting in 55 pictures.

Figure 7. 10 kinds of loading diagram of mine car.

Figure 7. 10 kinds of loading diagram of mine car.

The two classification data organization form of entire and non-full situations, with the locomotive mine car whole state as 100%, at this time recorded as the first category, except the other 9 cases are all recorded as a non-full state. The total number of pictures for the full-load state is 72, and the non-full-load state is 669. A data enhancement operation is applied to the training samples to reduce the sample imbalance and solve the problem of model overfitting caused by too small a data set. The samples were preprocessed by cropping, rotated (45°, 135°), panned, scaled (1.5x and 0.5x and 0.25x), horizontal/vertical flip transform, image brightness, saturation, contrast adjustment, and random noise addition, and the dataset was made up to about 300 images per category by these data enhancement methods, as shown in .

Figure 8. Image after data enhancement.

Figure 8. Image after data enhancement.

4. Experimental principles and methods

Underground unmanned electric locomotive loading process intelligent monitoring technology is divided into three steps to design, the first step of locomotive loading three-dimensional model construction and process simulation, in the mine field research, through the observation of the locomotive loading process, the construction of the three-dimensional model of the locomotive hopper and outlet equipment, ore particles as the underground unmanned electric locomotive loading simulation of the object of the study, in order to restore as much as possible the process of loading electric locomotives, through the ore Morphology and nature of the modeling to analyze the distribution of ore flow, according to the mechanical characteristics of the construction of ore and mine outlet mechanical model.

The second step of the locomotive loading monitoring deep neural network construction and training, through the three-dimensional model of locomotive loading and the process of simulation to build the dataset, this paper carries out the monitoring of the data of the image classification, and the creation of a dedicated training database. By constructing image libraries of 2-classification and 10-classification, and preprocessing the data to obtain the mine loading simulation database, which includes image data processed by standardization and normalization.

The third step of the network design and the final model determination, in the network design, this paper applies the migration learning, by modifying the network structure and initialization to achieve the effective migration to the monitoring task. Cross entropy loss is used for the loss function, the Adam algorithm is chosen for the optimizer, and the training process is optimized by the fixed step decay method. The model evaluation relies on metrics such as accuracy, recall, true positive rate (TPR) and false positive rate (FPR). Finally, in this paper, binary (2-VGG19-El) and decile (10-Resnet50-El) monitoring models are selected in the network model determination phase. Simultaneously, the monitoring parameters, such as the learning rate (Lr) and batch size (Batch_size), were explicitly specified ().

Figure 9. Flowchart of algorithm for intelligent monitoring technology of ore loading process of unmanned underground electric locomotive.

Figure 9. Flowchart of algorithm for intelligent monitoring technology of ore loading process of unmanned underground electric locomotive.

4.1. Experimental principle

CNN have long been one of the core algorithms in image recognition and have a stable performance when learning large amounts of data (Egmont-Petersen et al., Citation2002). CNN use the deep network structure to extract high-dimensional abstract features from images for image recognition and classification. It can be used in classification recognition to extract discriminative features for learning other classifiers (Wang et al., Citation2018). It can also be extracted through unsupervised learning (Krause et al., Citation2014). The convolutional kernel of the hidden layer has shared parameters and sparsity of inter-layer connections, and the network can learn lattice point (pixel and audio) data with a small computational effort, with stable results, and without additional feature engineering (Gu et al., Citation2018). The network is stable and does not require additional feature engineering. The structure of a CNN generally consists of an input layer, a hidden layer (convolutional layer, pooling layer, fully connected layer), and an output layer, as shown in .

Figure 10. General structure of convolutional neural networks.

Figure 10. General structure of convolutional neural networks.
  1. Input layer: Gradient descent algorithm learning requires normalization of the input features in a CNN. For example, for the input pixel data, the original pixel values can be normalized from [0, 255] to [0, 1], which helps to improve the execution efficiency and learning performance of the algorithm.

  2. Convolutional layer: The convolutional layer has the function of extracting features from the input data. It consists of several convolutional kernels, each containing a set of weight coefficients (W) and a deviation term (b). Each neuron is connected to several adjacent neurons in the previous layer, and the number of connections depends on the size of the convolution kernel.

  3. Activation layer: The role of the activation layer is to enhance the representation capability of the network by applying a nonlinear activation function to the linear output of the previous layer. Common activation functions include Sigmoid, Tanh, and ReLU.

  4. Pooling layer: After the feature maps are extracted in the convolutional layer, the feature maps are filtered, and the pooling layer performs feature selection. The pooling layer uses a pooling function to calculate the statistics of the adjacent regions of each point in the feature map, which reduces the size of the data while retaining helpful information and has the property of local linear transformation invariance, thus enhancing the generalization ability of the convolutional neural network.

  5. Fully connected layer: The fully connected layer is usually located at the end of the hidden layer of the convolutional neural network and only passes signals to the other fully connected layers. The three-dimensional feature map is expanded into a one-dimensional vector and passed to the next layer via activation.

  6. Output layer: In convolutional neural networks, the output layer is usually located after the fully connected layer. The structure and working principle of the output layer are similar to that of the output layer in traditional feed-forward neural networks. For image classification problems, the output layer uses a logistic function or a normalized exponential function (Softmax function) to output the classification labels.

4.2. Migration network structure construction

ImageNet is a database that contains manually labeled categories of images used for machine vision research. Currently, there are 22,000 categories available. Every year since 2010, an ILSVRC software competition based on this dataset is held where software programs compete to correctly classify and detect objects and scenes. Competition accuracy serves as a benchmark for computer vision classification algorithms in image classification. Since 2012, CNNs and deep learning have dominated the rankings of this competition, and the classification Test Top-5 has now decreased to 2.25%. The optimized networks are named VGG19-EL (electric locomotive) and RESNET-50-EL, as shown in . Specific information about each network before modification can be found in .

Figure 11. Transfer learning method flow.

Figure 11. Transfer learning method flow.

Table 4. Transfer network information.

4.3. Experimental methods

In this paper, we employ a migration learning-based method for image recognition to identify the loading status of mine cars. The neural connection structure of the pre-training network is highly intricate. To simplify it, we present a modified pre-training network structure, as depicted in . The connection pattern of each neuron in the second and third layers mirrors that of the first neuron in each respective layer. In the final layer, we have 10 neurons, each corresponding to one of the 10 categories of mine car loading status labels. To encode these labels uniquely, we utilize one-hot coding. For instance, if the output result is [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], it signifies that the discriminative category represents 20% of the mine car loading status.

First, the weight file is loaded into the corresponding CNN to initialize the network parameters. Subsequently, the last fully connected layer of the network is adjusted to ensure that the output aligns with the number of data categories. Lastly, the training parameters within the network are optimized, allowing for the retraining of the entire network. This comprehensive process enables the accomplishment of intelligent monitoring for the electric locomotive loading process. The specific training steps can be outlined as follows:

  • Step 1: The generated samples are divided into training, validation, and test sets in a ratio of 6:2:2. The training set is utilized for network training, the validation set is employed for cross-validation to mitigate overfitting, and the test set serves as unseen data to evaluate the actual accuracy of the trained network.

  • Step 2: The migration network parameters are initialized by downloading and loading the weight files onto the corresponding networks. This initialization of migration network parameters with pre-trained weights saves training time and accelerates convergence.

  • Step 3: The last fully connected layer of the network is modified while keeping the input unchanged and adjusting the output to match the number of data types. The weights of the last layer are initialized, and the gradient descent algorithm is employed for learning. Training parameters are optimized using fixed step decay. Subsequently, the entire network is retrained to obtain the recognition model.

  • Step 4: During the training process, small batches of images from the training set are randomly selected and used for training, ensuring that all training images are utilized within one training cycle. Iterative training is performed for a specified number of cycles until convergence is achieved, resulting in the recognition model.

  • Step 5: The model’s performance is evaluated using the test set. The original Ore dataset is divided into two and ten categories for testing purposes.

4.4. Network optimization parameters setting and environment configuration

4.4.1. Locomotive loading model network optimization

The primary goal of network optimization is to enhance the model’s performance by minimizing the loss function. This paper employs Adam, a stochastic optimization algorithm incorporating two adaptive moment estimations, for network optimization. The learning rate is updated using a fixed step decay method, where the learning rate is multiplied by 0.9 every five epochs. This strategy initiates training with a higher learning rate and gradually reduces it as the number of iterations increases. It ensures that the model stabilizes in the later stages of training and approaches the optimal solution more effectively.

During the forward propagation phase of model training, a random deactivation technique called Dropout is employed. Specifically, neural network units are randomly dropped with a Dropout probability of 0.5. This approach prevents the weight of each node from becoming too large and mitigates overfitting, a common phenomenon in neural networks.

After conducting tests, it was determined that setting the learning rate (Lr) to 0.00001 for the 10-category 10-VGG19-EL network, and setting Lr to 0.0001 for the 2-category 2-VGG19-EL network, 2-category 2-RESNET50-EL network, and 10-category 10-RESNET50-EL network resulted in better convergence. The training period is set to 200 epochs, cross-entropy is utilized as the loss function, and a regularized loss function is employed to mitigate overfitting.

4.4.2. Environment configuration

The training and testing of the migration networks are conducted using the PyTorch framework, an open-source neural network framework developed by Facebook specifically for GPU-accelerated deep neural network (DNN) programming. The software environment consists of the PyTorch framework, Ubuntu, and PyCharm. Regarding the hardware environment, the system utilizes an Intel (R) Core (TM) i7-8700 CPU @ 3.20 GHz processor, 16GB RAM, and NVIDIA GeForce GTX 1060 GPU. The software environment is based on a 64-bit Ubuntu 18.04 system, CUDA 10.1, PyTorch 1.1.03, and PyCharm Professional Edition.

5. Experimental analysis and discussion

5.1. Model training and testing results

The training in this paper focuses on two main aspects. Firstly, it involves training the model to recognize two categories of mine car loading status: full or not complete. This addresses the challenge of effectively distinguishing between packed and incomplete mine car loading images. Secondly, the model is trained to recognize loading capacities at intervals of 10%, such as 10%, 20%, and so on, during the mine car loading process. This involves training the model on ten categories corresponding to loading percentages ranging from 10% to 100%. The goal of these experiments is to achieve automatic recognition of electric locomotive loading status, enabling the statistical analysis of the mine car fleet’s carrying workload.

To evaluate the effectiveness of the network training, this paper employs metrics such as Train Accuracy, Test Accuracy, and Training Loss. Train Accuracy measures the percentage of correct classifications made by the model on the training set and is calculated as shown in EquationEquation (1): (1) Ptest=ntraincorrectntrainset(1)

In EquationEquation (1), ntraincorrect represents the number of samples correctly classified by the network in the training set, and ntrainset refers to the total number of samples in the training set. As the training dataset can be extensive, calculating the overall recognition accuracy for each training round can significantly prolong the training time. To enhance efficiency, the training accuracy is computed only for randomly selected batches from the training set, under the assumption that each batch of samples follows the same distribution as the test set.

The test accuracy measures the rate at which the model produces accurate results on the test set and is defined by EquationEquation (2): (2) Ptest=ntestcorrectntestset(2)

The test accuracy is a crucial metric for assessing the effectiveness of the network, as it reflects the accuracy of the network in practical applications. Furthermore, a significant gap between the training and testing accuracy indicates potential overfitting. Overfitting occurs when the network excessively learns from the training data, leading to poor generalization performance on unseen test data and diminishing the network’s usefulness. Thus, an optimal network should aim to improve test accuracy while minimizing the gap between training and test accuracy.

The training process of migration networks involves minimizing the loss function, commonly referred to as ‘Loss.’ The loss function guides the optimization process by minimizing the loss to adjust the network parameters, enabling better fitting of the training data. The loss function calculates the mean square error (MSE), denoted as Etrain, which represents the discrepancy between the model’s predictions and the actual values on the test set. It can be calculated using EquationEquation (3): (3) Etrain=1mi(ŷtestytest)i2(3)

5.1.1. Comparative analysis of training accuracy

The training results of the two sets of trials for the migration network are depicted in and . In these figures, the training accuracy curves of the migration networks are represented by black lines, while the validation accuracy curves are shown in red. Additionally, illustrates the comparison of the loss (Loss) curves for the two migration networks.

Figure 12. TrainAcc diagram of two classification transfer networks: (a) 2-VGG19-EL-TrainAcc; (b) 2-RESNET50-EL-TrainAcc.

Figure 12. TrainAcc diagram of two classification transfer networks: (a) 2-VGG19-EL-TrainAcc; (b) 2-RESNET50-EL-TrainAcc.

Figure 13. TrainAcc diagram of ten classification transfer networks: (a) 10-VGG19-EL-TrainAcc; (b) 10-ResNet50-EL-TrainAcc.

Figure 13. TrainAcc diagram of ten classification transfer networks: (a) 10-VGG19-EL-TrainAcc; (b) 10-ResNet50-EL-TrainAcc.

Figure 14. Transfer networks training loss comparison diagram: (a) Two-classification loss curve comparison; (b) ten-classification loss curve comparison.

Figure 14. Transfer networks training loss comparison diagram: (a) Two-classification loss curve comparison; (b) ten-classification loss curve comparison.

Figure 15. Transfer networks accuracy comparison diagrams.

Figure 15. Transfer networks accuracy comparison diagrams.

From , it is evident that both networks achieve training accuracies above 99% after 200 training cycles, with the loss values falling below 1e-11. Specifically, the training data for the 2-RESNET50-EL network performs exceptionally well, reaching a training accuracy of 99.99%. However, the 2-VGG19-EL network exhibits a higher testing accuracy of 99.85%. Based on these observations, it becomes apparent that the migration networks yielding the best test results in this study do not necessarily correspond to those with the best training results. Therefore, the test results carry more practical significance.

From , it is evident that the training for the 10-classification case is also conducted for 200 cycles. Both networks achieve training accuracies exceeding 99%, with losses lower than 8e-10. Specifically, both 10-VGG19-EL and 10-RESNET50-EL networks exhibit the same training data accuracy. However, the 10-RESNET50-EL network demonstrates superior test accuracy, reaching 97.3% for the ten-classification scenario. Therefore, the 10-RESNET50-EL network outperforms in terms of test accuracy.

As depicted in , both the 2-classification and 10-classification migration networks exhibit rapid convergence within 20 training cycles. Upon comparing the testing accuracy of the 10-classification networks, specifically 10-VGG19-EL and 10-RESNET50-EL, at different training cycle intervals (0, 100, 150, and 200 cycles), it is observed that the testing accuracy gradually improves with an increase in training cycles. Considering the time cost and practical effectiveness, a training cycle of 200 is chosen.

illustrates that the disparity between the training accuracy and testing accuracy of the two migration networks with 2 classifications is noticeably smaller compared to the migration network with 10 classifications. Both the training and testing accuracies are higher for the 2-classification networks. Additionally, when comparing the same migration network, the testing accuracy for the 2 classifications is notably higher than that for the 10 classifications.

The introduction of new recognition categories has an impact on the performance of the migration network. This is because the newly added recognition categories often share more similarities with the original categories, which in turn reduces the network’s recognition accuracy.

Precision refers to the proportion of samples predicted as positive by the model that are actually positive cases; recall is the proportion of actual positive cases correctly predicted by the model among all positive cases. The F1 score is the harmonic mean of precision and recall, providing a comprehensive assessment of both metrics. TPR (True Positive Rate) represents the proportion of actual positive cases correctly predicted by the model, identical to recall. On the other hand, FPR (False Positive Rate) signifies the proportion of actual negative cases incorrectly predicted as positive cases by the model.

As shown in , for binary classification, using the same training and test sets, the 2-VGG19-EL model in Figure (a) classifies 1 unfilled image as full, achieving an accuracy of 0.9986, recall of 0.9985, F1 score of 0.9985, and FPR of 0.0015. In contrast, the 2-RESNET50-EL model in Figure (b) classifies 2 unfilled images as full, resulting in a good judgment of full, with an accuracy of 0.9972, recall of 0.9971, F1 score of 0.9971, and FPR of 0.0029. For 10-class classification, as depicted in Figure (c), the 10-VGG19-EL model has 27 prediction errors, with 6 categories having a 30% error rate, achieving an accuracy of 0.9559, recall rate of 0.9571, F1 score of 0.9561, and FPR of 0.0429. Meanwhile, the 10-RESNET50-EL model in Figure (d) has 17 judgment errors, with 4 categories having a 60% error rate, resulting in an accuracy of 0.9742, recall of 0.9734, F1 score of 0.9735, and FPR of 0.0266.

Figure 16. Transfer networks two classification and ten classification confusion matrix: (a)–(c) 2-VGG19-EL and 10-VGG19-EL; (b)–(d) 2-RESNET50-EL and 10-RESNET50-EL.

Figure 16. Transfer networks two classification and ten classification confusion matrix: (a)–(c) 2-VGG19-EL and 10-VGG19-EL; (b)–(d) 2-RESNET50-EL and 10-RESNET50-EL.

For a combined comparison, the 2-VGG19-EL model performs better in binary classification, whereas the 10-RESNET50-EL model is superior in 10-class classification.

5.1.2. Test accuracy test

In order to generate images of the locomotive loading process, a simulation is used. The test set for these images is divided into ten categories, ranging from 10% to 100% filled. The first nine categories are considered unfilled samples, with 10% to 90% being unfilled and 100% being filled. shows that the 2-VGG19-EL test has the highest average accuracy of 99.85% in the two-classification test, while the 10-RESNET50-EL test has the highest average accuracy of 97.30% in the ten-classification test, as shown in . When the images from two classes are divided into ten classes, the test accuracy decreases. This indicates that the performance of different recognition networks varies greatly depending on the recognition task. Therefore, it is important to choose the appropriate recognition network for the task at hand. Additionally, 2-VGG19-EL, which performs best in the two-classification test, has fewer network layers than 2-RESNET50-EL. This suggests that a deeper network is not always better, and the appropriate network should be chosen based on the specific circumstances.

Table 5. Network test accuracy of two-classification and three-classification recognitions.

Table 6. Network test accuracy of ten-classification recognitions.

5.2. Network optimization and determination

5.2.1. Comparison of 2-classification and 10-classification results

When comparing the results of two classifications and ten classifications using the same model and model parameters, and demonstrate that the training and validation accuracy of two classifications is higher. Additionally, the training loss of the two migration networks of two classifications is lower, and the prediction accuracy of the model of the two migration networks of two classifications is significantly higher than that of ten classifications in terms of predicting new data. Therefore, it’s recommended to prioritize classifying the locomotive loading pictures into total and unfull cases for practical applications. The training and testing accuracy of two classifications is higher with a smaller training loss than that of ten classifications.

5.2.2. Optimization strategy

Strategy one: Our optimization strategy involves widening the available data. We obtained a total of 741 pictures, with varying loading capacities. However, some categories had minimal training samples. To prevent overfitting, we enhanced the original data by performing brightness and contrast enhancements, rotating angles, and inverting images. This increased the data to five times its original size, resulting in an average of 370.5 pictures per category, as shown in .

Figure 17. Before and after data enhancement: (a) Pre-data enhancement; (b) after data enhancement.

Figure 17. Before and after data enhancement: (a) Pre-data enhancement; (b) after data enhancement.

Strategy two: To equalize the enhanced data, the fixed-step decay method is employed for both equalization processing and learning rate adjustments. The data is downsampled to achieve balance among data categories by reducing the number of samples for overrepresented categories. Additionally, the fixed-step decay method is utilized to reduce the learning rate by a factor of 0.9 every five epochs, resulting in a gradual decrease as the number of iterations increases. illustrates this process.

Figure 18. Data equalization and learning rate decline: (a) Data equalization processing. (b) Learning rate fixed step decay.

Figure 18. Data equalization and learning rate decline: (a) Data equalization processing. (b) Learning rate fixed step decay.

By implementing strategy 1, we were able to achieve a significant improvement in the average test accuracy. Additionally, the accuracy of each type of test was balanced. We also implemented strategy 2, which resulted in improved test accuracy. As the number of iterations increased, the learning rate gradually reduced, allowing the model to stay relatively high in the later stages of training. This led to a more balanced accuracy of each class of tests. Our final network model parameters were as follows: learning rate Lr = 0.0001, batch size Batch_size = 5, network model Model = RESNET50-EL, and optimization function Adam.

5.3. Discussion

  1. According to this study, using two migration networks (VGG19-EL and RESNET50-EL) produces better results for identifying locomotive mine loading. Specifically, the 2-VGG19-EL classification has a test accuracy of 99.85%, which is 0.14% higher than 2-RESNET50-EL. On the other hand, the 10-RESNET50-EL classification has a test accuracy of 97.30%, which is 1.6% higher than 10-VGG19-EL. Notably, the model performs significantly better at identifying 2 classifications than 10 classifications. Therefore, when dividing locomotive loading pictures into complete and unfull, priority should be given to the former in actual applications.

  2. This research focuses on improving the monitoring of locomotive loading processes by optimizing two classical models of convolutional neural networks: Vgg19 and Resnet50. These optimized models can effectively monitor locomotive loading and overcome the limitations of existing monitoring devices, ultimately achieving complete automation of the process. This research is significant in promoting advancements in intelligent monitoring of locomotive loading, which could ultimately lead to the liberation of workers.

  3. It is essential to acknowledge the limitations of this study. Accurate data was difficult to obtain from the mine, so the paper used simulated pictures of the locomotive loading process. While this does not fully reflect the actual situation in the mine, it can be combined with actual data from the mine for more accurate results. In future research, additional network models will be used to find a better model for intelligent monitoring of the locomotive loading process. The goal is to expand our research to achieve even more accurate monitoring in the future.

6. Conclusion

  1. This paper conducted experiments on locomotive loading recognition using different migration networks such as 2-VGG19-EL and 10-Resnet50-EL. It was found that the recognition rate of 2-VGG19-EL for recognizing 2 classes of locomotive loading was the highest at 99.85%. Furthermore, the optimized 10-Resnet50-EL migration network had the highest locomotive loading recognition rate at 97.3% when tested on 10 categories. However, it was observed that increasing the recognition classes did not improve the recognition rate of locomotive loading. It slightly reduced the effect of the original recognition network. Despite this, the application of these classifications in mines holds practical significance. Therefore, increasing recognition categories does not necessarily improve the performance of the model, and sometimes even reduces the effectiveness of the original model. Therefore, careful consideration is required when choosing a recognition network that is suitable for a specific task.

  2. Two migration networks were improved to have a high recognition rate and good generalization performance. It is significant for intelligent monitoring of the locomotive loading process. However, it was discovered during the experiment that the two migration networks with the best test results were different from those with the best training results. In practical applications, the appropriate transfer learning models should be selected according to the specific recognition tasks and performance requirements, rather than just making decisions based on the highest accuracy in the training results or test results.

  3. The 2-VGG19-EL model is known for its excellent classification performance despite having fewer network layers. Simply adding more layers does not necessarily improve recognition of motor vehicle loadings. Instead, it is necessary to select the appropriate network structure according to the specific recognition tasks and data features. To address issues of overfitting or inadequate training in classification, data augmentation and equalization of augmented data can be used, along with the fixed-step decay method, to adjust the learning rate.

  4. In this paper, the simulation is used to obtain the corresponding images, which cannot fully reflect the real situation of the mine. We plan to further combine the real data captured in the mine, compare more network models for experiments, and find a better model to realize the intelligent monitoring of the locomotive loading process, to promote our research to a higher level. Considering the uncertainty of underground lighting and environmental conditions, the team will focus on researching advanced image enhancement algorithms to address issues such as instability in underground lighting conditions, increased dust concentration and humidity that may cause blurring of image capture details. This series of extensions and in-depth research will make the algorithm in this paper more robust, better adapted to extreme underground conditions, and contribute more innovative solutions to the development of mining intelligence.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

Thanks to the following organizations who provided basic data and technique support for this research: Changsha Digital Mine Co., Ltd. Besides, the authors also gratefully acknowledge the financial support from the National Natural Science Foundation of China (52204168), Henan Key Laboratory for Green and Efficient Mining & Comprehensive Utilization of Mineral Resources (Henan Polytechnic University) (KCF2201), and the Key R&D and Promotion Projects in Henan Province (222102220027).

References

  • Baklanova, O. E., & Baklanov, M. A. (2016). Methods and algorithms of image recognition for mineral rocks in the mining industry. In Advances in Swarm Intelligence: 7th International Conference, ICSI 2016, Bali, Indonesia, June 25–30, Proceedings, Part II 7 (pp. 253–262). Springer International Publishing.
  • Changlu, D. (2020). Comprehensive application of XRT ray and image intelligent concentrator in wolframite mine. China Metal Bulletin, 1032, 187–188.
  • Chen, Y., Chu, Z., & Yu, X. (2017). Research of the influence of the particle geometry on the accumulation of repose angle based on DEM [Paper presentation]. Atlantis Press. https://doi.org/10.2991/icmia-17.2017.8
  • Dong, L., Song, W., & Fu, L. (2022). Dynamic coal quantity measurement method based on binocular vision. Coal Science and Technology, 50, 196–203.
  • Egmont-Petersen, M., de Ridder, D., & Handels, H. (2002). Image processing with neural networks—A review. Pattern Recognition, 35(10), 2279–2301. https://doi.org/10.1016/S0031-3203(01)00178-9
  • Fang, W. (2020). Research on primary selection system of black tungsten ore based on machine vision [Master thesis]. JiangXi University of Science and Technology. School of Mechanical and Electrical Engineering.
  • Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354–377. https://doi.org/10.1016/j.patcog.2017.10.013
  • Guoli, W. (2022). Discussion on the latest technical progress and problems of coal mine intelligentisation. Coal Science and Technology, 50, 1–27.
  • Jiwei, X. (2019). Research and design of black tungsten ore intelligent sorting system based on machine vision [Master thesis]. Hunan University College of Electrical and Information Engineering.
  • Krause, J., Gebru, T., Deng, J., Li, L. J., & Fei-Fei, L. (2014). Learning features and parts for fine-grained recognition. IEEE.
  • Kumar, U., Mohapatra, S., & Sahoo, P. R. (2023). Iron ore image classification using deep learning. IEEE.
  • Liguan, W., Sijia, C., Mingtao, J., & Siyu, T. (2020). Beneficiation method of wolframite image recognition based on deep learning. Chinese Journal of Nonferrous Metals, 30, 1192–1201.
  • Lin, B., Yalong, L., & Zhaohong, G. (2019). Study on the estimation of ore loading quantity of truck based on deep convolutional neural network. Gold Science and Technology, 27, 112–120.
  • Liya, Z. (2017). Mine target monitoring based on feature extraction of moving target. Journal of China Coal Society, 42, 603–610.
  • Patel, A. K., Chatterjee, S., & Gorai, A. K. (2019). Development of a machine vision system using the support vector machine regression (SVR) algorithm for the online prediction of iron ore grades. Earth Science Informatics, 12(2), 197–210. https://doi.org/10.1007/s12145-018-0370-6
  • Shunling, R., Ying, J., Caiwu, L., Qinghua, G., & Xue Fei, Z. (2021). Study on recognition technology of truck loading condition in open-pit mine based on deep convolutional features. Coal Science and Technology, 49, 167–176.
  • Sun, X., Li, X., Xiao, D., Chen, Y., and Wang B. (2021). A method of mining truck loading volume detection based on deep learning and image recognition. Sensors (Basel, Switzerland), 21(2), 635. https://doi.org/10.3390/s21020635
  • Wang, J., Xue, Y., Xiao, J., & Shi, D. (2023). Diffusion characteristics of airflow and CO in the dead-end tunnel with different ventilation parameters after tunneling blasting. ACS Omega, 8(39), 36269–36283. https://doi.org/10.1021/acsomega.3c04819
  • Wang, Y., Guo, X., & Liu, H. (2021). Design of visual detection system for large foreign body in belt conveyor. Mechanical Science and Technology, 40, 1939–1943.
  • Wang, Z., Wang, X., & Wang, G. (2018). Learning fine-grained features via a CNN tree for large-scale classification. Neurocomputing, 275, 1231–1240. https://doi.org/10.1016/j.neucom.2017.09.061
  • Weisi, S. (2021). Research on three dimensional measurement of coal quantity of main conveyor belt based on binocular vision [Master thesis]. Xi’an University of Science and Technology.
  • Xiang, J., Shi, H., Huang, X., & Chen, D. (2023). Improving graphite ore grade identification with a novel FRCNN-PGR method based on deep learning. Applied Sciences, 13(8), 5179. https://doi.org/10.3390/app13085179
  • Xu, J., Wang, E., & Zhou, R. (2020). Real-time measuring and warning of surrounding rock dynamic deformation and failure in deep roadway based on machine vision method. Measurement, 149, 107028. https://doi.org/10.1016/j.measurement.2019.107028
  • Yi, G., Fuji, W., Yi, Z., Jianqiang, O. Y., & Wenlong, Y. (2021). Research on intelligent cleaning device for bottom of tipping-type car based on image recognition technology. China Tungsten Industry, 36, 76–80.
  • Yong, Y. (2023). Research on abnormal action recognition of miner based on deep learning [Master thesis]. China University of Mining and Technology.
  • Yu Peng, Z., Fu Ji, W., & Yi, G. (2022). Mine-loading measurement system of underground locomotive based on image recognition. Gold Science and Technology, 30, 131–140.
  • Zheng, K., Du, C., Li, J., Qiu, B., & Yang, D. (2015). Underground pneumatic separation of coal and gangue with large size (≥ 50 mm) in green mining based on the machine vision system. Powder Technology, 278, 223–233. https://doi.org/10.1016/j.powtec.2015.03.027