Full article: Non-destructive prediction of hazelnut and hazelnut kernel deformation energy using machine learning techniques

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

The hazelnut possesses a significant economic value and is extensively consumed on a global scale. Physico-mechanical properties such as linear dimensions, deformation, force, stress, and energy play an important role in the processing of hazelnut and hazelnut kernels, quality assessment, and the development of harvesting and post-harvest technologies. The data used in the data set was determined by applying compression tests and artificial neural networks, support vector regression, and multiple linear regression methods were applied to the data obtained. The aim of the study ws to determine the deformation energy of hazelnuts and hazelnut kernels based on some mechanical properties of hazelnuts using nondestructive machine learning methods instead of traditional measurement methods with minimum error, minimum labor, and in the shortest time. The average R² for kernels and hazelnuts was ANN 95.2%, SVR 89.6%, and MLR 86.1%. The average MSE for kernels and hazelnuts was ANN 0.006, SVR 0.012, and MLR 0.072. The machine learning methods used in the study provided results close to the ideal statistical metrics. According to the analyses of the machine learning methods, results similar to the optimal statistical metrics were obtained. The most successful and least-error methods were the artificial neural network, support vector regression and multiple linear regression, respectively.

KEYWORDS:

Introduction

Hazelnuts are nuts of the hazel tree and thus include all nuts derived from species of the genus Corylus, especially nuts of the Hazel species, also known as cobnut or hazelnut, depending on the species. They have an outer fibrous husk surrounding a smooth shell, and the cob is approximately spherical or oval, roughly 15–25 mm in length and 10–15 mm in diameter.^[Citation1] Hazelnut production began 2300 years ago on Turkey’s Black Sea coast, according to historical records, and it is well known that Turkey has been exporting hazelnuts to other nations for the past six centuries. A total of 75% of the world’s hazelnut output and 70% to 75% of exports come from Turkey, one of the few nations in the world with ideal weather for the crop.^[Citation2] The Food and Agriculture Organization reports that there are 966,196 hectares of hazelnut cultivation worldwide, with Turkey producing 75.4% of the world’s hazelnuts, Italy 8.1% of them, and Azerbaijan 4.0%. In the USA, it is 1.8%. Consequently, hazelnut is an important commercial commodity, and Turkey is the world’s greatest producer, accounting for 69% of the world’s total hazelnut production, which is estimated to be 1125 thousand tons.^[Citation3] Around 80% of the world’s hazelnut exports come from Turkey, which makes it the leading exporter of these nuts.^[Citation4]

For designing and adjusting the equipment used for harvesting, sorting, cleaning, transporting, and storing hazelnuts, knowledge of the physico-mechanical properties of the nuts as a function of moisture content is crucial. These properties are also essential during the processing of hazelnuts.^{[Citation5,Citation6]} Physical properties such as moisture content, length, width, and thickness, and mechanical properties such as deformation, stress, strain, deformation energy and rupture force are vital to engineers handling hazelnuts.^{[Citation2,Citation7]} The majority of the mechanical damage to hazelnuts, regardless of whether they are shelled or kerneled, is caused by forces both static and dynamic, which are mostly external to the product. However, there are a number of internal forces that can damage a product mechanically, including factors such as temperature variation and moisture content, and when the product suffers from mechanical damage, it becomes susceptible to spreading diseases and infections.^[Citation8] The breaking force is important and indicates a significant failure. In the mechanical processing of fruit, most of the damage occurs in harvesting and threshing, as well as in mechanical handling and other equipment. Dynamic forces during fruit transportation and handling cause by far the most rot damage. The evaluation of the mechanical properties of hazelnuts (whole fruit, shell, and kernel) has been developed in recent years with the aim of achieving industrial processes and improving the utilization of hazelnuts as a food ingredient. The experimental characterization of shells and kernels is a challenging issue to improve the quality of the final product. Some mechanical properties of chemically peeled hazelnuts, such as hardness and stiffness, are used to study an original industrial process to improve the removal of the inner shell. The mechanical characterization of the whole hazelnut, kernel and shell, is also determined to aid in the design and construction of selection machines. The physico-mechanical properties play a very significant role in preventing the kernels from being deformed during impact, allowing the hazelnuts to crack and release the kernels from their tight spaces. It is expected that these properties will be useful for determining the quality of the nuts derived from seed and for characterizing them, particularly if the hazelnuts are processed into other useful products such as edible oils. Moreover, the acquired information will prove valuable in the development of machinery utilized in kernel extraction and post-harvest procedures.^[Citation9]

For the reasons pointed out above, it is crucial to determine the physico-mechanical properties of both shelled hazelnuts and their kernels, and many researchers have carried out studies on this subject to the present day. Selvi et al.^[Citation2] determined the various engineering properties of two hazelnut varieties (Palaz and Çakıldak varieties) and compared them in terms of linear dimensions: mass, sphericity, surface area, projection area, and actual and bulk densities: porosity, stand angle, shell ratio, terminal velocity, breaking force, energy, deformation, and drag coefficient. They reported that these properties are essential for the design of many pieces of equipment for harvesting, processing, transportation, sorting, and packaging. Kabas^[Citation6] performed a compression simulation for the “Tombul” hazelnut variety, and some cracking properties were determined using experimental and finite element analysis. Aydın^[Citation10] evaluated various physical properties of hazelnuts and kernels as a function of moisture content. In his study, the mean length, width, thickness, geometric mean diameter, sphericity, unit mass, and volume of hazelnuts were found to be 18.03, 18.97, 16.58, 17.83 mm, 97.58%, 2.41 g, and 1.92 cm3, respectively. Alasalvar et al.^[Citation11] compared natural (raw) and roasted hazelnuts for differences in volatile compounds and sensory responses. A total of 79 compounds were detected in the hazelnuts, of which 39 (27 positive, 5 tentative, and 7 unknown) were detected in raw hazelnuts and 71 (40 positive, 14 tentative, and 17 unknown) in roasted hazelnuts. Demir and Cronin,^[Citation12] using instrumental analysis, investigated the texture changes of dry roasted hazelnuts at temperatures ranging from 120 to 180°C for periods of time ranging from 5 to 60 min. They reported that Young’s modulus and the fracture stress for roasted hazelnuts varied by 4.93 and 1.54 MPa, respectively. Kibar and Öztürk,^[Citation13] Ercisli et al.,^[Citation14] and Maleki and Milani^[Citation15] conducted studies on the determination of some of the physical properties of hazelnuts and hazelnut kernels such as length, width, thickness, specific gravity, porosity, etc. Ghirardello et al.^[Citation16] evaluated the effect of different storage conditions currently used in the industry on the chemical, physical, and sensory properties of “Tonda Gentile delle Langhe” hazelnuts during one year of storage. At the end of the study, they recommended the use of a modified atmosphere for long-term storage. Delprete and Sesana,^[Citation17] Chengmao et al.,^[Citation18] and Firouzi^[Citation19] studied the mechanical properties of hazelnuts and hazelnut kernels under various conditions. Guiné et al.^[Citation20] studied some physical and chemical properties of hazelnuts under specific temperature, relative humidity, and packing conditions to investigate the effects of storage. They determined that for good hazelnut preservation, the LDPE (low-density polyethylene) packaging type should be chosen, and storage conditions should be at room temperature, or alternatively, refrigeration or freezing. Giacosa et al.^[Citation21] compared different texture test conditions for the evaluation of significant mechanical and acoustic properties of raw and roasted hazelnut (Corylus avellana L.) kernels cv. Tonda Gentile Trilobata (TGT). The study compared combinations of compression and shear tests, test speed (0.2, 1.0, 10.0 mms⁻¹), and analyzed the axis (x, y, z). Bohnhoff et al.^[Citation22] determined the physical properties, such as size and mass, and mechanical properties, such as breaking force, breaking energy, and hardness, at different moisture contents of F 1 hybrid hazelnuts grown in Wisconsin. These studies attempted to evaluate the physico-mechanical properties of hazelnuts in relation to various moisture contents, storage techniques, variety, packaging types, loading forces, and loading directions.

Another common aspect of these studies is that they were all conducted under laboratory conditions. As can be seen, the mechanical properties of hazelnuts vary depending on many factors. The determination of mechanical properties, which depend on so many different variables under laboratory conditions, is very costly, time-consuming, and requires an intensive labor force. In addition, since the tests are destructive, many products are wasted during the experiments. Unconventional, nondestructive approaches can be employed in place of experimental procedures to precisely establish these desirable attributes at a time when economy, energy, labor, and time are very critical.

Machine learning is a sub-branch of artificial intelligence that can make decisions about events that may occur in the future, using the information it has previously acquired, similar to the information it has learnt. It consists of computer algorithms that can self-learn, analyze complex processes, predict, and classify. Machine learning tries to find the most appropriate model for new data based on past data.

Machine learning is used in this study to accurately estimate the deformation energy of both the hazelnut (Corylus avellana L.) and its kernel according to its physico-mechanical parameters. By taking various inputs and network architectures into account, it is intended to establish the most correct model using machine learning models. The findings can be utilized as a useful tool to reduce the harvest and post-harvest losses of hazelnuts and their kernels and to gather the information re-quired for improving current processing systems and designing the appropriate equipment.

In this study, deformation energy estimations of hazelnuts and kernels were carried out using machine learning methods. Support vector regression, artificial neural networks, and multiple linear regression were the three supervised machine learning techniques used in the study. To evaluate and analyze the various approaches, R² and MSE metrics were computed. The results of the study show that deformation energy estimation of hazelnuts and hazelnut kernels is successful. It is shown that these pre-dictions can be realized in the future by using the methods and models in the study.

Materials and methods

The randomly chosen Tombul hazelnut cultivar was used for the tests in this study. The experiments were run in four replications, with 30 hazelnuts and kernels for each replication. Samples from the 2022 harvest season were collected from various hazelnut farms in Giresun. As soon as the hazelnuts were purchased, the experiments were conducted. Until the analysis process started, the samples were kept in a refrigerator. All physico-mechanical tests for data collection were conducted in the laboratories of the Akdeniz University Vocational School of Technical Sciences at a room temperature of 20–21°C.

Data set

The initial moisture content of hazelnuts and kernels was determined using a standard method and was found to vary between 3.18–6.40% and 4.51–7.65% db (db = dry basis), respectively.^{[Citation23,Citation24]} Physical properties of hazelnuts and kernels were determined using the following method^{[Citation5,Citation6]}: linear dimensions, i.e., length (L), thickness (T), and width (W) were measured with a Vernier caliper with a sensitivity of 0.01 mm ().

Figure 1. Three dimensions of hazelnut.

Using a biological material test instrument, the mechanical properties of hazelnuts and their kernels were determined. In the preliminary testing, hazelnuts and kernels were crushed between two parallel plates at a constant pace of 10 mmmin⁻¹ and a force-deformation curve for each sample was obtained ().

Figure 2. Biological material test instrument.

The force-deformation curve, which shows a sharp reduction in force, was used to calculate the rupture force (N) and deformation (mm). The ratio of the force applied to the specimen to its cross-sectional area was used to calculate the rupture stress (Nmm⁻²). By measuring the area under the force-deformation curves, the energy absorbed in the rupture site was calculated from the diagram.^{[Citation2,Citation6]}

Machine learning methods

Machine learning enables a computer to learn without human support and assistance. Machine learning is a field of study in which science and technology are used together to develop various algorithms and techniques to enable computers to learn in a similar way as humans. Machine learning refers to the ability of a system to acquire and integrate knowledge through large-scale observations and to improve and extend itself by learning new knowledge rather than being programmed with that knowledge. Machine learning contains supervised, unsupervised, and reinforcement learning models. In supervised learning, there is “label” information in the training data. In other words, data with known results are used in model building. In this way, predicting the results of the data without label information in the dataset based on the model created is aimed for. In unsupervised learning, the training set contains no label information. Based on the elements in the data set, the model aims to discover hidden relationships or groupings. In reinforcement learning, artificial intelligence is based on prior information and guidance. There is a definite cause-and-effect relationship. Although there is an instructor in reinforcement learning, unlike in supervised learning, he or she cannot or does not provide the system with as much detail. Instead, when the learning system makes a decision, the instructor rewards it for the correct ones and punishes it for the wrong ones.^[Citation25]

Multiple Linear Regression (MLR): A statistical method called regression is used to assess the nature and degree of the connection between a group of independent variables and a dependent variable. The variables must be separated into dependent and independent variables for regression analysis. The variable that the independent variable(s) attempts to explain is known as the dependent variable. The best suitable line is used in linear regression to determine the linear connection between two variables. Linear regression is expressed graphically using a straight line with a slope that describes how a change in one variable affects a change in the other.^[Citation26]

MLR is a method for working out how much of an impact these independent variables have on a variable and for explaining the links between cause and effect among at least two independent factors that affect a variable in a model. Identifying a connection between variables in data sets that depend on many factors and whose dependent variable shows linear growth is the goal of multiple linear regression.^[Citation27]

(1)

Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{n} X_{n} + e

(1)

where X is the dependent variable, Y is the independent variable, N is the number of variables, β₀ is the intercept point of the regression curve on the y-axis, β₁ is the coefficient of the first prediction variable X₁, and e is the error term. In multiple linear regression, the impact of each independent variable on the dependent variable differs to varying degrees. Consequently, it is not necessary for each variable’s coefficient to be the same.^[Citation28]

Support Vector Regression: Support vector machines (SVM) were created by Vladimir Vapnik.^[Citation29] SVM are used to categorize the data as well as to forecast outcomes based on the different types of problems. Because of its incredibly successful methodology for resolving nonlinear issues, it is being utilized more and more in research and industry. SVM is frequently utilized in many different types of analysis, including nonlinear function approximation, regression, and classification.^[Citation30] SVR operates in accordance with the fundamental ideas of support vector machines. By incorporating the loss function into SVM, it may be used to address regression issues. In regression estimation, there are input and output data. With regression estimation, it is estimated how much the input affects the output.^[Citation31]

A classification is carried out using the SVM algorithm in accordance with the logic of identifying the hyperplane dividing two or more classes. To distinguish the classes from one another, a sizable number of planes can be identified. For this, decision boundaries, or, in other words, hyperplanes, are determined. In order to verify that the interval we design includes the largest point, we use support vector regression.^[Citation32] There are two types of support vector regression: linear and non-linear. Linear support vector regression:

(2)

f (x) = w * x_{i} + b

(2)

where f(x) is the output vector, w is the weight vector, x is the input vector in n-dimensional space, and b is the deviation. shows linear support vector regression.Not all data sets are acceptable for a linear regression model. As with the nonlinear SVM, the examples are therefore transferred to a higher dimensional space. The kernel approach is used when moving the data to high dimensional space. When the SVR model is applied together with the kernel method, we draw a non-linear interval. The kernel method increases machine learning in nonlinear data at a high rate. Radial Basis Function (RBF) and polynomial algorithms are the most often employed kernel techniques.^[Citation33]

Figure 3. Linear support vector regression.

Artificial Neural Networks: Artificial neural networks (ANNs) are computational systems developed for use in artificial intelligence applications inspired by the way human brains process information. They are intended to emulate this learning and adaption process, which is inspired by the notion that the human brain can do so depending on experience.^[Citation34] An artificial neural network (ANN) is made up of an output layer, one or more hidden layers, and an input layer with many input nodes.^[Citation35] The ANN structure is shown in .

Figure 4. The ANN structure.

Neural networks come in many forms and are used by many different sectors. The architecture of the network (number of layers and nodes), the transfer functions between layers, feature selection (i.e., which input variables to consider), and the choice of training algorithm to maximize prediction performances are significant determinants of an ANN model’s prediction accuracy.^[Citation36]

ANNs are made up of several linked artificial neurons, which were modeled after brain neurons. The information entering the network through the input layer is received by these artificial neurons. After that, the data is sent to any number of hidden levels to be processed. The processed data is delivered to the output layer, which then communicates the results. Weights, or numerical values that determine the strength of the connection between neurons, are used to connect artificial neurons in an ANN. The importance of the information reaching the neuron for the issue and its impact on the neuron are indicated by weights. A significant weight value denotes a more significant weight input, whereas a lower weight value denotes a less significant input value. If the weight value is zero, it indicates that the input has no effect on the neuron. The data coming to the ANN is multiplied by the weight of the connection it has before it is transmitted to the nucleus. This situation allows for the adjustment of the input’s impact on the computed output. As the network learns and adapts, the weights of the connections are changed during training in accordance with the input data and the intended output.^[Citation37]

The summing function is used to obtain the cell’s net input after multiplying all numerical inputs by their weights and adding them all together.

(3)

net = \sum_{i = 1}^{n} w_{i} x_{i} + b

(3)

The activation function analyses the net numerical value that results from the addition operation to calculate the output of the neuron in response to this input. The most important feature in the selection of the activation function is that the function to be selected must be differentiable. Examples of activation functions are ReLU, Sigmoid, and Tanh.^[Citation38]

The network design of artificial neural networks is split into two categories: feed-forward and feedback. In feed-forward networks, information only follows one path from the input to the output. Weights are updated to lessen the mistakes determined by feedback networks. For this, the outputs are entered into the error function and the weights are updated by back-propagating the errors.^[Citation39]

Peculiarity of Methods

Multiple linear regression is a statistical technique that is easily implemented for computing with software. Linear regression is a data analysis technique that predicts the value of unknown data using another related and known value. Linear regression models are a simple method to implement and provide an easy-to-interpret mathematical formula for generating predictions. At its core, a simple linear regression technique attempts to plot a line graph between two data variables, x and y. As the independent variable, x is plotted along the horizontal axis.

Support vector machines draw a line to separate points placed on a plane. It aims to maximize the distance of this line for points in both classes. It is complex but suitable for small to medium data sets. SVM provides computational simplicity and is good at scalability and robustness to outliers. It performs well in classification and regression problems even with a small number of training data points and a large number of features. In addition, there is no upper limit to the number of data points used; it is also suitable for ultra-large data sets.

The ANN methodology has many important features, such as learning from data, generalizing, and working with an unlimited number of variables. It consists of many cells, and these cells work simultaneously to perform complex tasks. ANN can provide linear and non-linear modeling without the need for any prior knowledge between input and output variables. They have fault tolerance. They can work with incomplete or uncertain information. They show graceful degradation in faulty situations. They can work in parallel and process real-time information. Although there are many studies by many researchers on the use of ANN as a predictive tool, there is no definite judgment on what are the key factors affecting the performance of ANN. In addition to these factors, the training algorithm, the organization of the dataset, and the length of the prediction period are also considered to be effective on ANN performance.

Results and discussion

The input of the models are moisture content, width, length, thickness, rupture force, deformation, rupture stress; the output of the models is deformation energy. The physico-mechanical properties, standard deviation, and standard error values of hazelnuts and hazelnut kernels determined as a result of the measurements and tests are given in . In this study, the deformation energy of the hazelnut and kernel was estimated using support vector regression, an artificial neural network, and multiple linear regression techniques. The working system is shown in . The dependent variable of the study, the hazelnut and kernel deformation energy, was predicted using seven independent factors. Moisture, breadth, length, thickness, fracture resistance, deforestation, and stress are the independent variables of the study. Every independent variable has 30 data points.

Figure 5. Flowchart.

Table 1. Physico-mechanical properties of hazelnut.

Download CSV Display Table

Table 2. Physico-mechanical properties of kernel.

Download CSV Display Table

The data utilized in the study had different value ranges, so they were reduced using a normalization approach to a particular numerical band. For this study, the decimal scaling method was used for normalization operations. In this normalization procedure, the existing values must be divided by multiples of 10 in order to make the data set values smaller than 1. In normalization via decimal scaling, the decimal portion of the values of the variable under consideration is moved to achieve normalization. The highest absolute value of the variable determines how many decimal points need to be moved.^[Citation40] The decimal scaling normalization formula is as follows:

(4)

A^{I} = \frac{A_{i}}{10^{j}}

(4)

here A^I is the normalized data, A_i is the value to be normalized, and j is the value that makes A^I less than 1. After normalization, the data set is divided into a training set and a test set before modeling or prediction. This allows us to train our model on the training set and make predictions on the test set. The training set is the data set on which the model is trained. The test set is a dataset used to evaluate the model developed on the training set. The larger the training set, the better the model will learn. The larger the test set, the better and more reliable the evaluation metrics will be. There is no exact method for separating the training and test datasets. Therefore, in this study, the data set was tested at various ratios using the trial-and-error method, and the most successful result was obtained for 70% training and 30% testing.^{[Citation41,Citation42]} The separation of data into training and testing is random. Therefore, the model operates on different data each time it is run. Randomization of the data increases the success, performance, and accuracy of the models.

To evaluate the machine learning techniques, the coefficient of determination (R²) and mean square error (MSE) metrics were calculated. The greatest indicator of how well the linear model fits the data is the coefficient of determination (R²). The R² displays how well independent variables are explained. In other words, R² indicates the correlation between the dependent variable and at least one independent variable. The total R² makes it possible to calculate the percentage impact of the independent variables on the change in the dependent variable. It is a good indicator of the explanatory power of the regression model. Independent variables can entirely clarify dependent variables when the R² is one, but they cannot do it at all when the R² is zero. Here, zero denotes a model’s explanatory power of 0%, and one denotes a model’s explanatory power of 100%. Expected values for R² are at least 0.40.^{[Citation43,Citation44]}

(5)

R^{2} = 1 - \frac{Unexplained Variation}{Total Variation}

(5)

The MSE is a metric of a machine learning model that indicates the error of the estimator. The MSE is squared by measuring the distance between the real data and the predicted functions. These values are added together and split by the total number of data points. MSE squares the differences, making it more sensitive and emphasizing outlier observations. It is always positive. Less error occurs in algorithms with a MSE near zero.^[Citation45]

(6)

MSE = \frac{1}{n} \sum_{i = 1}^{n} e_{i}^{2}

(6)

where e is the error and n are the data count. In this study, SVR, ANNs, and MLR were used for the deformation energy estimation of hazelnuts and kernels. A feedback learning model with seven input neurons and one output neuron has been built using the artificial neural network method. Two hidden layers were used as a consequence of trial-and-error techniques since they produced the best results. Each hidden layer has four neurons.

Sigmoid, hyperbolic tangent, and sine activation functions can be used between ANN layers. The sigmoid activation function produces values between 0 and 1, while the hyperbolic tangent and sine activation functions produce values between −1 and 1.^[Citation46] The sigmoid function was chosen after experimenting with a variety of activation functions since it produced the best results. The sigmoid activation function is given in Equation 7.

(7)

F (x) = \frac{1}{1 + e^{- x}}

(7)

For the backpropagation of the error, a feedback ANN, which displays nonlinear dynamic behavior, was preferred. The experiments led to the determination that 100 iterations of the model would be adequate in terms of time and success.

In the second method used in the study, nonlinear SVR was preferred. The radial basis kernel function (RBF) was used. In addition, the RBF sigma value was configured to 0, while the overlapping penalty value of the model was set to 100. To evaluate how variables affect the system in multiple linear regression, a significance value was initially established. When p > SL occurred, the variable with the greatest p-value (probability value) was eliminated from the system. After rebuilding the model, this process was repeated. The elimination was complete when all variables had PSL. The model was determined to be significant since there were no independent values below 0.05 for p values in the model.

shows the evaluation of the hazelnut and kernel models. If the independent variables can all fully describe the dependent variable and produce a linear curve, the R² is 100%. The R² for kernel is 87.2% for multiple linear regression, 91.1% for support vector regression, and 96.8% for an artificial neural network. For hazelnut, the R² is 85% for multiple linear regression, 88.1% for support vector regression, and 93.6% for artificial neural networks. All of the models may be shown to fall inside the allowed value range. Artificial neural networks have the greatest ability to explain the dependent variable among various techniques. R² evaluation of the models is shown in .

Figure 6. R² evaluation of the models.

Table 3. Analysis of the models.

Download CSV Display Table

The MSE shows the amount of error made in artificial intelligence techniques without estimating the actual values. The MSE for kernel is 0.054 for multiple linear regression, 0.011 for support vector regression, and 0.005 for artificial neural network. The MSE for hazelnut is 0.089 for multiple linear regression, 0.013 for support vector regression, and 0.007 for an artificial neural network. The error rates of the models were shown to be quite close to the optimum value. It is clear that the artificial neural networks are the model with the least inaccuracy. The MSE evaluation of the models is shown in .

Figure 7. MSE evaluation of the models.

When evaluated according to their failure and performance values, ANNs, SVR, and MLR have been seen to be the best-performing and lowest-failure models, respectively. A scatter plot, which is one of the types of graphs in which numerical data from two variables is represented by a point, visualizes the relationship between the data. Each point on the graph represents an observation. A scatter plot uses a collection of points placed using Cartesian coordinates to display the values of two variables. By displaying one variable on each axis, one can determine whether a relationship or correlation exists between two variables. These figures are the result of the test data. The scatter plots for the machine learning models are displayed in .

Figure 8. Scatter plot of machine learning models.

shows the scatter plot of the methods utilized to forecast the deformation energy of the kernel. The deformation energy of the kernel and the expected outcomes are positively correlated in all models. Additionally, the connection is fairly strong. When the value of one of the variables rises, which simultaneously raises the value of both the variables, the dots cluster close to the axis. shows the scatter plots of the hazelnut.

A line plot is a type of graph that arises by marking the values of continuous data on the horizontal and vertical axis and connecting these marked points with straight lines. Line plot are used to show ongoing or “numerical” changes in a certain time interval. The line plot is used to show the changes in the existing data over a period of time. It is used to show these data changes visually in an easier and more understandable form. When line plot are grouped, they show trends and relationships between data. shows the line plots of the kernel.

shows the relationship between actual values and predicted values for the kernel. The figures show that the relationship is strong for all three models. When the figures are analyzed in detail, it is seen that the model with the strongest relationship is artificial neural networks. Then support vector regression and multiple linear regression are seen, respectively.

Figure 9. Line plot of machine learning models.

Conclusion

The physico-mechanical properties of hazelnuts and kernels are crucial factors that need to be considered during the harvesting and post-harvesting processes. These parameters are rather significant for the design and adjustment of machines used in numerous processes, including harvesting, threshing, crushing, etc. In order to determine these properties, it is necessary to measure a large number of samples over an extended period of time and to carry out a variety of trials over that period of time. An extensive number of samples must be measured at the same time and there is an extreme amount of time and effort involved. Furthermore, during this process, there is also a possibility that measurement errors will occur. Larger data sets, traits, and methods that might be utilized in future research may be developed together with more precise and rapid outcomes for industrial applications like discriminating, classification, and forecasting.

The deformation energy of nuts and kernels was successfully determined in this work using machine learning approaches. The dependent variable of the study, the hazelnut and kernel deformation energy, was predicted using seven independent factors. Moisture, breadth, length, thickness, fracture resistance, deforestation, and stress are the study’s independent variables. Every independent variable has 30 data points. Support vector regression, artificial neural networks, and multiple linear regression were the three supervised machine learning techniques used in the study. To evaluate and analyze various approaches, R², MAE, and MSE metrics were computed. The R² for kernels is 87.2% for multiple linear regression, 91.1% for support vector regression, and 96.8% for the artificial neural network. For hazelnuts, the R² is 85% for multiple linear regression, 88.1% for support vector regression, and 93.6% for the artificial neural networks. The MSE for kernels is 0.054 for multiple linear regression, 0.011 for support vector regression, and 0.005 for artificial neural network. The MSE for hazelnuts is 0.089 for multiple linear regression, 0.013 for support vector regression, and 0.007 for artificial neural network. The results of the study show that all three models have a high rate of achievement, and their error levels are within acceptable limits. It may be demonstrated from the research in previous work that the model outperforms studies of similar kinds. When the results of the study are compared with the literature, it is seen that the developed models are more successful in predicting the deformation energy of nuts and kernels. The study differs from similar studies in terms of the number of data points, accuracy rate, and fast response time. Due to these advantages, it is a machine learning application area that can be widely used in practice. Future studies are planned to test the success of the study by using other machine learning methods, such as deep learning. It is planned to increase the size of the data set used in the model and test its effect on the success of the model. The developed machine learning models can be adapted to other foods in the future.

Abbreviation

A^I	=	Normalized data
A_i	=	Value to be normalized.
ANN	=	Artificial Neural Networks
β₀	=	Intercept point of the regression curve on the y-axis
β₁	=	Coefficient of the first prediction variable X₁
e	=	Error term
f(x)	=	Output vector
j	=	Value that makes A^I less than 1
L	=	Length
LDPE	=	Low-density polyethylene
MLR	=	Multiple linear regression
Mm	=	Millimetre
MSE	=	Mean squared error.
N	=	Newton
p-value	=	Probability value
R²	=	Coefficient of determination
RBF	=	Radial basis function
ReLU	=	Rectified linear unit
SVM	=	Support vector machines
SVR	=	Support vector regression
T	=	Thickness
USA	=	United States of America
W	=	Width
X	=	Dependent variable
Y	=	Independent variable

Highlights

Deformation energy estimation of hazelnut and kernel were carried out using machine learning methods.
Moisture content, width, length, thickness, rupture force, deformation and rupture stress were used as independent variables in the data set.
Three different machine learning methods were used.
Statistical metrics R² and MSE were used to evaluate the performance of the methods.
The best performing methods are artificial neural network, support vector regression and multiple linear regression, respectively.

Acknowledgement

This study was supported by the Scientific Research Fund of National University of Science and Technology Politehnica Bucharest, Bucharest, Romania and the Akdeniz University, Antalya, Turkey.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The work was supported by the This research was funded by National University of Science and Technology Politehnica Bucharest through PubArt program.

References

Uzundumlu, A. S.; Kurtoğlu, S.; Şerefoğlu, C. The Role of Turkey in the World Hazelnut Production and Exporting. Emir. J. Food Agric. 2022, 34, 117–127. DOI: 10.9755/ejfa.2022.v34.i2.2810.
Google Scholar
Selvi, K. C.; Yesiloğlu, E.; Sauk, H. Engineering Properties of Two Hazelnuts Varieties and Its Kernel Relation to Harvest and Threshing. Ital. J. Food Sci. 2020, 32, 528–539. DOI: 10.14674/IJFS-1666.
Web of Science ®Google Scholar
FAOSTAT. Food and Agriculture Organization of the United Natıons, Crops And Livestock Product. 2021. Available online: https://www.fao.org/faostat/en/#data/QCL (accessed on December 23 2022).
Google Scholar
Türkten, H.; Yıldırım, Ç.; Boz, İ. Factors Influencing the Adoption of Pressurized Irrigation Systems in Hazelnut Production and Its Effect on the Water Footprint in the Çarşamba District of Samsun. Erwerbs-Obstbau. 2022, 65(4), 775–783. DOI: 10.1007/s10341-022-00754-y.
Web of Science ®Google Scholar
Mohsenin, N. N. Physical Properties of Plant and Animal Materials; Gordon and Breach Press: New York, NY, USA, 1986, p. 742. DOI: 10.1002/food.19870310724.
Google Scholar
Kabas, O. Cracking Simulation of Hazelnut Shell Using Finite Element Method. Mitteilungen Klosterneubg. 2020, 70, 148–156.
Web of Science ®Google Scholar
Sitkei, G. Mechanics of Agricultural Materials; Budapest: Akademiai Kiado, 1986; p. 485.
Google Scholar
Sunmonu, M. O.; Iyanda, M. O.; Odewole, M. M.; Moshood, A. N. Determination of Some Mechanical Properties of Almond Seed Related to Design of Food Processing Machines. Niger. J. Technol. Dev. 2015, 12(1), 22–26. DOI: 10.4314/njtd.v12i1.5.
Google Scholar
Drees, A. M.; Ibrahim, M. M.; Aboegela, M. A. Design, Construction and Performance Evaluation of an Almond Kernel Extraction Machine. Agric. Eng. Int. CIGR J. 2017, 19, 133–144.
Google Scholar
Aydın, C. PH—Postharvest Technology. Biosyst. Eng. 2002, 82(3), 297–303. DOI: 10.1006/bioe.2002.006.
Web of Science ®Google Scholar
Alasalvar, C.; Shahidi, F.; Cadwallader, K. R. Comparison of Natural and Roasted Turkish Tombul Hazelnut (Corylus Avellana L.) Volatiles and Flavor by DHA/GC/MS and Descriptive Sensory Analysis. J. Agric. Food. Chem. 2003, 51, 5067–5072. DOI: 10.1021/jf0300846.
PubMed Web of Science ®Google Scholar
Demir, A. D.; Cronin, K. The Thermal Kinetics of Texture Change and the Analysis of Texture Variability for Raw and Roasted Hazelnuts. Int. J. Food Sci. Technol. 2004, 39, 371–383. DOI: 10.1111/j.1365-2621.2004.00796.x.
Web of Science ®Google Scholar
Kibar, H.; Öztürk, T. The Effect of Moisture Content on the Physico-Mechanical Properties of Some Hazelnut Varieties. J. Stored Prod. Res. 2009, 45, 14–18. DOI: 10.1016/j.jspr.2008.06.005.
Web of Science ®Google Scholar
Ercisli, S.; Ozturk, I.; Kara, M.; Kalkan, F.; Seker, H.; Duyar, O.; Erturk, Y. Physical Properties of Hazelnuts. Int. Agrophysics. 2011, 25, 115–121.
Web of Science ®Google Scholar
Maleki, G.; Milani, J.; Motamedzadegan, A. Some Physical Properties of Azarbayejani Hazelnut and Its Kernel. Int. J. Food Eng. 2013, 9, 135–140. DOI: 10.1515/ijfe-2012-0162.
Web of Science ®Google Scholar
Ghirardello, D.; Contessa, C.; Valentini, N.; Zeppa, G.; Rolle, L.; Gerbi, V.; Botta, R. Effect of Storage Conditions on Chemical and Physical Characteristics of Hazelnut (Corylus Avellana L.). Postharvest. Biol. Technol. 2013, 81, 37–43. DOI: 10.1016/j.postharvbio.2013.02.014.
Web of Science ®Google Scholar
Delprete, C.; Sesana, R. Mechanical Characterization of Kernel and Shell of Hazelnuts: Proposal of an Experimental Procedure. J. Food Eng. 2014, 124, 28–34. DOI: 10.1016/j.jfoodeng.2013.09.027.
Web of Science ®Google Scholar
Chengmao, C.; Si, S.; Ran, D.; Bing, L.; Shuo, W. Experimental study on mechanical characteristics of nut rupturing under impact loading. Int. J. Agric. Biol. Eng. 2017, 10, 53–60. DOI: 10.3965/j.ijabe.20171001.2331.
Web of Science ®Google Scholar
Firouzi, S. Physical, Mechanical and Nutritional Properties of Hazelnut (A Case Study: Cultivars of North of Iran). J. Agric. Eng. 2016, 39, 93–112. DOI: 10.22055/agen.2016.12277.
Google Scholar
Guiné, R. P.; Almeida, C. F.; Correia, P. M. Influence of Packaging and Storage on Some Properties of Hazelnuts. J. Food Meas. Charact. 2015, 9(1), 11–19. DOI: 10.1007/s11694-014-9206-3.
Web of Science ®Google Scholar
Giacosa, S.; Belviso, S.; Bertolino, M.; Dal Bello, B.; Gerbi, V.; Ghirardello, D.; Giordano, M.; Zeppa, G.; Rolle, L. Hazelnut Kernels (Corylus Avellana L.) Mechanical and Acoustic Properties Determination: Comparison of Test Speed, Compression or Shear Axis, Roasting, and Storage Condition Effect. J. Food Eng. 2016, 173, 59–68. DOI: 10.1016/j.jfoodeng.2015.10.037.
Web of Science ®Google Scholar
Bohnhoff, D. R.; Lawson, K. S.; Fischbach, J. A. Physical Properties of Upper Midwest USA-Grown Hybrid Hazelnuts. Trans. ASABE. 2019, 62(5), 1087–1102. DOI: 10.13031/trans.13378.
Web of Science ®Google Scholar
Aremu, A. K.; Ojo-Ariyo, A. M.; Oyefeso, B. O. Selected Mechanical Properties of Bambara Groundnut Seeds Under Compressive Loading. LAUTECH J. Eng. Technol. 2022, 16, 70–77.
Google Scholar
USDA. Official Grain Standards of the United States. US Department of Agricultural Consumer and Marketing Service, Grain Division, Revised; USDA: Washington, DC, USA, 1970.
Google Scholar
Dhanaraj, R. K.; Rajkumar, K.; Hariharan, U. Enterprise IoT Modelling: Supervised, Unsupervised, and Reinforcement Learning. In Business Intelligence for Enterprise Internet of Things; Haldorai, A., Ramu, A. Khan, S. Eds.; Springer: Cham, Switzerland, 2020; Vol. EAI/Springer Innovations in Communication and Computing, pp. 55–79. DOI:10.1007/978-3-030-44407-5_3
Google Scholar
Eberly, L. E. Multiple linear regression. Methods Mol. Biol. 2007, 404, 165–187. DOI: 10.1007/978-1-59745-530-5_9.
PubMedGoogle Scholar
Uyanık, G. K.; Güler, N. A Study on Multiple Linear Regression Analysis. Procedia Soc. Behav. Sci. 2013, 106, 234–240. DOI: 10.1016/j.sbspro.2013.12.027.
Google Scholar
Andrews, D. F. A Robust Method for Multiple Linear Regression. Technometrics. 1974, 16(4), 523–531. DOI: 10.2307/1267603.
Web of Science ®Google Scholar
Vapnik, V. The Nature of Statistical Learning Theory, 2nd ed.; Springer Science+Business Media: New York, NY, USA, 2000.
Google Scholar
Ahmad, A. S.; Hassan, M. Y.; Abdullah, M. P.; Rahman, H. A.; Hussin, F.; Abdullah, H.; Saidur, R. A Review on Applications of ANN and SVM for Building Electrical Energy Consumption Forecasting. Renewable Sustainable Energy Rev. 2014, 33, 102–109. DOI: 10.1016/j.rser.2014.01.069.
Web of Science ®Google Scholar
Smola, A. J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14(3), 199–222. DOI: 10.1023/B%3ASTCO.0000035301.49549.88.
Web of Science ®Google Scholar
Chang, C. C.; Lin, C. J. Training V-Support Vector Regression: Theory and Algorithms. Neural Comput. 2002, 14, 1959–1977. DOI: 10.1162/089976602760128081.
PubMed Web of Science ®Google Scholar
Üstün, B.; Melssen, W. J.; Buydens, L. M. Facilitating the Application of Support Vector Regression by Using a Universal Pearson VII Function Based Kernel. Chemom. Intell. Lab. Syst. 2006, 81(1), 29–40. DOI: 10.1016/j.chemolab.2005.09.003.
Web of Science ®Google Scholar
Jain, A. K.; Mao, J.; Mohiuddin, K. M. Artificial neural networks: A tutorial. Computer. 1996, 29(3), 31–44. DOI: https://doi.org/10.1109/2.485891.
Web of Science ®Google Scholar
Ghorbani, H.; Wood, D. A.; Choubineh, A.; Mohamadian, N.; Tatar, A.; Farhangian, H.; Nikooey, A. Performance comparison of bubble point pressure from oil PVT data: Several neurocomputing techniques compared. Experimental And Computational Multiphase Flow. 2020, 2(4), 225–246. DOI: 10.1007/s42757-019-0047-5.
Google Scholar
Mohamadian, N.; Ghorbani, H.; Wood, D. A.; Mehrad, M.; Davoodi, S.; Rashidi, S.; Soleimanian, A.; Shahvand, A. K. A Geomechanical Approach to Casing Collapse Prediction in Oil and Gas Wells Aided by Machine Learning. J. Petrol. Sci. Eng. 2021, 196, 107811. DOI: 10.1016/j.petrol.2020.107811.
Web of Science ®Google Scholar
Zou, J.; Han, Y.; So, S. S. Overview of Artificial Neural Networks. Methods Mol. Biol. 2009, 458, 15–23. DOI: 10.1007/978-1-60327-101-1_2.
Google Scholar
Ertuğrul, Ö. F. A Novel Type of Activation Function in Artificial Neural Networks: Trained Activation Function. Neural. Netw. 2018, 99, 148–157. DOI: 10.1016/j.neunet.2018.01.007.
PubMed Web of Science ®Google Scholar
Herzog, S.; Tetzlaff, C.; Wörgötter, F. Evolving artificial neural networks with feedback. Neural. Netw. 2020, 123, 153–162. DOI: 10.1016/j.neunet.2019.12.004.
PubMed Web of Science ®Google Scholar
Kiran, A.; Shirisha, N. K-Anonymization Approach for Privacy Preservation Using Data Perturbation Techniques in Data Mining. Mater. Today Proc. 2022, 64, 578–584. DOI: 10.1016/j.matpr.2022.05.117.
Google Scholar
Kayakus, M.; Tutcu, B.; Terzioglu, M.; Talaş, H.; Ünal Uyar, G. F. ROA and ROE Forecasting in Iron and Steel Industry Using Machine Learning Techniques for Sustainable Profitability. Sustainability. 2023, 15(9), 7389. DOI: 10.3390/su15097389.
Web of Science ®Google Scholar
Kayakuş, M.; Erdoğan, D.; Terzioğlu, M. Predicting the Share of Tourism Revenues in Total Exports. Alphanumeric J. 2023, 11(1), 17–30. DOI: 10.17093/alphanumeric.1212189.
Google Scholar
Ozer, D. J. Correlation and the Coefficient of Determination. Psychol. Bull. 1985, 97, 307. DOI: 10.1037/0033-2909.97.2.307.
Web of Science ®Google Scholar
Renaud, O.; Victoria-Feser, M. P. A Robust Coefficient of Determination for Regression. J. Stat. Plan. Inference. 2010, 140(7), 1852–1862. DOI: 10.1016/j.jspi.2010.01.008.
Web of Science ®Google Scholar
Tuchler, M.; Singer, A. C.; Koetter, R. Minimum mean squared error equalization using a priori information. EEE Trans. Signal Process. 2002, 50(3), 673–683. DOI: 10.1109/78.984761.
Web of Science ®Google Scholar
Colak, M.; Yesilbudak, M.; Bayindir, R. Daily Photovoltaic Power Prediction Enhanced by Hybrid GWO-MLP, ALO-MLP and WOA-MLP Models Using Meteorological Information. Energies. 2020, 13(4), 901. DOI: 10.3390/en13040901.
Web of Science ®Google Scholar

Non-destructive prediction of hazelnut and hazelnut kernel deformation energy using machine learning techniques

ABSTRACT

Introduction