512
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Non-destructive Identification of the geographical origin of red jujube by near-infrared spectroscopy and fuzzy clustering methods

ORCID Icon, , , ORCID Icon, &
Pages 3275-3290 | Received 11 Aug 2023, Accepted 04 Nov 2023, Published online: 22 Nov 2023

ABSTRACT

The red jujube quality is closely associated with its place of origin. In order to quickly and easily identify the geographical origin of red jujube, the classification of red jujube samples’ near-infrared reflectance (NIR) spectra was performed using several fuzzy clustering methods in combination with principal component analysis (PCA) and linear discriminant analysis (LDA). Firstly, a NIR-M-R2 portable near-infrared spectrometer was used to collect four varieties of red jujube samples from four representative producing areas in four provinces: Gansu, Henan, Shanxi and Xinjiang in China. Each variety corresponded to a producing area, and it had 60 samples with a total of 240 samples. Near-infrared spectra of red jujube were acquired using a NIR-M-R2 portable near-infrared spectrometer, and the initial near-infrared spectra were preprocessed by Savitzky-Golay (SG) filtering. Secondly, PCA and LDA were used to further process the NIR data for dimension reduction and feature extraction, respectively. Finally, red jujube samples were classified by fuzzy C-means (FCM) clustering, Gustafson-Kessel (GK) clustering and possibility fuzzy C-means (PFCM) clustering. When GK served as the clustering algorithm, the clustering accuracy was the highest, as the value of 98.8%. Based on the experimental results, it was evident that the GK clustering algorithm played a significant role in identifying the place of origin of red jujube with near-infrared spectroscopy.

Introduction

Red jujube, as one of China’s long historical heritage, carries rich culture and profound significance. The high nutritional value of red jujube has been praised by people. It is rich in nutrition and has certain therapeutic and health care effects, such as anti-oxidation, anti-tumor, anti-fatigue, hypoglycemic and lipid-lowering, liver protection and immune regulation.[Citation1] In the long history of China, red jujube can be traced back to the ancient times thousands of years ago. As early as the Spring and Autumn Period and the Warring States Period, red jujube was listed as a precious tonic food and was widely used. In ancient medical theory, red jujube is considered to have the effects of nourishing the body, strengthening the body, and treating some diseases. Therefore, red jujube is also known as “the first fruit of China.” In addition to medical value, red jujube also plays an important role in Chinese culture. In traditional festivals and important occasions, jujube is often used to cook a variety of traditional foods, such as jujube cake, jujube mud cake and so on. Red jujube is a symbol of harvest and happiness. It is regarded as a symbol of auspiciousness and is often used as a gift for relatives and friends, blessing health and a better future. In recent years, as the standard of living keeps improving, people’s expectations for the quality and quantity of red jujube are increasing, which has led to an unprecedented period of high consumer demand in the market, resulting in drastic changes in product prices and increased risks. However, due to the constraints of resources, environment and planting technology, such demand for a great variety of red jujube can sometimes lead to a shortage of supply. The wide variety of foods, complex components, and many kinds of doped substances, most of the appearance composition or physical and chemical properties are relatively close, but red jujube samples from different production areas often have distinct taste and nutritive value.[Citation2] The usual detection methods are cumbersome and inefficient for their determination of the quality and type of red jujubes. Driven by economic interests, some merchants faked products, and consumers spent a high price but did not get the good quality of jujube. Therefore, how to quickly and easily identify the red jujube variety is of great significance.[Citation3]

The traditional jujube identification method due to the manual grading results in high costs and low quality[Citation4] as well as accurate classification of fruit accessions in processing plants and during post-harvesting requests is a challenge that has been widely studied.[Citation5] Moreover, the process consumes too much manpower and material resources, which has been criticized. In recent years, researchers from both domestic and international fields have taken an active part in identifying red jujube varieties. According to the classification of red jujube from the perspective of appearance, Al-Saif et al. established an artificial neural network (ANN) classifier on the basis of the color and morphological attributes of a single Indian jujube fruit to identify different varieties of Indian jujube fruit, the overall classification accuracy was 98.39%.[Citation3] Meng et al. established a deep convolutional neural network model, which could recognize jujube varieties in their natural state by learning contrast differences from jujube images. To further facilitate the research, they built a dataset of 20 jujube varieties in natural settings. These jujube images varied greatly in angle, background and illumination conditions, and the average accuracy of the proposed model on this dataset reached 84.16%.[Citation6]

In addition to the classification from the physical appearance, some people also distinguished the types of jujube according to the chemical composition. The use of SNP tags for parental verification has achieved remarkable results.[Citation7] Hyperspectral imaging (HSI) technology was used to collect the hyperspectral data and spatial feature imaging of jujube samples as the original data, and a new method was proposed to select the feature wavelength and simplify the model. Compared with the old models before optimization, the accuracy of the new model in identifying jujube varieties in the same training set and test set reached 96.68%.[Citation8]

As a fast, nondestructive and accurate detection technology,[Citation9,Citation10] near-infrared spectroscopy (NIRS) has become widely used in food detection.[Citation11–15] For example, four kinds of tea samples were scanned by NIRS, and then the spectra were processed by principal component analysis (PCA) and linear discriminant analysis (LDA). At last, they were classified by possibilistic fuzzy discriminant C-means clustering with an accuracy of 98.84% [Citation12]. NIRS has a wide application in some fields such as nondestructive detection of agricultural products and food.[Citation16–27] For instance, the dried Hami jujube was detected by a visible and NIRS spectrometer to collect the diffuse reflection spectrum for checking out starch-head and mildewed fruit.[Citation28] Four varieties of apple samples were made spectral detection with the Antaris II FT-NIR spectrophotometer for collection of their near-infrared (NIR) spectra.[Citation29] Visible and NIRS were utilized to carry out detection on maltodextrin and soybean protein isolate (SPI).[Citation30] NIRS was coupled with pattern recognition methods to classify the adulterants in Ginger powder (GP).[Citation31]

Fuzzy pattern recognition (FPR) is an artificial intelligence technology, related to fuzzy reasoning or fuzzy logic. It is a method to deal with fuzzy information. Compared with traditional binary logic, fuzzy logic can better deal with uncertain or fuzzy information and can make more reasonable inferences and judgments when dealing with fuzzy information. At present, the application of FPR is very extensive, including classification, image processing, control system, decision support, and so on. For example, fuzzy uncorrelated discriminant transformation (FUDT) coupled with a portable NIR spectrometer was presented to build a classification system for identifying the geographical origin of milk.[Citation32] Li et al. designed the fire alarm model through the fuzzy inference system, which could effectively reduce the false alarm rate of the system.[Citation33] Wen et al. proposed an integrated evaluation approach for the safety action of coal and gas outburst (CFRD) based on the regression relation method and FPR model.[Citation34] Ding et al. used the fuzzy recognition algorithm to judge the integrated evaluation rank of dam safety.[Citation35] Chen et al. designed a single IT2 FLS based on Nagar Bardini (NB) structure for FPR, and IT2 FLS could obtain better generalization ability of fuzzy recognition problems.[Citation36]

Fuzzy C-means clustering (FCM) is one of fuzzy clustering algorithms based on fuzzy set theory. Compared with the traditional C-means clustering algorithm, the FCM algorithm does not strictly allocate each data point to a cluster but uses the concept of membership degree to represent the degree of each data point belonging to each cluster. Barzegaran et al. used k-means and FCM for data clustering to obtain the IRI range based on pavement conditions.[Citation37] Its advantage is that it can handle very large data sets and high-dimensional data, and its clustering results are more explanatory, which can represent the degree of data points belonging to each cluster. However, FCM algorithm also has some disadvantages, such as being sensitive to noise data and outliers, needing to define the number of clusters in advance, and getting different clustering results for different initial cluster centers.[Citation38,Citation39] Compared with FCM algorithm, the advantage of possibilistic fuzzy C-means (PFCM) clustering algorithm is that it is more robust to noise data and outliers, and can better deal with uncertainty and noise data without pre-defining the number of clusters.[Citation39] However, the computational complexity of PFCM algorithm is higher than FCM, and the values of two parameters (membership degree and possibility measure) need to be adjusted. Therefore, the values of parameters need to be carefully selected in practical applications.

In this study, the raw NIR data of red jujube were collected by a portable NIR spectrometer. After the spectral data were preprocessed by SG filtering, they were made feature extraction by PCA and LDA. Finally, fuzzy C-means clustering (FCM), Gustafson-Kessel (GK) and possibility fuzzy C-means (PFCM) algorithms were used to classify the sample data, respectively, and the experimental classification results were compared and analyzed.

Materials and Methods

Sample preparation

In this study, experimental samples from four varieties of red jujube were selected, originating from four provinces: Gansu, Henan, Shanxi, and Xinjiang in China. Red jujube in these four regions is common and popular jujube varieties in China. After picking the red jujube, a series of treatments were carried out such as selection, cleaning, sterilization, mixed drying, natural cooling, secondary drying and denuclearization packaging. Each step has specific operation and parameter requirements to ensure the quality of the experimental samples. Each production area was represented by one variety, and 60 samples were selected for each variety, resulting in 240 samples in total. Then the samples of each variety were randomly divided into test samples and training samples as numbers 21 and 39, respectively. In addition, the selection requirement of jujube samples is that the red jujube samples have approximately the size (length: 3–5 cm, width: 2–3 cm), weight (10–20 g) and the time of maturity (September and October). Meanwhile, the experimenters ensured that the surface of the red jujube was clean and free from obvious defects. Because considering the non-target factors of sample differences, it is helpful to eliminate the influence of other factors on the experimental results by unifying the appearance factors of samples, to evaluate the recognition ability of near-infrared spectroscopy for red jujube.

Spectral acquisition

The spectrometer used for spectral acquisition is NIR-M-R2, a portable near-infrared spectrometer (Shenzhen Spectrum Research Interconnection Technology Co., Ltd.). The spectrometer scanned the surface of red jujube to collect the NIR spectral data. The wavelength range of the spectrometer was 900 ~ 1700 nm/11100 ~ 5880 cm−1; the wavelength precision is ±1nm; the ratio of signal to noise was 6000:1; the slit size is 1.8 × 0.025 mm; the optical resolution is 10 nm. At the period of NIR collection, it was recommended to maintain a temperature of around 25°C and relative humidity of 50%-60%. Before collecting spectral data, it is important to preheat the spectrometer for 1 hour. The NIR spectra should be within the wavelength range of 900–1700 nm and a resolution of 10 nm. Each sample should be detected by the spectrometer along the equatorial direction. This can reduce the instability during the scanning process and finally obtain more accurate, 228-dimensional near-infrared spectral data. In addition, we used Matlab R2020b (the Math-Works, Natick, MA, USA) to implement all the algorithms in this study.

Data analysis methods

Principal Component Analysis: The original near-infrared spectral dimension is 228, and they contain some redundant information. If the raw spectral data is not processed properly, it can lead to difficulties in the later classification work and reduce the identification accuracy. Therefore, it is important to ensure proper preprocessing of the spectral data. Therefore, in order to obtain more effective information in 228-dimensional data, it is necessary to reduce the dimension of spectra, and then find the eigenvectors that can directly reflect the difference of near-infrared spectra. Principal component analysis (PCA) is a dimensional reduction method that transforms sample data into a new feature space while retaining most of the original information. PCA retains the maximum possible amount of information of the near-infrared spectra by selecting the eigenvectors. Therefore, in this study, PCA could be used to reduce the spectral dimensions.

The covariance matrix describes the linear relationship between data features. For a given data set X, where each column represents a feature, the elements of the covariance matrix C can be calculated by the following equation:

(1) C=1n×(Xμ)(Xμ)T(1)

where n is the number of samples, and μ is the mean of X. The eigenvectors and eigenvalues of PCA can be computed with eigen-decomposition of the covariance matrix C. After the original data set X is projected onto the selected eigenvectors, PCA finishes the data transformation.

Linear Discriminant Analysis: Linear discriminant analysis (LDA) is a classical linear extraction method.[Citation40,Citation41] The basic idea of LDA is to map the data to a straight line or a hyperplane, so that the projection points of similar samples are as close as possible, while the projection points between different categories are as far away as possible. LDA is a supervised learning method that requires the label information of known samples.[Citation40] Specifically, LDA first calculates the mean vector of each category and the overall mean vector, then calculates the intra-class scatter matrix and the inter-class scatter matrix, and finally obtains the projection vectors through matrix operation, which is used to map high-dimensional data to low-dimensional space. LDA is widely used in pattern recognition, image processing, face recognition and other fields.[Citation41–47] LDA aims to solve the optimized equation as follows:

(2) maxJ(W)=WTSBWWTSWW(2)

where W is a matrix formed by eigenvectors; Sw is the within-class scatter matrix and Sb is the between-class scatter matrix. To solve Equationequation (2), eigen-decomposition Sw1×Sb=VDVT is calculated, where V is the eigenvector matrix; D is the diagonal eigenvalue matrix, and the elements on the diagonal of D correspond to the eigenvalues. The original data set X is projected onto the selected eigenvectors to obtain the data for data projection.

Fuzzy C-means Clustering: FCM is a famous fuzzy clustering method. Its key feature is that this clustering method is divided by fuzzy set, and the membership degree of each data to each cluster center can be at 0,1. If FCM is given with c cluster centers, it can minimize the objective function as follows[Citation48]:

(3) Jm=k=1ni=1cuikmxkvi2(3)

Among them, uik is the fuzzy membership value of the kth data point xk to the ith cluster center vi; m is the fuzzy factor (usually greater than 1). At last, the final fuzzy membership value and the cluster centers can be achieved after fuzzy clustering.

Clustering: Gustafson-Kessel (GK) clustering algorithm implements clustering by assigning data points to different archetypes (i.e., cluster centers). The advantage of GK algorithm is that it adopts an adaptive distance measure of the clustering covariance matrix for fuzzy clustering, and it performs well in processing hyperellipsoid data. Through the computation of the covariance matrix, the GK clustering method can adaptively partition datasets with varying geometric structures. However, the GK algorithm needs to select the appropriate initial value and give the number of clustering in advance.

Possibility Fuzzy C-means Clustering: Based on a possibility constraint condition, the FCM algorithm should satisfy that the sum of the membership values of one sample across all clusters equals 1, but it is different from the intuitive membership degree. The membership degree in FCM does not really represent the degree of the samples belonging to the class. For outliers and noise, even if there is no contribution to clustering, it may still have a large membership degree, resulting in clustering error.[Citation49] PCM selects the typicality of the sample as its clustering result, and it removes the limitation of the possibility constraint condition, so it can distinguish noise, and overcome the defects of FCM. However, PCM is very sensitive to clustering centers and often leads to consistent clustering results.[Citation49] In order to overcome the shortcomings of FCM and PCM, PFCM combined FCM and PCM to provide both the membership degree and the typicality. The objective function model of PFCM is as follows[Citation39]:

(4) Jm,pU,T,V=k=1ni=1cauikm+btikpDik2+i=1cηik=1n(1tik)p(4)

Where ηi can be calculated as follows[Citation49]:

(5) ηi=Kk=1nuikmDik2k=1nuikm(5)

Generally, K=1. Parameters a and b are to characterize the influence of membership value and typical properties, respectively. If the parameter a is larger than b, it shows that the calculation process of the cluster centers receives more influence from the membership value, and the algorithm has weaker sensitivity to the cluster centers. On the contrary, if the parameter b is larger than a, it shows that the calculation process of the cluster center is more affected by the typical value. PFCM has stronger resistance to noise.

Results and Discussion

Spectral analysis

The near-infrared spectra of red jujube samples provide a wealth of distinct functional group information.[Citation50–52] The raw near-infrared spectra are shown in . As illustrated in , the near-infrared spectra of red jujube samples displayed two prominent peaks, one located at 1180 nm and the other at 1430 nm. The peak at 1180 nm is generated by the first and second frequency multiplication of the C-H group’s tensile vibration, which is associated with the presence of protein-like compounds.[Citation53] Additionally, beyond 1270 nm, there was a notable change in the absorbance of all jujube samples, mainly due to the absorption of O-H and water.[Citation54] In the region of 900–1270 nm, the absorbance of jujube samples is low. Above 1270 nm, the absorbance of the sample began to increase sharply and reached a peak at 1420 nm. The baseline drift is obvious. In the near-infrared region of 1400–1450 nm, the absorbance of jujube samples is quite high. Above 1450 nm, the absorbance of the sample began to decrease, and decreased sharply at 1630 nm, reaching the valley of the sample at 1680 nm. This is because the absorption of O-H is related to the absorption of water. The spectra have the phenomenon of baseline drift because the peaks of the curves are very different.

Figure 1. Raw and pretreated NIR spectra: (a) raw NIR spectra; (b) NIR spectra by SG smoothing; (c) raw mean NIR spectra; (d) the mean NIR spectra by SG smoothing.

Figure 1. Raw and pretreated NIR spectra: (a) raw NIR spectra; (b) NIR spectra by SG smoothing; (c) raw mean NIR spectra; (d) the mean NIR spectra by SG smoothing.

Spectral pretreatment

The original spectral data of red jujube are easily affected by the physical properties of the sample, so there are some noises and unnecessary information in the 228-dimensional data.[Citation48] In order to eliminate these noises and information, Savitzky-Golay (SG) smoothing filter was performed to smooth the NIR spectra with the function y = sgolayfilt (x, k, f) in MATLAB. The SG smoothing filter can reduce the noise data while retaining most of the information in the spectra. Moreover, compared with other pretreatment methods, the SG method is richer, more flexible and has the wider applicability.[Citation55] During the experiment, a polynomial order of 3 and a frame size of 53 were set. is the NIR spectra after the SG smoothing treatment. It can be clearly seen that the spectra become very smooth, and the peaks and troughs are more obvious. are the raw mean NIR spectra and the mean NIR spectra by SG smoothing, respectively. From the mean NIR spectra, the difference existing among red jujube varieties could be figured out clearly.

PCA + LDA

PCA: Considering the cumulative contribution rate and classification accuracy, the near-infrared spectral data were projected into the top 10 principal components (PCs). The eigenvalues were: λ1 = 110.555,λ2 = 7.668,λ3 = 4.789,λ4 = 0.404,λ5 = 0.128,λ6 = 0.040,λ7 = 0.014,λ8 = 0.0071,λ9 = 0.0032,λ10 = 0.0027. The first three PCs’ cumulative contribution rate reached 99.51%. The first three PCs retained the data feature information of the near-infrared spectra. PC1, PC2, and PC3 were selected as the first three PCs, and three feature spaces of spectral data were established accordingly. The PCA scores of PC1, PC2, and PC3 can be seen in .

Figure 2. PCA scores plot of PC1, PC2 and PC3.

Figure 2. PCA scores plot of PC1, PC2 and PC3.

If the data set X has p variables, the cumulative contribution rate of the first m (mp) PCs is as follows:

(6) λkmk=1λkpk=1(6)

Generally, the eigenvalues with a cumulative contribution rate of over 85% are taken into the first, second, … , mth principal components corresponding to λ1,λ2…, λm. The classification accuracy is the number of correctly classified test samples/the number of all test samples x100%.

LDA: It can be seen that although the classification effect of PCA is obvious, some red jujube samples cannot be well identified. Using PCA to reduce the dimension of spectral data to 10 dimensions, each variety of the samples was randomly divided into 21 test samples and 39 training samples, respectively. LDA made extraction for discriminant information from the data, and thereafter, the test samples were projected onto the LDA’s feature space.

illustrates the LDA scores plot of the three discriminant vectors, and the red jujube samples can be distinguished well. Then, the mean value of each variety of training samples was computed, and it served as the initial clustering centers of the following FCM and PFCM. The initial clustering centers were:

Figure 3. LDA scores plot of vectors with DV1, DV2 and DV3.

Figure 3. LDA scores plot of vectors with DV1, DV2 and DV3.

(7) V0=v10v20v30v40=0.01290.01620.01260.02130.00520.01550.00180.02140.01040.02110.00690.0088(7)

Classification with FCM

The number of clustering centers was 4, and the initial clustering centers were described above. The remaining experimental conditions: the index m of the segmentation matrix was 2, the maximum number of iterations was 100, and the allowable error at the end of the iteration was 0.00001. After FCM clustering, the final clustering centers were:

(8) VFCM=0.01040.00990.00870.01010.00800.01160.00320.01630.00710.03790.02500.0006(8)

The fuzzy membership diagram of FCM is shown in . The horizontal axis represents the kth sample, while the vertical axis denotes the fuzzy membership value. Since there are four different varieties of red jujube used in the experiment, there are four different subplots, with each subplot corresponding to a specific red jujube variety. If the value on the vertical axis uik crosses the 0.5 threshold, it indicates that the sample xk is assigned to the ith class of red jujube, or if uik is the largest value among the ith class, it can be determined that the kth sample belongs to the ith class. shows that the No.1-No.20 samples, the No.22-No.38 samples, the No.43-No.63 samples, and the No.64-No.84 samples belonged to Gansu, Henan, Shanxi and Xinjiang, respectively. Seventy-nine test samples were correctly clustered with a clustering accuracy of 94.0%.

Figure 4. Fuzzy membership value of FCM.

Figure 4. Fuzzy membership value of FCM.

Classification with GK

The initial clustering centers were the same as equation (9), and the weight index m was 2. The maximum number of iterations was 100, and the allowable error at the end of the iteration was 0.00001. After GK clustering was run to termination, the final clustering centers were:

(9) VGK=0.01360.00900.00640.00860.01180.00990.00640.01360.00740.03570.02250.0018(9)

The fuzzy membership degree of GK is shown in . shows that the No.1-No.21 samples, the No.22-No.42 samples, the No.43-No.62 samples, and the No.64-No.84 samples were classified as Gansu, Henan, Shanxi and Xinjiang, respectively. The No.63 sample was misclassified as Xinjiang; in fact, it belonged to Shanxi. As a result, 83 test samples were correctly clustered, and the clustering accuracy rate was 98.8%.

Figure 5. Fuzzy membership value of GK.

Figure 5. Fuzzy membership value of GK.

Classification with PFCM

In this experiment, the number of clusters was 4, and the influence degree of membership value and typical value was the same, that is, a=1, b=1. The initial clustering center was V0; the maximum number of iterations was 100; the allowable error at the end of the iteration was 0.00001. After running PFCM to termination, the cluster centers were:

(10) VPFCM=0.01010.00870.00800.00920.00830.01050.00260.01590.00690.03490.02210.0018(10)

Fuzzy Membership Classification: The fuzzy membership degree obtained by PFCM is shown in . The test samples were classified by fuzzy membership degree and it can be seen that the No.21 sample was misclassified as Henan; the No.39-No.42 samples were misclassified as Shanxi. Therefore, there is a total of five samples misclassification, and the clustering accuracy was 94.0%.

Figure 6. Fuzzy member degree of PFCM.

Figure 6. Fuzzy member degree of PFCM.

Typicality Classification: The typicality values obtained by PFCM are shown in . It can be seen that the No.1-No.20 samples, the No.22-No.38 samples, the No.43-No.62 samples, and the No.64-No.84 samples were classified as Gansu, Henan, Shanxi and Xinjiang, respectively. Seventy-eight test samples were correctly clustered with a clustering accuracy of 92.9%.

Figure 7. Typicality values of PFCM.

Figure 7. Typicality values of PFCM.

Clustering Results

shows the classification results including the number of misclassifications, the number of convergence and clustering accuracy. It can be observed that the clustering accuracy of GK was the highest among the three fuzzy clustering methods with a value of 98.8%. Both fuzzy membership degree and typical value of PFCM could be applied to classify samples, and the clustering accuracy from the typical value of PFCM was lower than FCM, GK and fuzzy membership degree of PFCM. The number of convergence of FCM was smaller than GK and PFCM, and this means FCM converged faster than GK and PFCM. K-nearest neighbor (KNN) was run for classification with the parameter K = 1, and its classification accuracy was 72.5% which was lower than FCM, GK and PFCM.

Table 1. Classification results of FCM, GK and PFCM.

displays clustering accuracies of FCM, GK and PFCM under preprocessing methods of standard normal variate (SNV), standard normal variate (SNV), SG, SNV+MSC, SNV+SG, and MSC+SG. FCM achieved the highest accuracy using SG, and the lowest accuracy using MSC or SNV+MSC. Among three fuzzy clustering algorithms, GK had the highest accuracy using any of the preprocessing methods. When the preprocessing method was SNV+SG, the accuracy of GK was 100%. For these three fuzzy clustering algorithms, when the preprocessing methods were SG, SNV+SG, and MSC+SG, their accuracies were high. On the other hand, when the preprocessing methods were SNV, MSC, and SNV+MSC, their accuracies were low.

Table 2. Clustering accuracies of FCM, GK and PFCM under different preprocessing methods (%).

Discussion

In this experiment, NIR spectral data of four varieties of jujube were collected by a NIR-M-R2 portable NIR spectrometer. The SG smoothing filter was used to preprocess the near-infrared spectrum data. PCA and LDA were used to further process the spectral data for extraction of characteristic information. Finally, the fuzzy membership of FCM, GK and PFCM were used to classify the jujube samples. Experiments showed that the clustering accuracy of jujube varieties was different by using different fuzzy clustering methods, and the gap was large. The classification accuracy was only 61.9% when using the typicality of PFCM, and less than 90% when using the fuzzy membership of GK/PFCM. In contrast, when using GK as the clustering algorithm, the clustering accuracy can reach more than 90%.

The quality of jujube is closely related to its place of origin. The soil and climate of the different places of origin will provide different nutrients for red jujube, and this difference will be reflected in the element type of red jujube. For example, according to Lang et al.‘s study, the red jujube samples collected from the Xinjiang region had the highest average value for Na and Ge content. On the other hand, the red jujube samples collected from the Hebei region had the highest average value for Ca, Ba, and Ti content.[Citation49] By measuring five different varieties of jujube, Gao et al. found that there were statistically significant differences in the measured parameters between the investigated jujube tree cultivars. These results indicated that cultivar was the main factor affecting the physical and chemical properties of jujube.[Citation50] In conclusion, the soil and climate in different regions will affect the types of red dates, and different varieties of red dates have different proportions of chemical components, so they can be distinguished according to their molecular composition. The near-infrared spectrum can be used to detect the molecular substances related to the absorption bands in the spectrum to distinguish different kinds of jujube.

It is important to note that traditional methods require high-precision experimental instruments, and supporting technicians, and consume much time. Moreover, the use of chemical analysis methods can lead to chemical pollution. In contrast, NIR spectroscopy combined with the three clustering algorithms used in the experiment can achieve the nondestructive and green non-pollution classification of red jujube samples with the GK clustering algorithm achieving the highest classification accuracy. This method can be effectively applied to provide jujube producers and supply chain managers with convenient tools to help them quickly and accurately identify and classify jujube varieties, thereby improving product quality, reducing market risks and increasing competitiveness.

FCM algorithm is widely used in pattern recognition because of its simple design and easy implementation. However, it has some problems: (1) poor robustness to noise and outliers, easy to classify as inaccurate; (2) sensitive to initialization data, and sometimes fall into local optimum. To solve the problem of noise sensitivity in FCM, PFCM was designed based on a possibility partition. PFCM can produce both fuzzy membership and typicality value for clustering data containing noise correctly. Based on a fuzzy covariance matrix and a non-Euclidean distance, GK clustering can deal with dataset with different geometric shape data. The data distribution of NIR spectra of red jujube is complex, so GK clustering can achieve the higher clustering accuracy than FCM and PFCM.

Conclusion

In order to rapidly and nondestructively identify red jujube varieties, this study proposed a classification method by combining fuzzy clustering methods and near-infrared spectroscopy. Four varieties of red jujube were detected by a NIR-M-R2 portable spectrometer to acquire the NIR spectra, which were preprocessed by SG filtering, PCA and LDA, respectively, so that they were clustered correctly. Finally, FCM, GK and PFCM were performed to cluster the red jujube samples. Compared to other fuzzy clustering methods, GK algorithm showed the highest accuracy in identifying the red jujube samples. PFCM algorithm was found to be effective in classifying the NIR spectra of red jujube by providing both fuzzy membership degree and typicality values. According to the experimental results, the combination of GK clustering algorithm and near-infrared spectroscopy played a significant role in identifying the geographical origin of red jujube samples. In this study, we further proved the effectiveness and feasibility of near-infrared spectroscopy in the rapid identification of jujube varieties. In the future, it not only provides a convenient variety identification tool for jujube producers, dealers and consumers, but also helps to ensure the quality and authenticity of products. It also provides a rapid, efficient and accurate identification method for food varieties, which helps researchers to understand the geographical origin traceability, variety characteristics and adaptability of red jujube.

Acknowledgement

The authors sincerely thank Mr. Zuxuan Qi for providing NIR spectra of red jujube.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data are available from the corresponding authors.

Additional information

Funding

This research was funded by Jinling Institute of Technology High-level Talent Research Start-up Project (JIT-RCYJ-202102), Key R&D Plan Project of Jiangsu Province (BE2022077), Jiangsu Province College Student Innovation Training Program Project (202313573080Y, 202313573081Y), the Major Natural Science Research Projects of Colleges and Universities in Anhui Province (2022AH040333), the Undergraduate Innovation and Entrepreneurship Training Program of Jiangsu Province (202213986008Y).

References

  • Yang, J.; Hou, Y.; Chang, N. Determination of Amino Acid Content and Principal Component Analysis of Shanxi Jujube. Food Res. Dev. 2021, 42, 141–145.
  • Mairemu, S. Y. Establishment of near infrared spectroscopy for Jun jujube sugar of different mature period. Anhui Agric. Sci. Bull. 2017, 23, 143–145.
  • Al-Saif, A. M.; Abdel-Sattar, M.; Aboukarima, A. M.; Eshra, D. H. Identification of Indian Jujube Varieties Cultivated in Saudi Arabia Using an Artificial Neural Network. Saudi J. Biol. Sci. 2021, 28(10), 5765–5772. DOI: 10.1016/j.sjbs.2021.06.019.
  • Huynh, Q.-K.; Nguyen, C.-N.; Vo-Nguyen, H.-P.; Tran-Nguyen, P. L.; Le, P.-H.; Le, D.-K.; Nguyen, V.-C. Crack Identification on the Fresh Chilli (Capsicum) Fruit Destemmed System. J. Sens. 2021, 2021, 10. DOI: 10.1155/2021/8838247.
  • Sabzi, S.; Abbaspour-Gilandeh, Y.; Garcıa-Mateos, G. A New Approach for Visual Identification of Orange Varieties Using Neural Networks and Metaheuristic Algorithms. Inform. Process. Agric. 2018, 5(1), 162–172. DOI: 10.1016/j.inpa.2017.09.002.
  • Meng, X.; Yuan, Y. C.; Teng, G.; Liu, T. Z. Deep Learning for Fine-Grained Classification of Jujube Fruit in the Natural Environment. J. Food Meas. Charact. 2021, 15(5), 4150–4165. DOI: 10.1007/s11694-021-00990-y.
  • Song, L. H.; Cao, B.; Zhang, Y.; Meinhardt, L. W.; Zhang, D. P. Mining Single Nucleotide Polymorphism (SNP) Markers for Accurate Genotype Identification and Diversity Analysis of Chinese Jujube (Ziziphus Jujuba Mill.) Germplasm. Agronomy. 2021, 11(11), 2303. DOI: 10.3390/agronomy11112303.
  • Wang, S. M.; Sun, J.; Fu, L. H.; Xu, M.; Tang, N. Q.; Cao, Y.; Yao, K. S.; Jing, J. P. Identification of Red Jujube Varieties Based on Hyperspectral Imaging Technology Combined with CARS-IRIV and SSA-SVM. J. Food Process. Eng. 2022, 45(10), e14137. DOI: 10.1111/jfpe.14137.
  • Ma, G. L.; Zheng, Z. M.; Wang, H.; Wang, L.; Zhao, G. H.; Tang, H. F.; Ding, X. M.; Wang, Q.; Fan, S.; Wang, P. Effect of Solution Supersaturation on Crystal Formation of Vitamin K2 Based on Near Infrared Spectroscopy Analysis Technology. J. Cryst. Growth. 2023, 605, 127034. DOI: 10.1016/j.jcrysgro.2022.127034.
  • Song, K.; Qin, Y. H.; Xu, B. Y.; Zhang, N. Q.; Yang, J. J. Study on Outlier Detection Method of the Near Infrared Spectroscopy Analysis by Probability Metric. Spectrochim. Acta A. 2022, 280, 121473. DOI: 10.1016/j.saa.2022.121473.
  • Gao, B.; Xu, X. D.; Han, L. J.; Liu, X. A Novel Near Infrared Spectroscopy Analytical Strategy for Meat and Bone Meal Species Discrimination Based on the Insight of Fraction Composition Complexity. Food Chem. 2021, 344, 128645. DOI: 10.1016/j.foodchem.2020.128645.
  • Wu, B.; Fu, H. J.; Wu, X. H.; Chen, Y.; Jia, H. W. Classification of FTNIR spectra of tea via possibilistic fuzzy discriminant C-means clustering, Spectrosc. Spect. Anal. 2020, 40, 512–516.
  • Yap, X. Y.; Chia, K. S.; Suarin, N. A. S. Adaptive artificial neural network in near infrared spectroscopy for standard-free calibration transfer. Chemometr. Intell. Lab. 2022, 230, 104674. DOI: 10.1016/j.chemolab.2022.104674.
  • Liu, H. M.; Shen, T.; Zhang, W. Y.; Shi, X. W.; Dai, T.; Bai, T.; Xiao, Y. H. Construction and Verification of a Mathematical Model for Near-Infrared Spectroscopy Analysis of Gel Consistency in Southern Indica Rice. Spectrosc. Spect. Anal. 2021, 41, 2432–2436.
  • Wu, X.; Zeng, S.; Fu, H.; Wu, B.; Zhou, H.; Dai, C. Determination of Corn Protein Content Using Near-Infrared Spectroscopy Combined with A-CARS-PLS. Food Chem. X. 2023, 18, 100666. DOI: 10.1016/j.fochx.2023.100666.
  • Zhang, X.; Chen, K.; Deng, T.; Yuan, J.; Zhou, R.; Yu, T.; Zhou, Y.; Song, E. Highly Distorted Cr3±doped Fluoroantimonate with High Absorption Efficiency for Multifunctional Near-Infrared Spectroscopy Applications. Mater. Today Chem. 2022, 26, 101194. DOI: 10.1016/j.mtchem.2022.101194.
  • Couto, C. D. C.; Freitas-Silva, O.; Oliveira, M.; Sousa, E. M. M.; S, C. C. Near-Infrared Spectroscopy Applied to the Detection of Multiple Adulterants in Roasted and Ground Arabica Coffee. Foods. 2021, 11(1), 35010188. DOI: 10.3390/foods11010061.
  • Kawano, S. Past, Present and Future Near Infrared Spectroscopy Applications for Fruit and Vegetables. NIR News. 2016, 27(1), 7–9. DOI: 10.1255/nirn.1574.
  • Winowiecki, L. A.; Tor-Gunnar, V.; Boeckx, P.; Dungait, J. A. J. Landscape-Scale Assessments of Stable Carbon Isotopes in Soil Under Diverse Vegetation Classes in East Africa: Application of Near-Infrared Spectroscopy. Plant Soil. 2017, 421(1–2), 259–272. DOI: 10.1007/s11104-017-3418-3.
  • Xu, X.; Cheng, F.; Ying, Y. B. Application and Recent Development of Research on Near-Infrared Spectroscopy for Meat Quality Evaluation. Spectrosc. Spect. Anal. 2009, 29, 1876–1880.
  • Yu, X.; Xu, Y. F.; Li, J. B.; Zhang, C.; Fan, S. C. Recent advances in emerging techniques for non-destructive detection of seed viability: a review. Artif. Intell. Agr. 2019, 1, 35–47. DOI: 10.1016/j.aiia.2019.05.001.
  • Huang, H.; Yu, H.; Xu, H.; Ying, Y. Near Infrared Spectroscopy for On/in-Line Monitoring of Quality in Foods and Beverages: A Review. J. Food Eng. 2008, 87(3), 303–313. DOI: 10.1016/j.jfoodeng.2007.12.022.
  • Nicolaï, B. M.; Beullens, K.; Bobelyn, E.; Peirs, A.; Saeys, W.; Theron, K. I.; Lammertyn, J. Nondestructive Measurement of Fruit and Vegetable Quality by Means of NIR Spectroscopy: A Review. Postharvest. Biol. Technol. 2007, 46(2), 99–118. DOI: 10.1016/j.postharvbio.2007.06.024.
  • Xia, Y.; Huang, W.; Fan, S.; Li, J.; Chen. L.Effect of Fruit Moving Speed on Online Prediction of Soluble Solids Content of Apple Using Vis/NIR Diffuse Transmission. J. Food Process. Eng. 2018, 41, e12915. DOI: 10.1111/jfpe.12915.
  • Kusumaningrum, D.; Lee, H.; Lohumi, S.; Mo, C.; Kim, M. S.; Cho, B.-K. Non-Destructive Technique for Determining the Viability of Soybean (Glycine Max) Seeds Using FT-NIR Spectroscopy. J. Sci. Food Agric. 2018, 98(5), 1734–1742. DOI: 10.1002/jsfa.8646.
  • Mo, C.; Kim, G.; Lee, K.; Kim, M. S.; Cho, B.-K.; Lim, J.; Kang, S. Non-Destructive Quality Evaluation of Pepper (Capsicum Annuum L.) Seeds Using LED-Induced Hyperspectral Reflectance Imaging. Sensors. 2014, 14(4), 7489–7504. DOI: 10.3390/s140407489.
  • Pieszczek, L.; Czarnik-Matusewicz, H.; Daszykowski, M. Identification of Ground Meat Species Using Near-Infrared Spectroscopy and Class Modeling Techniques – Aspects of Optimization and Validation Using a One-Class Classification Model. Meat Sci. 2018, 139, 15–24. DOI: 10.1016/j.meatsci.2018.01.009.
  • Li, Y. J.; Ma, B. X.; Hu, Y. T.; Yu, G. W.; Zhang, Y. J. Detecting Starch-Head and Mildewed Fruit in Dried Hami Jujubes Using Visible/near-Infrared Spectroscopy Combined with MRSA-SVM and Oversampling. Foods. 2022, 11(16), 36010431. DOI: 10.3390/foods11162431.
  • Wu, X. H.; Wu, B.; Sun, J.; Yang, N. Classification of Apple Varieties Using Near Infrared Reflectance Spectroscopy and Fuzzy Discriminant C-Means Clustering Model. J. Food Process. Eng. 2017, 40(2), e12355. DOI: 10.1111/jfpe.12355.
  • Rahmawati, L.; Pahlawan, M. F. R.; Hariadi, H.; Masithoh, R. E. Detection of Encapsulant Addition in Butterfly-Pea (Clitoria Ternatea L.) Extract Powder Using Visible–Near-infrared Spectroscopy and Chemometrics Analysis. Open. Agric. 2022, 7(1), 711–723. DOI: 10.1515/opag-2022-0135.
  • Yu, D. X.; Guo, S.; Zhang, X.; Yan, H.; Zhang, Z. Y.; Chen, X.; Chen, J. Y.; Jin, S. J.; Yang, J.; Duan, J. A. Rapid Detection of Adulteration in Powder of Ginger (Zingiber Officinale Roscoe) by FT-NIR Spectroscopy Combined with Chemometrics. Food Chem. X. 2022, 15, 100450. DOI: 10.1016/j.fochx.2022.100450.
  • Zhang, T.; Wu, X.; Wu, B.; Dai, C.; Fu, H. Rapid Authentication of the Geographical Origin of Milk Using Portable Near-Infrared Spectrometer and Fuzzy Uncorrelated Discriminant Transformation. J. Food Process. Eng. 2022, 45(8), e14040. DOI: 10.1111/jfpe.14040.
  • Li, H.; Yang, J.; Zhou, S. Z. Design of fire alarm system of intelligent camera based on fuzzy recognition algorithm. J. Intell. Fuzzy Syst. 2021, 41, 4479–4491. DOI: 10.3233/JIFS-189708.
  • Wen, L. F.; Yang, Y.; Li, Y. L. A Comprehensive Evaluation Method of Safety Behavior of Concrete Face Rockfill Dam Based on Regression Relationship Method and Fuzzy Recognition Model. IOP Conf. Ser. Earth Environ. Sci. 2021, 643, 012070. DOI: 10.1088/1755-1315/643/1/012070.
  • Ding, Y. L.; Yang, B. R.; Xu, G. C.; Wang, X. Y. Improved Dempster–Shafer Evidence Theory for Tunnel Water Inrush Risk Analysis Based on Fuzzy Identification Factors of Multi-Source Geophysical Data. Remote Sens-Base. 2022, 14(23), 14236178. DOI: 10.3390/rs14236178.
  • Yang, C.; Yang, J. X. Design of Back Propagation Optimized Nagar-Bardini Structure-Based Interval Type-2 Fuzzy Logic Systems for Fuzzy Identification. T. I. Meas. Control. 2021, 43(12), 2780–2787. DOI: 10.1177/01423312211006635.
  • Barzegaran, J.; Shahni, D. R. F. M. Estimation of IRI from PASER Using ANN Based on K-Means and Fuzzy C-Means Clustering Techniques: A Case Study. Int. J. Pavement Eng. 2022, 23, 5153–5167. DOI: 10.1080/10298436.2021.2000988.
  • Kannan, S. R.; Devi, R.; Ramathilagam, S.; Takezawa, K. Effective FCM noise clustering algorithms in medical images. Comput. Biol. Med. 2013, 43, 73–83. DOI: 10.1016/j.compbiomed.2012.10.002.
  • Pal, N. R.; Pal, K.; Keller, J. M.; Bezdek, J. C. A Possibilistic Fuzzy C-Means Clustering Algorithm. IEEE Trans. FUZZY Syst. 2005, 13(4), 517–530. DOI: 10.1109/TFUZZ.2004.840099.
  • Huang, Y.; Guan, Y. On the Linear Discriminant Analysis for Large Number of Classes. Eng. Appl. Artif. Intell. 2015, 43, 15–26. DOI: 10.1016/j.engappai.2015.03.006.
  • Zainuddin, Z.; Laswi, A. S. Implementation of the LDA Algorithm for Online Validation Based on Face Recognition. J. Phys.: Conf. Ser. 2017, 801, 012047. DOI: 10.1088/1742-6596/801/1/012047.
  • Park, L.-J. A spatial regularization of LDA for face recognition. Int. J. Fuzzy Lod Inte. 2010, 10(2), 95–100. DOI: 10.5391/IJFIS.2010.10.2.095.
  • Zhu, F.; Gao, J.; Yang, J.; Ye, N. Neighborhood Linear Discriminant Analysis. Pattern Recogn. 2022, 123, 108422. DOI: 10.1016/j.patcog.2021.108422.
  • Park, H.; Baek, S.; Park, J. High-Dimensional Linear Discriminant Analysis Using Nonparametric Methods. J. Multivariate Anal. 2022, 188, 104836. DOI: 10.1016/j.jmva.2021.104836.
  • Zheng, F. C. Facial Expression Recognition Based on LDA Feature Space Optimization. Comput. Intel. Neurosc. 2022, 2022, 9521329. DOI: 10.1155/2022/9521329.
  • Li, B.; Ding, H. Y.; Zhou, M. J. Semi-Supervised LDA and Multi-Distance Metric Learning for Person Re-Identification. J. Phys.: Conf. Ser. 2022, 2171(1), 012054. DOI: 10.1088/1742-6596/2171/1/012054.
  • Xie, C. R.; Tian, X. L.; Feng, X. C.; Zhang, X. N.; Ruana, J. H. Preference Characteristics on consumers’ Online Consumption of Fresh Agricultural Products Under the Outbreak of COVID-19: An Analysis of Online Review Data Based on LDA Model. Procedia Comput. Sci. 2022, 207, 4486–4495. DOI: 10.1016/j.procs.2022.09.512.
  • Song, N. Q.; Zhang, J. T. Fuzzy C-Means Clustering Applied to the Classification of Glycyrrhiza Uralensis Communities in North China. Autom. Control Intell. Syst. 2017, 5, 73–73. DOI: 10.11648/j.acis.20170505.13.
  • Krishnapuram, R.; Keller, J. M. A Possibilistic Approach to Clustering. IEEE Trans. FUZZY Syst. 1993, 1(2), 98–110. DOI: 10.1109/91.227387.
  • Chen, C.; Li, H. Y.; Lv, X. Y.; Tang, J.; Chen, C.; Zheng, X. X. Application of Near Infrared Spectroscopy Combined with SVR Algorithm in Rapid Detection of cAMP content in Red Jujube. Optik. 2019, 194, 163063. DOI: 10.1016/j.ijleo.2019.163063.
  • Gao, Q. H.; Wu, P. T.; Liu, J. R.; Wu, C. S.; Parry, J. W.; Min, W. Physico-Chemical Properties and Antioxidant Capacity of Different Jujube (Ziziphus Jujuba Mill.) Cultivars Grown in Loess Plateau of China. Sci. Hortic-Amsterdam. 2011, 130(1), 67–72. DOI: 10.1016/j.scienta.2011.06.005.
  • Lang, Y. M.; Cheng, Y.; Yang, H. R.; Cheng, Q. Z.; Liu, Q.; Wang, X. T.; Bian, P. S.; Yang, X. X. Study on Identification of Red Jujube Origin by Multi-Element Analysis. Qual. Assur. Saf. Crop. 2022, 14(4), 178–187. DOI: 10.15586/qas.v14i4.1139.
  • Gao, L. J.; Han, F.; Zhang, X. W.; Liu, B.; Fan, D. W.; Sun, X.; Zhang, Y. F.; Yan, L. G.; Dong, W. Simultaneous Nitrate and Dissolved Organic Matter Removal from Wastewater Treatment Plant Effluent in a Solid-Phase Denitrification Biofilm Reactor. Bioresource. Technol. 2020, 314, 123714. DOI: 10.1016/j.biortech.2020.123714.
  • Qi, Z. X.; Wu, X. H.; Yang, Y. J.; Wu, B.; Fu, H. J. Discrimination of the Red Jujube Varieties Using a Sortable NIR Spectrometer and Fuzzy Improved Linear Discriminant Analysis. Foods. 2022, 11(5), 763–763. DOI: 10.3390/foods11050763.
  • Shi, X. W.; Yao, L. J.; Pan, T. Visible and Near-Infrared Spectroscopy with Multi-Parameters Optimization of Savitzky-Golay Smoothing Applied to Rapid Analysis of Soil Cr Content of Pearl River Delta. J. Geosci. Environ. Prot. 2021, 9(3), 75–83. DOI: 10.4236/gep.2021.93006.