Full article: Sweetener identification using transfer learning and attention mechanism

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Accurate identification of the taste of compounds has helped in the screening and development of new sweeteners. This study proposes a deep learning model for sweetener identification based on transfer learning and attention mechanism. The Squeeze-and-Excitation (SE) attention mechanism is incorporated into the pre-trained Residual Network-50 (ResNet-50) model, resulting in SE-ResNet-50. Additionally, the Convolutional Block Attention Module (CBAM) is integrated to generate the CBAM-SEResNet-50 model for sweetener identification. This study divided the taste molecule dataset into two parts: Cross-Validation (CV) dataset and Hold-out test dataset. The effectiveness of the algorithm was verified using both the 5-fold CV and the Hold-out test methods. The experimental results demonstrate that the CBAM-SEResNet-50 model achieves an accuracy of 0.956 and an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.972 on the Hold-out test dataset. In the case of the 5-fold CV, the accuracy is 0.944 and the AUROC is 0.969.

KEYWORDS:

1. Introduction

Sweeteners, as a new type of food additive, have become indispensable in modern life (Castro-Muñoz et al., Citation2022). They serve as a sugar substitute, providing sweetness while having benefits such as low-calorie content and ease of preservation. Moreover, sweeteners play a crucial role in preventing various chronic diseases such as obesity, hypertension, and type 2 diabetes (MALULY et al., Citation2020) and are widely used in the food, pharmaceutical, and health product industries (Li et al., Citation2021). With people’s pursuit of a healthy diet, the demand for and research on sweeteners are increasing (Moriconi et al., Citation2020; Saraiva et al., Citation2020).

The types of sweeteners include natural sweeteners and artificial synthetic sweeteners. Recently, artificial synthetic sweeteners are gaining more attention (Gardener & Elkind, Citation2019). They can not only replace sugar to reduce calorie intake but also serve as excellent choices for people who need to control blood sugar and weight (Li et al., Citation2023). However, most artificial sweeteners on the market have certain issues (Debras et al., Citation2022). For instance, sugar alcohol sweeteners can cause adverse reactions such as diarrhea in the human body, and chemical synthetic sweeteners like saccharin (benzoic sulfimide) may have potential harm with long-term use (Miao et al., Citation2022). As a result, the development of novel and safe sweeteners has become an important topic in the field of food additives (Mela et al., Citation2020). Traditional sweetener screening methods are cumbersome, requiring a large number of experiments, which is time-consuming and costly. With the rapid development of modern artificial intelligence technology, computer-assisted screening has become an important means for the development of new sweeteners, it can greatly reduce the cost and time required for sweetener development.

Currently, there are mainly two types of computer-assisted screening methods for new sweeteners: structure-based and ligand-based methods (Lee et al., Citation2022). The structure-based method necessitates comprehension of the crystal structure of the sweet taste receptor Taste receptor type 1 member 2/Taste receptor type 1 member 3 (T1R2/T1R3) (Goel et al., Citation2021), but the experimental 3D structure of the sweet taste receptor has not been resolved yet (Bouysset et al., Citation2020). Therefore, this method is not well suited for taste prediction of sweet and non-sweet compounds. In contrast, ligand-based machine learning methods have significant advantages (Sharma et al., Citation2021), which can efficiently and effectively screen compounds that meet sweetening requirements.

Currently, researchers from both domestic and international institutions have proposed various machine learning models for predicting the sweet and non-sweet properties of compounds. For example, Rojas et al. (Citation2016) used a combination of unsupervised variable reduction, genetic algorithms, and K-nearest neighbor (KNN) methods to select the best subset of molecular descriptors to differentiate compounds with sweet, tasteless, and bitter tastes. Afterwards, Rojas et al. (Citation2017) employed partial least squares discriminant analysis (PLS-DA) and KNN to construct an expert system that effectively predicts the relationship between molecular structure and sweetness using molecular descriptors and extended connectivity fingerprints. Tuwani et al. (Citation2019) developed the BitterSweet model using adaptive boosting (AdaBoost) and random forest (RF) methods, which used molecular descriptors and extended connectivity fingerprints to represent molecules. In this study, the dataset consists of 2,366 sweet and non-sweet molecules. Fritz et al. (Citation2021) invented a sweetener identification model called VirtualTaste, based on random forest, data oversampling, and molecular fingerprints. This model was used to classify 2,011 sweet and non-sweet molecules using validation methods including 10-fold Cross-Validation (CV) and Hold-out tests. Goel et al. (Citation2021) used eight molecular descriptors as input variables to build a sweetener identification model based on random forests. A training set of 1,537 molecules was used to construct the model and a test set of 385 molecules was used for qualitative evaluation of the model. Both identification specificity and accuracy were above 0.86 for both sweet and bitter tastes.

Compared to conventional machine learning algorithms, deep learning possesses the ability to automatically extract features from raw data and address more intricate nonlinear problems (Gutiérrez et al., Citation2021). It finds in extensive application in tasks such as activity prediction (Chen et al., Citation2021) and drug discovery (Shi et al., Citation2019). In addition, deep learning methods have been employed for sweetener identification tasks. For example, e-sweet (Zheng et al., Citation2019) is an ensemble learning model that incorporates deep neural networks (DNN), support vector machines (SVM), gradient boosting machine (GBM), and RF, and utilized to predict the sweet and non-sweet properties of 1,380 compound molecules. BoostSweet (Lee et al., Citation2022) used eight types of molecular fingerprints and three molecular descriptors as molecular features and trained fully connected network (FCN), RF, extreme gradient boosting (XGBoost), and light gradient boosting machine (LGBM) separately. Finally, compound molecules are classified into different classes by a soft-voting ensemble learning model, which integrates the results of different machine learning models. This ensemble learning model improves identification accuracy by using votes from multiple machine learning algorithms. Bo et al. (Citation2022) used data augmentation techniques such as rotation and flipping to expand the molecular image dataset. Then, a convolutional neural networks (CNN) model based on 2D molecular images and a multilayer perceptron model based on molecular fingerprints are designed to predict the sweet taste properties of molecules, achieving an accuracy rate of 0.84 and 0.88, respectively.

Despite these aforementioned advantages of deep learning algorithms, training deep learning models requires a large amount of data, and a lack of data can lead to issues such as local optima or overfitting. Therefore, researchers often resort to transfer learning as a solution to this problem. In short, transfer learning involves utilizing a model that has been pre-trained on one task and applying it to another (Zhong et al., Citation2021). This approach accelerates the convergence of the model and reduces the training time. Yu et al. (Citation2023) tested and compared different transfer learning mechanisms, including AlexNet, Visual Geometry Group-19 (VGG-19), Residual Network-18 (ResNet-18), ResNet-50, and ResNet-101, for the smart apple identification problem. Xu et al. (Citation2023) employed CNN and transfer learning algorithms to classify and predict 900 images of six types of sea buckthorn fruits with varying water content. Zhong et al. (Citation2021) used transfer learning and data augmentation methods were to predict the reactivity of 1,089 organic compounds towards hydroxyl radicals. The experimental results demonstrate that using transfer learning and data augmentation methods effectively enhance robustness and predictive performance. Iqbal et al. (Citation2021) used 2D molecular images and transfer learning methods to predict molecular activity cliffs, which refer to cases where molecules have similar structures but significantly different activities. Corresponding experimental results show that re-training the optimized convolutional layer weights effectively improves the predictive accuracy of the model. In addition, attention mechanisms have been widely applied in the field of deep learning in recent years. Deep learning models focus on the entire input image, while attention mechanisms focus on specific regions of interest within the input image, shifting the focus from global to local. A vegetable disease recognition model was constructed using dual-transfer learning and attention mechanisms, based on the ImageNet and AI Challenger 2018 public databases (Zhao et al., Citation2022). Hua et al. (Citation2022) compared different attention mechanisms, such as Squeeze-and-Excitation (SE) attention, Efficient Channel attention, and Convolutional Block Attention Module (CBAM) attention, for the identification of crop diseases and pests, based on the PlantVillage database.

At present, the absence of a significant quantity of sweetener molecule data poses challenges for traditional machine learning methods and deep learning technologies in achieving satisfactory identification results on sweetener molecule image datasets. To address this issue, this article aims to construct a new sweetener classification and identification model that leverages transfer learning and attention mechanism to improve sweetener identification performance in the absence of a large amount of sweetener molecule images. The contributions of our study are as below:

A new dataset is collected based on multiple public sweetener molecular datasets, in which each molecule is represented by its structure in Simplified Molecular Input Line Entry System (SMILES) format. The dataset contains a total of 2,763 sweetener molecules and 2,313 non-sweetener molecules.
A sweetener classification method based on transfer learning and molecular images is proposed, as shown in . It mainly consists of the following steps: molecular image generation, dataset partitioning, data augmentation, transfer learning classification model construction, and sweetener molecular classification prediction.
Figure 1. Illustration of the proposed sweetener identification method.
A CBAM-SEResNet-50 transfer learning model is proposed. This model combines a pre-trained ResNet-50 model with SE and CBAM attention modules to efficiently extract and utilize sweet molecular image features, which further improves the identification performance of the model.

2. Materials and methods

2.1. Data acquisition

The dataset used in this study was collected from various literature sources and publicly available databases (Rojas et al., Citation2016, Citation2017; Tuwani et al., Citation2019; Garg et al., Citation2018; Dagan-Wiener et al., Citation2019), as shown in . It encompasses a range of sweet compounds, including sugars, carbohydrates, and artificial sweeteners. And the taste labels (sweet or non-sweet) of the molecules are experimentally validated.

Table 1. Description of the collected dataset.

Download CSV Display Table

This article collected and organized a novel taste molecular dataset containing 2,763 sweet compounds and 2,313 nonsweet compounds, as shown in . The non-sweet compounds consist of 1,944 bitter compounds and 369 tasteless compounds. The dataset was organized using the following criteria:

Merging the tasteless and bitter categories into one non-sweet category, because the main scientific interest was in identifying sweet compounds rather than bitter or tasteless chemicals (Rojas et al., Citation2016).
Eliminating duplicate compounds from different sources, keeping only one molecule with consistent labels. For molecules with inconsistent labels, removing those ambiguous molecules (Yang et al., Citation2022).
Removing compounds with multiple tastes to ensure that each molecule in the dataset has only one label.
Removing SMILES that cannot be used to generate molecular images. When generating a molecular image using RDKit (http://www.rdkit.org), the molecular input is in SMILES format. Due to possible SMILES errors, some parameters were blank values. These molecules were removed from the dataset to reduce modeling errors.

2.2. Data preprocessing

Representing compounds through 2D molecular images is a simpler and more intuitive approach, where each chemical substance has its own unique molecular image. In this study, data preprocessing is performed using techniques such as image conversion, data normalization, and data augmentation. The RDKit package in Python was used to convert molecular SMILES into RGB color images with a size of 224 × 224, and the generated molecular image samples are shown in . The generated molecular image dataset is randomly split into two parts, the CV dataset and the Hold-out test dataset. From the dataset, randomly select around 80% of the sweet molecules (2,233 sweet molecules) and non-sweet molecules (1,556 bitter molecules and 295 tasteless molecules) for 5-fold CV. The remaining 530 sweet molecules and 462 non-sweet molecules (388 bitter molecules and 74 tasteless molecules) are allocated as a Hold-out test dataset. The CV dataset is used to train the CNN model, tune the model parameters and avoid overfitting of the model, while the Hold-out test dataset is used to evaluate the generalization performance of the CNN model.

Figure 2. Molecular images in the dataset categorized as sweet and non-sweet (bitter, tasteless) compounds.

The molecular images which are used for training in CV dataset were flipped horizontally, vertically, and randomly to increase the diversity of the dataset, as shown in . The pixel values of the input molecular images are normalized to the range 0 to 1. With data preprocessing, the learning process of the model can be optimized and its performance and robustness can be improved. The images which are used for testing in CV and Hold-out test datasets were not augmented in our study.

Figure 3. Data augmentation.

2.3. Sweetener identification model CBAM-SEResNet-50

The constructed sweetener identification model in our study, CBAM-SEResNet-50, integrates the ResNet-50 model, SE attention mechanism, and CBAM attention mechanism, as shown in . The SE attention module is introduced into each residual unit of the pre-trained ResNet-50 to enhance feature utilization and model identification of sweet taste features. The CBAM attention module is inserted in the residual layer to adjust the features with channel and spatial attention, which improves the feature representation capability of the model. During training, the convolutional layers of ResNet-50 with pre-trained weights are unfrozen, and the CBAM-SEResNet-50 model is trained on a collected taste molecule dataset with 2D molecular images as input.

Figure 4. Sweetener identification based on CBAM-SEResNet-50.

2.3.1. ResNet-50 architecture

To enhance the reliability of the sweetener identification model, this study selects ResNet, a deep neural network based on the concept of residual learning, as the fundamental model for predicting compound tastes. ResNet ensures the propagation of information across multiple hidden layers by adding skip connections between convolutional layers, effectively alleviating the issues of gradient loss and network degradation present in traditional deep neural networks (He et al., Citation2016, pp. 770–778) and enabling the expansion of the network to dozens or even hundreds of layers. Considering the parameters and performance of the model comprehensively, ResNet-50 was chosen as the basic network architecture in this study.

() presents the network structure of ResNet-50, which consists of five groups of convolutional layers (a total of 49 convolutional layers) and one fully connected layer. After a 7 × 7 convolutional layer and a 3 × 3 max pooling layer, the output feature dimension is reduced by half. In the subsequent four groups, convolution kernels of sizes 1 × 1 and 3 × 3, are respectively used to extract features of different sizes from the image, with the ReLU function as the activation function after the convolutional layers. This stacked structure significantly reduces the model parameters and computational workload, while the inclusion of 1 × 1 convolution kernels increases the network depth and improves the nonlinearity of the network (Wu et al., Citation2023). The output dimension of the final fully connected layer corresponds to the number of taste molecule categories.

Table 2. The architecture of ResNet-50.

Display Table

2.3.2. SE attention module

SE attention module is a feature fusion method for enhancing the performance of CNN. It simulates the human attention mechanism, allowing the model to focus more on important channel information and suppress less important channel features, thereby improving the network’s identification efficiency (Hu et al., Citation2018, pp. 7132–7141). This is shown in . First, the input features i of each channel C, are globally averaged and pooled in spatial dimension via the compression operation F_sq. Then, the pooled features are fed into two fully connected layers using the excitation operation F_ex, where the first fully connected layer uses ReLU activation function and the second fully connected layer uses Sigmoid activation function to obtain the weight values for each feature channel. Finally, the weight values are used to perform the element-wise scaling operation F_scale on the original features and re-calibrate them to obtain the extracted attention features. The formulas for computing F_sq, F_ex, and F_scale are as follows.

(1)

Z_{c} = F_{sq} (u_{c}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} u_{c} (i, j)

(1)

(2)

S = F_{ex} (z, W) = σ (W_{2} δ (W_{1} z))

(2)

(3)

{\tilde{o}}_{c} = F_{scale} (u_{c}, s_{c}) = s_{c} ∙ u_{c}

(3)

Figure 5. The network structure of SE attention module.

Where u_c (i, j) represents the feature value of channel C at position (i, j), W₁ and W₂ are the weight values of the two fully connected layers. δ denotes the ReLU activation function, and σ represents the Sigmoid activation function.

2.3.3. CBAM attention module

CBAM is a lightweight attention model that can pay more attention to the fundamental features throughout the feature map in both channel and spatial dimensions (Woo et al., Citation2018, pp. 3–19). The structure of CBAM is illustrated in (). The calculation process of CBAM is shown in EquationEquations (4)(4) $F_{i, 1} = σ_{c} (F) \otimes F$ (4) and (Equation5(5) $F_{i, 2} = σHW (F_{i, 1}) \otimes F_{i, 1}$ (5) ).

(4)

F_{i, 1} = σ_{c} (F) \otimes F

(4)

(5)

F_{i, 2} = σHW (F_{i, 1}) \otimes F_{i, 1}

(5)

Figure 6. The network structure of CBAM attention module.

Where F is the original feature, F_i,1 is the attention feature of the i-th channel, F_i,2 is the CBAM output feature of channel i, and σ is the Sigmoid activation function.

2.4. Evaluation indicator

For the sweetener identification problem, the performance of the model is evaluated in terms of six indices — accuracy, precision, recall, specificity, F1, and AUROC. Accuracy is the percentage of correctly classified compounds out of the total number of compounds. Precision is the ratio of the number of true positive samples to the number of samples predicted to be positive. Recall represents the true positive rate, while specificity represents the true negative rate. The F1 ranges from 0 to 1 and is computed from precision and recall. A high F1 indicates good model performance. The AUROC quantifies the identification performance at different thresholds, with higher AUROC indicating better model performance.

(6)

Accuracy = \frac{TP + TN}{TP + FP + TN + FN}

(6)

(7)

Precision = \frac{TP}{TP + FP}

(7)

(8)

Recall = \frac{TN}{TN + FN}

(8)

(9)

Specificity = \frac{TN}{TN + FP}

(9)

(10)

F 1 = 2 \times \frac{Precision * Recall}{Precision + Recall}

(10)

TP represents the number of samples identified as sweet and correctly classified, TN represents the number of samples identified as non-sweet and correctly classified, FN represents the number of samples identified as non-sweet but should be classified as sweet, and FP represents the number of samples identified as sweet but should be classified as non-sweet.

3. Results and discussion

3.1. Experimental environment and parameter setting

Hardware environment in this study: CPU is Intel(R) Core (TM) i7-12700F 2.10 GHz, and the graphics card is NVIDIA GeForce RTX 3060 Ti. Software systems include the Windows 11 operating system, PyCharm Community Edition, Python 3.7.0, and PyTorch 1.7.0.

The main fine-tuning parameters of the model in this study include network input size, optimizer, learning rate, batch size, and number of iterations. The parameter settings of the optimized CBAM-SEResNet-50 sweetener identification model are shown in . During the training process, the stochastic gradient descent (SGD) optimizer with a momentum coefficient of 0.9 is used, and the L2 regularization coefficient is set to 0.00005. In addition, the cosine annealing algorithm (Loshchilov & Hutter, Citation2016) is applied to control the variation of the learning rate. Considering the size of the training set and GPU performance, the batch size is set to 36, resulting in a significant improvement in model performance.

Table 3. Parameter setting.

Download CSV Display Table

3.2. Effect of attention mechanisms on model performance

To analyze the impact of attention mechanisms on model performance, this study compared different transfer learning models for sweeteners identification. The basic transfer learning model is the ResNet-50 model with ImageNet pre-trained weights. The CBAM-ResNet-50 model is formed by integrating CBAM attention into the residual layers of the ResNet-50 network. For the SE-ResNet-50 model, the SE attention module is added to the ResNet-50 residual units. The proposed CBAM-SEResNet-50 model in this study is a transfer learning model that combines both attention mechanisms. The aforementioned network models were trained under the same experimental conditions. The mean results of 10 run times are shown in and , and the dataset is randomly split proportionally in each run. The experimental results indicate that:

Figure 7. Comparing the AUROC values of the transfer learning model ResNet-50 with different attention mechanisms.

Compared to transfer learning model ResNet-50, the other models with attention mechanisms exhibit improved accuracy, recall, F1, specificity, and AUROC on the Hold-out test dataset.
The SE-ResNet-50 model performs better than the CBAM-ResNet-50 model. This is because the CBAM-ResNet-50 model only introduces attention modules after the residual layers, while the SE-ResNet-50 model adds attention modules to the residual units. Therefore, the SE-ResNet-50 model can more accurately capture key information in the feature maps.
The CBAM-SEResNet-50 model achieves an accuracy of 0.944 ± 0.009 in CV and 0.956 ± 0.008 in Hold-out test. It outperforms the other three models in terms of precision, sensitivity, specificity, and F1, demonstrating greater stability and reliability. Additionally, this model achieves an AUROC value of 0.972 ± 0.007 on the Hold-out test dataset, indicating superior compound discrimination capability. The model incorporates two different attention mechanisms and focuses on different levels and positions of the feature maps, with the network’s generalization ability enhanced.

Table 4. Comparison of the transfer learning model ResNet-50 with different attention mechanisms.

Download CSV Display Table

The confusion matrix is an analysis table that summarizes the identification and prediction results of models in machine learning. In one run of this experiment, the confusion matrix results for 4 different models on the test dataset are shown in . In the figure, each row represents the predicted category, while each column represents the actual category, with C0 representing the sweet category and C1 representing the non-sweet category. The value along the diagonal indicates the number of correctly predicted samples. According to (), the CBAM-SEResNet-50 model shows overall better identification performance compared to the other models, being able to classify both sweet and non-sweet molecules more accurately. The identification model achieved a recall of 0.953 and a specificity of 0.959. These findings emphasize the effectiveness and reliability of our approach, which combines transfer learning, the SE attention mechanism, and the CBAM attention mechanism to optimize ResNet-50 for compound molecular taste identification.

Figure 8. Confusion matrices of different sweetener identification models.

3.3. Comparison with different deep learning algorithms

To further validate the identification performance of the CBAM-SEResNet-50 model, AlexNet, VGG-16, GoogleNet and Densely Connected Convolutional Networks-121 (DenseNet-121) models were selected for transfer learning-driven sweetener identification under the same experimental conditions. The experimental results are shown in and . The proposed model demonstrates certain advantages in terms of accuracy and AUROC during Hold-out testing, with an identification accuracy 1.51% higher than the best results of the other models and a 1.7% improvement in AUROC. In addition, good results are achieved in terms of recall and F1, indicating significantly better performance of the proposed method. Compared to the other three classical models, DenseNet-121, a densely connected CNN, demonstrates higher accuracy and AUROC values, but does not perform better overall than the proposed model CBAM-SEResNet-50. GoogleNet and VGG-16 perform slightly worse than DenseNet-121. On the other hand, AlexNet performs relatively poorly due to its relatively simple network structure, which makes it difficult to perform complex sweetener identification tasks. Overall, the optimized model demonstrated the highest identification performance, suggesting that the algorithm can effectively and accurately identify sweet taste properties of unknown molecules.

Figure 9. Comparison of AUROC values for different transfer learning driven sweetener identification algorithms.

Table 5. Comparison of different transfer learning driven sweetener identification models.

Download CSV Display Table

3.4. Comparison with previous sweetener identification models

The pertinent research on computer-aided sweetener identification models is summarized in (). Compared to the BoostSweet (Lee et al., Citation2022) model based on molecular fingerprints and molecular descriptors, as well as the VirtualTaste (Fritz et al., Citation2021) model based on molecular fingerprints, the CBAM-SEResNet-50 model achieved an improvement of more than 1% in AUROC value on the Hold-out test set. Furthermore, the accuracy of this study on Hold-out test set is 5% higher than e-sweet (Zheng et al., Citation2019). Moreover, in comparison to the sweetener identification model based on PLS-DA and KNN (Rojas et al., Citation2017), and the Bittersweet model based on AdaBoost (Tuwani et al., Citation2019), CBAM-SEResNet-50 model exhibits superior sensitivity and specificity performances.

Table 6. Comparison with previous sweetener identification methods.

Download CSV Display Table

Bo et al. (Citation2022) constructed a sweetener identification model using 2D molecular images and a shallow CNN, achieving an identification accuracy of 0.84 and an AUROC value of 0.90. However, insufficient data and the difficulty of extracting molecular features have limited the predictive performance of the model. In our study, we combined transfer learning and attention mechanism methods, resulting in an identification accuracy of 0.956 ± 0.008 and an AUROC value of 0.972 ± 0.007. The transfer learning method overcomes the challenge of data scarcity, while the attention mechanism enables the model to focus on relevant features and emphasize their importance. The identification accuracy and robustness are significantly improved. The above results show that the model we proposed is feasible for molecular sweetener identification.

However, there are still limitations in our study. Firstly, our study only focuses on the binary identification of sweet and non-sweet (bitter and tasteless) molecules, neglecting other food tastes such as sourness and saltiness. Multi-taste identification is one of our future research directions. Secondly, 2D molecular images lack the biochemical characteristics and chemical information pertaining to the interaction of taste molecules, such as hydrophobic groups, hydrogen bond receptors, water solubility, total charge, and the number of aromatic rings. In future research, we will further explore the multi-modal representation (fusion of molecular descriptor features and molecular image features) or 3D structural representation to improve the feature expression ability and interpretability of the model.

3.5. Visualization of feature maps

Compared to traditional machine learning methods, deep learning models have the advantage of automatically extracting features from molecular images, eliminating the need for manual feature selection. While traditional manual feature extraction methods suffer from complexity and high error rates in the experimental process, deep learning models simplify the process and make feature extraction simpler and more efficient. () shows the visualization of the convolutional layer (Conv1), maximum-pooling layer (Max Pooling), and residual layers (Conv_2, Conv_3) of our proposed model on the sweetener dataset. It can be observed that the molecular skeletons presented in the feature maps are similar in these layers, while regions with elevated pixel values are accentuated, indicating their substantial contribution to the model. Features learned in () mainly include some simple edge information and low-level features, while features learned in () are more detailed, informative, and distinctive. The heat map for visualizing the region of interest (ROI) is shown in . For each image, we extracted features from the last convolutional layer of the algorithm to visualize the parts that contribute to sweet molecule identification. Compared with the regions covered by blue, the regions covered in red and yellow hold greater significance for sweet molecule identification. From the visualization results, it can be observed that hydroxyl, amino and carbonyl groups are clearly highlighted, which indicates that these chemical structures may be associated with the sweet taste of a molecule. By visualizing the features, the process of extracting molecular features can be visually observed, validating the feasibility of predicting molecular taste based on molecular structure.

Figure 10. Visualization of the convolutional layers, maximum-pooling layers, and residual layers in the CBAM-SEResNet-50 model.

Figure 11. Visualizing the regions of interest (ROI) of sweet molecule images with hydroxyl groups.

Figure 12. Visualizing the ROI of sweet molecule images with carbonyl and amino groups.

3.6. Applicability Domain (AD) assessment

AD is a reliable method for assessing whether reliable predictions can be made for new compounds. It is mainly determined by calculating the similarity between the new compound and the compounds in the training set. The similarity measurement mainly includes the maximum similarity and the average similarity. A higher degree of similarity between the new compound and the compounds in the training set indicates that a more dependable prediction can be made for the new compound. In this study, Structural Similarity (SSIM) is employed to quantify the similarity between two images (Wang et al., Citation2004). The SSIM index ranges from 0 to 1, and a higher value signifies greater similarity between the images.

To determine the similarity measurement method and threshold for the AD of our proposed model, each image of a compound in the Hold-out test dataset was individually compared with the images of compounds in the CV dataset one by one, and the AUROC values were recalculated using different similarity measurement methods and thresholds, as shown in . Meanwhile, the number of compounds falling outside the AD was calculated. If the similarity of a compound in the test dataset is lower than the threshold, it is identified as being outside the AD and removed from the Hold-out test dataset. In the (), when the maximum similarity measurement is used and the threshold is 0.86, the number of compounds identified outside the AD is the smallest, and the AUROC value is the highest. Therefore, it can be concluded that if the maximum structural similarity between a given compound and the compounds in the CV dataset is higher than 0.86, our sweet taste identification model can make reliable predictions.

Table 7. The similarity threshold, the number of compounds outside AD for each threshold, and the corresponding AUROC value.

Download CSV Display Table

To evaluate the prediction performances of the CBAM-SEResNet-50 model on external datasets, we applied the model to the FlavorDB (Garg et al., Citation2018), FooDB (http://foodb.ca) and Super Natural II (Banerjee et al., Citation2015) datasets. For the FlavorDB dataset, we excluded compounds which maximum similarity thresholds are below 0.86 and natural compound molecules were included in our experiment as shown in , with 1,675 sweet compounds predicted. For FooDB and Super Natural II datasets, the number of molecules identified as sweet compounds are 11,485 and 67,586, respectively. In contrast to traditional machine learning methods (Tuwani et al., Citation2019), the CBAM-SEResNet-50 model employed deep learning methods to automatically extract features from molecule image for sweetener identification, resulting in more sweet molecules being successfully identified.

Table 8. Sweetener identification applications of CBAM-SEResNet-50 on different datasets.

Download CSV Display Table

3.7. Training CBAM-SEResNet-50 model on a multi-taste molecule dataset

To further evaluate our proposed CBAM-SEResNet-50 model, we collected a new molecule dataset (Kou et al., Citation2023) containing 6,107 sweet molecules and 2,474 non-sweet molecules, and performed the sweetener identification on this dataset. The non-sweet molecules in the dataset are comprised of 1,948 bitter molecules, 186 tasteless molecules, 11 salty molecules, 90 umami molecules, 33 sour molecules, and 206 multi-taste molecules. The dataset was split into a CV dataset and an external Hold-out test dataset in an 8:2 proportion. Data preprocessing and model training were conducted under the same experimental conditions mentioned above. As depicted in (), the 5-fold CV yielded an accuracy 0.919 ± 0.007, and an AUROC value of 0.950 ± 0.009. For the Hold-out test dataset, the accuracy is 0.926 ± 0.009, and AUROC value is 0.953 ± 0.007. Despite the diversity of non-sweet samples in the dataset, the CBAM-SEResNet-50 model still achieved satisfying performances, showing its outstanding performance in sweetness identification.

Table 9. The identification results of the CBAM-SEResNet-50 model on a multi-taste molecule dataset.

Download CSV Display Table

4. Conclusion

In this paper, we propose CBAM-SEResNet-50, a sweetener identification model based on transfer learning and attention mechanism to achieve better performance and robustness. This approach effectively enhances the network’s ability to express sweetener features based on residual network, thereby facilitating identification optimization. By using a pre-trained ResNet-50 model and attention mechanism, the limited dataset is fully utilized to improve the generalization ability of the sweetener recognition model. This approach offers an efficient and accurate screening method for food sweeteners, enabling the development of intelligent sweetener identification software that can be applied in real-world settings.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Natural Science Foundation of Zhejiang Province, China, [Grant No. LZ21F020008], the Blue Project of Jiangsu Province, China, and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China [Grant No. 22KJB510020].

References

Banerjee, P., Erehman, J., Gohlke, B. O., Wilhelm, T., Preissner, R., & Dunkel, M. (2015). Super natural II—a database of natural products. Nucleic acids research, 43(D1), D935–D939. https://doi.org/10.1093/nar/gku886
PubMed Web of Science ®Google Scholar
Bo, W., Qin, D., Zheng, X., Wang, Y., Ding, B., Li, Y., & Liang, G. (2022). Prediction of bitterant and sweetener using structure-taste relationship models based on an artificial neural network. Food Research International, 153, 110974. https://doi.org/10.1016/j.foodres.2022.110974
PubMed Web of Science ®Google Scholar
Bouysset, C., Belloir, C., Antonczak, S., Briand, L., & Fiorucci, S. (2020). Novel scaffold of natural compound eliciting sweet taste revealed by machine learning. Food Chemistry, 324, 126864. https://doi.org/10.1016/j.foodchem.2020.126864
PubMed Web of Science ®Google Scholar
Castro-Muñoz, R., Correa-Delgado, M., Córdova-Almeida, R., Lara-Nava, D., Chávez-Muñoz, M., Velásquez-Chávez, V. F., Hernández-Torres, C. E., Gontarek-Castro, E., & Ahmad, M. Z. (2022). Natural sweeteners: Sources, extraction and current uses in foods and food industries. Food Chemistry, 370, 130991. https://doi.org/10.1016/j.foodchem.2021.130991
PubMed Web of Science ®Google Scholar
Chen, J., Cheong, H. H., & Siu, S. W. (2021). xDeep-AcPEP: Deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. Journal of Chemical Information and Modeling, 61(8), 3789–13. https://doi.org/10.1021/acs.jcim.1c00181
PubMed Web of Science ®Google Scholar
Dagan-Wiener, A., DiPizio, A., Nissim, I., Bahia, M. S., Dubovski, N., Margulis, E., & Niv, M. Y. (2019). BitterDB: Taste ligands and receptors database in 2019. Nucleic Acids Research, 47(D1), D1179–D1185. https://doi.org/10.1093/nar/gky974
PubMed Web of Science ®Google Scholar
Debras, C., Chazelas, E., Srour, B., Druesne-Pecollo, N., Esseddik, Y., de Edelenyi, F. S., De Sa, A., Lutchia, R., Gigandet, S., Huybrechts, I., Julia, C., Kesse-Guyot, E., Allès, B., Andreeva, V. A., Galan, P., Hercberg, S., Deschasaux-Tanguy, M., Touvier, M., & Agaësse, C. (2022). Artificial sweeteners and cancer risk: Results from the NutriNet-Santé population-based cohort study. PLOS Medicine, 19(3), e1003950. https://doi.org/10.1371/journal.pmed.1003950
PubMed Web of Science ®Google Scholar
Fritz, F., Preissner, R., & Banerjee, P. (2021). VirtualTaste: A web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Research, 49(W1), W679–W684. https://doi.org/10.1093/nar/gkab292
PubMed Web of Science ®Google Scholar
Gardener, H., & Elkind, M. S. (2019). Artificial sweeteners, real risks. Stroke, 50(3), 549–551. https://doi.org/10.1161/STROKEAHA.119.024456
PubMed Web of Science ®Google Scholar
Garg, N., Sethupathy, A., Tuwani, R., Nk, R., Dokania, S., Iyer, A., Gupta, A., Agrawal, S., Singh, N., Shukla, S., Kathuria, K., Badhwar, R., Kanji, R., Jain, A., Kaur, A., Nagpal, R., & Bagler, G. (2018). FlavorDB: A database of flavor molecules. Nucleic Acids Research, 46(D1), D1210–D1216. https://doi.org/10.1093/nar/gkx957
PubMed Web of Science ®Google Scholar
Goel, A., Gajula, K., Gupta, R., & Rai, B. (2021). In-silico screening of database for finding potential sweet molecules: A combined data and structure based modeling approach. Food Chemistry, 343, 128538. https://doi.org/10.1016/j.foodchem.2020.128538
PubMed Web of Science ®Google Scholar
Gutiérrez, S., Hernández, I., Ceballos, S., Barrio, I., Díez-Navajas, A. M., & Tardaguila, J. (2021). Deep learning for the differentiation of downy mildew and spider mite in grapevine under field conditions. Computers and Electronics in Agriculture, 182, 105991. https://doi.org/10.1016/j.compag.2021.105991
Web of Science ®Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). https://arxiv.org/abs/1512.03385
Google Scholar
Hua, J., Zhu, T., & Liu, J. (2022). Leaf classification for crop pests and diseases in the compressed domain. Sensors, 23(1), 48. https://doi.org/10.3390/s23010048
PubMed Web of Science ®Google Scholar
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141). https://doi.org/10.48550/arXiv.1709.01507
Google Scholar
Iqbal, J., Vogt, M., & Bajorath, J. (2021). Learning functional group chemistry f-rom molecular images leads to accurate prediction of activity cliffs. Artificial Intelligence in the Life Sciences, 1, 100022. https://doi.org/10.1016/j.ailsci.2021.100022
Google Scholar
Kou, X., Shi, P., Gao, C., Ma, P., Xing, H., Ke, Q., & Zhang, D. (2023). Data-driven elucidation of flavor chemistry. Journal of Agricultural and Food Chemistry, 71(18), 6789–6802. https://doi.org/10.1021/acs.jafc.3c00909
PubMed Web of Science ®Google Scholar
Lee, J., Song, S. B., Chung, Y. K., Jang, J. H., & Huh, J. (2022). BoostSweet: Learning molecular perceptual representations of sweeteners. Food Chemistry, 383, 132435. https://doi.org/10.1016/j.foodchem.2022.132435
PubMed Web of Science ®Google Scholar
Li, D., Yao, Y., & Sun, H. (2021). Emission and mass load of artificial sweeteners from a pig farm to its surrounding environment: Contribution of airborne pathway and biomonitoring potential. Environmental Science & Technology, 55(4), 2307–2315. https://doi.org/10.1021/acs.est.0c05326
PubMed Web of Science ®Google Scholar
Li, D., Zheng, Q., Thomas, K. V., Dang, A. K., Binh, V. N., Anh, N. T. K., & Thai, P. K. (2023). Use of artificial sweeteners and caffeine in a population of Hanoi: An assessment by wastewater-based epidemiology. Science of the Total Environment, 868, 161515. https://doi.org/10.1016/j.scitotenv.2023.161515
PubMed Web of Science ®Google Scholar
Loshchilov, I., & Hutter, F. (2016). Sgdr: Stochastic gradient descent with warm restarts. arXiv Preprint arXiv: 160803983. https://doi.org/10.48550/arXiv.1608.03983
Google Scholar
MALULY, H. D. B., Johnston, C., Giglio, N. D., Schreiner, L. L., Roberts, A., & Abegaz, E. G. (2020). Low-and No-Calorie Sweeteners (LNCS): Critical evaluation of their safety and health risks. Food Science and Technology, 40(1), 1–10. https://doi.org/10.1590/fst.36818
Web of Science ®Google Scholar
Mela, D. J., McLaughlin, J., & Rogers, P. J. (2020). Perspective: Standards for research and reporting on low-energy (“artificial”) sweeteners. Advances in Nutrition, 11(3), 484–491. https://doi.org/10.1093/advances/nmz137
PubMed Web of Science ®Google Scholar
Miao, Y., Ni, H., Zhang, X., Zhi, F., Long, X., Yang, X., Zhang, X., He, X., & Zhang, L. (2022). Investigating mechanism of sweetness intensity differences through dynamic analysis of sweetener–T1R2–membrane systems. Food Chemistry, 374, 131807. https://doi.org/10.1016/j.foodchem.2021.131807
PubMed Web of Science ®Google Scholar
Moriconi, E., Feraco, A., Marzolla, V., Infante, M., Lombardo, M., Fabbri, A., & Caprio, M. (2020). Neuroendocrine and metabolic effects of low-calorie and non-calorie sweeteners. Frontiers in Endocrinology, 11, 444. https://doi.org/10.3389/fendo.2020.00444
PubMed Web of Science ®Google Scholar
Rojas, C., Ballabio, D., Consonni, V., Tripaldi, P., Mauri, A., & Todeschini, R. (2016). Quantitative structure–activity relationships to predict sweet and non-sweet tastes. Theoretical Chemistry Accounts, 135(3), 1–13. https://doi.org/10.1007/s00214-016-1812-1
Web of Science ®Google Scholar
Rojas, C., Todeschini, R., Ballabio, D., Mauri, A., Consonni, V., Tripaldi, P., & Grisoni, F. (2017). A QSTR-based expert system to predict sweetness of molecules. Frontiers in Chemistry, 5, 53. https://doi.org/10.3389/fchem.2017.00053
PubMed Web of Science ®Google Scholar
Saraiva, A., Carrascosa, C., Raheem, D., Ramos, F., & Raposo, A. (2020). Natural sweeteners: The relevance of food naturalness for consumers, food security aspects, sustainability and health impacts. International Journal of Environmental Research and Public Health, 17(17), 1–22. https://doi.org/10.3390/ijerph17176285
Web of Science ®Google Scholar
Sharma, V., Wakode, S., & Kumar, H. (2021). Structure-and ligand-based drug design: Concepts, approaches, and challenges. Chemoinformatics and Bioinformatics in the Pharmaceutical Sciences, 27–53. https://doi.org/10.1016/B978-0-12-821748-1.00004-X
Google Scholar
Shi, T., Yang, Y., Huang, S., Chen, L., Kuang, Z., Heng, Y., & Mei, H. (2019). Molecular image-based convolutional neural network for the prediction of ADMET properties. Chemometrics and Intelligent Laboratory Systems, 194, 103853. https://doi.org/10.1016/j.chemolab.2019.103853
Web of Science ®Google Scholar
Tuwani, R., Wadhwa, S., & Bagler, G. (2019). BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Scientific Reports, 9(1), 7155. https://doi.org/10.1038/s41598-019-43664-y
PubMedGoogle Scholar
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861
PubMed Web of Science ®Google Scholar
Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV) (pp. 3–19). https://doi.org/10.48550/arXiv.1807.06521
Google Scholar
Wu, D., Ying, Y., Zhou, M., Pan, J., & Cui, D. (2023). Improved ResNet-50 deep learning algorithm for identifying chicken gender. Computers and Electronics in Agriculture, 205, 107622. https://doi.org/10.1016/j.compag.2023.107622
Web of Science ®Google Scholar
Xu, Y., Kou, J., Zhang, Q., Tan, S., Zhu, L., Geng, Z., & Yang, X. (2023). Visual detection of water content range of sea buckthorn fruit based on transfer deep learning. Foods, 12(3), 550. https://doi.org/10.3390/foods12030550
PubMed Web of Science ®Google Scholar
Yang, Z. F., Xiao, R., Xiong, G. L., Lin, Q. L., Liang, Y., Zeng, W. B., Dong, J., & Cao, D. S. (2022). A novel multi-layer prediction approach for sweetness evaluation based on systematic machine learning modeling. Food Chemistry, 372, 131249. https://doi.org/10.1016/j.foodchem.2021.131249
PubMed Web of Science ®Google Scholar
Yu, F., Lu, T., & Xue, C. (2023). Deep learning-based intelligent apple variety classification system and model interpretability analysis. Foods, 12(4), 885. https://doi.org/10.3390/foods12040885
PubMed Web of Science ®Google Scholar
Zhao, X., Li, K., Li, Y., Ma, J., & Zhang, L. (2022). Identification method of vegetable diseases based on transfer learning and attention mechanism. Computers and Electronics in Agriculture, 193, 106703. https://doi.org/10.1016/j.compag.2022.106703
Web of Science ®Google Scholar
Zheng, S., Chang, W., Xu, W., Xu, Y., & Lin, F. (2019). E-sweet: A machine-learning based platform for the prediction of sweetener and its relative sweetness. Frontiers in Chemistry, 7, 35. https://doi.org/10.3389/fchem.2019.00035
PubMed Web of Science ®Google Scholar
Zhong, S., Hu, J., Yu, X., & Zhang, H. (2021). Molecular image-convolutional neural network (CNN) assisted QSAR models for predicting contaminant reactivity toward OH radicals: Transfer learning, data augmentation and model interpretation. Journal of chemical engineering, 408, 127998. https://doi.org/10.1016/j.cej.2020.127998
Web of Science ®Google Scholar

Sweetener identification using transfer learning and attention mechanism

ABSTRACT

1. Introduction