ABSTRACT
Introduction
Deep learning approaches have become popular in recent years in de novo drug design. Generative models for molecule generation and optimization have shown promising results. Molecules trained on different chemical data could regenerate molecules that were similar to the query molecule, thus supporting lead optimization. Recurrent neural network-based generative models have demonstrated application in low-data drug discovery, fragment-based drug design and in lead optimization.
Areas covered
In this review, we have provided an overview of recurrent neural network models and their variants for molecule generation with recent examples. The input representation of molecules as SMILES and molecular graphs have been discussed. The evaluation benchmarks and metrics used in generative neural network models are also highlighted. For this, ScienceDirect, Web of Science, and Google Scholar databases were searched with the article’s keywords and their combinations to retrieve the most relevant and up-to-date information.
Expert opinion
The simplicity of SMILES notation makes it suitable for training a sequence-based model such as a recurrent neural network. However, models that could be trained on molecular graphs to generate molecular structures which could be synthesized could open new possibility for valid molecule generation and synthetic feasibility.
Article highlights
Deep generative models using ligand-based approach can generate novel, unique, diverse, and valid structures.
Deep neural network generative models can be trained on 1D and 2D representations such as SMILES strings and molecular graphs.
RNN as a deep generative model has been used to generate novel structures with desired properties using reinforcement and transfer learning techniques.
Benchmark datasets with goal-directed learning tasks are used to evaluate molecules identified by generative models using various metrics.
This box summarizes key points contained in the article.
Declaration of interest
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or material discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or mending, or royalties.
Reviewer disclosures
Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.