Mobile Menu

Siamese Recurrent Neural Network for Small Molecule Drug Discovery

Despite being applied successfully to image recognition, Siamese recurrent neural network has rarely been explored in drug discovery. A recent study presented a Siamese neural network model based on bidirectional long short-term memory architecture with a self-attention mechanism, which is able to automatically learn discriminatory features from the SMILES representation of small molecules.

Small molecule drug discovery

Small molecule drug discovery usually starts with a lead compound, followed by quick structure-activity relationship (SAR) exploration with analogues. With only a small amount of biological data available at the very beginning stages of drug discovery, the subsequent lead optimisation presents a low-data problem. Lead optimisation then results in one or multiple congeneric series where compounds differ in a few atoms that are around a unique scaffold. One-shot learning classification combined with graph convolutional neural networks has previously been shown to tackle the issue of low data in drug discovery. However, an issue still remains as there is an imbalance between bioactive and inactive classes, which causes low hit rates in high-throughput screening assays.

A Siamese neural network consists of two identical subnetworks that work in parallel to find similarities between two different input vectors from the learned features. Unlike other modern deep learning models which rely on big data to perform well, a Siamese neural network is able to learn from very little data, which has made the method very popular in recent years. Overall, the Siamese neural network has a competitive edge to cope with both the low data and class imbalance associated with small molecule drug discovery.

The Siamese recurrent neural network model architecture

In this work, the researchers built a Siamese recurrent neural network based on bidirectional long short-term memory (BiLSTM) architecture with a self-attention mechanism, operating on the simplified molecular-input line-entry system (SMILES) representation of small molecules. SMILES donates a molecular structure as a graph with optional chiral indications. They named their deep learning model SiameseCHEM, and it consists of a dual-branch network with shared weights. The configuration consists of an embedding layer, three BiLSTM layers, an attention layer and a final distance layer.

The embedding layer consists of 128 dimensions, and projects the discrete tokens into a continuous two-dimensional space. This is then passed through three BiLSTM layers, each of which has 128 hidden units. The model then applies a self-attention mechanism to the hidden states extracted from the last BiLSTM layer. The researchers concatenated both the forward and backward hidden states from the last LSTM layer to yield a hidden states matrix, which is operated by an attention matrix of 512 dimensions. The output is then fed through a fully connected linear layer with the leaky rectified linear units (ReLU) activation function, which results in an attentional vector of 256 dimensions. Finally, the researchers computed the cosine similarity of the two attentional vectors, and this was then squeeze between 0 and 1 with a logistic sigmoid function. Figure 1 shows summary of the Siamese Recurrent Neural Network architecture.

Figure 1. Schematic showing the SiameseCHEM model architecture, from the SMILES input going through the different layers, finally giving a similarity label as the output. This figure is taken directly from the article by Fernández-Llaneza et al.

Comparison of their Siamese recurrent neural network with baseline models

To benchmark SiameseCHEM, the researchers implemented two popular machine learning methods – random forest (RF) and support vector machine (SVM). The researchers trained the two models on the same five training sets for the binary classification, but taking a single compound as input using either the ECFP6 fingerprints or the tokenised SMILE strings. The team then separately fed each compound from the validation set into the baseline model. The group determined the similarity label of a pair by the predicted activity class of each individual compound. To enable robust comparison based on performance, the researchers performed 10-fold stratified cross-validation on all models, repartitioning the training and validation sets with an 80:20 split.

The SiameseCHEM model demonstrated a good performance with the Matthews correlation coefficient (MMC), which ranges from -1 to 1. Overall, SiameseCHEM had an MMC greater than 0.675 across all five datasets. Notably, their model achieves better performances on the BACE1, DRD2 and NR1H2 datasets than the RF model, which uses the ECFP6 fingerprints. However, for CCR5 and EGFR, the performance was not significantly different from the RF. SiameseCHEM also performed better overall than the SVM model. These results suggest that the SiameseCHEM model is able to learn task-specific chemical features encoded by the SMILES strings.


The researchers behind this study presented a Siamese recurrent neural network model – SiameseCHEM – based on BiLSTM architecture with a self-attention mechanism. Their model is able to automatically learn discriminative features from SMILES representations of small molecules. When benchmarked against five baseline machine learning models, their model outperforms on three datasets. This study created a steppingstone for the use of the Siamese neural network for regression and may facilitate improved lead optimisation in future small molecule drug discovery.

Image credit: rawpixel – FreePik

Fernández-Llaneza D, Ulander S, Gogishvili D, Nittinger E, Zhao H, Tyrchan C. Siamese Recurrent Neural Network with a Self-Attention Mechanism for Bioactivity Prediction. ACS Omega. 2021;6(16):11086-11094. Published 2021 Apr 15.

Share this article