A new study leveraged gene expression data from Alzheimer’s disease patients to identify two Alzheimer’s disease molecular subtypes using consensus non-negative matrix factorization. It is hoped that identification of these molecular subtypes will have the potential to facilitate disease mechanism understanding, clinical trial design, drug discovery and precision medicine for Alzheimer’s disease.
Alzheimer’s disease is characterised by pathological extracellular deposition of beta-amyloid (Aβ) peptides and intracellular tau protein fibres in the brain. Recent studies have suggested that Aβ aggregates in different biochemical compositions. Defining disease subtypes is important for understanding disease mechanism, as well as influencing clinical trial design and drug discovery. In the past, neuroimaging, Aβ and tau have been used for Alzheimer’s disease subtyping. However, subtypes identified based on image analysis and Aβ offer limited understanding of disease pathophysiology.
High-throughput genomic data has greatly improved our understanding of disease mechanisms for Alzheimer’s disease. Genome-wide associate studies (GWAS) have identified over 20 loci for late onset Alzheimer’s disease. Several pathways or molecular networks involved in Alzheimer’s disease have been identified using gene expression data. Additionally, machine learning methods have used genomic data to classify Alzheimer’s disease from normal and mild cognitive impairment (MCI). However, genomic data has not yet been used for Alzheimer’s disease molecular subtyping.
Leveraging genomic data to identify Alzheimer’s disease molecular subtypes
The Religious Orders Study and Memory Aging Project (ROSMAP) is a longitudinal clinical-pathologic cohort study of aging and Alzheimer’s disease. Around 2,500 individuals were involved in the ROSMAP study and the genomic data from 642 of the participants is currently available to researchers. The researchers behind this study used the ROSMAP data for Alzheimer’s disease molecular subtyping using a non-negative matrix factorization (NMF) clustering model.
The NMF model is a group of algorithms in multivariate analysis where a matrix is factorised into around two matrices, with the property that all three matrices have no negative elements. It has been shown that NMF-based classification is an accurate and robust method for clustering genomic data. In this study, the researchers applied NMF to identify Alzheimer’s disease molecular subtypes using gene expression data from ROSMAP. The researchers then performed subtype analysis to identify signature genes and enriched pathways for each molecular subtype. Following that, the molecular subtypes were validated using an independent dataset (GEO). Finally, the researchers investigated the association of their molecular subtypes with available demographic and clinical variables, and the APOE genotype.
Identification of two Alzheimer’s disease molecular subtypes
The researchers used consensus NMF to cluster the gene expression data of 222 Alzheimer’s patients from ROSMAP. Compared with three of four clusters, consensus matrices from two clusters are more stable. Additionally, when the researchers assigned the data into three subtypes, the cophenetic correlation coefficient dropped. These results suggest that Alzheimer’s patient data can be best represented by two distinct subtypes. The researchers obtained 403 differentially expressed genes between the two molecular subtypes as signature genes using 197 core samples with positive silhouette scores. A distinct pattern of signature gene expression was observed in the two subtypes.
The researchers characterised the two subtypes as synaptic and inflammatory. The synaptic type is characterised by disfunction of the synaptic pathways. Substantial loss of neurons and synapses is a hallmark in late-stage Alzheimer’s disease. Recent studies have also shown that synaptic dysfunction was observed in mild cognitive impairment patients, suggesting that synaptic dysfunction is a fundamental mechanism of Alzheimer’s disease. On the other hand, the inflammatory subtype is enriched with over-activation of IL-2, IFN-α, and IFN-γ pathways. A sustained inflammatory response, mediated by over-activation of microglia, has previously been demonstrated to exacerbate both amyloid and tau pathology. This demonstrates that inflammation represents another mechanism of Alzheimer’s disease. Therefore, the two Alzheimer’s disease molecular subtypes that the researchers identified reflect inherent molecular mechanisms of the disease.
In this study, the researchers reported the first gene expression-based Alzheimer’s disease molecular subtypes. Using consensus NMF they were able to identify two robust molecular subtypes – synaptic type and inflammatory type – which represent two fundamental mechanisms of Alzheimer’s disease. Identification of these molecular subtypes may have an implication in better clinical trial design and improved drug discovery, hopefully facilitating personalised medicine for Alzheimer’s disease. However, one limitation of this study is that the molecular subtypes were based on gene expression data from post-mortem brain tissue, which limits its clinical use. To overcome this, the researchers suggest that proteomic data from cerebrospinal fluid and genotype data from blood could be useful to further validate their subtypes for improved clinical applications.
Image credit: kjpargeter – FreePik