A recent study, published as a preprint in bioRxiv, presents a new machine learning framework – MethylationToActivity. This framework uses convolutional neural networks to infer promoter activities.
Transcriptional regulation determines the identity and function of a cell. Therefore, deregulated gene expression is a defining feature of common diseases, such as cancer. Promoters (regulatory regions surrounding transcription start sites) integrate signals from distal enhancers and local histone modifications to initiate transcription. The activity of a promoter determines both the level of transcription and the expressed transcript isoform. Tumours often use alternative promoters to increase isoform diversity, activate repressed oncogenes and evade host immune response. The gold standard for studying promoter activity is ChIP-seq.
DNA methylation (DNAm) is a relatively stable and inheritable epigenetic regulatory mechanism. Compared to histone modifications, DNAm can be accurately and robustly profiled in various tissues through both array and sequencing platforms. Therefore, it is useful to study epigenetic deregulation in tumours. DNAm pattern is mechanistically connected with transcription binding and histone modifications. Apart from a few examples, the contribution of DNAm to the regulation of individual gene expression remains largely unknown. The lack of interpretability of the DNAm pattern at the individual gene level has limited our understanding of the biological significance of DNAm signatures.
A deep-learning framework
To address these challenges, a team of researchers have developed MethylationToActivity (M2A), a deep-learning framework. Specifically, they used a cohort of six paediatric neuroblastoma (NBL) orthoptic patient-derived xenograft (O-PDX) samples profiled in the Paediatric Cancer Genome Project. The researchers trained the model using whole-genome bisulphite sequencing data to predict enrichment of H3K4me3 and H3K27ac for genome-wide annotated promoters. Finally, they confirmed the model’s accuracy and generalisability using diverse tumour types from four publicly available datasets that represent real-world applications.
The team found that the model demonstrated excellent performance across various tumour types. The accuracy of the model was also comparable to that of ChIP-seq measurements. Researchers demonstrated that M2A was able to overcome unique challenges in systemically characterising promoter activities from DNAm signatures. It offers an accurate and robust depiction of the promoter activity landscape in various paediatric and adult cancers, including both solid and haematological malignancies. The team believe that M2A will serve as a valuable tool to provide functional interpretation of DNAm deregulation, to characterise promoter activity differences from DNAm patterns and to reveal alternative promoter usage in patient tumours. As a result, they hope that this will help with tailoring treatments based on genetic variants and also epigenetic deregulation.
Image credit: By Meletios Verras – canva.com