Dr Tim Cutts, Head of Scientific Computing at the Wellcome Sanger Institute, and Sinan Yavuz, Senior Bioinformatics Scientist at Seven Bridges, keynoted the Festival of Genomics 2020 last January in London, explaining how the two organisations collaborate in large-scale genomics research.
Dr Cutts begins by outlining the work of Wellcome Sanger and their involvement in the UK Biobank Vanguard Project, and the subsequent consortium-funded main phase sequencing project. One massive challenge, he explains, was scaling a joint variant call on 450,000 samples and the design of analysis infrastructure able to maintain the pace of the sequencers, so that data generated was never left to accumulate. Their part of the pipeline – the sequencing, analysis and initial QC, has a target time of 48 hours, before being passed via Google Cloud to Seven Bridges.
At 12.17 Dr Yavuz begins his half of the presentation, picking up from where Sanger hand over the sequencing data, having done initial quality control in the form of unaligned CRAM files to Seven Bridges. They then process the files at scale using BWA GATK, complete quality control, and perform data management and storage. Dr Yavuz has shared his presentation slides from the Festival, shown below the video.Sinan Yavuz, Seven Bridges