C3G Bioinformatics Platform at the Rosalind and Morris Goodman Cancer Institute
by Alain Pacis
With next-generation sequencing and other high-throughput technologies revolutionizing life sciences and health care research, data processing and interpretation, rather than data production, has become the major limiting factor for discovery and innovation. Large genomics centers, and increasingly smaller research labs, are facing significant data analysis challenges. The Canadian Center for Computational Genomics (C3G) provides bioinformatics analysis and high-performance computing services for the life science research community.
Since 2018, C3G has partnered with the Rosalind and Morris Goodman Cancer Institute (GCI) to setup and operate a bioinformatics platform at the GCI. High-throughput molecular assays offer an unprecedented insight into the biology and pathogenesis of cancer, with potential impact on diagnostics and therapy selection. The platform supports numerous GCI researchers in the analysis, visualization, and interpretation of a wide range of “omics” data (including genomics, transcriptomics, epigenomics, proteomics, and metabolomics).
Our WGS pipeline comprises the different data processing steps from the raw sequencing reads (FASTQ files) to variant report files. We employ a combination of variant callers to identify high-confidence single nucleotide variants (SNVs), insertions/deletions (INDELs), and structural variants (SVs) from matched tumor-normal pairs. The list of variants is annotated with different types of information such as genes/transcripts affected, genomic location, consequence/effect, and known variants from clinical databases (e.g., ClinVar and CIViC). Our pipeline also reports copy-number variations (CNVs), ploidy, microsatellite (in)stability, tumor mutational burden, and other mutational signature. We continue to update and improve our approach for WGS analysis by integrating and testing new benchmarking datasets and tools to increase both throughput and accuracy of the WGS procedure.
Bulk RNA-sequencing (RNA-seq) is one of the most commonly used techniques in cancer research. Our RNA-seq analysis pipeline aims to identify differentially expressed genes or isoforms (signatures), as well as to detect gene fusions. Single-cell RNA sequencing (scRNA-Seq) is a powerful tool to dissect intratumoral transcriptomic heterogeneity (hidden in bulk analysis). Our scRNA-seq analysis pipeline aims to distinguish neoplastic from nonneoplastic cells, identify transcriptionally distinct subpopulations and states that may drive tumorigenesis, and perform trajectory inference to uncover dynamic changes in gene expression.
For more information on the Canadian Center for Computational Genomics, visit https://www.computationalgenomics.ca