Structural Variation Atlas

Developing an atlas of structural variation across global populations

Remarkable progress has been made in the last decade in defining the importance of structural variation (SV) in human disease, and these studies have predominantly focused on copy number variation (CNV). However, a glaring blind spot exists in basic research and clinical diagnostic testing in our ability to detect genomic rearrangements that do not involve the gain or loss of genomic material, also known as balanced structural variation: neither clinical dosage arrays, nor genome-wide association studies, nor whole-exome sequencing, nor low-depth whole genome sequencing have been capable of detecting balanced chromosomal aberrations (including translocations, inversions, and insertions) or small CNVs. Our lab has long focused on developing whole genome sequencing methods to delineate these previously intractable variants and determining their role in neurological disorders. We have developed a new computational pipeline (GATK-SV) to solve the challenging problem of accurately detecting SV in large genome sequencing cohorts. This method has outperformed all comparable existing approaches and has been widely adopted by the human genomics community.

In our current pursuit of developing an atlas of structural variation across global populations, our lab is leading consortia efforts in the accurate detection of structural variations, including the Genome Aggregation Database (gnomAD) and All of Us.

The Genome Aggregation Database (gnomAD) is a coalition of investigators seeking to aggregate and harmonize exome and genome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The project is overseen by co-directors Heidi Rehm and Mark Daly, and council members Daniel MacArthur, Benjamin Neale, Michael Talkowski, Anne O’Donnell-Luria, Grace Tiao, Matthew Solomonson, and Kat Tarasova. The second release spans 125,748 exome sequences and 15,708 whole-genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies. The v3.1.2 data set (GRCh38) spans 76,156 genomes of diverse ancestries, selected as in v2.

The All of Us Research Program (AoURP), part of the NIH, is building one of the largest biomedical data resources of its kind. The All of UsResearch Hub will store health data from one million or more diverse participants in the All of Us Research Program. The Talkowski lab is working with the Broad InstituteColor Genomics, and the Partners Healthcare Laboratory for Molecular Medicine (LMM) to serve as the genome center for the AoURP. Dr. Talkowski’s group is leading efforts to develop structural variation detection pipelines from WGS across the All of Us program, including the development of a clinical structural variant pipeline that will evaluate the impact of structural variation across phenotypic traits in the biobank and permit the program to return clinically significant results to participants.