The Personalis ACE Platform

 The Personalis ACE Platform
Enabling Superior Discovery and Diagnostics
Personalis, Inc. 1350 Willow Road, Suite 202, Menlo Park, California 94025 • [email protected] • 1.855.GENOME4 • 1.650.752.1300 www.personalis.com The Personalis ACE Platform
The Personalis ACE Platform
Founded by leaders in sequencing, genomics, and clinical medicine, Personalis' mission is to provide the most accurate next generation sequencing and interpretation services for both researchers and clinicians. To enable this, we have developed the Accuracy and Content Enhanced (ACE) Platform to provide high accuracy sequencing, informatics, and content. The Personalis ACE Platform consists of: •
•
•
ACE Sequencing: Improves coverage of standard exomes and genomes in biomedically interpretable regions. ACE Pipeline: Provides best-­‐in-­‐class alignment, enhanced reference, and variant calling for improved accuracy. ACE Content & Annotation: Our curated proprietary disease variant and pharmacogenomics databases are integrated with over 40 additional public and commercial databases to provide high quality, comprehensive annotation of not just SNVs and indels, but also larger structural variants. www.personalis.com • [email protected] • 1.855.GENOME4 • 1.650.752.1300 • © Personalis, Inc. All Rights Reserved 2 The Personalis ACE Platform
ACE Sequencing
Even when sequenced at high average coverage, exomes (and whole genomes) have poor actual coverage in many important regions, including those areas linked to Mendelian disease, cancer, complex disease, and pharmacogenomics. Several papers have shown that heterozygous variants are not consistently and accurately called in regions with a local read depth below 13x to 25x coverage. Meeting this threshold can be a problem for standard exomes even at high coverage. To fill in these gaps, Personalis has developed Accuracy and Content Enhanced (ACE) Sequencing as an add-­‐on option to standard exome sequencing. ACE aims to "finish" the medical exome by targeting the gaps in over 7,000 genes of known biomedical function, enabling variants to be called more accurately and reliably in those regions. In addition to improving exome coverage, ACE provides coverage in regions outside the traditional exome such as regulatory regions (~12,000 regions), and intergenic regions of known function (~25,000 regions. This comprehensive and integrated approach leads to higher accuracy and sensitivity for discovery and clinical diagnostic applications. Examples of how ACE improves coverage for individual genes are shown below: Coverage over all exons for cancer and Mendelian genes. Some exons show poor coverage on a standard exome (blue), filled in by ACE (green). www.personalis.com • [email protected] • 1.855.GENOME4 • 1.650.752.1300 • © Personalis, Inc. All Rights Reserved 3 The Personalis ACE Platform
Coverage over the CYP2C9, a pharmacogenomics gene. The CYP2C9 plot shows, in addition to coding exons, biomedical variant loci and UTRs that are poorly covered in typical exomes that are covered by ACE (green). CYP2CP has been implicated in Warfarin metabolism. Coverage over the CFTR gene, including exons and introns supplemented by ACE. The figure shows a 10kb intronic variant recommended for carrier testing by ACMG that is missed on a standard exome. Example of a SNV missed by standard exome sequencing but covered by ACE. A critical SNV upstream of VKORC1 is the critical regulator of Warfarin metabolism and is missed with standard exome sequencing, but covered with our ACE sequencing supplement. Overall coverage of the medical exome (over 7,000 genes) is improved significantly with ACE sequencing. For example, with 12G of sequencing, the ACE exome "finishes" 56% to 76% more genes than comparable standard exomes (a gene is "finished" in this case if there is >25x average coverage over 99% of all bases www.personalis.com • [email protected] • 1.855.GENOME4 • 1.650.752.1300 • © Personalis, Inc. All Rights Reserved 4 The Personalis ACE Platform
in all exons of a gene). The figures below show overall performance of ACE compared to standard exome sequencing. ACE "finishes" more genes in the medical exome compared with standard exomes from 2 different vendors (Exome A and Exome B) at 8G, 10G, and 12G sequencing. For example, at 12Gb the ACE protocol achieves 25x mean depth on 99% of bases in 6,500 genes versus 4,100 and 3,700 for Exome A and Exome B at the same 12G sequencing level. The ACE exome not only increases coverage, but also improves accuracy. The graphs below show that the ACE exome can reduce the number of false positives and negatives significantly compared to standard exomes. ACE decreases both false negatives and false positives compared to standard exomes. Shown are false positive and false negative error curves for SNVs and indels for standard exome (Blue) and the Personalis ACE™ exome (Green) for a range of total sequence ranging from 3-­‐20Gb. The 12Gb lines are shown in BOLD for comparison. Accuracy is measured against the NIST GiB high-­‐confidence call set. ACE Pipeline
Accurate sequencing must be paired with accurate alignment and variant calling. Significant accuracy problems and poor QC can hamper traditional alignment and variant calling pipelines as discussed in prior publications by the Personalis team and founders. Personalis has developed a next generation pipeline which aims to improve overall accuracy. www.personalis.com • [email protected] • 1.855.GENOME4 • 1.650.752.1300 • © Personalis, Inc. All Rights Reserved 5 The Personalis ACE Platform
The ACE Pipeline has an expanded feature set and workflow, integrating over 20 proprietary and public tools. The pipeline also integrates over 40 public and proprietary databases to produce annotation for SNVs, indels, and SVs. Compared with other popular platforms for genome data analysis that typically analyze SNVs or a limited set of variants, the ACE Pipeline covers a more complete spectrum of variant types: single nucleotide variants (SNVs), short insertions and deletions (indels) and larger structural variations (SVs). Features of the ACE Pipeline include: • An Enhanced Public Reference. Personalis has created enhanced versions the public references that replace the approximately 1.1 million minor alleles with the major alleles, some of which are disease predisposing. Without this change, there can be "overcalling" of variants (up to 700,000 more variant calls) when aligning to the standard public reference. These improvements also have significant implications for downstream interpretation. For example, individuals who are homozygous for the disease-­‐predisposing alleles in the current reference (e.g. Factor V Leiden), who would not be reported as "variant" at these loci with the public reference, will be called accurately using the Personalis enhanced reference sequence. •
More Accurate Structural Variant (SV) Calling with Higher Sensitivity, Lower FDR, More Precise Breakpoints (whole genomes). •
•
•
BED-­‐formatted reference calls and a list of no-­‐call regions are provided so downstream analysis tools can differentiate between called positions and positions lacking sufficient information for confident genotype assignment.
The ability to detect repetitive and low-­‐complexity regions, including mobile element insertions (MEIs) and variable number tandem repeats (VNTRs).
Reporting of the zygosity of structural variation, which can make a substantial difference in downstream medical interpretation.
TM
TM
Our pipeline has been designed together with the Personalis ACE Exome and ACE Genome Assays, allowing us to achieve higher overall accuracy and enhance the quality of the alignments and variants output from the pipeline. www.personalis.com • [email protected] • 1.855.GENOME4 • 1.650.752.1300 • © Personalis, Inc. All Rights Reserved 6 The Personalis ACE Platform
ACE Annotations & Content
Personalis Proprietary Databases
Personalis has developed or exclusively licensed several large comprehensive, high quality, high depth, manually curated proprietary databases that enable annotation and analysis of complex and Mendelian disease, and drug response. This content covers regions both within and outside the exome. The quality and comprehensiveness in these databases result in more accurate analysis and interpretation, saving time and cost for the researcher. These databases have also helped inform the design of our ACE sequencing assays, making sure that interpretable regions of the exome/genome are adequately covered. Personalis Disease Variant Database: The Personalis Disease Variant Database is originally based on databases built over 10 years at Stanford and exclusively licensed and curated by Personalis. Personalis has greatly expanded manual curation of this database and it is now one of the most comprehensive, detailed, high quality manually curated variant to disease/phenotype database of its kind. Personalis Pharmacogenomics Database: The Personalis Pharmacogenomic Database is built on PharmGKB, the world’s leading pharmacogenomics database which has been exclusively licensed by Personalis. PharmGKB contains annotations linking variants to drug toxicity, dosing and efficacy. Manually curated over a 12-­‐year period, PharmGKB contains information on 376 drugs and >4000 genetic variants. Personalis Annotation Engine
Personalis’ annotation engine provides a comprehensive, integrated solution that draws on data from over 40 public and several proprietary databases to annotate SNVs, indels, and structural variants (SVs). Personalis updates, integrates, and version controls these databases on a regular basis. The breadth of databases allows Personalis to provide a wide variety of annotations for variants and genes. www.personalis.com • [email protected] • 1.855.GENOME4 • 1.650.752.1300 • © Personalis, Inc. All Rights Reserved 7 The Personalis ACE Platform
ACE Quality Control Reporting
Visual, Interactive, and Detailed Quality Control Reports
To ensure the quality of our data, Personalis creates comprehensive visual, interactive quality control (QC) reports for variant calls, raw read data, and read alignment data. Including: • An extensive list of standard QC metrics to ensure that each component of the pipeline has run as expected. • Interactive at-­‐a-­‐glance summary page that summarizes key metrics with automated out-­‐of-­‐range indicators provides a robust framework for processing data. These metrics can used to identify failures in each component of the pipeline. • Sample level data are presented in an HTML report that includes tables of detailed metrics for sequencing, alignment, and variant calling as well as more complex QC data. • The data contained in the QC reports is also provided in a text based format use by bioinformaticians. Summary
In summary, the ACE Platform combines accurate sequencing, informatics, and content to enable higher accuracy downstream interpretation for discovery and diagnostics. www.personalis.com • [email protected] • 1.855.GENOME4 • 1.650.752.1300 • © Personalis, Inc. All Rights Reserved 8