Istituto di Tecnologie Biomediche Consiglio Nazionale delle Ricerche ITB - CNR Next generation sequencing (NGS): procedure ed applicazioni Ermanno Rizzi, PhD Ferrara [email protected] 05/05/2014 Next Generation Sequencing NGS: Intro NGS…what? • High performance sequencing: High throughput Sequencing (HTS) Why? • Higher sequencing data request at lower cost (compare to Sanger). When? • in 2005 first NGS platfrom: GS20 454 Lifesciences …something is changing • high quantity and quality of available data • new genomics applications …what will change? • new multidisciplinary approaches. Next Generation Sequencing NGS: Intro Workflow Next Generation Sequencing NGS: Intro Keywords Reference sequence Depth or “X” or fold Mapping Reads Coverage (% of reference sequence) Next Generation Sequencing NGS: Intro NGS Platforms 454\Roche Genome Sequencer FLX-Titanium, Junior Life Ion Personal Genome Machine (PGM) and Proton Illumina Genome Anlyzer, HiSeq, MiSeq Life SOLiD 5500 W Series Genetic Analysis Systems NGS platform comparison Platform Sequencing chemistry Signal detection Roche/454 single-nucleotide addition (SNA) by pyrosequencing Luminescence Torrent or Proton (PGM) single-nucleotide addition (SNA) by semiconductor sequencing Chip pH Illumina Sequencing by Synthesis (SBS) using cyclic reversible termination (CRT) Fluorescence Solid Sequencing by ligation (SBL) Fluorescence NGS platform distribution NGS cost comparison Platform Run time Read Length Gigabases/run Reagent Cost/run Roche/454 Titanium FLX+ 20 hrs. 800 0,8 $ 6 200 Roche/454 Titanium Junior 10 hrs. 400 0,04 $ 977 Ion Torrent - Proton I 4 hrs. 175 12,2 $ 1 000 Illumina HiSeq 2500 - high output v3 2 days 50 75 $ 5 866 Illumina HiSeq 2500 - high output v3 11 days 200 300 $ 13 580 Life Technologies SOLiD – 5500xl 8 days 110 155 $ 10 503 NGS Workflow. From sample prep to final data Target sample Library preparation Library amplification NGS Reads analysis Signal detecion and analysis TG A C Final data TG A C Library preparation DNA fragmentation • Sonication • Ultra sonication (Covaris) • Nebulization • Enzymatic Adapter addition and Multiplexing (MID o Index) • Ligation • Tagmentation • Paired ends o Mate Pair Library quality control and quantitation • Fluorometer • qPCR • Agilent Bioanalyser …To Library amplification Procedure Library preparation: adapter Ligation Roche/454 Rapid Protocol: • for all Roche/454 applications • starting material: 500 ng of gDNA Procedure Library preparation: enzymatic “tagmentation” Illumina Nextera “tagmentation” • for small genomes • small amount of starting material: 1 or 50 ng of gDNA Procedure Library preparation: paired ends Roche/454 Rapid Protocol: • Coupling ends of large fragments: 3, 8, 20K bases • Large contigs • Enhance scaffolding • Complete genome seq. • Structural variants Amplification and selection Library amplification: Emulsion PCR (emPCR) emPCR • Library captured onto beads surface • Water in oil emulsion creates microreactors • millions of DNA fragments amplified onto beads surface • recovery of DNA beads by emulsion breaking • enrichment to eliminate null beads and recover positive beads Library amplification: Bridge Amplification Illumina Bridge amplification • High-density primers attached to the slide • Solid-phase amplification • 100–200 million spatially separated template clusters NGS Platforms 454\Roche Genome Sequencer FLX-Titanium, Junior Life Ion Personal Genome Machine (PGM) and Proton Illumina Genome Anlyzer, HiSeq, MiSeq Life SOLiD 5500 W Series Genetic Analysis Systems Platforms error rates Error Rates Instrument Primary Errors Single-pass Error Rate (%) Final Error Rate (%) Substitution 0.1-1 0.1-1 Indel 1 1 Substitution ~0.1 ~0.1 Ion Torrent – all chips Indel ~1 ~1 SOLiD – 5500xl A-T bias ~5 ≤ 0.1 3730xl (capillary) 454, all models Illumina, all models www.molecularecologist.com The best platform is….. Things to be considered: • • • • • • • Amount of data Read length Support from company User community Post sequencing requirements Platform cost Kits cost • cost per base NGS applications • Denovo Vs Re-sequencing ! ? •Target: DNA, RNA • Single ends Vs paired ends • Sequencing approaches: Shotgun Vs Enrichment Enrichment by PCR Enrichment by probe capture ChIP-Seq for epigenetic studies Target: DNA and RNA DNA Seq = Genomics • genome sequencing (nuDNA o mtDNA) • exome • variant calling: mutations and SNPs • copy number variation (CNV) • Epigenetics • ChIP-Seq: promoter metylation, histon modifications, transcription factors RNA Seq = Transcriptomics • gene expression levels • variant calling • splicing variants • fusion transcripts • transcript discovery Sample preparation for RNA Seq Poly T or random examers RNA seq Sequencing of: • mRNA • ncRNA, • small RNA • micro RNA NGS applications: DNA Total genomic DNA Direct sequencing PCR Amplicon sequencing Capture Capture sequencing Enrichment by probe capture Capture probe design and synthesis Library preparation Capture probe hybridization “fishing” Washing Target recovery NGS protocol Pre-amplification Capture sequencing: the target Exome Custom • Diagnostic genome regions • Chromosomes • Specific regions (kinase, transcription factors…) • Specific genes (HLA, MHC ecc) • targets: exons • ~1% of human genome • size: ~30Mb • ~85% mutations related to disease • multiple sample variant call (MSVC) for Low pass sequencing Aims • Rare and common variant identification • single nucleotide • Insertions and Deletions (InDels) • SNPs analysis • Copy Number Variations (CNV) Looking forward… Sanger Sequencing NGS Third Generation Sequencing (TGS) Third generation: single molecule sequencing Company Name TGS principle Helicos Genetic Analysis Platform Virtual Terminator nucleotides Pacific Biosciences Anchored DNA polymerase+Zero-mode waveguide (ZMW) VisiGen Biotechnologies Modified DNA polymerase + Fluorescence Resonance Energy Transfer (FRET) • Halcyon Molecular • ZS Genetics Transmission Electron Microscopy (TEM) Oxford Nanopore modified α-hemolysin pore + Measure of Ionic current TGS Features • No “wash-and-scan” technology • “Real time” - really fast • No synchronization required no dephasing problem • Single molecule sequencing Istituto di Tecnologie Biomediche Consiglio Nazionale delle Ricerche ITB - CNR Applications @ ITB-CNR • • • • • Shotgun: bacteria genome finishing PCR enrichment: Integrome study in Gene Therapy Variant calling in ancient DNA RNA seq: transcriptome of breast cancer Metagenomics Shotgun: bacteria genome finishing Fuel droplet A.venetianus colonies Acinetobacter venetianus VE-C3 genome sequencing • Roche/454 + Illumina sequencing • 3,564,836 bp bases were assembled Circular representations of A. venetianus VE-C3 chromosome and plasmids. Bioremediation an resistance clusters identification Adhesion to oil fuel: Metabolism of n-alkanes: Resistance to heavy metal: wee cluster for n-alkanes adhesion alk-like sequences cytochrome P450 As, Cd, Co,Cr, Cu, Hg, Pb, Zn found in the Venice Lagoon Philogenetic analysis: Acinetobacter pangenome Phylogenetic analysis conducted using a set of conserved proteins: FusA, IleS, LepA, LeuS, PyrG, RecA, RecG, RplB, RpoB BLAST comparisons of Acinetobacter species. Each genome is represented by an arc and the different genomes (arcs) are connected by vertices accounting for their shared sequence similarity Integrome study in Gene Therapy Integrome study in Gene Therapy Proviral vector Human genome Proviral vector Integration GATCCGTTTCAGTCGATCAGTGGGCATA Integration site (IS) nucleotide sequence Integrome: all detectable IS in the human genome Integrome study in Gene Therapy Recover of vector-genome junctions: Ligation Mediated PCR (LM-PCR) Restriction sites Mse I Pst I 5’ LTR 3’ LTR linker Genomic DNA Integrated proviral vector LM-PCR Nested PCR Integrome study in Gene Therapy Distribution of retroviral integrations around transcription start sites. A Distribution of the distance of MLV and HIV integration sites from the transcription start site (TSS) of targeted genes at 2500-bp (A), 50-bp (B), or 5-bp (C) resolution. C B Integrome study in Gene Therapy Results and applications •Integration pattern for pro-viral vector to be used in gene therapy •Tool to study the transcriptionally active regions -> applied in stem cells studies Restriction sites Mse I Pst I 5’ LTR 3’ LTR linker Genomic DNA Integrated proviral vector Integrome study in Gene Therapy Ancient DNA analysis by NGS Why to study ancient DNA (aDNA?) … to recover genetic info from the past. • To determine phylogenetic relationship among extint and extant animals • For palaeogenetics and evolutionary biology studies “Homo” evolution Common ancestor Why to study aDNA? For anthropological applications and population genetics on modern human and on Early Modern Humans (EMH). Why to study aDNA? Domestication process Ancient DNA analysis by NGS Features of aDNA • low amount • high degradation • small fragment size: 70-120 bp • contamination • post-mortem damage aDNA analysis challenge Authenticity assessment: Ancient Vs Modern Features of aDNA: Misincorporation pattern I Patterns of damage in genomic DNA sequences from a Neandertal. Briggs AW, PNAS 2007 Features of aDNA: Misincorporation pattern II C to T misincorporations at the first position of mtDNA fragments as a function of age. Temporal Patterns of Nucleotide Misincorporations and DNA Fragmentation in Ancient DNA. Sawyer S, PLoS One. 2012 Ancient DNA and NGS • Single locus PCR • Multiplex PCR • Shotgun approach • Custom capture best approach in terms of: • Discrimination endogenous vs exogenous • Cost • Enrichment ratio Forensic DNA Forensic DNA common features with aDNA • Fragmentation • Low amount • High level of contamination NGS applied to forensic DNA: Short Tandem Repeats (STR) count and analysis STR profiling RNA seq: transcriptome of breast cancer Rationale •primary human lobular breast cancer tissue • 132,000 reads • validated by RT-PCR Results: • one deletion • two novel ncRNAs • ten unknown or rare transcript isoforms • a novel gene fusion • thousands of novel non-coding transcripts • more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas. RNA seq: transcriptome of breast cancer The green and blue arrows represent the two halves of the fusion transcript which map on the opposite order to the genome. intragenic deletion in WHSC1L1, identified by the 99 bases long read 1B. Metagenomics Human Metagenomics Environmental Metagenomics Microbiome A microbiome is "the ecological community of commensal, symbiotic, and pathogenic microorganisms that literally share our body space." Metagnomics Vs 16 S rRNA seq. study of genetic material recovered directly from environmental samples. Thanks to my colleagues Alessandro Pietrelli Marco Severgnini Ingrid Cifola Clarissa Consolandi Clelia Peano Roberta Bordoni Eleonora Mangano Eva Pinatel Luca Petiti Simone Puccio Santosh Anand Gianluca De Bellis Cristina Battaglia Thanks for your attention! Lunch ?! [email protected]
© Copyright 2024 ExpyDoc