Istituto di Tecnologie Biomediche
Consiglio Nazionale delle Ricerche
ITB - CNR
Next generation sequencing
(NGS):
procedure
ed applicazioni
Ermanno Rizzi, PhD
Ferrara
[email protected]
05/05/2014
Next Generation Sequencing
NGS: Intro
NGS…what?
• High performance sequencing: High throughput Sequencing (HTS)
Why?
• Higher sequencing data request at lower cost
(compare to Sanger).
When?
• in 2005 first NGS platfrom: GS20 454 Lifesciences
…something is changing
• high quantity and quality of available data
• new genomics applications
…what will change?
• new multidisciplinary approaches.
Next Generation Sequencing
NGS: Intro
Workflow
Next Generation Sequencing
NGS: Intro
Keywords
Reference
sequence
Depth or “X”
or fold
Mapping Reads
Coverage (% of reference sequence)
Next Generation Sequencing
NGS: Intro
NGS Platforms
454\Roche Genome Sequencer
FLX-Titanium, Junior
Life Ion Personal Genome Machine (PGM)
and Proton
Illumina
Genome Anlyzer, HiSeq, MiSeq
Life SOLiD 5500 W Series
Genetic Analysis Systems
NGS platform comparison
Platform
Sequencing chemistry
Signal detection
Roche/454
single-nucleotide addition (SNA)
by
pyrosequencing
Luminescence
Torrent or Proton
(PGM)
single-nucleotide addition (SNA)
by
semiconductor sequencing Chip
pH
Illumina
Sequencing by Synthesis (SBS)
using
cyclic reversible termination (CRT)
Fluorescence
Solid
Sequencing by ligation (SBL)
Fluorescence
NGS platform distribution
NGS cost comparison
Platform
Run time
Read
Length
Gigabases/run
Reagent
Cost/run
Roche/454 Titanium FLX+
20 hrs.
800
0,8
$ 6 200
Roche/454 Titanium Junior
10 hrs.
400
0,04
$ 977
Ion Torrent - Proton I
4 hrs.
175
12,2
$ 1 000
Illumina HiSeq 2500 - high
output v3
2 days
50
75
$ 5 866
Illumina HiSeq 2500 - high
output v3
11 days
200
300
$ 13 580
Life Technologies SOLiD –
5500xl
8 days
110
155
$ 10 503
NGS Workflow.
From sample prep to final data
Target sample
Library
preparation
Library
amplification
NGS
Reads analysis
Signal detecion
and analysis
TG A
C
Final data
TG A
C
Library preparation
DNA fragmentation
• Sonication
• Ultra sonication (Covaris)
• Nebulization
• Enzymatic
Adapter addition and
Multiplexing
(MID o Index)
• Ligation
• Tagmentation
• Paired ends o Mate Pair
Library quality control
and quantitation
• Fluorometer
• qPCR
• Agilent Bioanalyser
…To Library
amplification
Procedure
Library preparation: adapter Ligation
Roche/454 Rapid Protocol:
• for all Roche/454
applications
• starting material:
500 ng of gDNA
Procedure
Library preparation: enzymatic “tagmentation”
Illumina Nextera
“tagmentation”
• for small genomes
• small amount of
starting material:
1 or 50 ng of gDNA
Procedure
Library preparation: paired ends
Roche/454 Rapid Protocol:
• Coupling ends of large fragments:
3, 8, 20K bases
• Large contigs
• Enhance scaffolding
• Complete genome seq.
• Structural variants
Amplification
and selection
Library amplification: Emulsion PCR (emPCR)
emPCR
• Library captured onto beads surface
• Water in oil emulsion creates microreactors
• millions of DNA fragments amplified onto beads surface
• recovery of DNA beads by emulsion breaking
• enrichment to eliminate null beads and recover positive beads
Library amplification: Bridge Amplification
Illumina Bridge amplification
• High-density primers attached to the slide
• Solid-phase amplification
• 100–200 million spatially separated template  clusters
NGS Platforms
454\Roche Genome Sequencer
FLX-Titanium, Junior
Life Ion Personal Genome Machine (PGM)
and Proton
Illumina
Genome Anlyzer, HiSeq, MiSeq
Life SOLiD 5500 W Series
Genetic Analysis Systems
Platforms error rates
Error Rates
Instrument
Primary Errors
Single-pass Error Rate (%)
Final Error
Rate (%)
Substitution
0.1-1
0.1-1
Indel
1
1
Substitution
~0.1
~0.1
Ion Torrent – all
chips
Indel
~1
~1
SOLiD – 5500xl
A-T bias
~5
≤ 0.1
3730xl (capillary)
454, all models
Illumina, all
models
www.molecularecologist.com
The best platform is…..
Things to be considered:
•
•
•
•
•
•
•
Amount of data
Read length
Support from company
User community
Post sequencing requirements
Platform cost
Kits cost
• cost per base
NGS applications
• Denovo Vs Re-sequencing
!
?
•Target: DNA, RNA
• Single ends Vs paired ends
• Sequencing approaches: Shotgun Vs Enrichment
 Enrichment by PCR
 Enrichment by probe capture
 ChIP-Seq for epigenetic studies
Target: DNA and RNA
DNA Seq = Genomics
• genome sequencing (nuDNA o mtDNA)
• exome
• variant calling: mutations and SNPs
• copy number variation (CNV)
• Epigenetics
• ChIP-Seq: promoter
metylation, histon modifications,
transcription factors
RNA Seq = Transcriptomics
• gene expression levels
• variant calling
• splicing variants
• fusion transcripts
• transcript discovery
Sample preparation for RNA Seq
Poly T or random examers
RNA seq
Sequencing of:
• mRNA
• ncRNA,
• small RNA
• micro RNA
NGS applications: DNA
Total
genomic
DNA
Direct
sequencing
PCR
Amplicon
sequencing
Capture
Capture
sequencing
Enrichment by probe capture
Capture probe design
and synthesis
Library preparation
Capture probe hybridization
“fishing”
Washing
Target recovery
NGS protocol
Pre-amplification
Capture sequencing: the target
Exome
Custom
• Diagnostic genome regions
• Chromosomes
• Specific regions (kinase, transcription
factors…)
• Specific genes (HLA, MHC ecc)
• targets: exons
• ~1% of human genome
• size: ~30Mb
• ~85% mutations related to disease
• multiple sample variant call (MSVC)
for Low pass sequencing
Aims
• Rare and common variant identification
• single nucleotide
• Insertions and Deletions (InDels)
• SNPs analysis
• Copy Number Variations (CNV)
Looking forward…
Sanger Sequencing
NGS
Third
Generation Sequencing (TGS)
Third generation: single molecule sequencing
Company Name
TGS principle
Helicos Genetic Analysis
Platform
Virtual Terminator nucleotides
Pacific Biosciences
Anchored DNA polymerase+Zero-mode
waveguide (ZMW)
VisiGen Biotechnologies
Modified DNA polymerase + Fluorescence
Resonance Energy Transfer (FRET)
• Halcyon Molecular
• ZS Genetics
Transmission Electron Microscopy (TEM)
Oxford Nanopore
modified α-hemolysin pore +
Measure of Ionic current
TGS Features
• No “wash-and-scan” technology
• “Real time” - really fast
• No synchronization required  no dephasing problem
• Single molecule sequencing
Istituto di Tecnologie Biomediche
Consiglio Nazionale delle Ricerche
ITB - CNR
Applications @ ITB-CNR
•
•
•
•
•
Shotgun: bacteria genome finishing
PCR enrichment: Integrome study in Gene Therapy
Variant calling in ancient DNA
RNA seq: transcriptome of breast cancer
Metagenomics
Shotgun: bacteria genome finishing
Fuel droplet
A.venetianus colonies
Acinetobacter venetianus VE-C3 genome
sequencing
• Roche/454 + Illumina sequencing
• 3,564,836 bp bases were assembled
Circular representations of A. venetianus VE-C3 chromosome and plasmids.
Bioremediation an resistance clusters identification
Adhesion to oil fuel:
Metabolism of n-alkanes:
Resistance to heavy metal:
wee cluster for n-alkanes adhesion

alk-like sequences

cytochrome P450
As, Cd, Co,Cr, Cu, Hg, Pb, Zn
found in the Venice Lagoon
Philogenetic analysis: Acinetobacter pangenome
Phylogenetic analysis
conducted using a set of conserved
proteins:
FusA, IleS, LepA, LeuS, PyrG,
RecA, RecG, RplB, RpoB
BLAST comparisons of
Acinetobacter species.
Each genome is represented by an arc and
the different genomes (arcs) are connected
by vertices accounting for their shared
sequence similarity
Integrome study in Gene Therapy
Integrome study in Gene Therapy
Proviral vector
Human genome
Proviral vector Integration
GATCCGTTTCAGTCGATCAGTGGGCATA
Integration site (IS) nucleotide sequence
Integrome:
all detectable
IS in the human genome
Integrome study in Gene Therapy
Recover of vector-genome junctions: Ligation Mediated PCR (LM-PCR)
Restriction sites
Mse I
Pst I
5’ LTR
3’ LTR
linker
Genomic DNA
Integrated proviral vector
LM-PCR
Nested PCR
Integrome study in Gene Therapy
Distribution of retroviral integrations around transcription start sites.
A
Distribution of the distance of MLV and HIV integration
sites from the transcription start site (TSS) of targeted
genes at 2500-bp (A), 50-bp (B), or 5-bp (C) resolution.
C
B
Integrome study in Gene Therapy
Results and applications
•Integration pattern for pro-viral vector to be used in
gene therapy
•Tool to study the transcriptionally active regions ->
applied in stem cells studies
Restriction sites
Mse I
Pst I
5’ LTR
3’ LTR
linker
Genomic DNA
Integrated proviral vector
Integrome study in Gene Therapy
Ancient DNA analysis by NGS
Why to study ancient DNA (aDNA?)
… to recover genetic info from the past.
• To determine phylogenetic relationship among extint and extant animals
• For palaeogenetics and evolutionary biology studies
“Homo” evolution
Common ancestor
Why to study aDNA?
For anthropological applications
and population genetics on
modern human and on
Early Modern Humans (EMH).
Why to study aDNA?
Domestication process
Ancient DNA analysis by NGS
Features of aDNA
• low amount
• high degradation
• small fragment size: 70-120 bp
• contamination
• post-mortem damage
aDNA analysis challenge
Authenticity assessment:
Ancient Vs Modern
Features of aDNA: Misincorporation pattern I
Patterns of damage in genomic DNA sequences from a Neandertal. Briggs AW,
PNAS 2007
Features of aDNA: Misincorporation pattern II
C to T misincorporations at the first position of mtDNA fragments as a function of age.
Temporal Patterns of Nucleotide Misincorporations and DNA Fragmentation in Ancient
DNA. Sawyer S, PLoS One. 2012
Ancient DNA and NGS
• Single locus PCR
• Multiplex PCR
• Shotgun approach
• Custom capture  best approach in terms of:
• Discrimination endogenous vs
exogenous
• Cost
• Enrichment ratio
Forensic DNA
Forensic DNA
common features with aDNA
• Fragmentation
• Low amount
• High level of contamination
NGS applied to forensic
DNA:
Short Tandem Repeats
(STR) count and analysis
STR profiling
RNA seq: transcriptome of breast cancer
Rationale
•primary human lobular breast cancer tissue
• 132,000 reads
• validated by RT-PCR
Results:
• one deletion
• two novel ncRNAs
• ten unknown or rare transcript isoforms
• a novel gene fusion
• thousands of novel non-coding transcripts
• more than three hundred reads corresponding to the non-coding RNA
MALAT1, which is highly expressed in many human carcinomas.
RNA seq: transcriptome of breast cancer
The green and blue
arrows represent
the two halves of
the fusion
transcript which
map on the
opposite order to
the genome.
intragenic deletion in WHSC1L1, identified by the 99 bases long read 1B.
Metagenomics
Human
Metagenomics
Environmental
Metagenomics
Microbiome
A
microbiome
is
"the
ecological
community of commensal, symbiotic, and
pathogenic microorganisms that literally
share our body space."
Metagnomics Vs 16 S rRNA seq.
study of genetic
material recovered
directly from
environmental
samples.
Thanks to my colleagues
Alessandro Pietrelli
Marco Severgnini
Ingrid Cifola
Clarissa Consolandi
Clelia Peano
Roberta Bordoni
Eleonora Mangano
Eva Pinatel
Luca Petiti
Simone Puccio
Santosh Anand
Gianluca De Bellis
Cristina Battaglia
Thanks for your attention!
Lunch ?!
[email protected]