a pilot source tracking network of Next Generation Sequencing

Integration of NGS Desktop Sequencers to Build a
Global Genomic Network for Pathogen Traceback and
Outbreak Detection: Description of international (GMI,
WHO) and national (GenomeTrakr, 100K) activities.
Marc Allard Ph.D.
Microbiologist, Division of Microbiology, ORS,
Center for Food Safety and Applied Nutrition, FDA
Feb. 26th, 2014 Carleton University
Roles for Sequence-based Subtyping
1.
2.
3.
4.
5.
Attribution
Surveillance
Risk assessment and modeling
Define a legal adulterant
Replace traditional bacteriological typing procedures
Role of Sequence-based Subtyping during Outbreak
Investigation (cluster ID and source tracking):
Is a particular isolate part of the outbreak?
–
Or is it a sporadic or unrelated case ?
Have we seen this isolate before?
–
–
geospatial distribution of clones from different agricultural
regions
persistent clone in a manufacturing facility
Does this food/environmental isolate match this clinical
isolate?
–
can we link an isolate from food or a facility to an outbreak?
S. Enteritidis
XbaI Patterns
JEGX01.0004
BlnI Patterns
JEGA26.0002
Similar Patterns
Very Different Patterns
Same PFGE but
unrelated to the event
Distinct outbreak related groups
5
S. Bareilly Outbreak (April-June 2012)
FDA1202 SAL2919 Coriander Powder India
FDA1203 SAL2921 Frozen Baila Bangladesh
FDA1201 SAL2918 Coriander Bangladesh
FDA1112 SAL2877 Frozen Undeveined Shrimp India
FDA1203 SAL2921 Frozen Baila Bangladesh
FDA1116 SAL2885 Coriander Powder India
FDA1206 SAL2924 Fish Stomach Vietnam
SAL3133 Clinical MD
FDA1146 SAL2903 Hilsa Fish Thailand
FDA1148 SAL2904 Frozen Rock Lobster Tails United Arab Emirates
FDA1143 SAL2920 Lobster Tails Taiwan
FDA1155 SAL2889 Fresh
FDA1159 SAL0955 Unknown
Cantaloupe
SAL3132 USA
Environmental
USA
FDA713 SAL0949
Clinical MD
FDA1160
SAL0956
Environmental
USASAL2898 Chili
FDA1145
Environmental
USA SAL2914 Pabda
FDA1107
Powder Thailand
FDA1141
SAL2884 Frozen Crab with
Fish
Bangladesh
FDA1150
SAL2439 Frog
Claws
SriSAL2908
Lanka Ground
FDA1157
Legs Unknown
FDA1140
SAL2895 Red Chili
Red
Pepper
USA Fennel Seeds
FDA1147
SAL2886
Powder
FDA1165Pakistan
SAL2894 Raw
United
Arab
Emirates
FDA1161
SAL2876
Whisker
Shrimp Vietnam
FDA1163
SAL2879 Frozen Raw Esomus
Fish
Vietnam
FDA1164
SAL2887
Sand Goby
Swaison Whole
FDA1117
SAL2888Vietnam
Frozen
Fish Vietnam
FDA1139
SAL2890 Kheer
Shrimp
India
FDA1123
SAL2901 Sesame
Mix
Pakistan
FDA1118
SAL2891 Coriander
Seeds
India
FDA1124
SAL2902
Powder India
FDA1132
SAL2916
Coconut
FDA1142 India
SAL2910 Shell-on
Shrimp
FDA1200India
SAL2917 Cumin
Shrimp
Lanka Organic
FDA1129SriSAL2911
Powder India
FDA1131
SAL2915 Frozen
Black Pepper
India Chili
FDA1207
SAL2925
Rohu
RishSAL2906
India
FDA1126
Ginger
Powder
FDA1120India
SAL2896 Crushed
Powder
FDA1138India
SAL2900
Chilis
India
FDA1119
SAL2893 Frozen
Coriander
Mexico
FDA1156
SAL2892
Fish India
FDA1121
SAL2897 Sesame
FDA364 Irrigation Water USA
Seed
India
FDA1149
SAL2438 Nonfat Dry
ATCC
9115
FDA1152
SAL2434
Milk
Unknown
FDA1154
SAL2436
Poultry Meal
USA Poultry
FDA1153
SAL2435
Feather
MealSAL2913
USA
FDA1137
Feather
MealSAL2912
USA
FDA1130
Cayenne
Scallops
FDA1128 Indonesia
SAL2909 Punjabi
Pepper India
FDA1204
SAL2922 Chili
Cheole Spice
India
FDA1125
SAL2905
Turmeric
Powder SAL2882
India
FDA1115
Frozen Raw
Powder
FDA1127India
SAL2907
Peeled
Shrimp
IndiaFrozen
FDA1113
SAL2880
Shrimp India
FDA1202
SAL2919 Coriander
Shrimp India
FDA1201
SAL2918 Coriander
Powder
FDA1112India
SAL2877 Frozen
Bangladesh
FDA1203 SAL2921 Frozen Baila
Undeveined
ShrimpBangladeshi
India
FDA1205 SAL2923
Fresh Water
Bangladesh
FDA1116 SAL2885 Coriander
Fish (Bacha)
Bangladesh
FDA1206
SAL2924
Fish
Powder India
SAL3133
Stomach Vietnam
FDA1146
SAL2903 Hilsa
Clinical
FDA1148MD
SAL2904 Frozen Rock Lobster Tails
Fish
Thailand
FDA1143
SAL2920 Lobster
United
Arab
Emirates
FDA1144
SAL2883
Frozen Whole
Tails Taiwan
FDA1114
SAL2881 Frozen Raw
Tilapia
Thailand
SAL3150
Shrimp
SAL3140India
Clinical
SAL3130NY
Clinical
NY
SAL3126
Clinical
SAL3139MD
Clinical MD
SAL3148
Clinical
SAL3128NY
Clinical NY
SAL3127
Clinical MD
SAL3157
Clinical
SAL3131MD
Clinical
SAL3155NY
Clinical MD
SAL3151
Clinical
SAL3129NY
Clinical
SAL3141NY
Clinical MD
SAL3145
Clinical
SAL3152NY
Clinical
SAL3158NY
Clinical
SAL3153NY
Clinical
SAL3146NY
Clinical NY
SAL3142
Clinical NY
SAL3143
Clinical
SAL3154NY
Clinical
SAL3144NY
Clinical
SAL3156NY
Clinical
SAL3125NY
Clinical
SAL3149NY
Clinical
SAL3147MD
Clinical NY
Clinical NY
FDA1144 SAL2883 Frozen Whole Tilapia Thailand
FDA1114 SAL2881 Frozen Raw Shrimp India
SAL3140 Clinical NY
SAL3130 Clinical MD
SAL3126 Clinical MD
SAL3139 Clinical NY
SAL3148 Clinical NY
SAL3128 Clinical MD
SAL3127 Clinical MD
20-25 SNPs
SAL3157 Clinical NY
SAL3131 Clinical MD
SAL3155 Clinical NY
SAL3151 Clinical NY
SAL3129 Clinical MD
SAL3141 Clinical NY
SAL3145 Clinical NY
SAL3152 Clinical NY
SAL3158 Clinical NY
SAL3153 Clinical NY
SAL3146 Clinical NY
SAL3142 Clinical NY
SAL3143 Clinical NY
SAL3154 Clinical NY
SAL3144 Clinical NY
SAL3156 Clinical NY
SAL3125 Clinical MD
SAL3149 Clinical NY
SAL3147 Clinical NY
<=5 SNPs
PFGE
Match
110-130 SNPs
NGS distinguishes geographical structure among closely
related Salmonella Bareilly strains
Our Current Model
FDA, USDA, CDC
State, Local, Federal and
Foreign Public Health Agencies
Academia
NCBI, EMBL DDBJ
(Public Access Database)
DATA ANALYSIS
DATA ASSEMBLY AND
STORAGE
Network of Sequencers
DATA ACQUISITION
10
Global Microbial Identifier
http://www.g-m-i.org/
•
Make novel genomic technologies and informatics tools available for improved
global patient diagnostics, surveillance, research and public health response.
develop a global system to aggregate, share, mine and use microbiological
genomic data to address global public health and clinical challenges, a high
impact area in need of focused effort. 500 members in 30 countries
Work groups
1.Political challenges, outreach and building a
global network
2.Repository and storage of sequence and metadata
3.Analytical approaches
4.Ring trials and quality assurance
5.Pilot project
Expansion of FDA Network to
site in economically-developing
country
Benefits
1.Add diversity to genome database by opening new strain collections and
access to incurred food, animal and environmental samples available to
each site
2.Identify gaps in project assumptions and resources that would interfere
with global expansion of NGS networks
Considerations
1.Participate as a full member of the USFDA network
2.Submit data and metadata to public database
3.Focus on sequencing Salmonella food and environmental isolates
4.Resources available for use of sequencer for other projects
$500,000
Argentina
3 years
13
Network of Sequencers
7 state health depts.
+ 10 FDA-ORA
Inputs
o 1 Miseq system
o Sufficient reagents to
sequence > 300 genomes
per year
o Dedicated scientific staff
(bioinformatics and/or
laboratory support) through
Oak Ridge Institute for
Science and Education
(ORISE)
o Bioinformatics and
laboratory support, analysis
pipeline
Deliverables
o Minimum ~300 genomes
with metadata uploaded
to NCBI per annum,
minimum 20X coverage
o food and environmental
related bacterial (prefer
Salmonella) isolates
Miseq benchtop NGS system (Illumina)
Register BioSample
metadata at NCBI
1
Data Transfer
4
Illumina BaseSpace
Cloud
FDA-CFSAN
Isilon storage drive
3
SampleSheet
Outside FDA
network
2
BioSample
BioProject
Inside FDA
network
Data Generation
18 State and Federal Health labs
Register
strains
with
NCBI
Illumina
MiSeq Data
Generation
Data
transfer
Data
transfer to
FDA
Data QC and Submission at FDA
Batch QC
of data
Conversion
to SRA
format
Upload to
SRA
CFSAN Genomic Information Management System (GIMS) integration
Automated
WGS
accessions
Genome
assembly +
PGAP
annotation
Hybrid k-mer and
reference-based
SNP calling
analysis
NCBI pathogen detection pipeline
BioProject: http://www.ncbi.nlm.nih.gov/bioproject/183844
Data Generation
18 State and Federal Health labs plus
the world
Register
strains
with
NCBI
Collect
genome
sequence
Data
transfer
Data
transfer to
FDA
Data QA/QC on site
Batch QC
of data
Conversion
to SRA
format
Upload to
SRA
(CLC
Plugin)
CFSAN Genomic Information Management System (GIMS) integration
Automated
WGS
accessions
Genome
assembly +
PGAP
annotation
Hybrid k-mer and
reference-based
SNP calling
analysis
NCBI pathogen detection pipeline
BioProject: http://www.ncbi.nlm.nih.gov/bioproject/183844
FDA-State Desktop Pilot called GenomeTrakr
http://www.ncbi.nlm.nih.gov/bioproject/183844
MN and VA are newest partners. Mexico Sinaloa 1st international partner.
SRA completed experiments are ~2000 records to date.
Partners with sequencers
United Kingdom
Denmark
Italy
Argentina
Brazil
Germany
Canada
Partners with isolates
Ireland
Mexico
Turkey
Columbia
Chile
22
MINIMAL PATHOGEN METADATA
(FOODBORNE OUTBREAKS)
sample_name
organism
strain/isolate
What
Category (attribute_package)
1a) Clinical/Host-associated
1a1) specific_host
1a2) isolation_source
1a3) host-disease
OR
1b) Environmental/Food/Other
1b1) isolation_source
collection_date
Geographic location
When
Where
6a) geo_loc_name
OR
6b) lat_lon
collected by
Who
Example
Surveillance
Workflow
Existing
Salmonella
clade in
combined
Eubacteria
kmer tree
After Day 1
After Day 2
After Day 3
Extract cluster, Montevideo serovar
• 2 existing genomes
• 39 new genomes
• Extract sub-tree
• Re-root on outlier
Compute new subtree
• Reference tree based on SNPs
• Provides additional resolution
• Integrate with metadata
FDA_2010_142_Pistachio-3
31
Public/Private Partnership
•
•
•
•
•
•
UC Davis
FDA
NCBI
BGI@UCDavis
Agilent Technologies
CDC
• Affiliate members
–
–
–
–
–
Mars, Inc.
Harvard hospital system
Poultry Industry members
Culture collections
SEEKING ADDITIONAL PARTNERS
Role of Pacbio RS Technology
Yield closed genome assemblies to improve
accuracy of clustering and proper assembly of
large repetitive elements (phage, plasmid,
CRISPR)
Yield data on epigenetic modification of genomes
possibly to further discriminate strains and/or
provide information about virulence and
pathogenicity
33
With Pacific Biosciences technology we’ve sequenced over 60 Salmonella and Listeria
genomes and their associated mobile elements for complete reference genomes.
Chromosome
Plasmid
Salmonella enterica subsp. Enterica Serovar Cubana
Methylation motifs from 14 Genomes
G
CA
Serovar
Listeria monocytogenes
J1-220
Listeria monocytogenes
J1816
III
AG
A
II
TG
T
CA
T
GA
II
C
C
GG
W
C
orphan
C
I
RT
N
AY
NN
NN
C
II
T
NC
AG
C
G
NN
TC
N
NN
A
TT
R
N
N
N
CC
N
NN
NN
A
A
TA
C
AN
G
C
G
C
C
C
G
G
C
G
CT
I
I
II
I
NN
NN
TG
NN
GA
I
N
GN
AG
NN
NN
A
RT
T
GA
II
YG
G
CA
G
I
YN
AA
NN
G
NN
A
GT
TC
TC
C
II
S. Bareilly
Salmonella enterica
subsp. diarizaonae
= MTase identified
S. Abaetetuba
S. Abony
= MTase unknown
S. Anatum
= novel MTase
S. Braenderup
S. Cubana
S. Heidelberg
CFSAN002069
S. Heidelberg
CFSAN002064
S. Heidelberg
CFSAN000318
S. Montevideo
S. Typhimurium
Large scale Salmonella
enterica subsp.
enterica phylogeny
inferred from 156
genomes across 78
serovars
Timme et al, in
preparation
 Reference–free
approach for gathering
SNPs
 ML tree inferred from
~119,000 SNPs
 Timme et. al. in Gen.
Biol. Evol. 5(11):2109-
36
Collaborations
FDA CDRH:
Sequencing as a diagnostic device.
High performance computing.
NIST:
Standards for genomic sequencing.
FDA CVM: MDR isolates from NARMS collection.
DOJ:
Microbial Forensics for FERN support.
DOD:
R&D on traceback using metagenomics.
2014
FDA_2010_142_Pistachio-3
SNP = Single Nucleotide Polymorphism
TTCCCTAGCAC
TTCCTTAGCAC
ONE THING’S FOR CERTAIN:
IT TAKES A VILLAGE TO GET THERE! Many thanks to the following:
Division of Microbiology-FDA
Eric Brown Peter Evans
Ruth Timme, Narjol Gonzalez, Yi Chen, Maria Hoffman,
Christine Keys, George Kastanis, Tim Muruvanda,
Rebecca Bell, Cary Pirone, Andrea Ottesen, Ruth Timme,
Charlie Wang, Jie Zheng, Justin Payne
Division of Biostatistics-FDA
Errol Strain, Yan Luo, James Pettengill
CVM Cong Li, Pat McDermott, Shaohua Zhao
CDC John Besser, Eija Trees, Lee Katz, Patti Fields
Division of Molecular Biology-FDA
Chris Elkins, Darcy Hanes, Palmer Orlandi
FDA Division of Field Sciences Rebecca Dreisch
NYPH Bill Wolfgang Kimberly Musser and colleagues
MPH Alvina Chu and colleagues
National Institutes of health (NCBI)
David Lipman, Jim Ostell, William Klimke,
Martin Shumway
Office of Regulatory Science-FDA
Steve Musser, Kelly Bunning, Don Zink
40
Questions
Eric Brown: [email protected]
Peter Evans [email protected]
Marc Allard: [email protected]