Seeds of Discovery (SeeD)

Seeds of Discovery (SeeD)
Large-scale application of
GbS in the SeeD project:
‘Rightsizing’ of methods and
initial results
Sarah Hearne, Alberto
Romero, Huihui Li, Carolina
Sansaloni, Cesar Petroli,
Martha Willcox, Aleyda Sierra,
Hector Galvez, Manuel
Martinez, Sukwinder Singh,
Marc Ellis, Giovanny Soca,
Gary Atlin, Andrzej Kilian, Ed
Buckler, Peter Wenzl
International Maize Improvement
Consortium (IMIC)
Wheat
Yield
Consortium
(WYC)
Genetic
resources
Breeding
programs
Cultivar
adoption,
agronomy
Seeds of Discovery (SeeD) New
genetic variation to raise future
crop production
Take it to the
Farmers (TTF)
Increased
agricultural
production
Global average yield (tons
per hectare
Why SeeD?
8
7
6
5
4
3
2
1
0
Wheat
Maize
Anticipated
demand by
2050 (FAO)
Climate
change
Soil degradation
and falling water
tables
Costs of fertilizer
and energy
Genetic
erosion
[Source: USDA PDS database]
1960 1970 1980 1990 2000 2010 2020 2030 2040 2050
Year
Genetic
resources
for food security
Research emphasis
Breeding-oriented
[heat/drought
tolerance]
Genetically
simple traits
[some diseases,
phenology]
Main emphasis:
Mobilize novel alleles
for complex traits into
breeding programs
‘Low-hanging fruits’
for breeding
Seek collaborations to mine
data for basic research
Genetically
complex traits
Upstream
Strategy
1
•
•
•
•
•
Molecular
atlases
Asociación
genómica
Underutilized sources of genetic variation
Selection imprints
Heterotic patterns (maize)
Hidden translocations (wheat)
Rare recombinants
3
‘Bridging
germplasm’
2
Novel alleles
and allele donors
• Novel, beneficial alleles, haplotypes
• Markers linked to loci and alleles
that control priority traits
• Genetically distinct ‘donor
accessions’
Project areas
1
Molecular atlases
(diversity surveys)
2
Novel alleles and ‘allele donors’ (GWAS)
3
Pre-breeding  ‘bridging germplasm’
4
Information management
5
Capacities (genetic-analysis service)
GbS
Genotyping by sequencing (GbS)
• Transition from genotyping-by-assay (gel, hybridization)
towards genotyping-by-sequencing
• Similar to analogue  digital photography transition
• Simultaneously discovers DNA polymorphisms and classifies
their allelic states  advantage for characterizing unknown
genetic diversity in genebanks
•
Minimizes ascertainment bias
• Configurable platform: adjust No. of markers vs. No. of DNA
samples  two ‘flavors’:
•
DArT: ~60-70K markers, SNP & PAV, ~20-35% missing data, lower error
rates, calling of heterozygotes for subset of SNP markers, no imputation
 maize & wheat diversity surveys
•
Cornell: ~800K markers: only SNP, ~60% missing data, higher error rates,
no heterozygote detection, imputation  maize GWAS
Genetic-analysis service (SAGA)
● Provide services, based on modern genomics platforms, which address
the needs of demand-driven, impact-oriented agricultural R&D
● Partnership with DArT (Diversity Arrays Technology) in Australia
● Objectives:
 Economies
of
scale for
characterizing
SeeD samples
using GbS
 Genomeprofiling &valueadding services
to scientists in
Mexico and the
region
 Vehicle for
capacitybuilding
Database & interfaces for
primary data (KDDart,
IBFieldbook) for managing
experiments (inventories,
germplasm evaluation, etc.)
IT ‘ecosystem’ of SeeD
Web portal & data warehouse
(Germinate): , and validated
genotypic & phenotypic data
To be OpenSourced from
the first
production
version onwards
(2015)
Data access layer
Visualization
tools (Flapjack,
CurlyWhirly, …)
Database
modules
Genebank
management
(GrinGlobal)
Web
services
Collaboration with DArT and James Hutton Inst.)
High-level data repository (Genesys):
Passport & summarized data
Wheat diversity survey
● 42,000 accessions
sequenced to date using
DArTseq
● One individual per
accession
● ~30,000 SNP and ~30,000
PAV per sample
● Comprehensive diversity
analysis and design of AM
panels is underway
● Positioning of markers
using new consensus map
● Target: Characterize up
to 160,000 accessions
(120,000 from CIMMYT)
Building AM panels
Phenotypic values
Core set / AM panel
Genetic diversity
Maize diversity survey
● To get an accurate representation of maize
landraces we need to score heterozygotes
● DArTseq is based on multiple REs whose
combination deliberately generates a smaller
number of fragments for deeper sequencing
● PstI enzyme used for DArTseq partly overlaps with
ApeKI (Cornell)  partly overlapping
representations
● Can score heterozygotes in many loci as multiple
copies of each tag are sequenced (ca. 2 M
fragments are typically sequenced per sample)
Genotyping bulks
● Can genotype bulks and
derive population-level allele
frequencies
 Reduces costs of diversity
Pools
● Most accessions are
genetically heterogeneous
landraces  need to
genotype multiple individuals
(SSRs: 15–30 individuals)
survey by more than an order
of magnitude
● The allele frequencies derived
are representative of allele
frequencies in the accessions
(populations)
● PAVs: Genetic distances
among populations
Individual samples
No. of individuals per bulk?
● Compared separately
assembled bulks of
increasing size
● Little change above
bulk sizes of 32
● Used bulks of 30 leaf
discs from 30
individuals for diversity
survey
● Pooling at leaf-disk
and DNA sample levels
gave indistinguishable
results
4
8 12 16 20 24 28 32 36 40 44 48
4 22.3
21.5 18.8 17.2 17.8 17.1 17.5
17
17.9 17.7 17.4 17.2
8 11.7
9.8
6.4
4.5
4.5
4.1
3.2
4.3
4.3
4.2
4.2
3.5
12 12.2
9
6.1
3.9
3.7
2.1
3.2
3.3
2.8
3.1
3.3
2.6
16 11.2
9
5.1
2.6
2.4
1.8
2.3
2.4
1.9
2.3
1.7
2
20 11.9
9.4
3.7
2.6
3.3
3.1
2.4
2.1
3
2.1
2.8
2.3
24 12.2
9.7
5.9
2.1
1.4
2.2
2.7
1.6
3
2.4
1.4
2
28 10.2
9.7
6.6
3.9
4
3.5
2.3
3
3.2
2.7
3.4
2.5
32 11.9
9.1
5.2
2.5
2.3
1.7
2.2
2
1.7
2.2
1.6
1.8
36 11.9
8.8
4.3
2.2
2.7
2.5
2.1
1.5
2.5
1.7
2.3
1.7
40 11.3
8.8
5
2.4
2.3
1.4
1.2
1.7
1.6
1.2
1.6
1
44 11.7
9
4.7
1.9
2.1
2.3
2.1
1.4
2.4
1.8
1.8
1.6
48 11.1
8.1
4.7
2.6
2.5
1.7
1.7
2.3
2.3
1.8
2
1.1
Accession 1 Accession 2
Accession 3 ... Accession 40,000
30
plants
each
1 DNA
sample
each
Molecular
Atlas
Genetic relationships
amongst accessions,
selection footprints,
race classification, etc.
Started to
genotype up
to 40,000
accessions
High-density
genome
profiles from
“bulk” samples
Allele frequencies
within accessions
Just finished 20,000 accessions…
• > 230,000 SNP identified (likely to increase upon re-calling
the entire set)
• Only 20% map to B73 reference genome! Whole-genome
re-sequencing of ca. 20 landraces in progress..
• Enriched for gene-rich regions (methylation filtration
effect)
• Target: Characterize up to 40,000 accessions (27,000
from CIMMYT)
No. of
SNP
within
window
Position on chromosome
Next steps
● Environmental-selection footprints
 18,500 accessions with good-quality geo-location
data
 Extracted long-term abiotic environment data
 Identify allele/haplotype-frequency gradients
across environmental clines in entire genebank
collection
● Breeding-selection footprints
 Multiple cycles of recurrent-selection populations
genotyped
 Identify response to selection
● Race-specific footprints
Maize GWAS
Accession 1
Accession 4,500
GWAS
…
Tester
Tester
GbS
Field
trials
● Existing core
collection of 4,500
landraces, three
adaptation zones
● Assumption:
haplotypes replicated
across accessions 
testcross one
individual per
accession with
adaptation-zonespecific hybrid
● Genotyped testcross
parents
Field trials for GWAS
Collected 700,000 data
points from 34 trials
across 14 locations
Traits evaluated
Abiotic
stresses
heat
drought
low N
Biotic
stresses
tar spot, ear rot, stalk
rot, Turcicum,
Cercospora
Grain
quality
hardness, starch, oil,
amino acids,
phenolics
GbS profiles of testcross parents
● Genotyped both with
Cornell GBS and
DArTseq methods


Highland
Subtropical
Tropical
36 Latin American countries
Maximize marker
density (Cornell)
Enable identification of
heterozygote regions
(DArTseq)
● Imputation based on
prevalent haplotypes
detected in ca.
40,000 maize samples
genotyped on Cornell
platform
● Little genetic structure
Proof of concept: Days to silking
● GWAS approach works
● Marker density just sufficient
Anthesis: Teocinte-derived inversion
Tar spot disease complex
● Up to 46%
yield loss
Yield
● Caused by
Phyllachora
maydis and
Monographell
a maydis in
association
Testcrosses
Accessions
`
Tar spot incidence
Chromosome 9
(position 139,172,758):
P = 1.01e -7
Next steps
1
Molecular
atlases
Asociación
genómica
2
Novel alleles
and allele donors
3
New breeding
approaches and
technologies; new
tools such as GS
‘Bridging
germplasm’
Elite germplasm
selected by
breeders
• Breeder-ready lines &populations with new, beneficial alleles for priority characters in elite genetic
backgrounds  joined linkage/association mapping & trait mobilization into breeding programs
• Molecular markers linked to beneficial alleles and statistical models for estimating breeding values to
accelerate genetic progress in breeding programs
Maize ‘bridging germplasm’
Useful novel
alleles &
haplotypes
Early
generation
lines & pools
enriched for
favorable
alleles
…using multiple strategies
defined by trait complexity
and breeder needs
(desired input germplasm,
demand for new sources)
Breeder
demand
Trait complexity
Monogenic
(1-3)
Oligogenic
(4-10)
Polygenic
(>10)
Urgent
DH from
landrace &
landrace /
line crosses,
selfing
DH from
landrace &
landrace /
line crosses,
selfing
GS with
MABC for
BC1S1
development
Mediumterm
MABC
MARS &
prediction
index
GS with
MABC for
BC1S2
development
Long-term
MABC &
GS
MARS,
prediction
index & GS
GS with
MABC for
BC1S2
development
Wheat ‘bridging germplasm’
Exotic 2
Exotic 1
Elite 2
Elite 1
50:50
Exotic
Exotic
Exotic
Elite
50:50
Elite
50:50
25:75
Family 1 of
fixed lines
Family 2 of
fixed lines
Exotic
50:50
Exotic
Elite
Elite
50:50
Exotic
Elite
Elite
50:50
Exotic
Elite
Elite
50:50
Exotic
Elite
Elite
50:50
50:50
Exotic
Elite
Elite
parents
• 200 exotics (synthetics,
landraces; FIGS)
• 10 elites selected by
breeders
• Currently at TC1F3
stage
Exotic
Elite
Elite
with
partly
• TC
chains with partly
overlapping
parents
overlapping
elite
Elite 3
25:75
Elite
Elite
50:50
Elite 2
Exotic
Elite
Linked topcross
panel (LTP)
Linked
topcrosses
50:50
Exotic
Elite
Elite
50:50
Exotic
Elite
Elite
50:50
Exotic
Elite
Elite
50:50
Exotic
Elite
Elite
50:50
Exotic
Elite
Elite
50:50
Elite
Elite
50:50
Elite
Elite
50:50
Elite
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
25:75
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Family of
fixed lines
Linked
Topcross Panel (LTP)mapping
for joint linkage/association
mapping
Joint linkage/association
to identify novel exotic
alleles
that are expressed across several elite genetic backgrounds
Thank
you!
http://seedsofdiscovery.org ([email protected])
Paritipants from Mexican
institutions
Participants from
CIMMYT
Participants from
other countries
Jonás Aguirre (UNAM), Flavio Aragón (INIFAP), Odette Avendaño (LANGEBIO), Ed Buckler (Cornell Univ.), Juan Burgueño, Vijay Chaikam,
Alain Charcosset (AMAIZING), Gabriela Chávez (INIFAP), Jiafa Chen, Charles Chen, Andrés Christen (CIMAT), Angelica Cibrian (LANGEBIO),
Héctor M. Corral (AGROVIZION), Moisés Cortés (CNRG), Sergio Cortez (UPFIM), Denise Costich, Lino de la Cruz (UdeG), Armando Espinosa
(INIFAP), Néstor Espinosa (INIFAP), Gilberto Esquivel (INIFAP), Luis Eguiarte (UNAM), Gaspar Estrada (UAEM), Juan D. Figueroa (CINVESTAV),
Pedro Figueroa (INIFAP), Jorge Franco (UDR), Guillermo Fuentes (INIFAP), Amanda Gálvez (UNAM), Héctor Gálvez (SAGA), Karen García,
Silverio García (ITESM), Noel Gómez (INIFAP), Gregor Gorjanc (Roslin Inst.), Sarah Hearne, Carlos Hernández, Juan M. Hernández (INIFAP),
Víctor Hernández (INIFAP), Luis Herrera (LANGEBIO), John Hickey (Roslin Inst.), Huntington Hobbs, Puthick Hok (DArT), Javier Ireta (INIFAP),
Andrzej Kilian (DArT), Huihui Li, Francisco J. Manjarrez (INIFAP), David Marshall (JHI), César Martínez, Carlos G. Martínez (UAEM), Manuel
Martínez (SAGA), Iain Milne (JHI), Terrence Molnar, Moisés M. Morales (UdeG), Henry Ngugi, Alejandro Ortega (INIFAP), Iván Ortíz,
Leodegario Osorio (INIFAP), Natalia Palacios, José Ron Parra (UdeG), Tom Payne, Javier Peña, Cesar Petroli (SAGA), Kevin Pixley, Ernesto
Preciado (INIFAP), Matthew Reynolds, Sebastian Raubach (JHI), María Esther Rivas (BIDASEM), Carolina Roa, Alberto Romero (Cornell Univ.),
Ariel Ruíz (INIFAP), Carolina Saint-Pierre, Jesús Sánchez (UdeG), Gilberto Salinas, Yolanda Salinas (INIFAP), Carolina Sansaloni (SAGA),
Ruairidh Sawers (LANGEBIO), Sergio Serna (ITESM), Paul Shaw (JHI), Rosemary Shrestha, Aleyda Sierra (SAGA), Pawan Singh, Sukhwinder
Singh, Giovanni Soca, Ernesto Solís (INIFAP), Kai Sonder, Maria Tattaris, Maud Tenaillon (AMAIZING), Fernando de la Torre (CNRG), Heriberto
Torres (Pioneer), Samuel Trachsel, Grzegorz Uszynski (DArT), Ciro Valdés (UANL), Griselda Vásquez (INIFAP), Humberto Vallejo (INIFAP), Víctor
Vidal (INIFAP), Eduardo Villaseñor (INIFAP), Prashant Vikram, Martha Willcox, Peter Wenzl, Víctor Zamora (UAAAN)
Contributed at the beginning: Gary Atlin, Michael Baum (ICARDA), David Bonnett, Paul Brennan (CropGen), Etienne Duveiller, Mustapha ElBouhssini (ICARDA), Marc Ellis, Ky Matthews, Bonnie Furman, Marta Lopes, George Mahuku, Francis Ogbonnaya (ICARDA), Ken Street
(ICARDA)