In-Depth Proteome Coverage by Iterative Data

In-Depth Proteome Coverage by
Iterative Data-Dependent Acquisition on
a Benchtop Orbitrap Mass Spectrometer
Mathias Müller, Tabiwang N. Arrey, Florian Große-Coosmann, Thomas Rietpietsch, Andreas
Kühn, Catharina Crone, Frank Ciesinski, Torsten Ueckert, Markus Kellmann
Thermo Fisher Scientific, Bremen, Germany
Overview
Data Analysis
Purpose: Improved identification of low-copy number proteins.
Methods: Iterative data-dependent MS/MS bottom-up proteomics on a Thermo
Scientific™ Q Exactive™ HF benchtop Orbitrap mass spectrometer.
Results: Significantly increasing the dynamic range for protein identification in complex
proteome samples, by employing an iterative acquisition strategy.
Introduction
For large-scale bottom-up tandem MS (MS/MS) protein sequencing techniques, datadependent TopN methods are widely established. Running numerous replicates
increases the number of identified proteins only to a certain point, since the method
prefers high-intensity peptides (belonging to highly concentrated proteins) for precursor
triggering. The sequence coverage of higher abundant proteins is increased, but the
probability to identify a higher number of low-intensity peptides is not improved. It was
shown, that an iterative approach of excluding previously triggered precursors for the
subsequent run allows deeper sequencing of low-copy number proteins [1].
Here, we describe a methodology using extremely rich inclusion and exclusion lists to
increase the identification of low-abundant proteins significantly.
Methods
Thermo Scientific™ Pierce™ HeLa Protein Digest Standard was diluted in HPLC grade
H2O (Fisher Scientific) to a final concentration of 0.5 µg/µL. For each run, 1 µg HeLa
was injected.
TABLE 1. Liquid Chromatography.
Chromatography
Settings
LC Stack
Thermo Scientific™ Dionex™ UltiMate™ 3000 RSLCnano
system equipped with nano pump NCS-3500 and autosampler
WPS-3000TPL
Mobile Phases
A: 0.1 % FA in water
B: 0.1 % FA in Acetonitrile (Fisher Chemicals)
Gradients
15 min 5–10 % B; then added to final gradient length of
30, 60, or 90 min 10–40 % B
Flow Rate
250 nL/min
Trapping Column
Thermo Scientific™ Acclaim™ PepMap™100 µCartridge
Column C18, 300 μm × 0.5 cm, 5 μm, 100 Å (backflush mode)
Separation Column
Acclaim PepMap C18, 75 μm × 50 cm, 2 μm, 100 Å
TABLE 2. Mass Spectrometry: Q Exactive HF.
Settings Master Run
Settings Iterative Run
Exclusion
–
on (mass tolerance ±10 ppm)
Resolution Full MS
60,000
60,000
AGC Target Full MS
3e6
3e6
Loop Count
20
20
dd Resolution
15,000
15,000
dd Target
1e5
1e5
dd-MS2
max IT
FIGURE 1. Flow chart of iter
low-abundant precursor ion
Master Runs
triplicate 1µg HeLa
„parallel“ data acq.
PD database search
fasta Uniprot A9609
Highest #
of protein
groups?
Sample Preparation
TopN Properties
A triplicate master run was ana
2.0 search engine SEQUEST®
Based on the master run with
exclusion list was exported as
Exactive Series 2.4 instrument
Table 2. For the final Proteome
were compared against a data
50 ms (“parallel”)
100 ms (“sensitive”)
Isolation Window
1.4 m/z
1.4 m/z
NCE
28
28
dd Underfill Ratio
5%
5%
Peptide Match
preferred
preferred
Dynamic Exclusion
30 s
30 s
2 In-Depth Proteome Coverage by Iterative Data-Dependent Acquisition on a Benchtop Orbitrap Mass Spectrometer
yes
no
data not used for
further iteration
Results
Iterative Approaches
1) Classic: All PSM from a
iteration.
2) Alternating bins: The m
m/z scan ranges (bins) in
The five bins are acquire
3) Merged bins: Five TopN
These five raw files are p
FIGURE 2. Scatter plot of all
retention time (RT) from a 30
divided into five bins indicat
MS/MS search input (3284 ea
correlating well.
roteins.
roteomics on a Thermo
pectrometer.
protein identification in complex
n strategy.
sequencing techniques, datang numerous replicates
tain point, since the method
entrated proteins) for precursor
proteins is increased, but the
eptides is not improved. It was
y triggered precursors for the
number proteins [1].
nclusion and exclusion lists to
ificantly.
Data Analysis
Comparison of Iterative Approache
A triplicate master run was analyzed with Thermo Scientific™ Proteome Discoverer™
2.0 search engine SEQUEST® HT against Uniprot fasta database human A9609.
Based on the master run with the highest number of protein group IDs, a PSM
exclusion list was exported as a *.csv file. This exclusion list was imported in the
Exactive Series 2.4 instrument TopN method using settings “Iterative Run” from
Table 2. For the final Proteome Discoverer Report, the HeLa protein copy numbers
were compared against a data sheet [2].
Reducing the number of potential MS
range is a common technique to trigg
[3]. Here, three different iterative app
classic, alternating bins, and merged
master run. Gradients of 30 min, 60 m
FIGURE 1. Flow chart of iterative approach for data-dependent triggering of
low-abundant precursor ions.
Master Runs
triplicate 1µg HeLa
„parallel“ data acq.
Highest #
of protein
groups?
dard was diluted in HPLC grade
µL. For each run, 1 µg HeLa
no
data not used for
further iteration
ltiMate™ 3000 RSLCnano
mp NCS-3500 and autosampler
er Chemicals)
o final gradient length of
PepMap™100 µCartridge
5 μm, 100 Å (backflush mode)
50 cm, 2 μm, 100 Å
Settings Iterative Run
Triplicate
Master
Iteration 1
Iteration 2
…
Iteration N
PD database search
fasta Uniprot A9609
FIGURE 3. Venn diagrams of SEQU
groups (upper panel) and identified
master runs and three different ite
Gradient length: 60 min.
Iterative
Classic
yes
yes
PSM m/z added to
Exclusion list
Ex Series ME
„sensitive“ data acq.
Further
iteration?
no
PD Report
node „copy
numbers“ MPI
FIGURE 4. Percentage increase or
Peptide Groups obtained from two
triplicate master run of the same g
bins gives optimal results.
Results
Iterative Approaches
1) Classic: All PSM from a master run are imported as an exclusion list for the next
iteration.
2) Alternating bins: The master scan ranges from method 1) are divided into five
m/z scan ranges (bins) indicating the same number of PSMs (see Figure 2).
The five bins are acquired alternately in a master scan of one TopN method.
3) Merged bins: Five TopN methods are created using one bin of method 2) each.
These five raw files are processed to one Proteome Discoverer *.msf file.
FIGURE 2. Scatter plot of all peptide spectral matches (PSMs) m/z against
retention time (RT) from a 30 min gradient HeLa master run. The scan range is
divided into five bins indicating the same number of PSMs (2131 each) and
MS/MS search input (3284 each). The distribution of PSM bins and MS/MS bins is
correlating well.
on (mass tolerance ±10 ppm)
IDs of Iterative Runs
10.0
30 min
8.0
6.0
4.0
1.3
2.0
0.0
-2.0
-0.2
-6.0
-8.0
-10.0
-2.9
-4.0
-4.8
-6.0
-8.6
60,000
3e6
20
15,000
1e5
100 ms (“sensitive”)
1.4 m/z
28
5%
preferred
MSMS bin size: 3284 SEs
PSM bin size: 2131 SEs
30 s
Thermo Scientific Poster Note • PN-64120-ASMS-EN-0614S 3
cientific™ Proteome Discoverer™
asta database human A9609.
f protein group IDs, a PSM
usion list was imported in the
settings “Iterative Run” from
the HeLa protein copy numbers
ata-dependent triggering of
Comparison of Iterative Approaches with Triplicate Master Runs: Protein IDs
Deeper Sequencing Using Itera
Reducing the number of potential MS/MS candidates by narrowing the full MS scan
range is a common technique to trigger a larger amount of low-intense precursor ions
[3]. Here, three different iterative approaches using two iterations each were evaluated:
classic, alternating bins, and merged bins. The results were compared to a triplicate
master run. Gradients of 30 min, 60 min, and 90 min were run in duplicates.
Using an iterative approach, lowTopN experiment. Compared to a
with copy numbers ranging from
becomes more significant at grad
FIGURE 3. Venn diagrams of SEQUEST HT results comparing identified protein
groups (upper panel) and identified peptide groups (lower panel) for triplicate
master runs and three different iterative methods using two iterations each.
Gradient length: 60 min.
Iterative
Classic
Iterative
Alternating Bins
500
Iterative
Merged Bins
acq.
Reference
Iterative
400
yes
to
HeLa Protein Co
450
Counts (Zoom)
Triplicate
Master
ation 1
ation 2
…
ation N
FIGURE 5. Comparison of HeL
approach with five merged sca
master runs (blue bars) and th
Further
iteration?
Triplicate
350
300
250
200
150
100
50
0
no
PD Report
node „copy
numbers“ MPI
FIGURE 4. Percentage increase or decrease of SEQUEST HT Protein Groups and
Peptide Groups obtained from two iterations of HeLa runs compared to a
triplicate master run of the same gradient length. Using two iterations of merged
bins gives optimal results.
om method 1) are divided into five
mber of PSMs (see Figure 2).
ster scan of one TopN method.
d using one bin of method 2) each.
eome Discoverer *.msf file.
atches (PSMs) m/z against
master run. The scan range is
er of PSMs (2131 each) and
n of PSM bins and MS/MS bins is
IDs of Iterative Runs compared to Triplicate Master Runs
10.0
30 min
8.0
60 min
5.4
6.0
-0.2
-8.0
-10.0
-3.3
-4.8
-6.0
1.4
1.7
1.0E+0
Reference
Iterative
1600
Triplicate
1400
1200
1000
800
600
400
0.7
200
0
90 min
-1.0
-1.3
-2.9
-4.0
-6.0
2.8
1.3
2.0
-2.0
6.6
3.8
4.0
0.0
7.5
1.0E+01
HeLa Protein Co
1800
Counts (Zoom)
ted as an exclusion list for the next
2000
1.0E+00
1.0E+00
1.0E+01
1.0E+0
-3.0
Benefit Protein Groups ID [%]
Benefit Peptide Groups ID [%]
2000
-8.6
1800
1600
Counts (Zoom)
1400
HeLa Protein Co
Reference
Iterative
Triplicate
1200
1000
800
600
400
MSMS bin size: 3284 SEs
PSM bin size: 2131 SEs
4 In-Depth Proteome Coverage by Iterative Data-Dependent Acquisition on a Benchtop Orbitrap Mass Spectrometer
200
0
1.0E+00 1.0E+01 1.0E+0
Bin:
by narrowing the full MS scan
unt of low-intense precursor ions
wo iterations each were evaluated:
s were compared to a triplicate
were run in duplicates.
s comparing identified protein
ps (lower panel) for triplicate
using two iterations each.
Preferred Triggering of Low-inte
Using an iterative approach, low-intense precursor ions are preferentially triggered in a
TopN experiment. Compared to a standard triplicate run, a larger amount of proteins
with copy numbers ranging from 1e3 to 1e5 will be identified (see Figure 5). This effect
becomes more significant at gradients longer than 60 minutes.
Histograms in Figure 6 indicate tha
intensities smaller than 1e6. The h
approach derive from the first mas
exclusion list for the subsequent ite
FIGURE 5. Comparison of HeLa protein copy numbers [2] using an iterative
approach with five merged scan range bins (yellow bars) compared to triplicate
master runs (blue bars) and the reference database results (grey bars).
FIGURE 6. Histogram comparing
master run (blue bars) with itera
complete distribution for 30, 60,
zoomed y axes.
500
Iterative
Merged Bins
HeLa Protein Copy Numbers 30 min
10000
25000
9000
8000
7000
Reference
450
20000
6000
5000
4000
15000
3000
2000
Iterative
400
Counts (Zoom)
ative
ting Bins
Deeper Sequencing Using Iterative Approach
Counts
e Master Runs: Protein IDs
0
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1.0E+08
Bin: Copy Number (1/4 Order of Magnitude)
0
300
250
Bin: Intensity
200
150
50000
100
40000
0
30000
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
5.4
6.6
3.8
1.0E+07
1.0E+08
1.4
Bin: Intensity
9000
8000
Counts
Reference
6000
5000
4000
3000
2000
Iterative
1000
0
1.0E+00
1.0E+01
Triplicate
1400
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1.0E+08
Bin: Copy Number (1/4 Order of Magnitude)
1200
1000
800
0
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1.0E+08
Bin: Copy Number (1/4 Order of Magnitude)
Benefit Protein Groups ID [%]
Benefit Peptide Groups ID [%]
2000
1800
1600
1400
Iterative 90 min
Triplicate 90 min
Conclusion
200
90 min
70000
60000
50000
40000
30000
20000
10000
0
Bin: Intensity
600
400
0.7
10000
10000
HeLa Protein Copy Numbers 90 min
9000
8000
7000
Reference
Iterative
Triplicate
Compared to a standard triplicate T
approach reveals to following bene
 Optimal iterative conditions us
10000
Counts
1.7
Triplicate 60 min
0
7000
1600
Counts (Zoom)
2.8
HeLa Protein Copy Numbers 60 min
1800
Counts (Zoom)
7.5
2000
Iterative 60 min
20000
Bin: Copy Number (1/4 Order of Magnitude)
plicate Master Runs
10000
5000
50
QUEST HT Protein Groups and
eLa runs compared to a
Using two iterations of merged
Triplicate 30 min
1000
Triplicate
350
Iterative 30 min
6000
5000
4000
3000
2000
1000
0
1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07 1.0E+08
Bin: Copy Number (1/4 Order of Magnitude)
1200
 Higher number of protein grou
 Deeper sequencing in a prote
References
1. Chen, H.; Rejtar, T.; Andreev,
77, 7816–7825.
2. Kulak, N. A.; Pichler, G.; Paro
319–324.
1000
800
3. Scherl, A.; Shaffer, S. A.; Taylo
R. Anal Chem. 2008, 80(4), 11
600
400
200
0
1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07 1.0E+08
Bin: Copy Number (1/4 Order of Magnitude)
SEQUEST is a registered trademark of the U
Thermo Fisher Scientific and its subsidiaries.
This information is not intended to encourage
intellectual property rights of others.
Thermo Scientific Poster Note • PN-64120-ASMS-EN-0614S 5
Preferred Triggering of Low-intense Precursor Ions in a TopN Experiment
are preferentially triggered in a
, a larger amount of proteins
tified (see Figure 5). This effect
inutes.
Histograms in Figure 6 indicate that the iterative approach is effective for precursor ion
intensities smaller than 1e6. The higher intensity precursor ions of the iterative
approach derive from the first master run, which is necessary to define a global
exclusion list for the subsequent iterations.
ers [2] using an iterative
bars) compared to triplicate
results (grey bars).
FIGURE 6. Histogram comparing all PSM precursor ion intensities of triplicate
master run (blue bars) with iterative runs (red bars). Left panels show the
complete distribution for 30, 60, and 90 min gradients. Right panels show
zoomed y axes.
min
10000
8000
7000
Counts
3000
25000
9000
20000
6000
5000
4000
15000
3000
2000
Iterative 30 min
2500
Iterative 30 min
Triplicate 30 min
2000
Triplicate 30 min
1500
1000
0
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1.0E+08
Bin: Copy Number (1/4 Order of Magnitude)
10000
1000
5000
500
0
0
Bin: Intensity
Bin: Intensity
10000
50000
40000
30000
0E+05
1.0E+06
1.0E+07
1.0E+08
Magnitude)
min
Iterative 60 min
Triplicate 60 min
8000
6000
20000
4000
10000
2000
0
0
9000
Triplicate 60 min
Bin: Intensity
Bin: Intensity
10000
Iterative 60 min
8000
Counts
7000
6000
5000
4000
3000
2000
1000
0
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1.0E+08
Bin: Copy Number (1/4 Order of Magnitude)
70000
60000
50000
40000
30000
20000
10000
0
10000
Iterative 90 min
Triplicate 90 min
8000
6000
Iterative 90 min
Triplicate 90 min
4000
2000
0
Bin: Intensity
Bin: Intensity
Conclusion
0E+05
1.0E+06
1.0E+07
1.0E+08
Magnitude)
min
 Optimal iterative conditions using smaller scan ranges for master scans
 Higher number of protein groups ids
10000
9000
 Deeper sequencing in a protein copy number range between 1e3 and 1e5
8000
7000
Counts
Compared to a standard triplicate TopN experiment on Q Exactive HF, the iterative
approach reveals to following benefits:
6000
5000
4000
3000
2000
1000
0
1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07 1.0E+08
Bin: Copy Number (1/4 Order of Magnitude)
References
1. Chen, H.; Rejtar, T.; Andreev, V.; Moskovets, E.; Karger, B. L. Anal. Chem. 2005,
77, 7816–7825.
2. Kulak, N. A.; Pichler, G.; Paron, I.; Nagaraj, N.; Mann, M. Nat. Methods 2014, 11,
319–324.
3. Scherl, A.; Shaffer, S. A.; Taylor, G. K.; Kulasekara, H. D.; Miller, S. I.; Goodlett, D.
R. Anal Chem. 2008, 80(4), 1182–1191.
0E+05 1.0E+06 1.0E+07 1.0E+08
of Magnitude)
SEQUEST is a registered trademark of the University of Washington. All other trademarks are the property of
Thermo Fisher Scientific and its subsidiaries.
This information is not intended to encourage use of these products in any manners that might infringe the
intellectual property rights of others.
PO64120-EN 0614S
6 In-Depth Proteome Coverage by Iterative Data-Dependent Acquisition on a Benchtop Orbitrap Mass Spectrometer
www.thermoscientific.com
©2014 Thermo Fisher Scientific Inc. All rights reserved. ISO is a trademark of the International Standards Organization.
SEQUEST is a registered trademark of the University of Washington. All other trademarks are the property of Thermo Fisher
Scientific and its subsidiaries. This information is presented as an example of the capabilities of Thermo Fisher Scientific products. It is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of
others. Specifications, terms and pricing are subject to change. Not all products are available in all countries. Please consult
your local sales representative for details.
Africa +43 1 333 50 34 0
Australia +61 3 9757 4300
Austria +43 810 282 206
Belgium +32 53 73 42 41
Canada +1 800 530 8447
China 800 810 5118 (free call domestic)
400 650 5118
Denmark +45 70 23 62 60
Europe-Other +43 1 333 50 34 0
Finland +358 9 3291 0200
France +33 1 60 92 48 00
Germany +49 6103 408 1014
India +91 22 6742 9494
Italy +39 02 950 591
Japan +81 45 453 9100
Latin America +1 561 688 8700
Middle East +43 1 333 50 34 0
Netherlands +31 76 579 55 55
New Zealand +64 9 980 6700
Norway +46 8 556 468 00
Russia/CIS +43 1 333 50 34 0
Thermo Fisher Scientific,
San Jose, CA USA is
ISO 9001:2008 Certified.
Singapore +65 6289 1190
Spain +34 914 845 965
Sweden +46 8 556 468 00
Switzerland +41 61 716 77 00
UK +44 1442 233555
USA +1 800 532 4752
PN-64120-EN-0614S