Download Presentation

Multivariate data-modelling to separate light
scattering from light absorbance:
The Optimized Extended Multiplicative Signal
Correction OEMSC
The importance of spectral preprocessing in data
analysis for aquaphotomics
Harald Martens
Dept. Engineering Cybernetics, Norwegian U. of Science and Technology,
Trondheim, Norway. Email: [email protected]
When studying complex systems, vis/NIR
spectoscopy is informative.
Many causes of variation: Light scattering, path length,
nonlinear responses, organic absorbers, temperature effects,
water concentration, water changes,…
Example: Belusov-Zhabotinsky reaction.
Anna Zhyrova, Dalibor Stys, U. South Bohemia:
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
When studying complex systems, vis/NIR
spectoscopy is informative.
Many causes of variation: Light scattering, path length,
nonlinear responses, organic absorbers, temperature effects,
water concentration, water changes,…
Example: Belusov-Zhabotinsky reaction.
Anna Zhyrova, Dalibor Stys, U. South Bohemia:
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
When studying complex systems, vis/NIR
spectoscopy is informative.
Many causes of variation: Light scattering, path length,
nonlinear responses, organic absorbers, temperature effects,
water concentration, water changes,…
Example: Belusov-Zhabotinsky reaction.
Anna Zhyrova, Dalibor Stys, U. South Bohemia:
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
NIT of «difficult samples» :
Mixtures of wheat protein and wheat starch powders
(5 different ratios: 0,25,50,75,100% protein)
Different sample thickness, different sample packing
2 technical replicates
MSC:
Martens, H., Jensen, S.Å. and Geladi, P. (1983)
Multivariate linearity transformation for near-infrared reflectance
spectrometry.
Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June
12.-14. 1983. Stokkand Forlag Publ., Skagenkaien 12, N-4000
Stavanger, Norway, ISBN 82-90496-02-8, 208-234.
Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and
scatter-correction for near-infrared reflectance of meat.
Applied Spectroscopy, 39, 3, 491-500.
EMSC: H.Martens and E.Stark (1991)
Extended multiplicative signal correction and spectral interference
subtraction: New preprocessing methods for near infrared spectroscopy.
Go to Multivariate calibration?
J.Pharmaceutical & Biomedical Analysis 9(8),625-635.
Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003)
Light Scattering and Light Absorbance Separated by Extended Multiplicative
Signal Correction. Application to Near-Infrared Transmission Analysis of
Powder Mixtures.
Anal. Chem. 75 (3) pp 394 – 404.
Bad news
NIT of «difficult samples» :
Mixtures of wheat protein and wheat starch powders
(5 different ratios: 0,25,50,75,100% protein)
Different sample thickness, different sample packing
2 technical replicates
MSC:
Martens, H., Jensen, S.Å. and Geladi, P. (1983)
Multivariate linearity transformation for near-infrared reflectance
spectrometry.
Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June
12.-14. 1983. Stokkand Forlag Publ., Skagenkaien 12, N-4000
Stavanger, Norway, ISBN 82-90496-02-8, 208-234.
PLS regression:
Imperfect. Needs 5 PCs.
Should need only 1 PC
Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and
scatter-correction for near-infrared reflectance of meat.
Applied Spectroscopy, 39, 3, 491-500.
EMSC: H.Martens and E.Stark (1991)
Extended multiplicative signal correction and spectral interference
subtraction: New preprocessing methods for near infrared spectroscopy.
J.Pharmaceutical & Biomedical Analysis 9(8),625-635.
Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003)
Light Scattering and Light Absorbance Separated by Extended Multiplicative
Signal Correction. Application to Near-Infrared Transmission Analysis of
Powder Mixtures.
Anal. Chem. 75 (3) pp 394 – 404.
Bad news
NIT of «difficult samples» :
Mixtures of wheat protein and wheat starch powders
(5 different ratios: 0,25,50,75,100% protein)
Different sample thickness, different sample packing
2 technical replicates
MSC:
Martens, H., Jensen, S.Å. and Geladi, P. (1983)
Multivariate linearity transformation for near-infrared reflectance spectrometry.
Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag
Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234.
Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for
near-infrared reflectance of meat.
Applied Spectroscopy, 39, 3, 491-500.
EMSC: H.Martens and E.Stark (1991)
Extended multiplicative signal correction and spectral interference
subtraction: New preprocessing methods for near infrared spectroscopy.
J.Pharmaceutical & Biomedical Analysis 9(8),625-635.
Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003)
Light Scattering and Light Absorbance Separated by Extended Multiplicative
Signal Correction. Application to Near-Infrared Transmission Analysis of
Powder Mixtures.
Anal. Chem. 75 (3) pp 394 – 404.
The solution, of course: MSC or SNV!
NIT of «difficult samples» :
Mixtures of wheat protein and wheat starch powders
(5 different ratios: 0,25,50,75,100% protein)
Different sample thickness, different sample packing
2 technical replicates
MSC:
Martens, H., Jensen, S.Å. and Geladi, P. (1983)
Multivariate linearity transformation for near-infrared reflectance spectrometry.
Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag
Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234.
Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for
near-infrared reflectance of meat.
Applied Spectroscopy, 39, 3, 491-500.
EMSC: H.Martens and E.Stark (1991)
Extended multiplicative signal correction and spectral interference
subtraction: New preprocessing methods for near infrared spectroscopy.
J.Pharmaceutical & Biomedical Analysis 9(8),625-635.
Problem: MSC regression model could not estimate the
offset and slope properly. Confused by large chemical
variations.
Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003)
Light Scattering and Light Absorbance Separated by Extended Multiplicative
Signal Correction. Application to Near-Infrared Transmission Analysis of
Powder Mixtures.
Anal. Chem. 75 (3) pp 394 – 404.
Really bad news
NIT of «difficult samples» :
Mixtures of wheat protein and wheat starch powders
(5 different ratios: 0,25,50,75,100% protein)
Different sample thickness, different sample packing
2 technical replicates
MSC:
Martens, H., Jensen, S.Å. and Geladi, P. (1983)
Multivariate linearity transformation for near-infrared reflectance spectrometry.
Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag
Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234.
Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for
near-infrared reflectance of meat.
Applied Spectroscopy, 39, 3, 491-500.
EMSC: H.Martens and E.Stark (1991)
Extended multiplicative signal correction and spectral interference subtraction: New
preprocessing methods for near infrared spectroscopy. J.Pharmaceutical & Biomedical Analysis
9(8),625-635.
Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003)
Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction.
Application to Near-Infrared Transmission Analysis of Powder Mixtures.
Anal. Chem. 75 (3) pp 394 – 404.
The solution, of course: EMSC with
known analyte spectra !
NIT of «difficult samples» :
Mixtures of wheat protein and wheat starch powders
(5 different ratios: 0,25,50,75,100% protein)
Different sample thickness, different sample packing
2 technical replicates
MSC:
Martens, H., Jensen, S.Å. and Geladi, P. (1983)
Multivariate linearity transformation for near-infrared reflectance spectrometry.
Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag
Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234.
Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for
near-infrared reflectance of meat.
Applied Spectroscopy, 39, 3, 491-500.
EMSC: H.Martens and E.Stark (1991)
Extended multiplicative signal correction and spectral interference subtraction: New
preprocessing methods for near infrared spectroscopy. J.Pharmaceutical & Biomedical Analysis
9(8),625-635.
Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003)
Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction.
Application to Near-Infrared Transmission Analysis of Powder Mixtures.
Anal. Chem. 75 (3) pp 394 – 404.
Using known constituent spectra in the EMSC model:
Difference spectrum between pure protein and pure starch
Good news, but needs extra info
Stabilize the preprocessing without any prior
knowledge
raw data
mean centred
0.5
3.2
0.4
3
3
0.3
2.8
0.2
2.8
0
O.D.
2.6
O.D.
O.D.
0.1
2.6
-0.1
2.4
2.4
-0.2
-0.3
2.2
2.2
-0.4
2
-0.5
2
850
900
950
Wavelength,nm
1000
850
900
After EMSC with wavelength and its square
950
Wavelength,nm
1000
2.45
2.5
2.55
Mean
2.45
2.5
2.55
Mean
2.6
2.65
mean centred
2.75
0.06
2.75
EMSC without known analyte spectra
(Morten Beck Rye):
2.7
0.05
2.7
2.65
0.03
2.6
0.02
2.55
2.6
O.D.
2.65
O.D.
No known constituent spectra used in the EMSC model.
Instead: Included known wavelength info:
( ,  2 )
0.04
0.01
2.55
0
2.5
2.5
-0.01
2.45
-0.02
2.4
-0.03
2.45
2.35
850
2.4
-0.04
900
950
Wavelength,nm
1000
850
900
950
Wavelength,nm
1000
Good news, but nonlinear
2.6
2.65
Optimized EMSC (OEMSC )
Init. AOpt=7,RMSEP=0.025424
Z Input , to be optimized for GoodC
GoodC spectrum, optimized
3.1
0.15
2.75
3
Cal.
subset
0.1
2.7
2.9
2.8
Initial EMSC
2.7
2.6
2.5
2.4
estimated
correction
spectrum
0.05
2.65
2.6
0
2.55
-0.05
2.5
-0.1
2.45
-0.15
2.3
2.2
2.1
2.4
10
20
30
40
50
60
70
80
90
100
-0.2
10
20
30
AOpt=9
40
50
60
70
80
90
100
RMSEP(protein)
0.04
2.6
2.55
2.5
0.02
RMSEPY
Optimized EMSC, centred
Optimized EMSC
2.65
0
-0.02
2.45
-0.04
2.4
30
40
50
60
70
OptMethod=1 for Gluten
30
80
90
100
10
20
30
40
50
60
70
OptMethod=1 for Gluten
40
50
60
70
80
90
100
nMSEPWgts=1
2.7
20
20
RMSEP(Y)=0.0022494
0.06
10
10
80
90
100
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
2
4
6
8
# of PCR PCs, k:=start.EMSC ro=Opt.EMSC
Estimate a correction spectrum in the EMSC model (in addition to  ,2 )
Simplex optimization of rmsep(protein)
10
Good news: linear
Optimize the EMSC model
automatically
The mean spectrum, m
Baseline 1,1,1,1,1,1…
2

Correction spectrum (optimized)
Optimize the EMSC model
automatically
EMSC:
a) Estimation: Project input each sample
spectrum on these model spectra.
b) Correction: Subtract baseline level and
- effects. Divide by m-effects.
The mean spectrum, m
Baseline 1,1,1,1,1,1…
2

Correction spectrum (optimized)
OEMSC:
1) Define V = First e.g. 5 PCA PCs of the
untreated sample spectra.
The correction spectrum will be a linear
combination tV by optimizing t(1 x 5).
2) Find optimal correction spectrum.
Here: Guess initial t. Then Simplex
optimization of t for a chosen criterion
(e.g. RMSEP(y) of leverage-corrected PCR).
Many alternative opt.criteria possible!
OEMSC model overfitted ?
Cal: After EMSC
Cal: input
3.1
2.7
3
2.9
2.65
2.6
OD
2.7
OD
Cal.
subset
2.8
2.6
2.55
2.5
2.5
2.4
2.3
2.45
2.2
2.4
2.1
860
880
900
920
940
960
wavelength
980
1000
1020
1040
860
880
900
920
940
960
wavelength
980
1000
1020
1040
980
1000
1020
1040
Pred: After EMSC
Pred: input
3
2.7
2.9
2.8
2.65
2.7
2.6
OD
2.6
OD
Test
subset
2.5
2.55
2.4
2.5
2.3
2.2
2.45
2.1
2.4
2
860
880
900
920
940
960
wavelength
980
1000
1020
1040
860
880
900
920
940
960
wavelength
Not overfitted
Optimized EMSC parameter: proportional to
analyte variable optimized for (protein)
Cal.
Multivariate calibration unnecessary?
Conventional EMSC:
Test
Does OEMSC also work for other data?
• Limited experience
• One example: protein in ground wheat from log(1/R)
• Water structure: which criterion to optimize?
Software
• EMSC:
• The Unscrambler (www.camo.com)
• PLS Toolbox (www.eigenvector.com)
• OEMSC:
• Matlab (preliminary) from [email protected]