Multivariate data-modelling to separate light scattering from light absorbance: The Optimized Extended Multiplicative Signal Correction OEMSC The importance of spectral preprocessing in data analysis for aquaphotomics Harald Martens Dept. Engineering Cybernetics, Norwegian U. of Science and Technology, Trondheim, Norway. Email: [email protected] When studying complex systems, vis/NIR spectoscopy is informative. Many causes of variation: Light scattering, path length, nonlinear responses, organic absorbers, temperature effects, water concentration, water changes,… Example: Belusov-Zhabotinsky reaction. Anna Zhyrova, Dalibor Stys, U. South Bohemia: 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 When studying complex systems, vis/NIR spectoscopy is informative. Many causes of variation: Light scattering, path length, nonlinear responses, organic absorbers, temperature effects, water concentration, water changes,… Example: Belusov-Zhabotinsky reaction. Anna Zhyrova, Dalibor Stys, U. South Bohemia: 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 When studying complex systems, vis/NIR spectoscopy is informative. Many causes of variation: Light scattering, path length, nonlinear responses, organic absorbers, temperature effects, water concentration, water changes,… Example: Belusov-Zhabotinsky reaction. Anna Zhyrova, Dalibor Stys, U. South Bohemia: 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 NIT of «difficult samples» : Mixtures of wheat protein and wheat starch powders (5 different ratios: 0,25,50,75,100% protein) Different sample thickness, different sample packing 2 technical replicates MSC: Martens, H., Jensen, S.Å. and Geladi, P. (1983) Multivariate linearity transformation for near-infrared reflectance spectrometry. Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234. Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for near-infrared reflectance of meat. Applied Spectroscopy, 39, 3, 491-500. EMSC: H.Martens and E.Stark (1991) Extended multiplicative signal correction and spectral interference subtraction: New preprocessing methods for near infrared spectroscopy. Go to Multivariate calibration? J.Pharmaceutical & Biomedical Analysis 9(8),625-635. Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003) Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction. Application to Near-Infrared Transmission Analysis of Powder Mixtures. Anal. Chem. 75 (3) pp 394 – 404. Bad news NIT of «difficult samples» : Mixtures of wheat protein and wheat starch powders (5 different ratios: 0,25,50,75,100% protein) Different sample thickness, different sample packing 2 technical replicates MSC: Martens, H., Jensen, S.Å. and Geladi, P. (1983) Multivariate linearity transformation for near-infrared reflectance spectrometry. Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234. PLS regression: Imperfect. Needs 5 PCs. Should need only 1 PC Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for near-infrared reflectance of meat. Applied Spectroscopy, 39, 3, 491-500. EMSC: H.Martens and E.Stark (1991) Extended multiplicative signal correction and spectral interference subtraction: New preprocessing methods for near infrared spectroscopy. J.Pharmaceutical & Biomedical Analysis 9(8),625-635. Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003) Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction. Application to Near-Infrared Transmission Analysis of Powder Mixtures. Anal. Chem. 75 (3) pp 394 – 404. Bad news NIT of «difficult samples» : Mixtures of wheat protein and wheat starch powders (5 different ratios: 0,25,50,75,100% protein) Different sample thickness, different sample packing 2 technical replicates MSC: Martens, H., Jensen, S.Å. and Geladi, P. (1983) Multivariate linearity transformation for near-infrared reflectance spectrometry. Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234. Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for near-infrared reflectance of meat. Applied Spectroscopy, 39, 3, 491-500. EMSC: H.Martens and E.Stark (1991) Extended multiplicative signal correction and spectral interference subtraction: New preprocessing methods for near infrared spectroscopy. J.Pharmaceutical & Biomedical Analysis 9(8),625-635. Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003) Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction. Application to Near-Infrared Transmission Analysis of Powder Mixtures. Anal. Chem. 75 (3) pp 394 – 404. The solution, of course: MSC or SNV! NIT of «difficult samples» : Mixtures of wheat protein and wheat starch powders (5 different ratios: 0,25,50,75,100% protein) Different sample thickness, different sample packing 2 technical replicates MSC: Martens, H., Jensen, S.Å. and Geladi, P. (1983) Multivariate linearity transformation for near-infrared reflectance spectrometry. Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234. Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for near-infrared reflectance of meat. Applied Spectroscopy, 39, 3, 491-500. EMSC: H.Martens and E.Stark (1991) Extended multiplicative signal correction and spectral interference subtraction: New preprocessing methods for near infrared spectroscopy. J.Pharmaceutical & Biomedical Analysis 9(8),625-635. Problem: MSC regression model could not estimate the offset and slope properly. Confused by large chemical variations. Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003) Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction. Application to Near-Infrared Transmission Analysis of Powder Mixtures. Anal. Chem. 75 (3) pp 394 – 404. Really bad news NIT of «difficult samples» : Mixtures of wheat protein and wheat starch powders (5 different ratios: 0,25,50,75,100% protein) Different sample thickness, different sample packing 2 technical replicates MSC: Martens, H., Jensen, S.Å. and Geladi, P. (1983) Multivariate linearity transformation for near-infrared reflectance spectrometry. Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234. Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for near-infrared reflectance of meat. Applied Spectroscopy, 39, 3, 491-500. EMSC: H.Martens and E.Stark (1991) Extended multiplicative signal correction and spectral interference subtraction: New preprocessing methods for near infrared spectroscopy. J.Pharmaceutical & Biomedical Analysis 9(8),625-635. Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003) Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction. Application to Near-Infrared Transmission Analysis of Powder Mixtures. Anal. Chem. 75 (3) pp 394 – 404. The solution, of course: EMSC with known analyte spectra ! NIT of «difficult samples» : Mixtures of wheat protein and wheat starch powders (5 different ratios: 0,25,50,75,100% protein) Different sample thickness, different sample packing 2 technical replicates MSC: Martens, H., Jensen, S.Å. and Geladi, P. (1983) Multivariate linearity transformation for near-infrared reflectance spectrometry. Proc. Nordic Symp. on Applied Statistics, (O.H.J. Christie, ed.) June 12.-14. 1983. Stokkand Forlag Publ., Skagenkaien 12, N-4000 Stavanger, Norway, ISBN 82-90496-02-8, 208-234. Geladi, P., MacDougall, D. and Martens, H. (1985): Linearization and scatter-correction for near-infrared reflectance of meat. Applied Spectroscopy, 39, 3, 491-500. EMSC: H.Martens and E.Stark (1991) Extended multiplicative signal correction and spectral interference subtraction: New preprocessing methods for near infrared spectroscopy. J.Pharmaceutical & Biomedical Analysis 9(8),625-635. Martens, H., Pram Nielsen, J. and Balling Engelsen, S (2003) Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction. Application to Near-Infrared Transmission Analysis of Powder Mixtures. Anal. Chem. 75 (3) pp 394 – 404. Using known constituent spectra in the EMSC model: Difference spectrum between pure protein and pure starch Good news, but needs extra info Stabilize the preprocessing without any prior knowledge raw data mean centred 0.5 3.2 0.4 3 3 0.3 2.8 0.2 2.8 0 O.D. 2.6 O.D. O.D. 0.1 2.6 -0.1 2.4 2.4 -0.2 -0.3 2.2 2.2 -0.4 2 -0.5 2 850 900 950 Wavelength,nm 1000 850 900 After EMSC with wavelength and its square 950 Wavelength,nm 1000 2.45 2.5 2.55 Mean 2.45 2.5 2.55 Mean 2.6 2.65 mean centred 2.75 0.06 2.75 EMSC without known analyte spectra (Morten Beck Rye): 2.7 0.05 2.7 2.65 0.03 2.6 0.02 2.55 2.6 O.D. 2.65 O.D. No known constituent spectra used in the EMSC model. Instead: Included known wavelength info: ( , 2 ) 0.04 0.01 2.55 0 2.5 2.5 -0.01 2.45 -0.02 2.4 -0.03 2.45 2.35 850 2.4 -0.04 900 950 Wavelength,nm 1000 850 900 950 Wavelength,nm 1000 Good news, but nonlinear 2.6 2.65 Optimized EMSC (OEMSC ) Init. AOpt=7,RMSEP=0.025424 Z Input , to be optimized for GoodC GoodC spectrum, optimized 3.1 0.15 2.75 3 Cal. subset 0.1 2.7 2.9 2.8 Initial EMSC 2.7 2.6 2.5 2.4 estimated correction spectrum 0.05 2.65 2.6 0 2.55 -0.05 2.5 -0.1 2.45 -0.15 2.3 2.2 2.1 2.4 10 20 30 40 50 60 70 80 90 100 -0.2 10 20 30 AOpt=9 40 50 60 70 80 90 100 RMSEP(protein) 0.04 2.6 2.55 2.5 0.02 RMSEPY Optimized EMSC, centred Optimized EMSC 2.65 0 -0.02 2.45 -0.04 2.4 30 40 50 60 70 OptMethod=1 for Gluten 30 80 90 100 10 20 30 40 50 60 70 OptMethod=1 for Gluten 40 50 60 70 80 90 100 nMSEPWgts=1 2.7 20 20 RMSEP(Y)=0.0022494 0.06 10 10 80 90 100 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 2 4 6 8 # of PCR PCs, k:=start.EMSC ro=Opt.EMSC Estimate a correction spectrum in the EMSC model (in addition to ,2 ) Simplex optimization of rmsep(protein) 10 Good news: linear Optimize the EMSC model automatically The mean spectrum, m Baseline 1,1,1,1,1,1… 2 Correction spectrum (optimized) Optimize the EMSC model automatically EMSC: a) Estimation: Project input each sample spectrum on these model spectra. b) Correction: Subtract baseline level and - effects. Divide by m-effects. The mean spectrum, m Baseline 1,1,1,1,1,1… 2 Correction spectrum (optimized) OEMSC: 1) Define V = First e.g. 5 PCA PCs of the untreated sample spectra. The correction spectrum will be a linear combination tV by optimizing t(1 x 5). 2) Find optimal correction spectrum. Here: Guess initial t. Then Simplex optimization of t for a chosen criterion (e.g. RMSEP(y) of leverage-corrected PCR). Many alternative opt.criteria possible! OEMSC model overfitted ? Cal: After EMSC Cal: input 3.1 2.7 3 2.9 2.65 2.6 OD 2.7 OD Cal. subset 2.8 2.6 2.55 2.5 2.5 2.4 2.3 2.45 2.2 2.4 2.1 860 880 900 920 940 960 wavelength 980 1000 1020 1040 860 880 900 920 940 960 wavelength 980 1000 1020 1040 980 1000 1020 1040 Pred: After EMSC Pred: input 3 2.7 2.9 2.8 2.65 2.7 2.6 OD 2.6 OD Test subset 2.5 2.55 2.4 2.5 2.3 2.2 2.45 2.1 2.4 2 860 880 900 920 940 960 wavelength 980 1000 1020 1040 860 880 900 920 940 960 wavelength Not overfitted Optimized EMSC parameter: proportional to analyte variable optimized for (protein) Cal. Multivariate calibration unnecessary? Conventional EMSC: Test Does OEMSC also work for other data? • Limited experience • One example: protein in ground wheat from log(1/R) • Water structure: which criterion to optimize? Software • EMSC: • The Unscrambler (www.camo.com) • PLS Toolbox (www.eigenvector.com) • OEMSC: • Matlab (preliminary) from [email protected]
© Copyright 2025 ExpyDoc