Machine Learning Methods for ENSO Analysis and Predictions Carlos H. R. Lima - Dept. of Civil and Environmental Engineering, University of Brasilia. Brazil. [email protected] Upmanu Lall - Earth & Env. Eng., Columbia Water Center, Columbia University. New York, USA. [email protected] Tony Jebara - Computer Science, Columbia University, USA. [email protected] Anthony G. Barnston - International Research Institute for Climate and Society, Columbia University, USA. [email protected] Motivation • ENSO plays a vital role on the global climate variability; • ENSO forecasts are very limited for lead times beyond 6 months; Our goal: use machine learning methods for better understanding and improving ENSO forecasts. Nonlinear method of dimensionality reduction (MVU) to obtain covariates from a large climate data set (Tropical Pacific D20 data); LASSO regression to shrink model coefficients, since the number of predictors is large (> 50). Step 1: Maximum Variance Unfolding Maximum variance unfolding (MVU) was originally developed by Weinberger and Saul (2006) and has its origins on Kernel PCA, which uses a nonlinear mapping of the original data to a transformed which is expected to be linear. Using the kernel trick, dual PCA can be applied in this space to obtain a lowerdimensional system of the original data. MVU is a nonparametric approach, where the nonlinear function is not assumed a priori and the Kernel matrix is obtained from the original data by semidefinite progamming. The goal is to maximize the sum of the eigenvalues (trace) of a Kernel matrix while preserving neighbors in the original and transformed space. Mathematically, MVU can be expressed as : x i (x i ), i 1,..., N . D ENSO Correlation and the UNB/CWC ENSO Forecast Model Cross-correlation function of MVU (ticker lines) and PC (thin lines) and NINO34 Forecast Model Solution : Kernel trick do not need to compute NINO(t ) a a NINO(t ) the mapping explicitly , but only thedot producs. Idea : apply PCA in the space defined by (x i ) rather than X. However, can be huge. Temporal correlation of the D20 gridded data and MVU (left) and PC (left) modes: first, second and third from top to bottom. E.g. for (w ) w : K (w, z ) (w ) (z ) (w z ) 2 Hence, K ij (x i ) (x j ) Cross-correlation function of MVU (thicker lines) and PC (thin lines) modes with WWV 2 t b l t 24 ,l MVU 1 (l ) c ,l MVU 2 (l ) d ,l MVU 3 (l ) T 10-fold cross-validation: correlation and MSE skills for MVU (black) and PC(red) models 1982, 1997 and 2014 ENSO Events Question : given N high dimensiona l inputs xi , how can we compute outputsy i , D d where d D, such that nearby points remain nearby and distant ones remain distant? Basics of MVU NINO3.4 and WWV lagged by 9 months NINO3.4 and MVU2 lagged by 9 months Step 2: LASSO regression Basic idea: it shrinks the model coefficients by minimizing the sum of the mean squared error with a constraint on the sum of absolute values of the coefficients. NINO3.4 and MVU3 lagged by 12 months Climate Dataset Here we extend previous work (Lima et al., 2009) and apply MVU to the new and updated NOAA/NCEP GODAS sub-surface ocean dataset. We focus on the depth of the 200C isotherm of the tropical Pacific ocean, which is a proxy for the thermocline depth and one of the main carriers of ENSO information. Details: We restrict our analysis to the Pacific D20 along the latitudinal and longitudinal bands bounded by 26N and 28S and 122E and 77W, respectively. The dataset covers the period from January/1980 through May/2014 and consists of 21001 data points located in an equally-spaced grid cell. . Results: Themocline Modes of Variability MODE PC1 PC2 PC3 Var exp MVU1 0.75 -0.51 -0.26 57% MVU2 0.46 0.51 -0.04 15% MVU3 0.06 -0.25 0.50 8% Var exp 24% 16% 7% 80% 47% NINO3.4 predictions from MAR/2014 NINO3.4 predictions from SEP/2014 Summary and Future Work • More variance explained by MVU modes, more amplitude and less cycles; • Monotonic incresing trend in the first MVU: trend in the thermocline tilt? • Patterns of second and third MVU different from those equivalent PCs and more correlated with NINO3.4; • LASSO forecast model shrinks coefficients and shows appreciable skills up to 15 month lead time; • Future work will explore forecasts for other ENSO indices as well as for the thermocline/SST fields and other ENSO related variables. Acknowledgment We thank IRI for providing the climate datasets and K. Q.Weinberger for making his MVU code available. The first author acknowledges the financial support from Colorado State University and ORAU to attend Climate Informatics 2014. References MVU (thicker lines) and PC (thin lines) modes for the thermocline data. Wavelet analysis of MVU (left) and PC (right) modes 1 (top) to 3 (bottom). Correlation and variance explained (top), Kendall’s tau (middle) for temporal trends and WWV series (bottom). • Lima, C. H. R., Lall, U., Jebara, T., Barnston, A. G., 2009. Statistical Prediction of ENSO from Subsurface Sea Temperature Using a Nonlinear Dimensionality Reduction. J. Climate 22, 4501–4519. • Weinberger, K. Q., Saul, L., 2006. Unsupervised Learning of Image Manifolds by Semidefinite Programming. Int. J. Comp. Vision 70 (1), 77–90.
© Copyright 2024 ExpyDoc