E XPLICIT L INK B ETWEEN P ERIODIC C OVARIANCE F UNCTIONS AND S TATE S PACE M ODELS ¨ a¨ Arno Solin and Simo Sarkk Department of Biomedical Engineering and Computational Science Aalto University, Finland [email protected] April 22, 2014 I Gaussian processes and state space models I Periodic covariance functions I Quasi-periodic covariance functions I Example studies Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 2/26 Gaussian Process Regression in 1D I The kernel approach: f (t) ∼ GP(0, k (t, t 0 )) yk = f (tk ) + εk , εk ∼ N (0, σn2 ), where the observed data is {(tk , yk ), k = 1, 2, . . . , n}. I Prior assumptions of the process encoded into the covariance function k (t, t 0 ). I Can be solved in closed-form, but the naive solution scales as O(n3 ). Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 3/26 Gaussian Process Regression in 1D I The kernel approach: f (t) ∼ GP(0, k (t, t 0 )) yk = f (tk ) + εk , εk ∼ N (0, σn2 ), where the observed data is {(tk , yk ), k = 1, 2, . . . , n}. I State space approach: df(t) = Ff(t) + Lw(t) dt yk = Hf(tk ) + εk , εk ∼ N (0, σn2 ), where w(t) is a white noise process with spectral density Qc . I Prior assumptions of the process I Model defined by F, L, Qc , the I Can be solved in closed-form, I Solved using Kalman filtering encoded into the covariance function k (t, t 0 ). but the naive solution scales as O(n3 ). stationary covariance P∞ , and the observation model H. and Rauch–Tung–Striebel smoothing in O(n) time complexity. Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 3/26 Covariance and Spectral Density k (τ ) = −4 −2 0 1 2π 2 Z ∞ S(ω) exp(−i ωτ ) dω. −∞ 4 −5 0 τ = |t − t 0 | ω Covariance function k (τ ) Spectral density S(ω) 5 Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 4/26 GP Regression (naive solution) 3 Data Mean 95% confidence interval 2 Output, y 1 0 −1 −2 −3 −1 0 Input, t 1 Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 5/26 GP Regression (filtering) 3 Data Mean 95% confidence interval 2 Output, y 1 0 −1 −2 −3 −1 0 Input, t 1 Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 6/26 GP Regression (smoothing) 3 Data Mean 95% confidence interval 2 Output, y 1 0 −1 −2 −3 −1 0 Input, t 1 Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 7/26 Periodic Covariance Functions Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 8/26 Canonical Periodic Covariance Function Start off with the squared exponential: kx − x0 k2 k(x, x0 ) = σ 2 exp − 2`2 Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 9/26 Canonical Periodic Covariance Function Start off with the squared exponential: kx − x0 k2 k(x, x0 ) = σ 2 exp − 2`2 Polar coordinates: cos(ω0 t) x(t) = sin(ω0 t) Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 9/26 Canonical Periodic Covariance Function 1 π 2 Start off with the squared exponential: kx − x0 k2 k(x, x0 ) = σ 2 exp − 2`2 Polar coordinates: cos(ω0 t) x(t) = sin(ω0 t) 3 π 4 1 π 4 π The canonical periodic covariance: 0 2 sin2 ω0 t−t 2 0 2 kp (t, t ) = σ exp− `2 −2σ 0 − 43 π 2σ 0 − 14 π − 21 π Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 9/26 Canonical Periodic Covariance Function −6π −4π −2π 0 2π |t − t 0 | Covariance function 4π 6π −6 −4 −2 0 2 4 6 ω Spectral density Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 10/26 State Space Formulation (1/2) I Fourier series representation (τ = |t − t 0 |) kp (τ ) = ∞ X qj2 cos(j ω0 τ ) j=0 Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 11/26 State Space Formulation (1/2) I Fourier series representation (τ = |t − t 0 |) kp (τ ) = ∞ X qj2 cos(j ω0 τ ) j=0 I The model can be constructed as solutions to second-order ODEs: x˙ j (t) 0 = y˙ j (t) ω0 j −ω0 j 0 xj (t) yj (t) Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 11/26 State Space Formulation (2/2) I The state space model can be given as a superposition the following kind of models: Fpj 0 ω0 j = −ω0 j 0 , Hpj = 1 0 , Pp∞,j = qj2 I2 The diffusion part is zero (i.e. the model is deterministic). Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 12/26 State Space Formulation (2/2) I The state space model can be given as a superposition the following kind of models: Fpj 0 ω0 j = −ω0 j 0 , Hpj = 1 0 , Pp∞,j = qj2 I2 The diffusion part is zero (i.e. the model is deterministic). I The spectral (variance) coefficients are qj2 = 2 Ij (`−2 ) , for j = 1, 2, . . . , exp(`−2 ) and q02 = I0 (`−2 )/ exp(`−2 ), where Iα (z) is the modified Bessel function. Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 12/26 State Space Formulation (2/2) I The state space model can be given as a superposition the following kind of models: Fpj 0 ω0 j = −ω0 j 0 , Hpj = 1 0 , Pp∞,j = qj2 I2 The diffusion part is zero (i.e. the model is deterministic). I The spectral (variance) coefficients are qj2 = 2 Ij (`−2 ) , for j = 1, 2, . . . , exp(`−2 ) and q02 = I0 (`−2 )/ exp(`−2 ), where Iα (z) is the modified Bessel function. I Taking the J first terms in the series gives an approximation, and this approximation converges uniformly to the actual covariance as J → ∞. Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 12/26 Approximative Covariance Function (J = 0) −6π −4π −2π 0 2π |t − t 0 | Covariance function 4π 6π −6 −4 −2 0 2 4 6 ω Spectral density Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 13/26 Approximative Covariance Function (J = 1) −6π −4π −2π 0 2π |t − t 0 | Covariance function 4π 6π −6 −4 −2 0 2 4 6 ω Spectral density Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 14/26 Approximative Covariance Function (J = 2) −6π −4π −2π 0 2π |t − t 0 | Covariance function 4π 6π −6 −4 −2 0 2 4 6 ω Spectral density Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 15/26 Approximative Covariance Function (J = 3) −6π −4π −2π 0 2π |t − t 0 | Covariance function 4π 6π −6 −4 −2 0 2 4 6 ω Spectral density Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 16/26 Quasi-Periodic Covariance Functions Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 17/26 Quasi-Periodic Covariance I Allow the shape of the periodic effect to change over time. I Take the product of a periodic covariance function kp (t, t 0 ) with a covariance function kq (t, t 0 ) with rather long characteristic length-scale, k(t, t 0 ) = kp (t, t 0 ) kq (t, t 0 ), allowing the covariance to decay away from exact periodicity. Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 18/26 Quasi-Periodic Covariance Function −6π −4π −2π 0 2π |t − t 0 | Covariance function 4π 6π −6 −4 −2 0 2 4 6 ω Spectral density Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 19/26 Quasi-Periodic Covariances in State Space Form I State space representation of both the ‘quasi’ and ‘periodic’ part. I The quasi-periodic state space model needs to be set up so that the feedback matrices commute (Fp Fq = Fq Fp .). I This can be accomplished by using the special features of the Kronecker product. Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 20/26 Quasi-Periodic Covariances in State Space Form I State space representation of both the ‘quasi’ and ‘periodic’ part. I The quasi-periodic state space model needs to be set up so that the feedback matrices commute (Fp Fq = Fq Fp .). I This can be accomplished by using the special features of the Kronecker product. I The joint model corresponding to the quasi-periodic product of the two covariance functions can then be given in a block-form: Fj Lj Qc,j P∞,j Hj = = = = = Fq ⊗ I2 + Iq ⊗ Fpj , Lq ⊗ Lpj , Qqc ⊗ qj2 I2 , Pq∞ ⊗ Pp∞,j , Hq ⊗ Hpj , where ‘ ⊗ ’ denotes the Kronecker product of two matrices. Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 20/26 Example Studies Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 21/26 Example: Computational Complexity 102 CO2 concentration (ppm) Computation time (seconds) Arno Solin and Simo S¨ arkk¨ a Full GP solution 101 100 State space solution 10−1 10−2 0 2000 4000 6000 8000 Number of data points, n 10 000 410 400 390 380 370 2000 Figure 5: Demonstration of the computational benefits Figure 6: CO of the state space model in solving a GP regression values for ye problem for a number of data points up to 10 000 and gether with th Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ with ten repetitions. The state space model execution shaded patch 22/26 times grow exactly linearly. thin lines from Example: Mauna Loa CO2 Concentration I CO2 concentration observations CO2 concentration (ppm) and Simo S¨ arkk¨ a 410 (n = 2227, years 1958–2000 not shown in figure). 400 I The GP covariance function is as follows: 390 k(t, t 0 ) = kSE (t, t 0 ) 380 + kp (t, t 0 ) kν=3/2 (t, t 0 ) + kν=3/2 (t, t 0 ) 370 2000 000 2005 2010 Time (years) 2015 2020 s n d n Figure 6: CO2 concentration observations (n = 2227, values for years 1958–2000 not shown in figure) together with the 95% predictive confidence region (the shaded patch is from the state space model, and the thin lines from the exact GP solution). s s e 4.2 Modeling Carbon Dioxide Concentration In this section the method is applied to the well-known I Converted to state space and hyperparameters optimized with respect to marginal likelihood. Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 23/26 Example: Daily Births in 1969–1988 Explicit Link Between Periodic Covariance Functions and State Space Models Trends 110 100 90 Slow trend 80 Seasonal effect 1970 1973 Fast non-periodic component 1976 1979 1982 1985 1988 110 100 90 1972 80 1980 1988 Day of week effect Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 110 100 90 1972 80 Mon Tue 1980 Wed 1988 Thu Fri Sat Sun Figure 7: Relative number of births in the US based on daily data between 1969–1988 (n = 7305). The first plot shows the non-periodic long-term effects, the two latter the quasi-periodic seasonal and weekly effects. the region corresponding to the full GP solution. The approximation error is negligible even though both the squared exponential and the periodic covariance func- a quasi-periodic covariance function with a period of I number of inern the one week Relative (J = 6, hyperparameters σ52 , `births 5 ) and Mat´ (ν = 3/2,based hyperparameter `6 ) damping. This is simon daily data between ilar to [19], but special days are not considered sepa1969–1988 = 7305). rately. The observations are(n assumed to be corrupted by Gaussian noise with variance σn2 . US I The model is as follows: Optimizing (quasi-Newton BFGS) the marginal likeliI Matern (ν 11 = 7/2) for the slow trend hood with respect to all the hyperparameters gives I are Matern (ν in=Figure 3/2) for variation the results that shown 7. faster All the plots I Quasi-periodic (yearly) with have been scaled in the same way to show differences Matern (ν = 3/2) relative to a baseline of 100. The firstdamping subfigure shows Quasi-periodic (weekly) with the slow trendIover the 20-year period and the faster Matern (νThe = 3/2) damping subfignon-periodic component. two remaining ures visualize the periodic yearly and weekly effects I Converted state space for years 1972, 1980, and to 1988. The day of weekand and seasonal hyperparameters effects are clearly quasi-periodic; the rising optimized with number of induced births and selective C-sections has respect to effect. marginal likelihood affected the day of week The results agree with those of [19], and this can be regarded a successful example of a beneficial reformulation of a GP model in terms of sequential inference. 5 CONCLUSION This paper has established the explicit connection between periodic covariance functions and stochastic dif-state space models Explicit link between periodic covariance functions and ¨ a¨ ferential equations. This link enables the use of effi- Solin and Sarkk 24/26 cient sequential inference methods to solve periodic Conclusion I We have established the explicit connection between periodic covariance functions and state space models. I This link enables the use of efficient sequential inference methods to solve periodic GP regression problems in O(n) time complexity. I The approximation converges uniformly and a rough upper bound for the error can be given in closed-form. I This is a ‘best of both worlds’ approach; it brings together the convenient model specification and hyperparametrization of GPs with the computational efficiency of state space models. Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 25/26 Explicit Link Between Periodic Covariance Functions and State Space Models ¨ a¨ Arno Solin Simo Sarkk Department of Biomedical Engineering and Computational Science Aalto University, Finland P ERIODIC C OVARIANCE F UNCTIONS I The canonical periodic covariance function: 0 2 sin2 ω0 t−t 2 kp(t, t 0) = exp − , `2 where ` is the characteristic length-scale and ω0 defines the angular velocity (period length). I The covariance function can be expanded into a (almost everywhere) convergent Fourier series (τ = |t − t 0|) ∞ X kp(τ ) = qj2 cos(j ω0 τ ). I NTRODUCTION I Gaussian processes (GPs, [2]) are commonly used modeling tools in non-parametric machine learning. I Prior assumptions are encoded into the covariance function (kernel). I We show that periodic covariance functions in GP regression can be rewritten as state space models. I Reduces the problematic O(n3) computational complexity to O(n) in the number of observations n. I The model is written in terms of a series of stochastic resonators. I Generalizes to quasi-periodic (almost periodic) covariance functions. j=0 I The differential equation model is a superposition the following kind of models [1]: 0 −ω0 j Fpj = , ω0 j 0 and the diffusion part is zero (i.e. the model is deterministic), Hpj = 1 0 , and Pp∞,j = qj2I2. I The spectral (variance) coefficients qj2 are given by qj2 = G AUSSIAN P ROCESS R EGRESSION 2 Ij (`−2) , for j = 1, 2, . . . , exp(`−2) R EFERENCES I It is often desirable to allow for seasonable periodic variation, allowing the shape of the periodic effect to change over time. I A common way of constructing quasi-periodic covariances is to take the product of a periodic covariance function kp(t, t 0) with a covariance function kq(t, t 0) with rather long characteristic length-scale, df(t) = Ff(t) + Lw(t) dt yk = Hf(tk ) + εk , εk ∼ N (0, σn2), http://arno.solin.fi I An example implementation is available on the author web page: http://becs.aalto.fi/~asolin/ I The method is also a part of the GP STUFF toolbox for Matlab/Octave. Q UASI -P ERIODIC C OVARIANCE F UNCTIONS I Certain classes of covariance functions allow to work with the mathematical dual, where the Gaussian process is constructed as a solution to a mth order linear stochastic differential equation (SDE). I State space representation: The GP regression problem can also be given as: I Codes for the examples available at: E XAMPLE I MPLEMENTATION and q02 = I0(`−2)/ exp(`−2), where Iα(z) is the modified Bessel function. I Taking the J first terms in the series gives an approximation, and this approximation converges uniformly [1] to the actual covariance as J → ∞. I Kernel representation: In GP regression the model functions f are assumed to be realizations from a GP prior, and the observations yk , k = 1, 2, . . . , n, corrupted by Gaussian noise: f (t) ∼ GP(0, k (t, t 0)) yk = f (tk ) + εk , εk ∼ N (0, σn2) C ONCLUSIONS I We have established the explicit connection between periodic covariance functions and state space models. I This link enables the use of efficient sequential inference methods to solve periodic GP regression problems in O(n) time complexity. I The approximation converges uniformly and a rough upper bound for the error can be given in closed-form. I This is a ‘best of both worlds’ approach; it brings together the convenient model specification and hyperparametrization of GPs with the computational efficiency of state space models. ¨ a¨ (2014). “Explicit link between periodic [1] A. Solin and S. Sarkk covariance functions and state space models.” Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR W&CP vol. 33. [2] C.E. Rasmussen and C.K.I. Williams (2006). “Gaussian Processes for Machine Learning.” The MIT Press. ¨ a¨ (2013). “Bayesin Filtering and Smoothing.” [3] S. Sarkk Cambridge University Press. [4] A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, and D.B. Rubin. (2013). “Bayesian Data Analysis.” Third edition. Chapman & Hall/CRC Press. k (t, t 0) = kp(t, t 0) kq(t, t 0), where w(t) is a multi-dimensional white noise process with spectral density Qc. I The model is defined by the feedback matrix F, the noise effect matrix L, the spectral density Qc, the stationary covariance P∞, and the observation model H. I The inference problem can now be solved using Kalman filtering [3] in O(nm3) time complexity. allowing the covariance to decay away from exact periodicity. I The joint model corresponding to the quasi-periodic product of the two covariance functions can then be given [1] in a block-form: I The methods also implemented into the Re yk jav ik, Icel and. Fj = Fq ⊗ I2 + Iq ⊗ Fpj , Lj = Lq ⊗ Lpj , Qc,j = Qqc ⊗ qj2 I2, P∞,j = Pq∞ ⊗ Pp∞,j , Hj = Hq ⊗ Hpj , where ‘⊗’ denotes the Kronecker product of two matrices. , 14 Th rdinates. 20 polar coo es d in TS e TA lize cu AIS ua rv vis es nd ar er ra Con Explicit Link Between Periodic Covariance Functions and State Space Models tact rio an E XAMPLE S TUDY deta dom Pp ils: ic G draw I A simulated example showing the arn s from a period a quasi-periodic covariance function with a period of 110 o.so i computational efficiency. lin.f lin@ one week (J = 6, hyperparameters σ52 , `5 ) and Mat´ern o.so aalto. I Prediction of CO2 levels using weekly data 100 fi, web address: http://arn (ν = 3/2, hyperparameter `6 ) damping. This is simTrends D EMONSTRATIONS (see [2]), where we compare the approximation to the full GP result. I Explaining the periodic variation in the number of births per day in the US (see [4]). 0 100 410 400 390 State space solution State space solution 10−1 10−2 380 370 10 000 0 4000 20006000 40008000 600010 0008000 2000 Number of dataNumber points, nof data points, n F ULL GP 410 400 1973 Fast non-periodic component 1976 1979 1982 1985 1988 110 100 90 1972 80 1980 1988 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 390 380 370 2005 2000 2010 2005 Time (years) 2015 2010 2020 2015 Time (years) 2020 Figure 5: Demonstration 5: Demonstration of the computational benefits benefits 6: CO concentration 6: CO2 concentration observations (n observations 2227,values (nfor = years 2227, 2Figure Demonstration ofFigure the computational benefitsofofthe thecomputational state space Figure CO observations (n ==2227, 2 concentration of thein state of themodel state in space solving model aforGP in solving regression GP values regression for years values 1958–2000 fornot years not1958–2000 shown intogether not figure) shown to- in model solvingspace a GP regression problem a number ofa data 1958–2000 shown in figure) with thefigure) 95% toproblem problem number of fordata a number points of updata to 10 points 000space and up to 10 000 and points up tofor 10a000 and with ten repetitions. The state predictive confidence region (the region shaded patch isregion from the state gether with the gether 95% with predictive the 95% confidence predictive confidence (the (the model execution timesten grow exactly linearly. model, thin the lines from and the exact GP solution). with ten repetitions. with The repetitions. state space The model stateexecution space model shaded execution patchspace shaded is from patch theand state isthe from space model, state space the model, and the times grow exactly times linearly. grow exactly linearly. thin lines fromthin the lines exactfrom GP the solution). exact GP solution). The approximation The approximation still converges still uniformly converges as long uniformly as as long as 4.2 Modeling 4.2 Carbon Modeling Dioxide Carbon Concentration Dioxide Concentration kq (0) < ∞ andkqthe (0) approximation < ∞ and the approximation for kp (τ ) converges for kp (τ ) converges In this section In thethis method section is applied the method to the is applied well-known to the well-known uniformly. The uniformly. matrices The (32) matrices result in (32) a very result sparse in a very sparse 1 series data time consisting series data of1 atmospheric consisting ofCO atmospheric CO2 conmodel, and sparse model, matrix and sparse methods matrix can be methods employed can be time employed 2 concentration readings centration in parts readings per million in parts (ppm) per million by vol- (ppm) by volfor the matrixfor exponentials the matrixand exponentials multiplication. and multiplication. ume from air samples ume from collected air samples at the collected Mauna at Loa the obMauna Loa obFigure 3b shows Figure the 3b quasi-periodic shows the quasi-periodic covariance funccovariance funcservatory, Hawaii servatory, (see, e.g., Hawaii [1]).(see, The e.g., observations [1]). Theare observations are tion corresponding tion corresponding to squared exponential to squareddamping exponential damping monthly from monthly 1958 to May from 1974, 1958 to after May which 1974,the after ob- which the ob(ν → ∞) of the (ν periodicity → ∞) of the (the periodicity squared exponential (the squared exponential servations are servations weekly, resulting are weekly, in 2227 resulting measurements in 2227 measurements covariance is represented covariance is byrepresented the dashed by line), theand dashed Fig-line), and Figaltogether. Data altogether. collectedData aftercollected year 2010 after were year re-2010 were reure 4b shows draws ure 4bfrom shows thedraws corresponding from the corresponding prior. prior. tained for validation. tained for validation. In modeling practical GP problems modeling it isproblems common itto is common to EXPERIMENTAL 4 EXPERIMENTAL RESULTS RESULTSIn practical GP combine several combine simpleseveral covariance simple functions covariance in order functions in order to come up with to come a model up with structure a model thatstructure meets the that meets the In this section In the thiscomputational section the computational efficiency of the efficiency of the requirements of requirements the phenomenon. of the phenomenon. The following rather The following rather method is firstmethod demonstrated is first demonstrated by applying itbytoapplying simit to simsimplified model simplified is considered model for is considered the covariance: for the covariance: ulated data, after ulated which data, two after empirical which two sets empirical of data aresets of data are used to show that usedthe to show method thatis the feasible method in real-world is feasible in real-world k(τ ) = k1 (τ ) +k(τ k2)(τ=) kk31(τ (τ)) + + kk42(τ (τ), ) k3 (τ )(33) + k4 (τ ), (33) applications. applications. 4 where k1 (·) is where a squared k1 (·)exponential is a squaredcovariance exponential funccovariance funcDemonstrating 4.1 Demonstrating the Computational the Computational tion for the slow tion rising for the trend slow (hyperparameters rising trend (hyperparameters σ12 , `1 ), σ12 , `1 ), Efficiency Efficiency k2 (·) the canonical k2 (·) periodic the canonical covariance periodic function covariance with afunction with a period of one year period (hyperparameters of one year (hyperparameters σ22 , `2 ), k3 (·) isσ2a2 , `2 ), k3 (·) is a To illustrate the To efficiency illustrate of thethe efficiency proposed of model, the proposed let model, let covariance function covariance of thefunction Mat´ernofclass the with Mat´eνrn=class 3/2 with ν = 3/2 f (t) be a Gaussian f (t) be process a Gaussian simulated process fromsimulated a GP prior from a GP prior (hyperparameter (hyperparameter `3 ), and k4 (·) is`3a),covariance and k4 (·) isfunction a covariance function with a periodicwith covariance a periodic function covariance with function unit paramewith unit parameof the Mat´ernofclass the with Mat´ern ν = class 3/2with (hyperparameters ν = 3/2 (hyperparameters ters. The stateters. space The solution state space is benchmarked solution is against benchmarked2 against σ4 , `4 ). The observations σ42 , `4 ). Theare observations assumed toare beassumed corrupted to be corrupted a naive GP implementation a naive GP implementation in Mathworks in Matlab Mathworks (im- Matlab (imby Gaussian noise by Gaussian with variance noise with σn2 . variance σn2 . plemented as in plemented [1] using the as inCholesky [1] usingdecomposition). the Cholesky decomposition). Maximizing the Maximizing marginal the likelihood marginal (quasi-Newton likelihood (quasi-Newton Figure 5 showsFigure the results 5 shows for the simulated results GP for simulated regression GP regression BFGS) with respect BFGS)towith the respect hyperparameters to the hyperparameters and preand preproblems withproblems the number withofthe observations number ofranging observations up ranging up dicting 20 years dicting forward 20 years gives forward the results givesthat the are results that are to n = 10 000 and to n with = 10ten 000repetitions and with ten each. repetitions The perieach. The perishown in Figure shown 6. The in Figure predictive 6. The 95%predictive confidence 95% re- confidence reodic model was odic truncated model was at Jtruncated = 6, and at yetJthe = 6, worst and yet the worst gion may be compared gion may to be the compared solid line to the representing solid line representing −3 case root-mean-square case root-mean-square error was ∼ 10 error . was As stated ∼ 10−3 . As stated in Section 2 the in computational Section 2 the computational complexity truly complexity scales truly1 Data scalesavailable1 Data from available ftp://ftp.cmdl.noaa.gov/ccg/ from ftp://ftp.cmdl.noaa.gov/ccg/ linearly with respect linearlytowith the respect numbertoofthe observations. number of observations. co2/trends/. co2/trends/. 4.1 Seasonal effect 102 Full 1GP solutionFull GP solution 10 2000 WITH THE Day of week effect 100 CO2 concentration (ppm) 101 10−2 Slow trend 1970 C ONSISTENCY CO2 concentration (ppm) Computation time (seconds) Computation time (seconds) 102 10−1 90 80 Arno Solin and Arno Simo Solin S¨ arkk¨ and a Simo S¨ arkk¨ a C OMPUTATIONAL E FFICIENCY 110 100 90 1972 80 Mon Tue 1980 Wed 1988 Thu Fri Sat Sun Figure 7: Relative number of births in the US based Relative number of births 1969–1988 in the US based daily The data first on daily data between (n =on 7305). between 1969–1988 (n = 7305).long-term The first plot showsthe thetwo plot shows the non-periodic effects, non-periodic long-term effects, the two latter the quasi-periodic latter the quasi-periodic seasonal and weekly effects. seasonal and weekly effects. the region corresponding to the full GP solution. The approximation error is negligible even though both the squared exponential and the periodic covariance function were approximated only by degree J = 6. 4.3 Modeling Birth Frequencies Gaussian processes can ultimately be employed as components in a larger model. As demonstrated in [19], analysis of birthday frequencies can be done by considering structural knowledge of population growth and temporal patterns implied by the calendar weeks and years. The data in this example consist of the number of deliveries in the US during the years 1969– 1988 (observed daily, n = 7305). The data was provided by the US National Vital Statistics System, available from Google BigQuery and pre-processed by Chris Mulligan2 . Consider the following additive model with four components: a Mat´ern (ν = 5/2, with hyperparameters σ12 , `1 ) GP prior for a smooth slow trend, a Mat´ern (ν = 3/2, with hyperparameters σ22 , `2 ) prior for the fast non-periodic component, a quasi-periodic covariance function with a period of about one year (365.25 days, J = 6, hyperparameters σ32 , `3 ) and Mat´ern (ν = 3/2, hyperparameter `4 ) damping, and 2 Data available from http://chmullig.com/ wp-content/uploads/2012/06/births.csv. GP STUFF toolbox for Matlab/Octave. ilar to [19], but special days are not considered separately. The observations are assumed to be corrupted by Gaussian noise with variance σn2 . Optimizing (quasi-Newton BFGS) the marginal likelihood with respect to all the 11 hyperparameters gives the results that are shown in Figure 7. All the plots have been scaled in the same way to show differences relative to a baseline of 100. The first subfigure shows the slow trend over the 20-year period and the faster non-periodic component. The two remaining subfigures visualize the periodic yearly and weekly effects for years 1972, 1980, and 1988. The day of week and seasonal effects are clearly quasi-periodic; the rising number of induced births and selective C-sections has affected the day of week effect. The results agree with those of [19], and this can be regarded a successful example of a beneficial reformulation of a GP model in terms of sequential inference. 5 CONCLUSION This paper has established the explicit connection between periodic covariance functions and stochastic differential equations. This link enables the use of efficient sequential inference methods to solve periodic GP regression problems in O(n) time complexity. This reformulation is a ‘best of both worlds’ approach; it brings together the convenient model specification and hyperparametrization of GPs with the computational efficiency of state space models. As shown in Section 3.4, the approximation converges uniformly and a rough upper bound for the error can be given in closed-form. As demonstrated in the examples, the computational benefits can be accomplished with practically no loss of accuracy. Several extensions could be considered: It is possible to consider timedependent frequencies (non-stationarity) as was done in [11] for the resonator model. Spatio-temporal extensions could be formulated following [20]. The codes for running the examples in this paper are available on the author’s web page: http://becs. aalto.fi/~asolin/. ACKNOWLEDGEMENTS The authors wish to thank Aki Vehtari for help with the data analysis, and Tomi Peltola, Ville Tolvanen, and Alan Saul for comments on the manuscript. This work was supported by grants from the Academy of Finland (266940, 273475) and the Finnish Funding Agency for Technology and Innovation (40304/09). Explicit link between periodic covariance functions and state space models ¨ Solin and Sarkk a¨ 26/26
© Copyright 2024 ExpyDoc