Explicit Link Between Periodic Covariance Functions and State

E XPLICIT L INK
B ETWEEN P ERIODIC C OVARIANCE
F UNCTIONS AND S TATE S PACE M ODELS
¨ a¨
Arno Solin and Simo Sarkk
Department of Biomedical Engineering and Computational Science
Aalto University, Finland
[email protected]
April 22, 2014
I
Gaussian processes and state space models
I
Periodic covariance functions
I
Quasi-periodic covariance functions
I
Example studies
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
2/26
Gaussian Process Regression in 1D
I The kernel approach:
f (t) ∼ GP(0, k (t, t 0 ))
yk = f (tk ) + εk ,
εk ∼ N (0, σn2 ),
where the observed data is
{(tk , yk ), k = 1, 2, . . . , n}.
I Prior assumptions of the process
encoded into the covariance
function k (t, t 0 ).
I Can be solved in closed-form,
but the naive solution scales as
O(n3 ).
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
3/26
Gaussian Process Regression in 1D
I The kernel approach:
f (t) ∼ GP(0, k (t, t 0 ))
yk = f (tk ) + εk ,
εk ∼ N (0, σn2 ),
where the observed data is
{(tk , yk ), k = 1, 2, . . . , n}.
I State space approach:
df(t)
= Ff(t) + Lw(t)
dt
yk = Hf(tk ) + εk , εk ∼ N (0, σn2 ),
where w(t) is a white noise
process with spectral density Qc .
I Prior assumptions of the process
I Model defined by F, L, Qc , the
I Can be solved in closed-form,
I Solved using Kalman filtering
encoded into the covariance
function k (t, t 0 ).
but the naive solution scales as
O(n3 ).
stationary covariance P∞ , and
the observation model H.
and Rauch–Tung–Striebel
smoothing in O(n) time
complexity.
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
3/26
Covariance and Spectral Density
k (τ ) =
−4
−2
0
1
2π
2
Z
∞
S(ω) exp(−i ωτ ) dω.
−∞
4
−5
0
τ = |t − t 0 |
ω
Covariance function k (τ )
Spectral density S(ω)
5
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
4/26
GP Regression (naive solution)
3
Data
Mean
95% confidence interval
2
Output, y
1
0
−1
−2
−3
−1
0
Input, t
1
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
5/26
GP Regression (filtering)
3
Data
Mean
95% confidence interval
2
Output, y
1
0
−1
−2
−3
−1
0
Input, t
1
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
6/26
GP Regression (smoothing)
3
Data
Mean
95% confidence interval
2
Output, y
1
0
−1
−2
−3
−1
0
Input, t
1
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
7/26
Periodic
Covariance Functions
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
8/26
Canonical Periodic Covariance Function
Start off with the squared exponential:
kx − x0 k2
k(x, x0 ) = σ 2 exp −
2`2
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
9/26
Canonical Periodic Covariance Function
Start off with the squared exponential:
kx − x0 k2
k(x, x0 ) = σ 2 exp −
2`2
Polar coordinates:
cos(ω0 t)
x(t) =
sin(ω0 t)
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
9/26
Canonical Periodic Covariance Function
1
π
2
Start off with the squared exponential:
kx − x0 k2
k(x, x0 ) = σ 2 exp −
2`2
Polar coordinates:
cos(ω0 t)
x(t) =
sin(ω0 t)
3
π
4
1
π
4
π
The canonical periodic covariance:


0
2 sin2 ω0 t−t
2
0
2

kp (t, t ) = σ exp−
`2
−2σ 0
− 43 π
2σ
0
− 14 π
− 21 π
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
9/26
Canonical Periodic Covariance Function
−6π −4π −2π
0
2π
|t − t 0 |
Covariance function
4π
6π
−6
−4
−2
0
2
4
6
ω
Spectral density
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
10/26
State Space Formulation (1/2)
I Fourier series representation (τ = |t − t 0 |)
kp (τ ) =
∞
X
qj2 cos(j ω0 τ )
j=0
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
11/26
State Space Formulation (1/2)
I Fourier series representation (τ = |t − t 0 |)
kp (τ ) =
∞
X
qj2 cos(j ω0 τ )
j=0
I The model can be constructed as solutions to second-order ODEs:
x˙ j (t)
0
=
y˙ j (t)
ω0 j
−ω0 j
0
xj (t)
yj (t)
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
11/26
State Space Formulation (2/2)
I The state space model can be given as a superposition the following
kind of models:
Fpj
0
ω0 j
=
−ω0 j
0
,
Hpj = 1
0 ,
Pp∞,j = qj2 I2
The diffusion part is zero (i.e. the model is deterministic).
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
12/26
State Space Formulation (2/2)
I The state space model can be given as a superposition the following
kind of models:
Fpj
0
ω0 j
=
−ω0 j
0
,
Hpj = 1
0 ,
Pp∞,j = qj2 I2
The diffusion part is zero (i.e. the model is deterministic).
I The spectral (variance) coefficients are
qj2 =
2 Ij (`−2 )
, for j = 1, 2, . . . ,
exp(`−2 )
and q02 = I0 (`−2 )/ exp(`−2 ), where Iα (z) is the modified Bessel function.
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
12/26
State Space Formulation (2/2)
I The state space model can be given as a superposition the following
kind of models:
Fpj
0
ω0 j
=
−ω0 j
0
,
Hpj = 1
0 ,
Pp∞,j = qj2 I2
The diffusion part is zero (i.e. the model is deterministic).
I The spectral (variance) coefficients are
qj2 =
2 Ij (`−2 )
, for j = 1, 2, . . . ,
exp(`−2 )
and q02 = I0 (`−2 )/ exp(`−2 ), where Iα (z) is the modified Bessel function.
I Taking the J first terms in the series gives an approximation, and this
approximation converges uniformly to the actual covariance as J → ∞.
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
12/26
Approximative Covariance Function (J = 0)
−6π −4π −2π
0
2π
|t − t 0 |
Covariance function
4π
6π
−6
−4
−2
0
2
4
6
ω
Spectral density
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
13/26
Approximative Covariance Function (J = 1)
−6π −4π −2π
0
2π
|t − t 0 |
Covariance function
4π
6π
−6
−4
−2
0
2
4
6
ω
Spectral density
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
14/26
Approximative Covariance Function (J = 2)
−6π −4π −2π
0
2π
|t − t 0 |
Covariance function
4π
6π
−6
−4
−2
0
2
4
6
ω
Spectral density
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
15/26
Approximative Covariance Function (J = 3)
−6π −4π −2π
0
2π
|t − t 0 |
Covariance function
4π
6π
−6
−4
−2
0
2
4
6
ω
Spectral density
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
16/26
Quasi-Periodic
Covariance Functions
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
17/26
Quasi-Periodic Covariance
I Allow the shape of the periodic effect to change over time.
I Take the product of a periodic covariance function kp (t, t 0 ) with a
covariance function kq (t, t 0 ) with rather long characteristic length-scale,
k(t, t 0 ) = kp (t, t 0 ) kq (t, t 0 ),
allowing the covariance to decay away from exact periodicity.
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
18/26
Quasi-Periodic Covariance Function
−6π −4π −2π
0
2π
|t − t 0 |
Covariance function
4π
6π
−6
−4
−2
0
2
4
6
ω
Spectral density
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
19/26
Quasi-Periodic Covariances in State Space Form
I State space representation of
both the ‘quasi’ and ‘periodic’
part.
I The quasi-periodic state space
model needs to be set up so that
the feedback matrices commute
(Fp Fq = Fq Fp .).
I This can be accomplished by
using the special features of the
Kronecker product.
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
20/26
Quasi-Periodic Covariances in State Space Form
I State space representation of
both the ‘quasi’ and ‘periodic’
part.
I The quasi-periodic state space
model needs to be set up so that
the feedback matrices commute
(Fp Fq = Fq Fp .).
I This can be accomplished by
using the special features of the
Kronecker product.
I The joint model corresponding to
the quasi-periodic product of the
two covariance functions can
then be given in a block-form:
Fj
Lj
Qc,j
P∞,j
Hj
=
=
=
=
=
Fq ⊗ I2 + Iq ⊗ Fpj ,
Lq ⊗ Lpj ,
Qqc ⊗ qj2 I2 ,
Pq∞ ⊗ Pp∞,j ,
Hq ⊗ Hpj ,
where ‘ ⊗ ’ denotes the Kronecker
product of two matrices.
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
20/26
Example Studies
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
21/26
Example: Computational Complexity
102
CO2 concentration (ppm)
Computation time (seconds)
Arno Solin and Simo S¨
arkk¨
a
Full GP solution
101
100
State space solution
10−1
10−2
0
2000
4000
6000
8000
Number of data points, n
10 000
410
400
390
380
370
2000
Figure 5: Demonstration of the computational benefits
Figure 6: CO
of the state space model in solving a GP regression
values for ye
problem for a number of data points
up to 10 000 and
gether with th
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
with ten repetitions. The state space model execution
shaded
patch
22/26
times grow exactly linearly.
thin lines from
Example: Mauna Loa CO2 Concentration
I CO2 concentration observations
CO2 concentration (ppm)
and Simo S¨
arkk¨
a
410
(n = 2227, years 1958–2000 not
shown in figure).
400
I The GP covariance function is as
follows:
390
k(t, t 0 ) = kSE (t, t 0 )
380
+ kp (t, t 0 ) kν=3/2 (t, t 0 )
+ kν=3/2 (t, t 0 )
370
2000
000
2005
2010
Time (years)
2015
2020
s
n
d
n
Figure 6: CO2 concentration observations (n = 2227,
values for years 1958–2000 not shown in figure) together with the 95% predictive confidence region (the
shaded patch is from the state space model, and the
thin lines from the exact GP solution).
s
s
e
4.2
Modeling Carbon Dioxide Concentration
In this section the method is applied to the well-known
I Converted to state space and
hyperparameters optimized with
respect to marginal likelihood.
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
23/26
Example: Daily Births in 1969–1988
Explicit Link Between Periodic Covariance Functions and State Space Models
Trends
110
100
90
Slow trend
80
Seasonal effect
1970
1973
Fast non-periodic component
1976
1979
1982
1985
1988
110
100
90
1972
80
1980
1988
Day of week effect
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
110
100
90
1972
80
Mon
Tue
1980
Wed
1988
Thu
Fri
Sat
Sun
Figure 7: Relative number of births in the US based
on daily data between 1969–1988 (n = 7305). The first
plot shows the non-periodic long-term effects, the two
latter the quasi-periodic seasonal and weekly effects.
the region corresponding to the full GP solution. The
approximation error is negligible even though both the
squared exponential and the periodic covariance func-
a quasi-periodic
covariance function with a period of
I
number of
inern
the
one week Relative
(J = 6, hyperparameters
σ52 , `births
5 ) and Mat´
(ν = 3/2,based
hyperparameter
`6 ) damping.
This is simon daily
data between
ilar to [19], but special days are not considered sepa1969–1988
= 7305).
rately. The
observations are(n
assumed
to be corrupted
by Gaussian noise with variance σn2 .
US
I The model is as follows:
Optimizing (quasi-Newton BFGS) the marginal likeliI Matern
(ν 11
= 7/2)
for the slow trend
hood with respect
to all the
hyperparameters
gives
I are
Matern
(ν in=Figure
3/2) for
variation
the results that
shown
7. faster
All the
plots
I
Quasi-periodic
(yearly)
with
have been scaled in the same way to show differences
Matern
(ν =
3/2)
relative to a baseline
of 100.
The
firstdamping
subfigure shows
Quasi-periodic
(weekly)
with
the slow trendIover
the 20-year period
and
the faster
Matern (νThe
= 3/2)
damping subfignon-periodic component.
two remaining
ures visualize the periodic yearly and weekly effects
I Converted
state
space
for years
1972, 1980, and to
1988.
The day
of weekand
and
seasonal hyperparameters
effects are clearly quasi-periodic;
the rising
optimized
with
number of induced births and selective C-sections has
respect
to effect.
marginal
likelihood
affected the
day of week
The results
agree with
those of [19], and this can be regarded a successful example of a beneficial reformulation of a GP model in
terms of sequential inference.
5
CONCLUSION
This paper has established the explicit connection between periodic
covariance
functions
and stochastic
dif-state space models
Explicit
link between
periodic covariance
functions and
¨
a¨
ferential equations. This link enables the use of effi- Solin and Sarkk
24/26
cient sequential inference methods to solve periodic
Conclusion
I We have established the explicit connection between periodic
covariance functions and state space models.
I This link enables the use of efficient sequential inference methods to
solve periodic GP regression problems in O(n) time complexity.
I The approximation converges uniformly and a rough upper bound for
the error can be given in closed-form.
I This is a ‘best of both worlds’ approach; it brings together the
convenient model specification and hyperparametrization of GPs with
the computational efficiency of state space models.
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
25/26
Explicit Link Between Periodic Covariance Functions
and State Space Models
¨ a¨
Arno Solin
Simo Sarkk
Department of Biomedical Engineering and Computational Science
Aalto University, Finland
P ERIODIC
C OVARIANCE F UNCTIONS
I The canonical periodic covariance function:
0 2 sin2 ω0 t−t
2
kp(t, t 0) = exp −
,
`2
where ` is the characteristic length-scale and
ω0 defines the angular velocity (period length).
I The covariance function can be expanded into
a (almost everywhere) convergent Fourier
series (τ = |t − t 0|)
∞
X
kp(τ ) =
qj2 cos(j ω0 τ ).
I NTRODUCTION
I Gaussian processes (GPs, [2]) are commonly
used modeling tools in non-parametric
machine learning.
I Prior assumptions are encoded into the
covariance function (kernel).
I We show that periodic covariance functions in
GP regression can be rewritten as state space
models.
I Reduces the problematic O(n3) computational
complexity to O(n) in the number of
observations n.
I The model is written in terms of a series of
stochastic resonators.
I Generalizes to quasi-periodic (almost periodic)
covariance functions.
j=0
I The differential equation model is a
superposition the following kind of models [1]:
0 −ω0 j
Fpj =
,
ω0 j 0
and the diffusion part is zero
(i.e. the model is
deterministic), Hpj = 1 0 , and Pp∞,j = qj2I2.
I The spectral (variance) coefficients qj2 are
given by
qj2 =
G AUSSIAN P ROCESS R EGRESSION
2 Ij (`−2)
, for j = 1, 2, . . . ,
exp(`−2)
R EFERENCES
I It is often desirable to allow for seasonable
periodic variation, allowing the shape of the
periodic effect to change over time.
I A common way of constructing quasi-periodic
covariances is to take the product of a periodic
covariance function kp(t, t 0) with a covariance
function kq(t, t 0) with rather long characteristic
length-scale,
df(t)
= Ff(t) + Lw(t)
dt
yk = Hf(tk ) + εk , εk ∼ N (0, σn2),
http://arno.solin.fi
I An example implementation is available on the
author web page:
http://becs.aalto.fi/~asolin/
I The method is also a part of the GP STUFF
toolbox for Matlab/Octave.
Q UASI -P ERIODIC
C OVARIANCE F UNCTIONS
I Certain classes of covariance functions allow
to work with the mathematical dual, where the
Gaussian process is constructed as a solution
to a mth order linear stochastic differential
equation (SDE).
I State space representation: The GP regression
problem can also be given as:
I Codes for the examples available at:
E XAMPLE I MPLEMENTATION
and q02 = I0(`−2)/ exp(`−2), where Iα(z) is the
modified Bessel function.
I Taking the J first terms in the series gives an
approximation, and this approximation
converges uniformly [1] to the actual
covariance as J → ∞.
I Kernel representation: In GP regression the
model functions f are assumed to be
realizations from a GP prior, and the
observations yk , k = 1, 2, . . . , n, corrupted by
Gaussian noise:
f (t) ∼ GP(0, k (t, t 0))
yk = f (tk ) + εk , εk ∼ N (0, σn2)
C ONCLUSIONS
I We have established the explicit connection
between periodic covariance functions and
state space models.
I This link enables the use of efficient sequential
inference methods to solve periodic GP
regression problems in O(n) time complexity.
I The approximation converges uniformly and a
rough upper bound for the error can be given in
closed-form.
I This is a ‘best of both worlds’ approach; it
brings together the convenient model
specification and hyperparametrization of GPs
with the computational efficiency of state space
models.
¨ a¨ (2014). “Explicit link between periodic
[1] A. Solin and S. Sarkk
covariance functions and state space models.” Proceedings
of the 17th International Conference on Artificial Intelligence
and Statistics (AISTATS). JMLR W&CP vol. 33.
[2] C.E. Rasmussen and C.K.I. Williams (2006). “Gaussian
Processes for Machine Learning.” The MIT Press.
¨ a¨ (2013). “Bayesin Filtering and Smoothing.”
[3] S. Sarkk
Cambridge University Press.
[4] A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari,
and D.B. Rubin. (2013). “Bayesian Data Analysis.” Third
edition. Chapman & Hall/CRC Press.
k (t, t 0) = kp(t, t 0) kq(t, t 0),
where w(t) is a multi-dimensional white noise
process with spectral density Qc.
I The model is defined by the feedback matrix F,
the noise effect matrix L, the spectral density
Qc, the stationary covariance P∞, and the
observation model H.
I The inference problem can now be
solved using Kalman filtering [3]
in O(nm3) time complexity.
allowing the covariance to decay away from
exact periodicity.
I The joint model corresponding to the
quasi-periodic product of the two covariance
functions can then be given [1] in a block-form:
I The methods also implemented into the
Re
yk
jav
ik,
Icel
and.
Fj = Fq ⊗ I2 + Iq ⊗ Fpj ,
Lj = Lq ⊗ Lpj ,
Qc,j = Qqc ⊗ qj2 I2,
P∞,j = Pq∞ ⊗ Pp∞,j ,
Hj = Hq ⊗ Hpj ,
where ‘⊗’ denotes the Kronecker product of
two matrices.
,
14
Th
rdinates.
20
polar coo
es
d in
TS
e
TA
lize
cu
AIS
ua
rv
vis
es
nd
ar
er
ra
Con
Explicit Link Between Periodic Covariance Functions and State Space Models
tact
rio
an
E XAMPLE S TUDY
deta
dom
Pp
ils:
ic G
draw
I A simulated example showing the
arn
s from a period
a quasi-periodic covariance function with a period of
110
o.so
i
computational efficiency.
lin.f
lin@
one week (J = 6, hyperparameters σ52 , `5 ) and Mat´ern
o.so
aalto.
I Prediction of CO2 levels using weekly data
100
fi, web address: http://arn
(ν = 3/2, hyperparameter `6 ) damping. This is simTrends
D EMONSTRATIONS
(see [2]), where we compare the approximation
to the full GP result.
I Explaining the periodic variation in the number
of births per day in the US (see [4]).
0
100
410
400
390
State space solution
State space solution
10−1
10−2
380
370
10 000
0 4000 20006000 40008000 600010 0008000 2000
Number of dataNumber
points, nof data points, n
F ULL GP
410
400
1973
Fast non-periodic component
1976
1979
1982
1985
1988
110
100
90
1972
80
1980
1988
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
390
380
370
2005
2000
2010
2005
Time (years)
2015
2010
2020
2015
Time (years)
2020
Figure 5: Demonstration
5: Demonstration
of the computational
benefits
benefits
6: CO
concentration
6: CO2 concentration
observations
(n
observations
2227,values
(nfor
= years
2227,
2Figure
Demonstration
ofFigure
the computational
benefitsofofthe
thecomputational
state
space Figure
CO
observations (n
==2227,
2 concentration
of thein state
of
themodel
state in
space
solving
model
aforGP
in
solving
regression
GP values
regression
for years
values
1958–2000
fornot
years
not1958–2000
shown
intogether
not
figure)
shown
to- in
model
solvingspace
a GP
regression
problem
a number
ofa data
1958–2000
shown
in figure)
with
thefigure)
95% toproblem
problem
number
of
fordata
a number
points of
updata
to 10
points
000space
and
up to 10
000 and
points
up tofor
10a000
and with
ten
repetitions.
The
state
predictive
confidence
region
(the region
shaded
patch isregion
from the
state
gether
with the
gether
95% with
predictive
the 95%
confidence
predictive
confidence
(the
(the
model
execution
timesten
grow
exactly
linearly.
model,
thin the
lines
from and
the exact
GP solution).
with ten
repetitions.
with
The
repetitions.
state space
The
model
stateexecution
space model shaded
execution
patchspace
shaded
is from
patch
theand
state
isthe
from
space
model,
state
space
the
model,
and the
times grow exactly
times linearly.
grow exactly linearly.
thin lines fromthin
the lines
exactfrom
GP the
solution).
exact GP solution).
The approximation
The approximation
still converges still
uniformly
converges
as long
uniformly
as
as long
as
4.2
Modeling
4.2 Carbon
Modeling
Dioxide
Carbon
Concentration
Dioxide Concentration
kq (0) < ∞ andkqthe
(0) approximation
< ∞ and the approximation
for kp (τ ) converges
for kp (τ ) converges
In this
section In
thethis
method
section
is applied
the method
to the
is applied
well-known
to the well-known
uniformly. The
uniformly.
matrices The
(32) matrices
result in (32)
a very
result
sparse
in a very
sparse
1
series data
time
consisting
series data
of1 atmospheric
consisting ofCO
atmospheric
CO2 conmodel, and sparse
model,
matrix
and sparse
methods
matrix
can be
methods
employed
can be time
employed
2 concentration readings
centration
in parts
readings
per million
in parts
(ppm)
per million
by vol- (ppm) by volfor the matrixfor
exponentials
the matrixand
exponentials
multiplication.
and multiplication.
ume from air samples
ume from
collected
air samples
at the
collected
Mauna at
Loa
the
obMauna Loa obFigure 3b shows
Figure
the 3b
quasi-periodic
shows the quasi-periodic
covariance funccovariance funcservatory, Hawaii
servatory,
(see, e.g.,
Hawaii
[1]).(see,
The e.g.,
observations
[1]). Theare
observations are
tion corresponding
tion corresponding
to squared exponential
to squareddamping
exponential damping
monthly from monthly
1958 to May
from 1974,
1958 to
after
May
which
1974,the
after
ob- which the ob(ν → ∞) of the
(ν periodicity
→ ∞) of the
(the
periodicity
squared exponential
(the squared exponential
servations are servations
weekly, resulting
are weekly,
in 2227
resulting
measurements
in 2227 measurements
covariance is represented
covariance is
byrepresented
the dashed by
line),
theand
dashed
Fig-line), and Figaltogether. Data
altogether.
collectedData
aftercollected
year 2010
after
were
year
re-2010 were reure 4b shows draws
ure 4bfrom
shows
thedraws
corresponding
from the corresponding
prior.
prior.
tained for validation.
tained for validation.
In modeling
practical GP
problems
modeling
it isproblems
common itto is common to
EXPERIMENTAL
4 EXPERIMENTAL
RESULTS RESULTSIn practical GP
combine several
combine
simpleseveral
covariance
simple
functions
covariance
in order
functions in order
to come up with
to come
a model
up with
structure
a model
thatstructure
meets the
that meets the
In this section
In the
thiscomputational
section the computational
efficiency of the
efficiency of the
requirements of
requirements
the phenomenon.
of the phenomenon.
The following rather
The following rather
method is firstmethod
demonstrated
is first demonstrated
by applying itbytoapplying
simit to simsimplified model
simplified
is considered
model for
is considered
the covariance:
for the covariance:
ulated data, after
ulated
which
data,
two
after
empirical
which two
sets empirical
of data aresets of data are
used to show that
usedthe
to show
method
thatis the
feasible
method
in real-world
is feasible in real-world
k(τ ) = k1 (τ ) +k(τ
k2)(τ=) kk31(τ
(τ)) +
+ kk42(τ
(τ),
) k3 (τ )(33)
+ k4 (τ ),
(33)
applications. applications.
4
where k1 (·) is where
a squared
k1 (·)exponential
is a squaredcovariance
exponential
funccovariance funcDemonstrating
4.1 Demonstrating
the Computational
the Computational
tion for the slow
tion
rising
for the
trend
slow
(hyperparameters
rising trend (hyperparameters
σ12 , `1 ),
σ12 , `1 ),
Efficiency Efficiency
k2 (·) the canonical
k2 (·) periodic
the canonical
covariance
periodic
function
covariance
with afunction with a
period of one year
period
(hyperparameters
of one year (hyperparameters
σ22 , `2 ), k3 (·) isσ2a2 , `2 ), k3 (·) is a
To illustrate the
To efficiency
illustrate of
thethe
efficiency
proposed
of model,
the proposed
let
model, let
covariance function
covariance
of thefunction
Mat´ernofclass
the with
Mat´eνrn=class
3/2 with ν = 3/2
f (t) be a Gaussian
f (t) be
process
a Gaussian
simulated
process
fromsimulated
a GP prior
from a GP prior
(hyperparameter
(hyperparameter
`3 ), and k4 (·) is`3a),covariance
and k4 (·) isfunction
a covariance function
with a periodicwith
covariance
a periodic
function
covariance
with function
unit paramewith unit parameof the Mat´ernofclass
the with
Mat´ern
ν =
class
3/2with
(hyperparameters
ν = 3/2 (hyperparameters
ters. The stateters.
space
The
solution
state space
is benchmarked
solution is against
benchmarked2 against
σ4 , `4 ). The observations
σ42 , `4 ). Theare
observations
assumed toare
beassumed
corrupted
to be corrupted
a naive GP implementation
a naive GP implementation
in Mathworks in
Matlab
Mathworks
(im- Matlab
(imby Gaussian noise
by Gaussian
with variance
noise with
σn2 . variance σn2 .
plemented as in
plemented
[1] using the
as inCholesky
[1] usingdecomposition).
the Cholesky decomposition).
Maximizing the
Maximizing
marginal the
likelihood
marginal
(quasi-Newton
likelihood (quasi-Newton
Figure 5 showsFigure
the results
5 shows
for the
simulated
results GP
for simulated
regression GP regression
BFGS) with respect
BFGS)towith
the respect
hyperparameters
to the hyperparameters
and preand preproblems withproblems
the number
withofthe
observations
number ofranging
observations
up ranging up
dicting 20 years
dicting
forward
20 years
gives forward
the results
givesthat
the are
results that are
to n = 10 000 and
to n with
= 10ten
000repetitions
and with ten
each.
repetitions
The perieach. The perishown in Figure
shown
6. The
in Figure
predictive
6. The
95%predictive
confidence
95%
re- confidence reodic model was
odic
truncated
model was
at Jtruncated
= 6, and at
yetJthe
= 6,
worst
and yet the worst
gion may be compared
gion may to
be the
compared
solid line
to the
representing
solid line representing
−3
case root-mean-square
case root-mean-square
error was ∼ 10
error
. was
As stated
∼ 10−3 . As stated
in Section 2 the
in computational
Section 2 the computational
complexity truly
complexity
scales truly1 Data
scalesavailable1 Data
from available
ftp://ftp.cmdl.noaa.gov/ccg/
from ftp://ftp.cmdl.noaa.gov/ccg/
linearly with respect
linearlytowith
the respect
numbertoofthe
observations.
number of observations.
co2/trends/. co2/trends/.
4.1
Seasonal effect
102
Full 1GP solutionFull GP solution
10
2000
WITH THE
Day of week effect
100
CO2 concentration (ppm)
101
10−2
Slow trend
1970
C ONSISTENCY
CO2 concentration (ppm)
Computation time (seconds)
Computation time (seconds)
102
10−1
90
80
Arno Solin and
Arno
Simo
Solin
S¨
arkk¨
and
a Simo S¨
arkk¨
a
C OMPUTATIONAL E FFICIENCY
110
100
90
1972
80
Mon
Tue
1980
Wed
1988
Thu
Fri
Sat
Sun
Figure 7: Relative number of births in the US based
Relative
number
of births 1969–1988
in the US based
daily The
data first
on daily
data between
(n =on
7305).
between
1969–1988
(n = 7305).long-term
The first plot
showsthe
thetwo
plot shows
the non-periodic
effects,
non-periodic long-term effects, the two latter the quasi-periodic
latter the quasi-periodic seasonal and weekly effects.
seasonal and weekly effects.
the region corresponding to the full GP solution. The
approximation error is negligible even though both the
squared exponential and the periodic covariance function were approximated only by degree J = 6.
4.3
Modeling Birth Frequencies
Gaussian processes can ultimately be employed as
components in a larger model. As demonstrated in
[19], analysis of birthday frequencies can be done by
considering structural knowledge of population growth
and temporal patterns implied by the calendar weeks
and years. The data in this example consist of the
number of deliveries in the US during the years 1969–
1988 (observed daily, n = 7305). The data was provided by the US National Vital Statistics System,
available from Google BigQuery and pre-processed by
Chris Mulligan2 .
Consider the following additive model with four components: a Mat´ern (ν = 5/2, with hyperparameters
σ12 , `1 ) GP prior for a smooth slow trend, a Mat´ern
(ν = 3/2, with hyperparameters σ22 , `2 ) prior for
the fast non-periodic component, a quasi-periodic covariance function with a period of about one year
(365.25 days, J = 6, hyperparameters σ32 , `3 ) and
Mat´ern (ν = 3/2, hyperparameter `4 ) damping, and
2
Data available from http://chmullig.com/
wp-content/uploads/2012/06/births.csv.
GP STUFF toolbox for Matlab/Octave.
ilar to [19], but special days are not considered separately. The observations are assumed to be corrupted
by Gaussian noise with variance σn2 .
Optimizing (quasi-Newton BFGS) the marginal likelihood with respect to all the 11 hyperparameters gives
the results that are shown in Figure 7. All the plots
have been scaled in the same way to show differences
relative to a baseline of 100. The first subfigure shows
the slow trend over the 20-year period and the faster
non-periodic component. The two remaining subfigures visualize the periodic yearly and weekly effects
for years 1972, 1980, and 1988. The day of week and
seasonal effects are clearly quasi-periodic; the rising
number of induced births and selective C-sections has
affected the day of week effect. The results agree with
those of [19], and this can be regarded a successful example of a beneficial reformulation of a GP model in
terms of sequential inference.
5
CONCLUSION
This paper has established the explicit connection between periodic covariance functions and stochastic differential equations. This link enables the use of efficient sequential inference methods to solve periodic
GP regression problems in O(n) time complexity.
This reformulation is a ‘best of both worlds’ approach;
it brings together the convenient model specification
and hyperparametrization of GPs with the computational efficiency of state space models. As shown in
Section 3.4, the approximation converges uniformly
and a rough upper bound for the error can be given
in closed-form. As demonstrated in the examples,
the computational benefits can be accomplished with
practically no loss of accuracy. Several extensions
could be considered: It is possible to consider timedependent frequencies (non-stationarity) as was done
in [11] for the resonator model. Spatio-temporal extensions could be formulated following [20].
The codes for running the examples in this paper are
available on the author’s web page: http://becs.
aalto.fi/~asolin/.
ACKNOWLEDGEMENTS
The authors wish to thank Aki Vehtari for help with
the data analysis, and Tomi Peltola, Ville Tolvanen,
and Alan Saul for comments on the manuscript. This
work was supported by grants from the Academy of
Finland (266940, 273475) and the Finnish Funding
Agency for Technology and Innovation (40304/09).
Explicit link between periodic covariance functions and state space models
¨
Solin and Sarkk
a¨
26/26