The ARMA(p,q) - Department of Mathematics and Statistics

Notes
Lecture 14: The ARMA(p,q) process
STAT352: Applied Time Series
Tilman M. Davies
Dept. of Mathematics & Statistics
University of Otago
This Lecture
Bringing the autoregressive and moving average models
together
ARMA fitting
Notes
AR + MA = ARMA
Notes
We have covered the AR and MA models extensively.
However, up until now, we have treated them as completely
separate entities.
The ARMA(p,q) process incorporates both modelling
frameworks under one framework.
The ARMA(p,q) process
X is an Autoregressive Moving Average process of order p, q
if
Xt = µ + Wt + θ1 Wt−1 + θ2 Wt−2 + . . . + θq Wt−q
+ φ1 (Xt−1 − µ) + φ2 (Xt−2 − µ) + . . . + φp (Xt−p − µ),
where µ is the stationary mean of the series, W is white noise with
variance σ 2 and θ = {θ1 , . . . , θq } and φ = {φ1 , . . . , φp } are
constants.
Notes
The ARMA(p,q) process: important notes
Notes
It looks a little ugly in its general form, but the ARMA(p,q)
process is just the sum of AR(p) terms with those of an
MA(q) process around some stationary mean!
OMG! WTF? LOL @ ARMA.
As you can also see from the previous slide, the pure AR(p)
and MA(q) models are just special cases of the ARMA(p,q)
model:
The ARMA(p,0) model is exactly an AR(p) model.
The ARMA(0,q) model is exactly an MA(q) model.
Because of this, we should automatically try to fit an ARMA
model to a stationary time series as a first step, as pure AR
and MA models are automatically included as possibilities
under this umbrella.
Aw c’mon Tilly. Why do we care?
By possessing the ability to incorporate behaviour apparent in
both the pure AR and MA processes, the ARMA process
allows for far more flexibility in modelling an assumed
stationary time series.
It subsequently has the typical advantage of requiring us to
estimate far fewer parameters than if we considered the AR
and MA models separately.
This is known as the Principle of Parsimony.
Notes
Principle of Parsimony and ARMA models
Notes
The Principle of Parsimony says we want to find a model with as
few parameters as possible, but which gives an adequate
representation of the data at hand.
The ARMA model satisfies this principle: it is a fact that a
stationary time series may often be adequately modelled by an
ARMA model involving fewer parameters than a pure MA or AR
process by itself.
Properties of the ARMA(1,1) model
The model is Xt = µ + Wt + θ1 Wt−1 + φ1 (Xt−1 − µ) . Using
similar analytical methods as for the pure AR process, it can be
shown that
E[Xt ] = µ,
that
Var[Xt ] = σ 2 + σ 2 (θ1 + φ1 )2 (1 − φ21 )−1 ,
and that
2
γX (h) = Cov[Xt , Xt+h ] = (θ1 + φ1 )φh−1
1 σ
+ (θ1 + φ1 )2 σ 2 φh1 (1 − φ21 )−1 .
The methods used to derive these properties are the same as for
the more general case of an ARMA(p,q) model.
Notes
Conditions on the parameters of an ARMA(p,q) process
Notes
Remember, there are some restrictions on the values of the
parameters for an MA(q) model to satisfy the condition of
invertibility.
There are even more restrictions on the values of the
parameters for an AR(p) model to be stationary and causal.
Especially for the AR(p) process, these ‘restrictions’ are
difficult to clarify further without some rather involved
mathematics. But just be aware that they do exist i.e. that we
can’t just have any random values as the parameters.
These restrictions for the pure AR and MA models flow
directly through to the ARMA model, so that we end up with
a stationary, causal, and invertible process.
The parameter estimation techniques we consider
automatically produce values for θ and φ which result in a
process that satisfies these conditions.
Parameter estimation for the ARMA(p,q) process
Estimation of θ and φ in an ARMA model given some
observed time series X = {x1 , . . . , xN } follows the same ideas
as parameter estimation for the MA(q) process.
That is, we use a
conditional-least-squares-followed-by-maximum-likelihood
approach.
Remember, the conditional least squares method finds the
combination of parameter valuesPthat minimises the estimated
residual sum of squares RSS = t w
ˆ t2 .
Notes
Example 1: Conditional least squares for the ARMA(1,1)
model
Notes
Suppose we have some observed data X = {x1 , . . . , xN } that we
wish to fit an ARMA(1,1) model to. As in the previous slide, the
model is
Xt = µ + Wt + θ1 Wt−1 + φ1 (Xt−1 − µ).
We then guess some values for µ, θ1 and φ1 , set w0 = 0 and
x0 = µ, and than calculate the residuals recursively by
w
ˆ 1 = x1 − µ
w
ˆ 2 = x2 − µ − θ1 w
ˆ 1 − φ1 (x1 − µ)
up to
w
ˆ N = xN − µ − θ1 w
ˆ N−1 − φ1 (xN−1 − µ).
P 2
Then, RSS = t w
ˆ t may be calculated.
Example 1: Conditional least squares for the ARMA(1,1)
model (cont.)
The procedure on the previous slide is repeated for different
combinations of µ, θ1 , φ1 .
The conditional sum of squares estimates is represented by
the combination which results in the smallest value of RSS.
In the computer functions, these estimates are then used as
starting values for a numerical procedure to find the parameter
values which maximise a specific likelihood function.
The actual value of this log-transformed likelihood function,
at the maximum likelihood estimates of the parameters, is
provided in the R output as log likelihood after running
arma.fit (we’ll see this in a moment) or
ma.fit or ar.fit.
Note that the if we decided beforehand that µ = 0, then this
discussion still holds, except that we only seek estimates of
θ1 , φ1 .
Notes
Order selection for the ARMA(p,q) model
Notes
Recall the AIC measure allowed us to automate the procedure
for selecting an appropriate order for the pure AR(p) and
MA(q) processes.
Minimisation of this criterion provides a model in which we
aim to balance goodness-of-fit and complexity in terms of the
number of parameter requiring estimation given some
observed time series.
So, given a time series, we can successively fit higher-order
models and find the combination of p and q which provide
the smallest AIC.
Since the pure AR and MA processes are special cases of
the ARMA model, this essentially means the R functions
ma.orderselect and ar.orderselect are obsolete.
arma.orderselect
Order selection via AIC for a general ARMA(p,q) model,
which obviously includes the possibility of selecting a pure
AR(p) or MA(q) process, is achieved in R with the function
arma.orderselect. It has precisely the same three
arguments and is used in exactly the same way as
ma.orderselect and ar.orderselect.
There is one small note to make: unlike ma.orderselect and
ar.orderselect, the argument order.max has no default
value and must be set by the user.
Notes
arma.orderselect (cont.)
Notes
Also, order.max in arma.orderselect refers to both p and
q i.e. the search for the best order according to AIC will only
take place up to the ARMA(order.max,order.max) model.
The function can be particularly computationally expensive;
more so than ma.orderselect or ar.orderselect. You
should start with a modest value, e.g. order.max=10 or lower.
arma.fit
Notes
Much like ma.fit and ar.fit, arma.fit uses
conditional-least-squares-followed-by-maximum-likelihood to
fit an ARMA model, once your order has been selected.
The function has four arguments:
1
2
3
4
X: your time series (a numeric vector)
p: your selected value of p for the AR component of the model
q: your selected value of q for the MA component of the model
include.mean: whether or not we wish to include µ in the
estimation procedure
Using arma.fit with p=0 will produce identical results to
using ma.fit.
Using arma.fit with q=0 will produce identical results to
using ar.fit.
Example 2: Level of Lake Heron
Notes
582
Level of Lake Heron, 1875−1972
●
●●●
●
●
● ●
●
●
●
●
●●
●●
●
579
●
●
●●
●●
●● ●●
●
●
●●
●
●●●●
●
●
578
level (ft)
580
581
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
577
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●
●
●
●
●●
● ●
●●
●● ●●●
●●
576
●
●
0
20
40
60
80
100
year
Example 2: Level of Lake Heron (cont.)
> arma.orderselect(heron,include.mean=TRUE,order.max=10)
|===============================================| 100%
p
q
aic
1.0000
1.0000 214.4905
>
> heron.arma <- arma.fit(heron,p=1,q=1,include.mean=TRUE)
> heron.arma
Call:
arima(x = ts(X, 1, length(X)), order = c(p, 0, q), include.mean = TRUE)
Coefficients:
ar1
ma1
0.7449 0.3206
s.e. 0.0777 0.1135
intercept
579.0555
0.3501
sigma^2 estimated as 0.4749:
log likelihood = -103.25,
aic = 214.49
AIC has elected an ARMA(1,1) model for the Lake Heron
data.
In the output, which estimate corresponds to θˆ1 and φˆ1 ?
Why have we chosen to include µ? What is its estimate?
Notes
Example 2: Level of Lake Heron (cont.)
Notes
> acf(heron,lag.max=lgm,main="Lake Heron ACF")
> lines(0:lgm,ARMAacf(ar=coef(heron.arma)[1],ma=coef(heron.arma)[2],lag.max=lgm),col=2,lwd=2,lty=2)
0.4
−0.2
0.0
0.2
ACF
0.6
0.8
1.0
Lake Heron ACF
0
5
10
15
20
25
30
Lag
Example 2: Level of Lake Heron (cont.)
As always, we should make sure via diagnostic plots that the
estimated residuals under this model do appear to be white noise.
> tsdiag(heron.arma)
−2
0
1
2
Standardized Residuals
0
20
40
60
80
100
Time
0.6
0.2
−0.2
ACF
1.0
ACF of Residuals
0
5
10
15
Lag
p values for Ljung−Box statistic
●
●
●
●
●
●
●
●
0.0
p value
●
0.4
0.8
●
2
4
6
lag
and everything looks sweet as.
8
10
Notes
Example 3: Pure AR or MA model for Lake Heron data...?
Notes
> ar.orderselect(heron,include.mean=TRUE,order.max=10)
|=============================================| 100%
p
aic
2.0000 215.2664
> ma.orderselect(heron,include.mean=TRUE,order.max=10)
|=============================================| 100%
q
aic
3.0000 222.1263
The above results show that the ARMA(1,1) has a smaller
AIC than the ‘best’ models AIC could select for either a pure
AR or MA process.
And the Principle of Parsimony is exhibited here, with a
larger number of parameters being required in the case of a
pure MA process compared to the selected ARMA process.
Example 4: Comparing the functions
> ar.fit(heron,p=2,include.mean=TRUE)
Call:
arima(x = ts(X, 1, length(X)), order = c(p, 0, 0), include.mean = TRUE)
Coefficients:
ar1
ar2
1.0436 -0.2495
s.e. 0.0983
0.1008
intercept
579.0473
0.3319
sigma^2 estimated as 0.4788:
log likelihood = -103.63,
aic = 215.27
> arma.fit(heron,p=2,q=0,include.mean=TRUE)
Call:
arima(x = ts(X, 1, length(X)), order = c(p, 0, q), include.mean = TRUE)
Coefficients:
ar1
ar2
1.0436 -0.2495
s.e. 0.0983
0.1008
intercept
579.0473
0.3319
sigma^2 estimated as 0.4788:
log likelihood = -103.63,
aic = 215.27
Notes
Example 4: Comparing the functions (cont.)
Notes
> ma.fit(heron,q=3,include.mean=TRUE)
Call:
arima(x = ts(X, 1, length(X)), order = c(0, 0, q), include.mean = TRUE)
Coefficients:
ma1
ma2
1.0872 0.7444
s.e. 0.0958 0.1160
ma3
0.3671
0.1083
sigma^2 estimated as 0.5029:
intercept
579.0086
0.2266
log likelihood = -106.06,
aic = 222.13
> arma.fit(heron,p=0,q=3,include.mean=TRUE)
Call:
arima(x = ts(X, 1, length(X)), order = c(p, 0, q), include.mean = TRUE)
Coefficients:
ma1
ma2
1.0872 0.7444
s.e. 0.0958 0.1160
ma3
0.3671
0.1083
sigma^2 estimated as 0.5029:
intercept
579.0086
0.2266
log likelihood = -106.06,
aic = 222.13
Example 4: Comparing the model ACFs
Notes
1.0
Lake Heron ACF
0.4
0.2
0.0
−0.2
ACF
0.6
0.8
ARMA(1,1)
AR(2)
MA(3)
0
5
10
15
Lag
20
25
30
Next lecture...
Notes
What’s the forecast?
Prediction from stationary ARMA models.
Notes