notes style

EC352 Econometric Methods: Week 07
Gordon Kemp
Outline
Contents
1
Heteroskedasticity in Time Series Regressions
1.1 Failure of Standard Conditional Homoskedasticity . . . . . . . . . . . . .
1.2 ARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
2
Advanced Time Series Topics
2.1 Highly Persistent Time Series . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Cointegration and Related Issues . . . . . . . . . . . . . . . . . . . . . .
3
3
6
3
Serial Correlation and OLS under Weak Dependence
3.1 Testing for Serial Correlation . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Remedies for Serial Correlation . . . . . . . . . . . . . . . . . . . . . .
10
11
11
1
Heteroskedasticity in Time Series Regressions
1.1
Failure of Standard Conditional Homoskedasticity
Effects On OLS
• Suppose Var (εt |xt ) varies with xt (so standard conditional homoskedasticity assumption fails).
• Similar consequences to those of serial correlation and to those of conditional heteroskedasticity in cross-section regressions:
– OLS estimates of coefficients remain consistent.
1
– Usual OLS standard errors are invalid
– Usual t-test and F-tests are invalid
Dealing with Conditional Homoskedasticity
• As in cross-section regressions we can go one of two routes:
1. Compute corrected standard errors and implement corrected t-test and F-tests.
Note we can also do this for serial correlation provided that exogeneity still
holds.
2. Computed weighted least squares estimates.
• The usual tests of conditional heteroskedasticity can be applied in time-series regressions but are typically only valid in the absence of serial correlation.
1.2
ARCH
Autoregressive Conditional Heteroskedasticity
• There is a second type of heteroskedasticity can arise specifically in time-series
regression models.
• This type of heteroskedasticity is conditional on the past error terms in the regression.
• The first model of this type was the Autoregressive Conditional Heteroskedasticity
(ARCH) model.
• The first order ARCH model has:
2
Var (ut |ut−1 , ut−2 , . . . ) = α0 + α1 ut−1
• Extensions of this model are widely used in finance for data that exhibits volatility,
e.g., stock returns data.
2
Figure 1: NYSE Returns (Weekly, Jan 1976 to Mar 1989)
Example 12.9 (following Example 11.4)
• Here we regress current NYSE composite index returns on their lagged values and
then examine for ARCH effects in the residuals (the data is from NTSE.dta).
• First regression gives:
\ t=
RETURN
0.180 + 0.059 RETURNt−1
(0.081)
(0.038)
2
n = 689, R2 = 0.0035, R = 0.0020
• Regression of squared residuals:
ubt2 =
2
2.95 + 0.337 ubt−1
+ error
(0.44)
(0.036)
n = 688, R2 = 0.114
2
2.1
Advanced Time Series Topics
Highly Persistent Time Series
A Note on Deterministic Trends
3
• If a variable has a linear trend, then it is necessarily non-stationary.
• However, it can be weakly independent.
• Example:
yt = β0 + β1t + et
where et is iid
• Clearly, E(yt ) depends on time and so is non-stationary.
• However, {yt } is weakly dependent since Cov(yt , yt+h ) = 0 for all h = 1, 2, . . . .
Stochastic Trends
• Not all trends are deterministic: some are stochastic.
• The simplest example of stochastic trend is a random walk:
– The process {yt ;t = 1, 2, . . . } follows a pure random walk if:
yt = yt−1 + et
where et is iid with mean 0 and variance σ 2 .
• Random walks are highly persistent:
yt = yt−1 + et = yt−2 + et−1 + et
= y0 + (e1 + · · · + et−1 + et )
4
Moments
• If y0 is fixed then:
E(yt ) = y0 , Var(yt ) = t · σ 2
so the variance depends on time.
Importance of unit roots
• A change can have very long lasting effects if there is a high persistence.
• Note: We should not confuse trending behavior with highly persistent behavior.
• However, some highly persistent processes can also have a trend:
– For example, random walk with drift:
yt = α0 + yt−1 + et
which implies:
yt = y0 + tα0 + (e1 + · · · + et−1 + et )
so:
E(yt ) = y0 + tα0 , Var(yt ) = t · σ 2
5
Transformations on Highly Persistent Time Series
• If there is high persistence, then we have no weak dependence and OLS is not consistent.
• What can we then do?
• If a process has a unit root, that is, is integrated of order 1 or I(1) for short, then the
first difference of the process is weakly dependent:
4yt = yt − yt−1 = ut , t = 1, 2, . . . ,
where ut is weakly dependent.
• However, deciding whether a time series has a unit root requires that we use techniques such as the Dickey-Fuller test.
Unit Root Tests
• Simplest test of whether yt has a unit root is the Dickey-Fuller (DF) test:
– Regress ∆yt on an intercept term and yt−1 and reject if the t-statistic is below a
critical value from the Dickey-Fuller tables; use Table 18.2 from Wooldridge.
• We can generalize by putting in a time trend, i.e., put in t as an additional regressor;
use Table 18.3 from Wooldridge.
• We can also generalize to allow for short-run dynamics by putting in ∆yt−1 , . . . , ∆yt−p
as additional regressors for suitably chosen p (use Table 18.2 if no time-trend and
Table 18.3 if time trend included): leads to the Augmented Dickey-fuller (ADF) test.
2.2
Cointegration and Related Issues
Problems for OLS
• If the process is not weakly dependent, that is, if there is a strong persistence in the
process, then our sample violates the standard randomness assumption.
• In the case of a random walk, they clearly depend on the initial value.
• This causes problems for OLS.
6
Problems for OLS (continued)
• Suppose that both xt and yt are non-stationary and are not weakly dependent.
• For example, suppose that:
xt = xt−1 + ut
yt = yt−1 + et
where ut and et are iid and jointly independent of one another: thus yt and xt follow
independent random walks.
• If we estimate a linear regression model as follows:
yt = β0 + β1 xt + ηt
then we run into spurious regression problems.
Spurious Regression with I(1) Processes
• Risk of spurious regressions:
– We will tend get high R-squared and significant t-statistics which lead us to
conclude that there is a strong relationship between the two variables when in
fact they are completely unrelated.
• Including a deterministic trend in the regression model will not fix this spurious
regression problem.
Spurious Regression with I(1) Processes (continued)
• What can we do?
– Tests (unit root tests, cointegration tests).
– First-differencing all the variables in the model: indeed, if they are I(1), then
the first-difference is a stationary process (and weakly dependent).
– Cointegrating regressions; error correction models.
7
Cointegration Tests
• Suppose we have identified yt and xt as both being I (1): here xt can be a vector.
• We now wish to test if they are cointegrated (CI) or not: is there a linear combination
of them which is I (0)?
• For this we can use the Engle-Granger test:
1. Regress yt on an intercept term and xt by OLS and let the residuals be uˆt .
2. Regress ∆uˆt on an intercept term and uˆt−1 (together with lags of ∆uˆt if required)
and reject if the t-statistic is below a critical value from the Engle-Granger
tables; use Table 18.4 from Wooldridge.
Cointegration Tests (continued)
• This two-step procedure is similar to the Breusch-Godfrey test for serial correlation:
but it uses quite different critical values.
• As with the Dickey-Fuller test we can also modify by putting in a time trend; use
Table 18.5 from Wooldridge.
– We then need to include a time trend in the initial regression as well as in the
second regression.
Cointegration Tests (continued)
• A different method of testing is the Johansen procedure.
• This treats y and x symmetrically.
• Whereas the Engle-Granger procedure can only test if y is cointegrated with x or
not, the Johansen procedure can be used to assess how many distinct cointegrating
relationships (if any) there are among the variables.
8
Estimation when Cointegration is Absent
• If yt and xt are I (1) but are not CI then we can run a regression in first differences:
∆yt = α0 + γ0 ∆xt + ut
(1)
where γ0 is the partial effect of of a change in the growth rate of xt on the growth
rate of yt .
• Here ∆yt and ∆xt will be I (0).
• Provided that ∆yt and ∆xt satisfy the usual conditions, including the condition that
∆xt and ut are uncorrelated, then we will get consistent asymptotically normal estimates of γ0 .
Estimation when Cointegration is Present
• If yt and xt are I (1) and are CI then we could run the regression in levels:
yt = α + β xt + ut
(2)
(we could include a time-trend as well).
• This would give a consistent estimate of β (the effect of a change in the level of xt
on the level of yt ) even when endogeneity is present.
• The residuals are:
uˆt = yt − αˆ − βˆ xt = ut − (αˆ − α) − (βˆ − β )xt ,
so the residual sum of squares will tend to be minimized when βˆ is close to β since
otherwise (βˆ − β )xt would make a contribution that would tend to explode.
Leads and Lags Estimator
• However, the usual standard errors and asymptotic distribution are not in general
valid because typically there is correlation between ut and ∆xs for some (or all) s
and t.
• One way to handle the possible correlation between correlation between ∆xs and ut
is to include leads and lags of ∆xt into Equation (2).
9
• So, for example, we could include second leads and lags:
yt = α0 + β xt + φ0 ∆xt + φ1 ∆xt−1 + φ2 ∆xt−2
+ γ1 ∆xt+1 + γ2 ∆xt+2 + et ,
and estimate this by OLS.
• Doing this will mop up the correlation between ut and (∆xt−2 , . . . , ∆xt+2 ).
Error Correction Model
• Take the first difference regression of Equation (1) and include the lagged value of
(y − β x) as an additional regressor:
∆yt = α0 + γ0 ∆xt + δ (yt−1 − β xt−1 ) + ut
(we would expect δ < 0).
• This enables us to study the short-run dynamics.
• Engle-Granger
Two-Step Procedure: In practice we do not know β so we include
ˆ
yt−1 − β xt−1 as a regressor where βˆ is a consistent estimator of β .
3
Serial Correlation and OLS under Weak Dependence
Serial Correlation and OLS
• OLS will still be consistent provided that Time-Series Assumptions 1-3 are satisfied.
• However, OLS will be inefficient because it does not use the information of serial
correlation even when the other Gauss Markov assumptions are satisfied (just like
OLS is inefficient under heteroskedasticity).
• OLS fails to take account of how the value of the disturbance for observation (t − 1)
provides information on the likely value of the disturbance for observation t and so
on:
– If the disturbances and the regressors both show positive serial correlation (or
both show negative serial correlation) then the usual OLS formula for the variance of the OLS estimator under-states the true variance.
– Hence, standard errors, t-tests and F-tests are invalid.
10
3.1
Testing for Serial Correlation
Breusch-Godfrey Test for Serial Correlation
• Suppose:
yt = β0 + β1 xt + ut
ut = ρ1 ut−1 + et
where {et } is iid (and independent of {xt }) with mean zero and variance σ 2 so that
the disturbances ut follow an AR(1) process.
• Null hypothesis is H 0 : ρ1 = 0 (i.e. disturbances are iid so no serial correlation).
• Procedure:
1. Run an OLS regression of yt on xt to generate OLS residuals ubt = yt − βb0 − βb1 xt .
2. Run an OLS regression of ubt on xt , ubt−1 .
3. Test significance of ubt−1 in 2nd OLS regression.
3.2
Remedies for Serial Correlation
Remedies
1. Use Feasible Generalized Least Squares
• For example, Cochrane-Orcutt Procedure when errors are AR(1):
– Run an OLS regression of yt on xt to generate OLS residuals ubt = yt −
βb0 − βb1 xt .
– Run an OLS regression of ubt on xt , ubt−1 to generate ρb1 .
– Run an OLS regression on transformed model:
(yt − ρb1 yt−1 ) = (β0 − ρb1 β1 ) + β1 (xt − ρb1 xt−1 )
to get an efficient estimate of β1 .
2. Another approach is to run OLS and then compute autocorrelation robust standard
errors.
• The newey command in Stata will do this (but requries selecting a “lag length”
parameter).
11