ApEc 8212: Econometric Analysis II -- Lecture #4
Instrumental Variables (Part 1)
Instrumental variable (IV) methods are used to deal
with problems of omitted variable bias, measurement
error and simultaneity. They are used very often in
applied econometrics (~75 out of the 140 empirical
papers in AER, 2003-2007). This lecture and the next
lecture present a detailed analysis of IV methods.
I. Instrumental Variables: 1 Endogenous Variable
Of the 3 assumptions of the single-equation linear
model, the most troublesome is Assumption OLS.1,
that is the assumption that E[xu] = 0.
The reason for this trouble is:
1. It is likely to be violated.
2. If violated, then βˆ OLS, the OLS estimate of , will
be biased and inconsistent (this is not true when
Assumptions OLS.3 is violated).
3. It is not easy to test whether it has been violated.
The most commonly used method to get around the
problem that E[xu] ≠ 0 is instrumental variables.
1
To see the need for IV methods, consider the linear
population model:
y = x′β + u, E[u] = 0,
Suppose that Cov(u, xj) = 0 for j = 1, 2, …K-1, but it
may be that Cov(u, xK) ≠ 0. That is, assume the 1st K-1
variables in the x vector are exogenous, but xK may be
endogenous. One example is omitted variable bias: an
omitted variable q is part of u and is correlated with xK.
In Lecture 3 we saw that if Cov(u, xK) ≠ 0 then all the
elements in β are likely to be inconsistent.
Instrumental variable (IV) methods work by
introducing another variable (an “instrument”), call it
z1, that satisfies two fundamental conditions:
1. Cov(z1, u) = 0 (“exclusion restriction”)
2. Cov[z1, xK| L[xK| 1, x1, … xK-1]] ≠ 0
The first condition amounts to requiring that z1 be
exogenous with respect to the error term in the
(structural) equation of interest. It does not “belong”
in that equation and thus can be “excluded”).
The second condition says that z1 must have some
“explanatory power” for xK beyond what is obtained
2
from a linear projection of xK on the other variables
in x. An equivalent (and perhaps more intuitive) way
to state this is to generate a linear projection of xK on
the other variables in x and on z1, which yields:
xK = δ0 + δ1x1 + … δK-1xK-1 + θ1z1 + rK, E[rK| x1, … xK-1, z1] = 0
The condition amounts to requiring that θ1 ≠ 0.
Some people express the 2nd condition as: “z1 should be
correlated with xK”. But this is not enough; instead, z1
must have explanatory power for xK in addition to the
explanatory power of a linear projection of xK on the
other x variables. This is a partial correlation condition.
The above linear projection of xK on z1 and the other
elements of x is the reduced form equation for xK
(we’ll see this terminology when we discuss estimation
of systems of equations). It is also called the first-stage
equation for xK (this terminology is explained below).
The reduced form equation always requires (assumes)
that the error term in the equation of interest (i.e. u) is
uncorrelated with the regressors in the reduced form.
You can also estimate a reduced form equation for y.
Just regress y on x1, … xK-1 and z1. OLS estimation
yields (plug the equation of xK into the y = x′β + u):
y = α0 + α1x1 + αK-1xK-1 + λ1z1 + v
3
where v = u + βKrK, αj = βj + βKδj (j = 1, 2, … K-1)
and λ1 = βKθ1. (Consistency requires Cov[z1, u] = 0.)
Returning to the equation of interest, y = x′β + u, any
IV estimate of that equation must solve the
identification problem. That is, we want to show
that we can “recover” (estimate) β using the estimates
we have that are generated using the instrument(s).
In this example with one endogenous variable and
one instrument, β is identified. To how this works,
let z = (1, x2, … xK-1, z1)′ be the instrument vector
(variables not correlated with u can “be their own
instruments”). The assumptions thus far imply that:
E[z·u] = 0
Premultiplying both sides of y = x′β + u by z, taking
expectations of both sides and using y = x′β + u gives:
E[zx′]β = E[zy]
Note that E[zx′] is K×K and E[zy] is K×1.
E[zx′]β = E[zy] is a set of K equations with K
unknowns (the K elements of β). These equations
have a unique solution only if E[zx′] has full rank,
which allows E[zx′] to be inverted to solve for β:
β = (E[zx′])-1E[zy]
4
The rank(E[zx′]) = K if and only if θ1 ≠ 0, i.e. if and only
if z1 has explanatory power for xK beyond that obtained
from a linear projection of xK on the other x variables.
This rank assumption is called the rank condition.
In practice, the IV estimator for β is:
1
N
N
ˆβ IV =  N 1  z i x i'   N 1  z i y i  = (Z′X)-1Z′y
 


i 1
i 1
The second equality just expresses βˆ IV in matrix
notation (Z and X are N×K matrices and y is an N×1
vector). Consistency can be shown using the law of
large numbers; this will be shown formally below for
a more general case with many endogenous variables.
Going back to the two conditions for IV estimation,
the first is hard to test, but the second is easy to test.
Examples. Wooldridge gives examples on pp.93-96.
The first is a regression of (log) wages on education,
experience and experienced2. IQ may be an omitted
variable; it could be correlated with education. One
possible IV for education is mother’s education, but it
may be correlated with (adult) child IQ (high IQ moms
have high education and high IQ children). Angrist &
Krueger (1991) used quarter of year of birth as an IV
for education. Intuition: people born in January –
5
March can drop out of school earlier (if, say, 16, is the
legal dropout age) than kids born later in the year. See
pp. 93-97 of Wooldridge for a nice discussion.
II. More than One Instrumental Variable: 2SLS
Consider again the linear model
y = x′β + u, E[u] = 0,
where Cov(u, xj) = 0 for j = 1, …K-1, & Cov(u, xK) ≠ 0.
Suppose we have M instruments for xK: z1, z2, … zM.
Assume that Cov(z, u) = 0. If each IV in z1, … zM
also meets the partial correlation condition, we could
use each of them one at a time to get M estimates of
β. In fact, we will see that it is best to use all of these
instruments in one regression. The method of doing
this is called Two-Stage Least Squares (2SLS).
Let z denote (1, x1, … xK-1, z1, z2, … zM)′, an L×1
vector (L = K+M). Basically, 2SLS chooses the
linear combination of these instruments that is most
strongly correlated with xK. We get this by regressing
(taking a linear projection of) xK on z. This leads to:
xK = δ0 + δ1x1 + … δK-1xK-1 + θ1z1 + … + θMzM + rK
where by definition rK is uncorrelated with every
element in z. Define xK* = L[xK| x1, …xK-1, z1, …zM]:
6
xK* = δ0 + δ1x1 + … δK-1xK-1 + θ1z1 + … + θMzM
This is uncorrelated with u because all elements in z
are uncorrelated with u. Thus all of the correlation of
xK with u is due to correlation between rK and xK.
We actually never observe xK* because we do not
know the δ’s or θ’s, but we can use OLS to estimate it:
xˆ K = ˆ 0 + ˆ 1x1 + … + ˆ K-1xK-1 + ˆ 1z1 + … + ˆ MzM
With these predictions for xK we can estimate β by
OLS, replacing xi with xˆ i (=1, xi1, … xi,K-1, xˆ iK):
1
N
ˆβ IV =  N  xˆ i x i '   N 1  xˆ i y i  = ( X
ˆ ′X)-1 X
ˆ ′y

 

i 1
i 1
1
N
Three things to note with this approach:
ˆ ′X
ˆ )-1 X
ˆ ′y
1. Note that βˆ IV can also be written as ( X
ˆ = Z(Z′Z)-1Z′X.
since X
2. The fact that there are more IVs than there are
endogenous variables leads, in effect, to a
procedure in which x is first regressed on z, which
produces xˆ , and then y is regressed on xˆ . This is
why this is called “two-stage least squares”.
3. However, do not use OLS twice to estimate this;
that will give incorrect standard errors for the 2nd
7
stage since your software will not know that xˆ is
itself an estimate with its own covariance matrix.
Finally, we need to specify the rank condition for the
case of additional instruments. We still require
rank{E[zx΄]} = K. This implies that in the above linear
projection xK = δ0 + δ1x1 + … δK-1xK-1 + θ1z1 + … +
θMzM + rK it must be the case that at least one of the θ’s
is ≠ 0. Intuition: at least one of the z’s must have
explanatory power that is not already in x1 through xK-1.
You can test the rank condition. The null hypothesis (for
1st-stage regression) is H0: θ1 = θ2 = … θM = 0. Use an Ftest (homoscedasticity) or Wald test (heteroscedasticity).
Lastly, some vocabulary. If there are more instruments
than there are endogenous x variables (M > 1), then we
say that the equation we are interested in, y = x΄β + u, is
overidentified, and the number of “extra” instruments
(M-1) is the number of overidentifying restrictions.
III. Generalizing 2SLS to Multiple Endog. Variables
This section presents the most general case of 2SLS,
that with many endogenous variables. It also shows
that this general estimator is consistent, asymptotically
normally distributed and (under certain assumptions)
efficient. The model is the same as in Section II, except
that more than one x variable can be endogenous.
8
Consistency
Other than the assumption of a linear model, we need 2
other assumptions to prove consistency. They are
analogous to assumptions OLS.1 & OLS.2 in Lecture 3.
Assumption 2SLS.1: The L×1 vector of instruments,
z, satisfies E[zu] = 0.
Recall that the exogenous x variables can be instruments
for themselves. A more restrictive assumption is that
E[u| z] = 0. This implies (but is not implied by) E[zu] = 0.
Assumption 2SLS.2:
Rank E[zz΄] = L
Rank E[zx΄] = K.
The first rank condition simply says that none of the
instruments can be linear combinations of the others.
It simply requires common sense. The second rank
condition ensures that z has enough instruments so
that no variables in xˆ (the predicted values of the K x
variables) is a linear combination of other variables in
xˆ . If one or both rank conditions do not hold for your
data, your software cannot produce an estimate of β.
The second condition is often referred to as the rank
condition. It must be satisfied to get a 2SLS estimate
of β. If it does not hold we say that β is not identified.
9
A necessary (but not sufficient) condition for the 2nd
rank condition to hold is the order condition: L ≥ K.
Very simply, we need to have at least as many
instruments as we have explanatory (x) variables.
Note: It is possible (but unlikely) that even if the order
condition holds the rank condition does not; an example
is, you have enough instruments but for some endogenous x variables the z variables have no predictive power.
Wooldridge shows (pp.99-100) how linear projections
can demonstrate that 2SLS.1 and 2SLS.2 ensure the β
is identified (can be estimated). If rank E[zz΄] = L we
can linearly project x on z to get x*′ = z′Π, where Π is
the L×K matrix (E[zz΄])-1E[zx΄]. Consistent estimates
of Π can be obtained via the first stage regression(s) of
the endogenous variable(s) in x on z, which implies
that we have consistent estimates of x*. Premultiplying
y = x΄β + u by x* and taking expectations gives
E[x*y] = E[x*x΄]β (since E[x*u] = 0). Thus we have:
β = (E[x*x΄])-1E[x*y]
We just need to show that E[x*x΄] is nonsingular:
E[x*x΄] = Π΄E[zx΄] = E[xz΄](E[zz΄])-1E[zx΄]
The matrix E[xz΄](E[zz΄])-1E[zx΄] is nonsingular if
and only if E[zx΄] has a rank of K (a good matrix
10
algebra homework problem), which is the second
rank condition of Assumption 2SLS.2.
Finally, to get the more general βˆ IV that allows for
multiple endogenous variables, replace E[x*x΄] in
β = (E[x*x΄])-1E[x*y] with E[xz΄](E[zz΄])-1E[zx΄] and
do an analogous substitution for E[x*y] to get
β = {E[xz΄](E[zz΄])-1E[zx΄]}-1E[xz΄](E[zz΄])-1E[zy]
Estimating these expectations using the data gives:
1
1 N
N
N


βˆ IV =   x i z i'   z i z i'    z i x i'    x i z i'    z i z i'    z i yi 
 i 1
  i 1
  i 1
  i 1

  i 1
 i 1
N
N
1
N
This is the same expression given on p.7, it just replaces
xˆ with the linear projection of x on z.
Theorem 5.1 (Consistency of βˆ IV): This follows from
Assumptions 2SLS.1 and 2SLS.2, applying the law of
large numbers to every term, and Slutsky’s theorem.
Asymptotic Normality of βˆ IV
Start with the homoscedasticity case:
Assumption 2SLS.3: E[u2zz΄] = σ2E[zz΄], where σ2 = E[u2]
11
This is close to Assumption OLS.3, except the variance
of u is assumed constant for different values of z, instead
of different values of x. A slightly stronger assumption
(which implies Assumption 2SLS.3) is E[u2| z] = σ2,
which is equivalent to Var[u| z] = σ2 if E[u| z] = 0.
Theorem 5.2 (Asymptotic Normality of βˆ IV): When
Assumptions 2SLS.1, 2SLS.2 and 2SLS.3 hold, then:
a
N ( βˆ IV - β) ~ N(0, σ2{E[xz΄](E[zz΄])-1E[zx΄]}-1)
The proof is very similar to the proof of Theorem 4.2.
You can estimate σ2{E[xz΄](E[zz΄])-1E[zx΄]}-1 using
sample averages. First, a consistent estimate of σ2 is:
N
ˆ = [1/(N-K)]  uˆ i2,
2
i 1
where uˆ i = yi – xi΄ βˆ IV
Note that uˆ i is defined using xi, not xˆ i, so these are
not the residuals from the second stage regression.
Finally, we saw above that E[xz΄](E[zz΄])-1E[zx΄] =
E[x*x*΄], so Avar( βˆ IV) is consistently estimated by:
N
ˆ ΄X
ˆ )-1
ˆ [  xˆ i xˆ i΄]-1 = ˆ 2( X
2
i 1
12
Asymptotic Efficiency of βˆ IV
Theorem 5.3 (Asymptotic Efficiency of 2SLS): If
Assumptions 2SLS.1, 2SLS.2 and 2SLS.3 hold, then
βˆ IV is efficient compared to all instrumental variable
estimators using instruments linear in z.
The proof is on p.103 of Wooldridge. In practice, this
means that (under homoscedasticity) you should use all
the instruments you have. This is very intuitive because
the more instruments you use the more precisely you
will estimate xˆ , which will increase the precision of
your estimate of β. However, in finite samples using
lots of instruments can give misleading results, as will
be seen in Lecture 6. So don’t overdo it!
IV. Further Useful Results for IV Estimation
This section examines hypothesis testing, estimation
under heteroscedasticity, and some potential problems.
Hypothesis Testing
With the estimate of the variance covariance matrix
of βˆ IV given above you can construct confidence
intervals for the elements of β and you can construct
Wald tests to check hypotheses that can be depicted
as restrictions on the parameters in β.
13
If you don’t like constructing Wald tests, you can
jointly test multiple restrictions by using regression
methods. Wooldridge explains this on pp.104-105.
You can also use regression methods to construct LM
tests, which are useful if it is easier to estimate the
restricted model. (Wald tests require estimates of
unrestricted models.) This is similar to the discussion
of LM test for the standard linear model in Lecture 3.
As in that lecture, divide β and x into two sets:
y = x1΄β1 + x2΄β2 + u
Any linear restriction can be rewritten as an exclusion
restriction (example of Cobb-Douglas prod. function),
so let’s just examine the hypothesis that β2 = 0. Let
~
β 1 be the (restricted) estimate of β1 by using IV
methods, with the instrument set z, to estimate
~
u = y – x1΄ β 1. Let
y = x1΄β1 + u. Let the residuals be ~
xˆ 1 and xˆ 2 be the fitted values from regressing x1 and
x2, respectively, on z. The LM statistic is:
LM = NRu2
u
where NRu2 is the uncentered R2 from regressing ~
a
on xˆ 1 and xˆ 2. Under H0 (β2 = 0), LM ~ χK22, where
K2 is the number of elements in β2.
14
Heteroscedasticity
In general, there is little reason to think that Assumption
2SLS.3 (homoscedasticity) holds. Fortunately, the
covariance matrix for βˆ IV is easily modified to allow for
heteroscedasticity of unknown form. When that
assumption is dropped Avar( βˆ IV) can be estimated by:
N
ˆ ΄X
ˆ )-1[  uˆ i2 xˆ i xˆ i΄]( X
ˆ ΄X
ˆ )-1
(X
i 1
You can use a degrees of freedom correction, multiplying this by N/(N-K), but this has no asymptotic effect.
Some software packages, e.g. Stata, allow you to select
this “robust” covariance matrix for βˆ IV as an option. If
you use software without this option, you can get the
standard errors for each element of βˆ IV using standard
regression methods. See Wooldridge, pp.106-107.
To test linear restrictions on the β parameters, use the
standard Wald test (see Lecture 3), but use the above
ˆ.
“robust” estimate of Avar( βˆ IV) for V
Lagrange multiplier (LM) tests can also be adjusted to
allow for heteroscedasticity. Do this in the same way
as above except regress each element in xˆ 2 on xˆ 1. Call
the residuals from this regression rˆ 1, rˆ 2, … rˆ K2, and
15
multiply them by the ~
u ’s defined above for the LM
test under homoscedasticity. Then regress a constant
term on ~
u 1·rˆ 1, … ~
u K2·rˆ K2. The LM statistic is then
LM = N – SSR0
where SSR0 is the sum of the squared residuals from this
regression. Under H0 (β2 = 0), this is distributed as χK22.
Some Additional Cautions and Comments
1. IV methods are, in general, never unbiased (a
small sample property). Indeed, if the number
of variables in z equals the number of variables
in x then E[ βˆ IV] doesn’t even exist! (See
Davidson and MacKinnon, 1993, p.222).
2. The better the z variables are at predicting the
endogenous x variables, the more precise your
estimates of β. In general, “better” z’s increase
E[xz΄] and decrease σ2{E[xz΄](E[zz΄])-1E[zx΄]}-1.
3. There are 2 problems with “weak instruments”,
that is z variables that are not good at predicting
the x variables. This can increase bias in small
samples, and if in addition Assumption 2SLS.1
does not hold then asymptotically this can lead
to greater inconsistency than standard OLS.
This will be discussed in detail in Lecture 5.
16
4. IV estimates are used to overcome omitted
variable bias by “instrumenting” the elements in
x that are correlated with the omitted variable q.
Alternatively, if you have 2 or more “indicators”
of q, you can use one (or more) as an instrument
for the other. See Wooldridge (pp.112-113).
5. IV estimates can also be used to overcome bias due
to measurement error. You will need to find
instruments for the variable measured with error.
The instruments must not be correlated with the
measurement error. See Wooldridge (p.114).
VI. The Control Function Approach to Endogeneity
(Wooldridge, Chapter 6, Section 2)
Another way to use instrumental variables to overcome
bias due endogeneity of one or more of the x variables
is the control function approach. This method uses the
instruments to add extra regressors to the equation of
interest that “pull out” of the error term the part that is
correlated with one or more of the x variables, so that
the rest of the error is uncorrelated with the x variables.
To use this, we need to change the notation a little.
Let the y variable in the equation of interest be y1, and
assume there is one potentially endogenous x variable;
call it y2. Call the other x variables z1. The model is:
17
y1 = z1′δ1 + α1y2 + u1
Let there be L1 variables in z1. Denote the set of IVs
by z (including all z1 variables, which are instruments
for themselves). Assume E[zu1] = 0, and that there is
at least one good IV for y2; that is, if we regress y2 on
z the coefficient on at least one of the z variables not
in z1 (the “identifying instruments”) is not equal to 0.
The latter assumption is basically Assumption 2SLS.2.
As with 2SLS, this method uses a linear projection of
y2 onto z:
y2 = z′π2 + v2,
E[z΄v2] = 0
Since E[zu1] is assumed to equal zero, the only way
that E[y2u1] can equal zero is when E[u1v2] = 0. The
correlation (or lack of it) of these 2 error terms can be
expressed by a linear projection of u1 on v2:
u1 = ρ1v2 + e1
By definition, ρ1 = E[v2u1]/E[v22] and E[v2e1] = 0.
Also, E[ze1] = 0 since e1 = u1 – ρ1v2 and we assume
that both E[zu1] = 0 and E[zv2] = 0.
Inserting this into the equation of interest gives:
y1 = z1′δ1 + α1y2 + ρ1v2 + e1
18
Since e1 is uncorrelated with z1, y2 and v2 (why?), this
implies that a regression of y1 on z1, y2 and v2 will
give consistent estimates of δ1 and α1 (and ρ1). But
we don’t observe v2, so how can we do this?
The “trick” is to note that v2 = y2 - z′π2, and regressing
y2 on z estimates π2. The regression to estimate is:
y1 = z1′δ1 + α1y2 + ρ1 vˆ 2 + “error”
where vˆ 2 = y2 – z′πˆ 2 and “error” = e1 + ρ1z′(πˆ 2 – π2).
It turns out that the OLS estimates of δ1 and α1 are
identical to the 2SLS estimates. A disadvantage of
the control function approach is that it is rather
complicated to derive the correct standard errors for
δ1 and α1 (the derivations are in Appendix 6A of
Wooldridge). So why use the control function
method? There are two reasons:
1. The estimate of ρ1 is a test of the endogeneity of
y2. If ρ1 = 0 then E[y2u1] = 0.
2. As will be seen in later lectures, the control
function approach is useful for nonlinear models.
Question: Why do we need a z variable other than
those in z1 to make this work?
19
Final comment: With more than 1 endogenous variable
this approach requires more assumptions than 2SLS, but
it is also more efficient. See Wooldridge, pp.128-129.
VII. Testing for Endogeneity and Valid IV’s
(Wooldridge, Chapter 6, Section 3)
This section presents some very useful tests to use
when you are using, or considering the use of,
instrumental variable (IV) methods.
Testing whether u is correlated with the x variables
If all the x variables are exogenous, i.e. Cov(u,x) = 0,
we can use OLS instead of IV methods (assuming no
problems with measurement error). This is more
convenient, it leads to more precise estimates of β,
and (most importantly) it saves you from having to
find good IV’s (which can be extremely hard to find).
The standard test for exogeneity is the Hausman test
(also called Durbin-Wu-Hausman test). The intuition
is very simple. For the model y = x′β + u, if E[xu] = 0
then OLS and IV (2SLS) give the same estimates for β.
The Hausman test formally tests whether this is the
case by constructing a covariance matrix for the
difference between the OLS and the IV estimates that
can be used to test whether any set of these differences
in parameters is (jointly) significantly different from 0.
20
We must check whether βˆ 2SLS - βˆ OLS is significantly
different from 0, so we need the asymptotic variance of
N (βˆ 2SLS - βˆ OLS). If u is homoscedastic & E[xu] = 0 then:
Avar[ N ( βˆ 2SLS - βˆ OLS)] = σ2(E[x*′x*])-1 - σ2(E[x′x])-1
which is simply the difference in the asymptotic
variances of βˆ 2SLS and βˆ OLS.
Thus the “traditional” version of the Hausman test is:
ˆ ′X
ˆ )-1 - (X′X)-1]-(βˆ 2SLS - βˆ OLS)/ ˆ 2OLS
(βˆ 2SLS - βˆ OLS)′[( X
The term in brackets is a “generalized” inverse,
which is complicated to compute.
A regression-based control function approach is more
convenient, and allows for heteroscedasticity. To start,
assume only one (potentially) endogenous variable.
Call it y2, so we have y1 = z1′δ1 + α1y2 + u1. Thus we
want to test whether Cov(y2, u1) = 0. We saw above
that this is equivalent to E[v2u1] = 0, which is that
ρ1 = 0 in the equation at the top of p.19. Under that
null that ρ1 = 0 we do not even have to adjust the
standard error in the estimate of ρ1, and we can use the
heteroscedasticity-robust estimate of the standard error
of ρ1. But if we reject ρ1 = 0 then we have to use the
more complicated standard errors given above.
21
In fact, if we have only one potentially endogenous
variables and we assume homoscedasticity, the
“traditional” Hausman test can be used.:
ˆ 1, 2SLS  ˆ 1, OLS
[se(ˆ 1, 2SLS )]2  [se(ˆ 1,OLS )]2
where [se( )]2 is the associated diagonal element of
the covariance matrix. Unfortunately, there is no
simple adjustment for heteroscedasticity.
This procedure can easily be extended to the case of
more than one endogenous variable. Let the model be:
y1 = z1δ + y2΄α1 + u1
where y2 is a G1×1 column vector. The reduced form
equation for y2 is y2 = Π2z + v2, where Π2 is G1×L
matrix and v2 is a G1×1 column vector.
Let vˆ 2 denote the predicted residuals from OLS
estimation of the reduced form (first stage) equation.
Then use these to estimate:
y1 = z1δ + y2΄α1 + vˆ 2΄ρ + u1
and do a standard F-test for H0: ρ = 0. If there is
heteroscedasticity you will need to do the associated
Wald test (see Lecture 3). You could also do an LM test
22
instead of the F-test. See Wooldridge, pp.132-133. See
p.134 for the case where some variables are “known” to
exogenous, others are “known” to be endogenous, and
you want to test endogeneity for a third set.
Testing Validity of IVs (Overidentification Test)
IV estimates are consistent only if Assumption 2SLS.1
holds: the instrument vector, z, must satisfy E[zu] = 0.
It is possible to test this assumption (sometimes called
a Sargan test) if the number of “excluded” instruments
exceeds the number of variables to be instrumented.
To see how this works, start with the equation:
y1 = z1΄δ + y2΄α1 + u1
where z1 has L1 variables and y2 has G1 variables.
Let the total number of instruments be z = (z1, z2),
where z2 contains the L2 excluded instruments.
Assume that the model is overidentified (L2 > G1).
The basic idea behind overidentification tests is that
if all instruments are valid in the sense that they are
not correlated with u1, then using different sets of
instruments will not lead to different estimates of δ
and α1. The simplest way to do this is as follows:
Let uˆ 1 be the estimated residuals (estimated u1) when
all instruments are used. Regress uˆ 1 on all variables
23
in z. Call the R-squared coefficient for this regression
Ru2. Under the null hypothesis that E[zu] = 0 and
Assumption 2SLS.3 (homoscedasticity), NRu2 (N is
the sample size) is distributed as χQ12 (Q1 = L2 - G1).
Note 1: If z or z1 does not contain a constant term
then use the uncentered R2.
Note 2: Failing this test definitely means that some or
all of your instruments are bad. But passing this test
does not guarantee that they are good. The test may
have low power to reject H0 (that E[zu] = 0) and it
implicitly assumes that at least G1 of your
instruments are uncorrelated with u (which, if not
true, means that this test is invalid).
If Assumption 2SLS.3 doesn’t hold, you should use
the following “heteroscedasticity robust” method.
Obtain yˆ 2 from the first-stage regression (of y2 on z).
Let h2 be any Q1×1 subset of z2 (it doesn’t matter
which ones you choose). Regress each element of h2
on (z1, yˆ 2) and keep the Q1×1 vector of residuals (call
them rˆ 2). Then the test statistic N – SSR0, where
SSR0 is the sum of the squared residuals from
regressing 1 on uˆ 1· rˆ 2, is distributed as χQ12 under H0.
RESET test: not very useful (Wooldridge pp.137-138).
Heteroscedasticity test: see Wooldridge pp.138-141.
24