ApEc 8212: Econometric Analysis II -- Lecture #4 Instrumental Variables (Part 1) Instrumental variable (IV) methods are used to deal with problems of omitted variable bias, measurement error and simultaneity. They are used very often in applied econometrics (~75 out of the 140 empirical papers in AER, 2003-2007). This lecture and the next lecture present a detailed analysis of IV methods. I. Instrumental Variables: 1 Endogenous Variable Of the 3 assumptions of the single-equation linear model, the most troublesome is Assumption OLS.1, that is the assumption that E[xu] = 0. The reason for this trouble is: 1. It is likely to be violated. 2. If violated, then βˆ OLS, the OLS estimate of , will be biased and inconsistent (this is not true when Assumptions OLS.3 is violated). 3. It is not easy to test whether it has been violated. The most commonly used method to get around the problem that E[xu] ≠ 0 is instrumental variables. 1 To see the need for IV methods, consider the linear population model: y = x′β + u, E[u] = 0, Suppose that Cov(u, xj) = 0 for j = 1, 2, …K-1, but it may be that Cov(u, xK) ≠ 0. That is, assume the 1st K-1 variables in the x vector are exogenous, but xK may be endogenous. One example is omitted variable bias: an omitted variable q is part of u and is correlated with xK. In Lecture 3 we saw that if Cov(u, xK) ≠ 0 then all the elements in β are likely to be inconsistent. Instrumental variable (IV) methods work by introducing another variable (an “instrument”), call it z1, that satisfies two fundamental conditions: 1. Cov(z1, u) = 0 (“exclusion restriction”) 2. Cov[z1, xK| L[xK| 1, x1, … xK-1]] ≠ 0 The first condition amounts to requiring that z1 be exogenous with respect to the error term in the (structural) equation of interest. It does not “belong” in that equation and thus can be “excluded”). The second condition says that z1 must have some “explanatory power” for xK beyond what is obtained 2 from a linear projection of xK on the other variables in x. An equivalent (and perhaps more intuitive) way to state this is to generate a linear projection of xK on the other variables in x and on z1, which yields: xK = δ0 + δ1x1 + … δK-1xK-1 + θ1z1 + rK, E[rK| x1, … xK-1, z1] = 0 The condition amounts to requiring that θ1 ≠ 0. Some people express the 2nd condition as: “z1 should be correlated with xK”. But this is not enough; instead, z1 must have explanatory power for xK in addition to the explanatory power of a linear projection of xK on the other x variables. This is a partial correlation condition. The above linear projection of xK on z1 and the other elements of x is the reduced form equation for xK (we’ll see this terminology when we discuss estimation of systems of equations). It is also called the first-stage equation for xK (this terminology is explained below). The reduced form equation always requires (assumes) that the error term in the equation of interest (i.e. u) is uncorrelated with the regressors in the reduced form. You can also estimate a reduced form equation for y. Just regress y on x1, … xK-1 and z1. OLS estimation yields (plug the equation of xK into the y = x′β + u): y = α0 + α1x1 + αK-1xK-1 + λ1z1 + v 3 where v = u + βKrK, αj = βj + βKδj (j = 1, 2, … K-1) and λ1 = βKθ1. (Consistency requires Cov[z1, u] = 0.) Returning to the equation of interest, y = x′β + u, any IV estimate of that equation must solve the identification problem. That is, we want to show that we can “recover” (estimate) β using the estimates we have that are generated using the instrument(s). In this example with one endogenous variable and one instrument, β is identified. To how this works, let z = (1, x2, … xK-1, z1)′ be the instrument vector (variables not correlated with u can “be their own instruments”). The assumptions thus far imply that: E[z·u] = 0 Premultiplying both sides of y = x′β + u by z, taking expectations of both sides and using y = x′β + u gives: E[zx′]β = E[zy] Note that E[zx′] is K×K and E[zy] is K×1. E[zx′]β = E[zy] is a set of K equations with K unknowns (the K elements of β). These equations have a unique solution only if E[zx′] has full rank, which allows E[zx′] to be inverted to solve for β: β = (E[zx′])-1E[zy] 4 The rank(E[zx′]) = K if and only if θ1 ≠ 0, i.e. if and only if z1 has explanatory power for xK beyond that obtained from a linear projection of xK on the other x variables. This rank assumption is called the rank condition. In practice, the IV estimator for β is: 1 N N ˆβ IV = N 1 z i x i' N 1 z i y i = (Z′X)-1Z′y i 1 i 1 The second equality just expresses βˆ IV in matrix notation (Z and X are N×K matrices and y is an N×1 vector). Consistency can be shown using the law of large numbers; this will be shown formally below for a more general case with many endogenous variables. Going back to the two conditions for IV estimation, the first is hard to test, but the second is easy to test. Examples. Wooldridge gives examples on pp.93-96. The first is a regression of (log) wages on education, experience and experienced2. IQ may be an omitted variable; it could be correlated with education. One possible IV for education is mother’s education, but it may be correlated with (adult) child IQ (high IQ moms have high education and high IQ children). Angrist & Krueger (1991) used quarter of year of birth as an IV for education. Intuition: people born in January – 5 March can drop out of school earlier (if, say, 16, is the legal dropout age) than kids born later in the year. See pp. 93-97 of Wooldridge for a nice discussion. II. More than One Instrumental Variable: 2SLS Consider again the linear model y = x′β + u, E[u] = 0, where Cov(u, xj) = 0 for j = 1, …K-1, & Cov(u, xK) ≠ 0. Suppose we have M instruments for xK: z1, z2, … zM. Assume that Cov(z, u) = 0. If each IV in z1, … zM also meets the partial correlation condition, we could use each of them one at a time to get M estimates of β. In fact, we will see that it is best to use all of these instruments in one regression. The method of doing this is called Two-Stage Least Squares (2SLS). Let z denote (1, x1, … xK-1, z1, z2, … zM)′, an L×1 vector (L = K+M). Basically, 2SLS chooses the linear combination of these instruments that is most strongly correlated with xK. We get this by regressing (taking a linear projection of) xK on z. This leads to: xK = δ0 + δ1x1 + … δK-1xK-1 + θ1z1 + … + θMzM + rK where by definition rK is uncorrelated with every element in z. Define xK* = L[xK| x1, …xK-1, z1, …zM]: 6 xK* = δ0 + δ1x1 + … δK-1xK-1 + θ1z1 + … + θMzM This is uncorrelated with u because all elements in z are uncorrelated with u. Thus all of the correlation of xK with u is due to correlation between rK and xK. We actually never observe xK* because we do not know the δ’s or θ’s, but we can use OLS to estimate it: xˆ K = ˆ 0 + ˆ 1x1 + … + ˆ K-1xK-1 + ˆ 1z1 + … + ˆ MzM With these predictions for xK we can estimate β by OLS, replacing xi with xˆ i (=1, xi1, … xi,K-1, xˆ iK): 1 N ˆβ IV = N xˆ i x i ' N 1 xˆ i y i = ( X ˆ ′X)-1 X ˆ ′y i 1 i 1 1 N Three things to note with this approach: ˆ ′X ˆ )-1 X ˆ ′y 1. Note that βˆ IV can also be written as ( X ˆ = Z(Z′Z)-1Z′X. since X 2. The fact that there are more IVs than there are endogenous variables leads, in effect, to a procedure in which x is first regressed on z, which produces xˆ , and then y is regressed on xˆ . This is why this is called “two-stage least squares”. 3. However, do not use OLS twice to estimate this; that will give incorrect standard errors for the 2nd 7 stage since your software will not know that xˆ is itself an estimate with its own covariance matrix. Finally, we need to specify the rank condition for the case of additional instruments. We still require rank{E[zx΄]} = K. This implies that in the above linear projection xK = δ0 + δ1x1 + … δK-1xK-1 + θ1z1 + … + θMzM + rK it must be the case that at least one of the θ’s is ≠ 0. Intuition: at least one of the z’s must have explanatory power that is not already in x1 through xK-1. You can test the rank condition. The null hypothesis (for 1st-stage regression) is H0: θ1 = θ2 = … θM = 0. Use an Ftest (homoscedasticity) or Wald test (heteroscedasticity). Lastly, some vocabulary. If there are more instruments than there are endogenous x variables (M > 1), then we say that the equation we are interested in, y = x΄β + u, is overidentified, and the number of “extra” instruments (M-1) is the number of overidentifying restrictions. III. Generalizing 2SLS to Multiple Endog. Variables This section presents the most general case of 2SLS, that with many endogenous variables. It also shows that this general estimator is consistent, asymptotically normally distributed and (under certain assumptions) efficient. The model is the same as in Section II, except that more than one x variable can be endogenous. 8 Consistency Other than the assumption of a linear model, we need 2 other assumptions to prove consistency. They are analogous to assumptions OLS.1 & OLS.2 in Lecture 3. Assumption 2SLS.1: The L×1 vector of instruments, z, satisfies E[zu] = 0. Recall that the exogenous x variables can be instruments for themselves. A more restrictive assumption is that E[u| z] = 0. This implies (but is not implied by) E[zu] = 0. Assumption 2SLS.2: Rank E[zz΄] = L Rank E[zx΄] = K. The first rank condition simply says that none of the instruments can be linear combinations of the others. It simply requires common sense. The second rank condition ensures that z has enough instruments so that no variables in xˆ (the predicted values of the K x variables) is a linear combination of other variables in xˆ . If one or both rank conditions do not hold for your data, your software cannot produce an estimate of β. The second condition is often referred to as the rank condition. It must be satisfied to get a 2SLS estimate of β. If it does not hold we say that β is not identified. 9 A necessary (but not sufficient) condition for the 2nd rank condition to hold is the order condition: L ≥ K. Very simply, we need to have at least as many instruments as we have explanatory (x) variables. Note: It is possible (but unlikely) that even if the order condition holds the rank condition does not; an example is, you have enough instruments but for some endogenous x variables the z variables have no predictive power. Wooldridge shows (pp.99-100) how linear projections can demonstrate that 2SLS.1 and 2SLS.2 ensure the β is identified (can be estimated). If rank E[zz΄] = L we can linearly project x on z to get x*′ = z′Π, where Π is the L×K matrix (E[zz΄])-1E[zx΄]. Consistent estimates of Π can be obtained via the first stage regression(s) of the endogenous variable(s) in x on z, which implies that we have consistent estimates of x*. Premultiplying y = x΄β + u by x* and taking expectations gives E[x*y] = E[x*x΄]β (since E[x*u] = 0). Thus we have: β = (E[x*x΄])-1E[x*y] We just need to show that E[x*x΄] is nonsingular: E[x*x΄] = Π΄E[zx΄] = E[xz΄](E[zz΄])-1E[zx΄] The matrix E[xz΄](E[zz΄])-1E[zx΄] is nonsingular if and only if E[zx΄] has a rank of K (a good matrix 10 algebra homework problem), which is the second rank condition of Assumption 2SLS.2. Finally, to get the more general βˆ IV that allows for multiple endogenous variables, replace E[x*x΄] in β = (E[x*x΄])-1E[x*y] with E[xz΄](E[zz΄])-1E[zx΄] and do an analogous substitution for E[x*y] to get β = {E[xz΄](E[zz΄])-1E[zx΄]}-1E[xz΄](E[zz΄])-1E[zy] Estimating these expectations using the data gives: 1 1 N N N βˆ IV = x i z i' z i z i' z i x i' x i z i' z i z i' z i yi i 1 i 1 i 1 i 1 i 1 i 1 N N 1 N This is the same expression given on p.7, it just replaces xˆ with the linear projection of x on z. Theorem 5.1 (Consistency of βˆ IV): This follows from Assumptions 2SLS.1 and 2SLS.2, applying the law of large numbers to every term, and Slutsky’s theorem. Asymptotic Normality of βˆ IV Start with the homoscedasticity case: Assumption 2SLS.3: E[u2zz΄] = σ2E[zz΄], where σ2 = E[u2] 11 This is close to Assumption OLS.3, except the variance of u is assumed constant for different values of z, instead of different values of x. A slightly stronger assumption (which implies Assumption 2SLS.3) is E[u2| z] = σ2, which is equivalent to Var[u| z] = σ2 if E[u| z] = 0. Theorem 5.2 (Asymptotic Normality of βˆ IV): When Assumptions 2SLS.1, 2SLS.2 and 2SLS.3 hold, then: a N ( βˆ IV - β) ~ N(0, σ2{E[xz΄](E[zz΄])-1E[zx΄]}-1) The proof is very similar to the proof of Theorem 4.2. You can estimate σ2{E[xz΄](E[zz΄])-1E[zx΄]}-1 using sample averages. First, a consistent estimate of σ2 is: N ˆ = [1/(N-K)] uˆ i2, 2 i 1 where uˆ i = yi – xi΄ βˆ IV Note that uˆ i is defined using xi, not xˆ i, so these are not the residuals from the second stage regression. Finally, we saw above that E[xz΄](E[zz΄])-1E[zx΄] = E[x*x*΄], so Avar( βˆ IV) is consistently estimated by: N ˆ ΄X ˆ )-1 ˆ [ xˆ i xˆ i΄]-1 = ˆ 2( X 2 i 1 12 Asymptotic Efficiency of βˆ IV Theorem 5.3 (Asymptotic Efficiency of 2SLS): If Assumptions 2SLS.1, 2SLS.2 and 2SLS.3 hold, then βˆ IV is efficient compared to all instrumental variable estimators using instruments linear in z. The proof is on p.103 of Wooldridge. In practice, this means that (under homoscedasticity) you should use all the instruments you have. This is very intuitive because the more instruments you use the more precisely you will estimate xˆ , which will increase the precision of your estimate of β. However, in finite samples using lots of instruments can give misleading results, as will be seen in Lecture 6. So don’t overdo it! IV. Further Useful Results for IV Estimation This section examines hypothesis testing, estimation under heteroscedasticity, and some potential problems. Hypothesis Testing With the estimate of the variance covariance matrix of βˆ IV given above you can construct confidence intervals for the elements of β and you can construct Wald tests to check hypotheses that can be depicted as restrictions on the parameters in β. 13 If you don’t like constructing Wald tests, you can jointly test multiple restrictions by using regression methods. Wooldridge explains this on pp.104-105. You can also use regression methods to construct LM tests, which are useful if it is easier to estimate the restricted model. (Wald tests require estimates of unrestricted models.) This is similar to the discussion of LM test for the standard linear model in Lecture 3. As in that lecture, divide β and x into two sets: y = x1΄β1 + x2΄β2 + u Any linear restriction can be rewritten as an exclusion restriction (example of Cobb-Douglas prod. function), so let’s just examine the hypothesis that β2 = 0. Let ~ β 1 be the (restricted) estimate of β1 by using IV methods, with the instrument set z, to estimate ~ u = y – x1΄ β 1. Let y = x1΄β1 + u. Let the residuals be ~ xˆ 1 and xˆ 2 be the fitted values from regressing x1 and x2, respectively, on z. The LM statistic is: LM = NRu2 u where NRu2 is the uncentered R2 from regressing ~ a on xˆ 1 and xˆ 2. Under H0 (β2 = 0), LM ~ χK22, where K2 is the number of elements in β2. 14 Heteroscedasticity In general, there is little reason to think that Assumption 2SLS.3 (homoscedasticity) holds. Fortunately, the covariance matrix for βˆ IV is easily modified to allow for heteroscedasticity of unknown form. When that assumption is dropped Avar( βˆ IV) can be estimated by: N ˆ ΄X ˆ )-1[ uˆ i2 xˆ i xˆ i΄]( X ˆ ΄X ˆ )-1 (X i 1 You can use a degrees of freedom correction, multiplying this by N/(N-K), but this has no asymptotic effect. Some software packages, e.g. Stata, allow you to select this “robust” covariance matrix for βˆ IV as an option. If you use software without this option, you can get the standard errors for each element of βˆ IV using standard regression methods. See Wooldridge, pp.106-107. To test linear restrictions on the β parameters, use the standard Wald test (see Lecture 3), but use the above ˆ. “robust” estimate of Avar( βˆ IV) for V Lagrange multiplier (LM) tests can also be adjusted to allow for heteroscedasticity. Do this in the same way as above except regress each element in xˆ 2 on xˆ 1. Call the residuals from this regression rˆ 1, rˆ 2, … rˆ K2, and 15 multiply them by the ~ u ’s defined above for the LM test under homoscedasticity. Then regress a constant term on ~ u 1·rˆ 1, … ~ u K2·rˆ K2. The LM statistic is then LM = N – SSR0 where SSR0 is the sum of the squared residuals from this regression. Under H0 (β2 = 0), this is distributed as χK22. Some Additional Cautions and Comments 1. IV methods are, in general, never unbiased (a small sample property). Indeed, if the number of variables in z equals the number of variables in x then E[ βˆ IV] doesn’t even exist! (See Davidson and MacKinnon, 1993, p.222). 2. The better the z variables are at predicting the endogenous x variables, the more precise your estimates of β. In general, “better” z’s increase E[xz΄] and decrease σ2{E[xz΄](E[zz΄])-1E[zx΄]}-1. 3. There are 2 problems with “weak instruments”, that is z variables that are not good at predicting the x variables. This can increase bias in small samples, and if in addition Assumption 2SLS.1 does not hold then asymptotically this can lead to greater inconsistency than standard OLS. This will be discussed in detail in Lecture 5. 16 4. IV estimates are used to overcome omitted variable bias by “instrumenting” the elements in x that are correlated with the omitted variable q. Alternatively, if you have 2 or more “indicators” of q, you can use one (or more) as an instrument for the other. See Wooldridge (pp.112-113). 5. IV estimates can also be used to overcome bias due to measurement error. You will need to find instruments for the variable measured with error. The instruments must not be correlated with the measurement error. See Wooldridge (p.114). VI. The Control Function Approach to Endogeneity (Wooldridge, Chapter 6, Section 2) Another way to use instrumental variables to overcome bias due endogeneity of one or more of the x variables is the control function approach. This method uses the instruments to add extra regressors to the equation of interest that “pull out” of the error term the part that is correlated with one or more of the x variables, so that the rest of the error is uncorrelated with the x variables. To use this, we need to change the notation a little. Let the y variable in the equation of interest be y1, and assume there is one potentially endogenous x variable; call it y2. Call the other x variables z1. The model is: 17 y1 = z1′δ1 + α1y2 + u1 Let there be L1 variables in z1. Denote the set of IVs by z (including all z1 variables, which are instruments for themselves). Assume E[zu1] = 0, and that there is at least one good IV for y2; that is, if we regress y2 on z the coefficient on at least one of the z variables not in z1 (the “identifying instruments”) is not equal to 0. The latter assumption is basically Assumption 2SLS.2. As with 2SLS, this method uses a linear projection of y2 onto z: y2 = z′π2 + v2, E[z΄v2] = 0 Since E[zu1] is assumed to equal zero, the only way that E[y2u1] can equal zero is when E[u1v2] = 0. The correlation (or lack of it) of these 2 error terms can be expressed by a linear projection of u1 on v2: u1 = ρ1v2 + e1 By definition, ρ1 = E[v2u1]/E[v22] and E[v2e1] = 0. Also, E[ze1] = 0 since e1 = u1 – ρ1v2 and we assume that both E[zu1] = 0 and E[zv2] = 0. Inserting this into the equation of interest gives: y1 = z1′δ1 + α1y2 + ρ1v2 + e1 18 Since e1 is uncorrelated with z1, y2 and v2 (why?), this implies that a regression of y1 on z1, y2 and v2 will give consistent estimates of δ1 and α1 (and ρ1). But we don’t observe v2, so how can we do this? The “trick” is to note that v2 = y2 - z′π2, and regressing y2 on z estimates π2. The regression to estimate is: y1 = z1′δ1 + α1y2 + ρ1 vˆ 2 + “error” where vˆ 2 = y2 – z′πˆ 2 and “error” = e1 + ρ1z′(πˆ 2 – π2). It turns out that the OLS estimates of δ1 and α1 are identical to the 2SLS estimates. A disadvantage of the control function approach is that it is rather complicated to derive the correct standard errors for δ1 and α1 (the derivations are in Appendix 6A of Wooldridge). So why use the control function method? There are two reasons: 1. The estimate of ρ1 is a test of the endogeneity of y2. If ρ1 = 0 then E[y2u1] = 0. 2. As will be seen in later lectures, the control function approach is useful for nonlinear models. Question: Why do we need a z variable other than those in z1 to make this work? 19 Final comment: With more than 1 endogenous variable this approach requires more assumptions than 2SLS, but it is also more efficient. See Wooldridge, pp.128-129. VII. Testing for Endogeneity and Valid IV’s (Wooldridge, Chapter 6, Section 3) This section presents some very useful tests to use when you are using, or considering the use of, instrumental variable (IV) methods. Testing whether u is correlated with the x variables If all the x variables are exogenous, i.e. Cov(u,x) = 0, we can use OLS instead of IV methods (assuming no problems with measurement error). This is more convenient, it leads to more precise estimates of β, and (most importantly) it saves you from having to find good IV’s (which can be extremely hard to find). The standard test for exogeneity is the Hausman test (also called Durbin-Wu-Hausman test). The intuition is very simple. For the model y = x′β + u, if E[xu] = 0 then OLS and IV (2SLS) give the same estimates for β. The Hausman test formally tests whether this is the case by constructing a covariance matrix for the difference between the OLS and the IV estimates that can be used to test whether any set of these differences in parameters is (jointly) significantly different from 0. 20 We must check whether βˆ 2SLS - βˆ OLS is significantly different from 0, so we need the asymptotic variance of N (βˆ 2SLS - βˆ OLS). If u is homoscedastic & E[xu] = 0 then: Avar[ N ( βˆ 2SLS - βˆ OLS)] = σ2(E[x*′x*])-1 - σ2(E[x′x])-1 which is simply the difference in the asymptotic variances of βˆ 2SLS and βˆ OLS. Thus the “traditional” version of the Hausman test is: ˆ ′X ˆ )-1 - (X′X)-1]-(βˆ 2SLS - βˆ OLS)/ ˆ 2OLS (βˆ 2SLS - βˆ OLS)′[( X The term in brackets is a “generalized” inverse, which is complicated to compute. A regression-based control function approach is more convenient, and allows for heteroscedasticity. To start, assume only one (potentially) endogenous variable. Call it y2, so we have y1 = z1′δ1 + α1y2 + u1. Thus we want to test whether Cov(y2, u1) = 0. We saw above that this is equivalent to E[v2u1] = 0, which is that ρ1 = 0 in the equation at the top of p.19. Under that null that ρ1 = 0 we do not even have to adjust the standard error in the estimate of ρ1, and we can use the heteroscedasticity-robust estimate of the standard error of ρ1. But if we reject ρ1 = 0 then we have to use the more complicated standard errors given above. 21 In fact, if we have only one potentially endogenous variables and we assume homoscedasticity, the “traditional” Hausman test can be used.: ˆ 1, 2SLS ˆ 1, OLS [se(ˆ 1, 2SLS )]2 [se(ˆ 1,OLS )]2 where [se( )]2 is the associated diagonal element of the covariance matrix. Unfortunately, there is no simple adjustment for heteroscedasticity. This procedure can easily be extended to the case of more than one endogenous variable. Let the model be: y1 = z1δ + y2΄α1 + u1 where y2 is a G1×1 column vector. The reduced form equation for y2 is y2 = Π2z + v2, where Π2 is G1×L matrix and v2 is a G1×1 column vector. Let vˆ 2 denote the predicted residuals from OLS estimation of the reduced form (first stage) equation. Then use these to estimate: y1 = z1δ + y2΄α1 + vˆ 2΄ρ + u1 and do a standard F-test for H0: ρ = 0. If there is heteroscedasticity you will need to do the associated Wald test (see Lecture 3). You could also do an LM test 22 instead of the F-test. See Wooldridge, pp.132-133. See p.134 for the case where some variables are “known” to exogenous, others are “known” to be endogenous, and you want to test endogeneity for a third set. Testing Validity of IVs (Overidentification Test) IV estimates are consistent only if Assumption 2SLS.1 holds: the instrument vector, z, must satisfy E[zu] = 0. It is possible to test this assumption (sometimes called a Sargan test) if the number of “excluded” instruments exceeds the number of variables to be instrumented. To see how this works, start with the equation: y1 = z1΄δ + y2΄α1 + u1 where z1 has L1 variables and y2 has G1 variables. Let the total number of instruments be z = (z1, z2), where z2 contains the L2 excluded instruments. Assume that the model is overidentified (L2 > G1). The basic idea behind overidentification tests is that if all instruments are valid in the sense that they are not correlated with u1, then using different sets of instruments will not lead to different estimates of δ and α1. The simplest way to do this is as follows: Let uˆ 1 be the estimated residuals (estimated u1) when all instruments are used. Regress uˆ 1 on all variables 23 in z. Call the R-squared coefficient for this regression Ru2. Under the null hypothesis that E[zu] = 0 and Assumption 2SLS.3 (homoscedasticity), NRu2 (N is the sample size) is distributed as χQ12 (Q1 = L2 - G1). Note 1: If z or z1 does not contain a constant term then use the uncentered R2. Note 2: Failing this test definitely means that some or all of your instruments are bad. But passing this test does not guarantee that they are good. The test may have low power to reject H0 (that E[zu] = 0) and it implicitly assumes that at least G1 of your instruments are uncorrelated with u (which, if not true, means that this test is invalid). If Assumption 2SLS.3 doesn’t hold, you should use the following “heteroscedasticity robust” method. Obtain yˆ 2 from the first-stage regression (of y2 on z). Let h2 be any Q1×1 subset of z2 (it doesn’t matter which ones you choose). Regress each element of h2 on (z1, yˆ 2) and keep the Q1×1 vector of residuals (call them rˆ 2). Then the test statistic N – SSR0, where SSR0 is the sum of the squared residuals from regressing 1 on uˆ 1· rˆ 2, is distributed as χQ12 under H0. RESET test: not very useful (Wooldridge pp.137-138). Heteroscedasticity test: see Wooldridge pp.138-141. 24
© Copyright 2025 ExpyDoc