Applied Econometrics
Martin Huber
Chair of Applied Econometrics - Evaluation of Public Policies
University of Fribourg
1 / 26
Contents of this lecture
1
Digression: consistency (Wooldridge 5.1, Appendix C.3)
Definition
2
Heteroskedasticity (Wooldridge 8.1-8.4)
Definition
Heteroskedasticity robust inference
Testing heteroskedasticity
Weighted least squares regression
3
Non-random samples (Wooldridge 9.4)
Sample selection based on the independent variables
Sample selection based on the dependent variable
Missing values
Outliers
Conclusion
2 / 26
Consistency (Wooldridge 5.1, Appendix C.3)
βˆn ... estimator of parameter β based on sample size n
βˆn is a consistent estimator, if it holds for every ε > 0 that
Pr (|βˆn − β | > ε) → 0 if n → ∞
⇔
plim(βˆn ) = β
(1)
With increasing sample size, n → ∞, the distribution of βˆn is more and
more concentrated around the true value β : large deviations from the
true value become less and less likely.
Note: The property ‘unbiasedness’ (E (βˆn ) = β ) refers to a given
sample size. Consistency is a property that refers to the distribution of
the estimator when the sample size becomes very (infinitely) large.
Property: g (plim(βˆn )) = plim(g (βˆn ))
3 / 26
Consistency (Wooldridge 5.1, Appendix C.3)
Example: Estimator of the variance of the error term:
yi = β0 + β1 xi + ui
σˆ 2 =
1
ui ∼ N (0, σ 2 )
n
∑ uˆi2
n−2
i =1
ˆ 2) = σ 2
E (σ
ˆ 2) = σ 2
plim(σ
σ˜ 2 =
1
(2)
n
∑ uˆi2
(3)
n i =1
˜ 2) =
E (σ
n−2
n
σ 2 6= σ 2
n−2
˜ 2 ) = plim
plim(σ
|
n
{z
1
(4)
σ2 = σ2
(5)
}
σˆ 2 is unbiased and consistent
σ˜ 2 is biased (as the degrees of freedom adjustment is omitted), but
still consistent
ˆi2 = (n − 2)σ 2
E ∑ni=1 u
(Wooldridge 2.5, equation 2.61)
4 / 26
Consistency (Wooldridge 5.1, Appendix C.3)
5 / 26
Heteroskedasticity (Wooldridge 8.1)
Var (ui |x1 , . . . , xk ) = σi2 6= σ 2
(6)
Error variance not the same for all values of the regressors:
MLR.5 is violated
Assumptions MLR.1-MLR.4 are maintained: OLS is unbiased and
consistent
Due to a violation of MLR.5, standard errors have a different form
than under homoskedasticity, therefore the ‘standard’ variance
estimator is (generally) biased and inconsistent
Without correcting for heteroskedasticity, test statistics (t-tests,
F-tests etc.) have a different distribution so that inference
(p-values, confidence intervals) is incorrect
OLS without correction for heteroskedasticity is no longer efficient
6 / 26
Heteroskedasticity robust inference (Wooldridge 8.2)
Simple regression model:
n
Σ (xi − x¯)ui
βˆ1 = β1 + i =1
Var (βˆ1 )
=
c (βˆ1 )
Var
=
SSTx
n
Σi =1 (xi − x¯)2 σi2
SSTx2
Σni=1 (xi − x¯)2 uˆi 2
SSTx2
(7)
(8)
(9)
Multivariate regression modell:
c (βˆj )
Var
=
Σni=1 rˆij 2 uˆi 2
SSRj2
(10)
rˆij is the ith residual in a regression of xj on all other independent
variables. SSRj is the sum of squared residuals in this regression.
7 / 26
Heteroskedasticity robust inference (Wooldridge 8.2)
c (βˆj ) is not unbiased, but consistent
Var
Alternative variance estimators include a degrees of freedom
correction n/(n − k − 1) (but asymptotically all variance
estimators are equivalent)
q
c (βˆj ) is the heteroskedasticity robust estimator of the
Var
standard error
Tests have the same distribution as before, after replacing the
homoskedastic standard error by the heteroskedasticity robust
(estimator of the) standard error
See Eicker (1967), Huber (1967), White (1980)
8 / 26
9 / 26
Testing heteroskedasticity (Wooldridge 8.3)
Model and null hypothesis:
y
H0 :
= β0 + β1 x1 + . . . + βk xk + u
2
E (u |x1 , . . . , xk ) = σ
2
(11)
(12)
Breusch-Pagan-test for heteroskedasticity:
1
Estimate (11) and compute residuals uˆi
2
Estimate uˆi 2 = δ0 + δ1 x1 + . . . + δk xk + v
3
Test H0 : δ1 = . . . = δk = 0 by means of F-test for joint
significance for all coefficients
The Breusch-Pagan-test assumes a linear association of the
regressors and the variance of the error term.
10 / 26
Testing heteroskedasticity (Wooldridge 8.3)
White-test for heteroskedasticity:
Model for uˆi additionally includes squares and cross products of
all regressors
The method tests for those forms of heteroskedasticity which
violate the validity of the conventional OLS standard errors
Problem:
Tests may reject the null hypothesis if one of the assumptions
MLR.1-MLR.4 are violated, even if MLR.5 is satisfied
Therefore, the tests may also be regarded as general
specification tests
Recommendation: When in doubt, use heteroskedasticity robust
standard errors!
11 / 26
Weighted least squares regression (Wooldridge 8.4)
In contrast to OLS, weighted least squares (WLS) minimizes the
weighted sum of squared residuals
Less weight is given to observations with a higher error variance
to correct for heteroskedasticity
In contrast, OLS gives each observation the same weight (which
is best when the error variance is always the same)
→ WLS is more precise (smaller standard errors) than OLS in the
case of heteroskedasticity (but both estimators are consistent)
12 / 26
Weighted least squares regression (Wooldridge 8.4)
More concisely, WLS proceeds as follows:
1
Divide yi and 1, xi1 , . . . , xik (for any observation
i in the sample) by
p
2
the heteroskedastic standard error, σi , to obtain normalized
(and thus, homoskedastic) values of the initial observations:
yi∗
∗
xi0
2
q
= yi / σi2 ,
q
q
q
= 1/ σi2 , xi1∗ = xi1 / σi2 , . . . , xik∗ = xik / σi2
∗ , x ∗ , . . . , x ∗ (using all
Run an OLS regression of yi∗ on xi0
i1
ik
observations i in the sample)
In
errors need to be estimated, so that
ppractice, the heteroscedastic
p
ˆi2 rather than the true σi2 are used in the weighted regression
u
(=feasible GLS).
13 / 26
Non-random samples (Wooldridge 9.4)
Violation of MLR.2:
Sample is only drawn from a subgroup of the population of
interest
Parts of the random sample are not/cannot be used
Is the sample still random and representative?
Can the coefficients still be estimated consistently?
14 / 26
Non-random samples (Wooldridge 9.4)
Sample selection based on the independent variables:
Subsample is representative for a part of the population
The true model also applies to this subsample
If MLR.1-MLR.4 hold, then the coefficients are estimated
consistently in the subsample
Problem: The true model is unknown
We cannot check whether the coefficients differ between the
observed subsample and the unobserved part of the population
(e.g. through dummies for various parts of population and
interaction terms)
Due to such potential effect heterogeneities (different coefficients
for different parts of the population), the estimates generally apply
only to the observed subgroup (internal validity), but not
necessarily to the entire population (external validity)
15 / 26
Non-random samples (Wooldridge 9.4)
True model (in which MLR.1-MLR.4 holds):
y = β0 + δ0 1(x1 > x¯1 ) + β1 x1 + δ1 1(x1 > x¯1 )x1 + u
(13)
in unobserved group with x1 ≤ x¯1 :
y = β0 + β1 x1 + u
(14)
in observed group with x1 > x¯1 :
y = (β0 + δ0 ) + (β1 + δ1 ) x1 + u
(15)
| {z } | {z }
α0
α1
it is neither known nor estimable whether δ0 6= 0 or δ1 6= 0, therefore
(αˆ0 , αˆ1 ) mit only be correct for the observed group with x1 > x¯1 .
16 / 26
Non-random samples (Wooldridge 9.4)
Sample selection based on the dependent variable:
Systematic censoring of the values of the dependent variable
generally restricts the location of the regression line and
introduces bias
The reason is that censoring systematically excludes residuals
with particular(ly high or low) values
If the regressors affect the dependent variable, this entails
endogeneity: if xj and y are positively correlated and the sample
is restricted to small y , then large xj must go together with small
residuals
Therefore, Corr (xj , u |y < ymax ) 6= 0 even if Corr (xj , u ) = 0
Never select your sample based on y or some function of y !
17 / 26
Non-random samples (Wooldridge 9.4)
18 / 26
Non-random samples (Wooldridge 9.4)
Sample selection based on the dependent variable:
True model: hwage = 10 + 10 · education + u
Person 1: hourly wage 200, years of education 16: u1 = 30
Person 2: hourly wage 170, years of education 16: u2 = 0
Person 3: hourly wage 140, years of education 16: u3 = −30
If the sample is restricted to observations with hourlywage < 170, then
among those with education = 16 only individuals with u < 0 are
selected: person 3.
19 / 26
Non-random samples (Wooldridge 9.4)
hwage = 10 + 10 · education + u ,
education
9
10
11
12
13
14
15
16
17
18
umin
-70
-70
-70
-70
-70
-70
-70
-70
-70
-70
umax
70
60
50
40
30
20
10
0
-10
-20
u ∼ U (−70, 70)
E (u |education, hwage ≤ 170)
0
-5
-10
-15
-20
-25
-30
-35
-40
-45
20 / 26
Non-random samples (Wooldridge 9.4)
Missing values in variables:
Observations are automatically excluded by many statistical
software packages
Why do missing values occur?
Are missing values likely correlated with the dependent variable?
Missing values in one of the regressors:
Are values of dependent variable and other regressors
systematically different across observations with and without
missing values?
Instead of dropping observations with missing values, it seems
preferable to set the missing variable value to zero and include a
dummy for missingness as additional regressor
Missing values in the dependent variable:
Are values of the regressors systematically different across
observations with and without missing values in the dependent
variable?
Observations cannot be used
21 / 26
Non-random samples (Wooldridge 9.4)
Missing values in a regressor:
Set the missing value to zero and estimate
y = β0 + δ0 1(x1 missing ) + β1 x1 + u
If missing dummy δ0 6= 0, then it cannot be excluded that the
sample is selective w.r.t. the dependent variable y if observations
with missing values in the regressor are dropped
Missing dummy in the regression is a test whether missing values
are problematic
If clear that missing values are exclusively related to independent
variables, then estimate the model also with observations without
missing values for comparison
Results for this group are in any case correct (internal validity),
but for the total sample the interaction term 1(x1 missing )x1 could
be missing
22 / 26
Non-random samples (Wooldridge 9.4)
Outliers:
...have a strong impact on the location of the regression line
Coding errors vs. ‘real’ outliers (why do they occur)
If dropped, is the sample still random and representative?
Are outliers correlated with the dependent variable?
Present results with an without outliers
It might be worth considering median regression rather than
mean regression (both estimate the same parameters under a
symmetric distribution of the error term):
E (y ) = β0 + β1 x
Median(y ) = β0 + β1 x
23 / 26
Non-random samples (Wooldridge 9.4)
Example:
24 / 26
Non-random samples (Wooldridge 9.4)
Scatter plot of the data:
25 / 26
Non-random samples (Wooldridge 9.4)
Conclusion:
Sample selection based on the dependent variable entails the
inconsistency of all regression coefficients
Under sample selection based on the regressor(s), the results
might only be (internally) valid for this part of the population (due
to effect heterogeneity)
Missing values and outliers are not problematic if they are random
and can then be discarded (corresponds to randomly picking a
somewhat smaller sample), but the are very often not random!
→ Detailed information about sample selection and data
generation required!
26 / 26