Review for the final

Review for Econ 611 Final
Topic 1: General regression analysis
1. Conditional moments: conditional mean/variance/skewness/kurtosis
2. Properties of conditional expectation:
(a) Law of iterated expectations:  [ (  )] =  { [ (  ) |]}   ( ) =  [ ( |)] 
n
o
2
(b) Optimality of  ( |) :  ( |) = arg min∈G   () =  [ −  ()] where G ≡ { :
2
[ () ]  ∞}
(c) Variance decomposition: Var( ) =Var( ( |)) +  [Var ( |)] 
3. Best linear LS predictor
h¡
¢2 i
(a)  ∗ = arg min∈R   −  0 
= [ ( 0 )]−1  ( 0  ) is the best linear LS approximation coefficient. By LS projection,  () = 0 where  =  −  ∗0 
(b) Correct specification in conditional mean:  =  0  +  is correctly specified for
 ( |) if  ( |) =  00  for some  0 ∈ R 
(c) If  =  0  +  is correctly specified for  ( |)  then ()  =  00  +  for some  0 ∈ R
−1
where  (|) = 0 a.s., and ()  0 =  ∗ = [ ( 0 )]  ( 0  ) 
4. Multivariate Normality: If  v  ( Σ) = 
µµ
1
2
¶
Σ =
µ
Σ11
Σ012
Σ12
Σ22
¶¶
 then
(a)  +  ∼  ( +  Σ0 )
(b) ( − ) Σ−1 ( − ) v 2 () 
0
(c) 1 ⊥ 2 if and only if (iff) Σ12 = 0
¡
¢
−1
(d) 1 |2 v  1 + Σ12 Σ−1
22 (2 − 2 ) Σ11 − Σ12 Σ22 Σ21 
Topic 2: Classical linear regression models
1. Assumptions.
A.1 (Linearity)  =  0 X +  
A.2 (Strict exogeneity): ( |X) = ( |X1  · · ·  X ) = 0 for  = 1 2 · · ·  
A.3 (Nonsingularity) The rank of X0 X is  with probability 1.
A.4 (Spherical error variance) (ε|X) = 0 and (εε0 |X) =  2  
2. OLS estimation:
0
ˆ≡
ˆ
(a) 
 ≡ arg min() = (Y − X) (Y − X) 
∈R
P
(b) Normal equations: =1 ˆ X = 0 ˆ = ¯ if an intercept is included in the LRM.
P 2
1
 
(c) Estimation of  2 : 2 = −
=1 ˆ
1
3. Basic properties of projection matrix and orthogonal decomposition: Y = ( +  )Y =  Y +
ˆ +ˆ
Y = Y
ε where
−1
 = X(X0 X)
−1
X0  and  =  − X(X0 X)
X0 =  − 
ε = ( ε)0 ( ε) = ε0  ε = Y0  Y
In particular, ˆ
ε =  Y =  (X + ε) =  ε and ˆ
ε0ˆ
4. Goodness of Fit:
(a) If an intercept is included in the regression, then   =  +  and
P ˆ
ˆ 0 0Y
ˆ
( − ˆ )2

Y
=
= P=1
 =

0
0
2
¯
 
Y Y
=1 ( −  )
2
where  0 =  − 1 0 =  −  (0 )
ones.
−1 0
 is the demeaned matrix and  is an  × 1 vector of
(b) When an intercept is included, 2 indicates the sample correlation between  and ˆ :
i2
− ˆ )( − ¯ )
ˆ 0  0 Y)2
(Y
[
ˆ )]2 = P
=
2 = [Corr(
P

0  0 Y)
ˆ
ˆ 0  0 Y)(Y
ˆ
ˆ 2  ( − ¯ )2
(Y
=1 ( −  )
=1
hP

ˆ
=1 (
5. Finite Sample Properties of the OLS Estimators. Assume that the classical Assumptions A.1-A.4
ˆ is
ˆ
ˆ
hold. Then: (a) (|X)
=  ;(b)  (|X)
=  2 (X0 X)−1 ;(c) (Gauss-Markov Theorem) 
BLUE. (d) (2 |X) = 2 
¢
¡
6. Sampling Distribution: A.5 (Normality) ε|X v  0 2  
(a) A useful lemma. (i) If ε ∼  (0 Σ) where Σ is nonsingular, then ε0 Σ−1 ε ∼ 2 () ;
¢
¡
(ii) If ε ∼  0  2  and  is an  ×  projection matrix, then ε0 ε2 ∼ 2 (rank ()) ;
¢
¡
(iii) If ε ∼  0  2    is an  ×  projection matrix, and 0  = 0 then ε0 ε and  0 ε
are independent;
¢
¡
(iv) If ε ∼  0  2    and  are both symmetric, then ε0 ε and ε0 ε are independent
iff  = 0
¢
¡
In fact, if ε ∼  0  2  and  is symmetric, then ε0 ε 2 ∼ 2 (rank ()) iff  is idempotent.
³
´
2
ˆ −  | X v  0  2 (X0 X)−1  (b) (−)
(b) Under Assumptions A.1, A.3 and A.5 hold, (a) 
2
ˆ
| X ∼ 2 ( − )  (c) Conditional on X, 2 ⊥ .
7. Hypothesis Testing:
ˆ
0 −

2 0 (X0 X)−1 
(a) -test: 0 : 0  =  versus 1 : 0  6=   ≡ √
(b)  -test: 0 :  =  versus 1 :  6=   ≡
 (  − ) conditional on X
8. Restricted Least Squares:  = 
(a)  =
( − )
 (−)
∼  (  − )
2
1

³
´0 h
i−1 ³
´
−1
ˆ−
ˆ− ∼

2  (X0 X) 0

(b) 0 :  2 =  =   = 0  =
2 (−1)
(1−2 )(−)
∼  ( − 1  − )
¡
¢
9. Generalized Least Squares: A.5∗ (Normality) ε|X v  0 2  (X) 
¡ 0 −1 ¢−1 0 −1
ˆ
(a) 
X
X  Y
 = X 
Topic 3: Large Sample Theory for Linear Regression Models
1. Inequalities:
(a) Chebyshev/Markov inequality:  (kk  ) ≤  kk  
h
i12
2
2

(b) Cauchy-Schwarz inequality: k ( 0 )k ≤  k 0 k ≤  kk  k k
(c) Hölder inequality: If   1   1 and
1

1
+ 1 = 1 then  k 0 k ≤ [ kk ]
 1
(d) Minkowski’s inequality: for any  ≥ 1 [ k +  k ]
 1
≤ [ kk ]
1
[ k k ]
 1
+ [ k k ]


(e) Jensen inequality: If (·) is a convex function, then (()) ≤  [()] 
2. Modes of Convergence:
(a) Convergence in Probability: lim→∞  (k − k ≥ ) = 0 WLLN; Slutsky theorem: If


 →  and  (·) is a continuous function, then  ( ) →  () 
(b) Almost Sure Convergence  (lim→∞  = ) = 1 SLLN. [It is ok to Skip this.]
(c) Convergence in th Mean: lim→∞  (k − k ) = 0 [suffices to know  = 2.]

(d) Convergence in Distribution:  →  Lindeberg-Levy CLT for IID observations.
(e) Basic concepts: Cramér-Wold device, Slustky theorem, CMT: Suppose { } is a sequence

of random R -vectors such that  →  as  → ∞. If  : R → R is continuous, then

( ) → () as  → ∞
√

(f) Delta method: Suppose (ˆ − ) →  (0 Σ), and  : R → R is continuously
differen-i
√ h ˆ

tiable in the neighborhood of  with nonzero derivative at  and  ≤  Then  ( ) − () →
 (0 ΦΣΦ0 )  where Φ =  () 0 is an  ×  matrix.
3. Asymptotic Properties of the OLS Estimator
(a) Basic Assumptions:
A3.1(Linearity):  =  0 X +  
A3.2 (Correct specification) ( |X ) = 0 a.s. with (2 ) =  2  ∞
A3.3 (Nonsingularity)  ≡ (X X0 ) is nonsingular and finite.
A3.4  ≡ (2 X X0 ) is finite and p.d.
A3.5 (Conditional homoskedasticity) (2 |X ) ≡  2 a.s.


ˆ→
 and 2 → 2 
(b) Consistency of the OLS Estimator: Under Assumptions A3.1-A3.3, 
√ ˆ

(c) Asymptotic Normality of OLS Estimator: Under Assumptions A3.1-A3.4, (
− ) →
 (0 −1  −1 )
3
(d) Asymptotic Variance Estimator:
√ ˆ
ˆ = −1 P X X0 
ˆ −1  where 
≡ 2 
Case 1 Conditional homoskedasticity: [
( )

=1
√ ˆ
ˆ −1 ˆ 
ˆ −1 where ˆ = −1 P ˆ2 X X0 
=
Case 2 Conditional heteroskedasticity: [
( )

=1
White’s formula
4. Large Sample Hypothesis Testing: Wald, LR and LM Tests, 0 :  =  Key:
 (0 −1  −1 0 ) under 0 
√

ˆ − ) →
(
(a) Tests under Conditional Homoskedasticity
³
´0 h
i−1 ³
´

−1
ˆ−
ˆ− →
1. Wald test (2 test):  ≡  = 
2 ()
2  (X0 X) 0


under 0  Under 0 :  2 =  3 =  =   = 0 2 → 2 ( − 1) where 2 is the 2 from
the unrestricted model.

2. Likelihood ratio (LR) test:  = 2 ln  = 2 (ln  − ln  ) → 2 () under 0 
ˆ −)0 [ (X0 X)−1
ˆ 0  (X0 X)−1 0 
ˆ  = 12 (
3. Lagrange multiplier (LM) test:  = 
ˆ2
 
0 −1
]

ˆ − ) →
(
2 () under 0 

4.  ≤  ≤  :  = 
³
´

 ln 


³
 −

(b) Tests under Conditional Heteroskedasticity:  =

) → 2 () under 0 

ˆ
´
  = 
³

 −

´
  =
√
ˆ − )0 (
ˆ−
ˆ −1 ˆ 
ˆ −1 0 )−1 √(
(
¡ ¢
5. Testing for Conditional Homoskedasticity: 0 : (2 |X ) =  2 =  2 .
(a) White’s Test:
(b) Breusch-Pagan’s LM Test (Skip?)
Topic 4: Linear Regression Models with Time Series Data
1. Fundamental Concepts in Time Series
(a) Weak Stationarity, Strict Stationarity, and Ergodicity
(b) White noise process: (i)  ( ) = 0; (ii) Var( ) = Γ (0)  ∞; (iii) Cov(  − ) = Γ () = 0
for all  6= 0
(c) Ergodic theorem: Let { } be an ergodic stationary process with  ( ) =  Then ¯ =
P

1
=1  → 

(d) Martingale, Martingale Difference Sequence and Random Walk
(e) CLT for an Ergodic Stationary m.d.s.: Suppose { } is a stationary ergodic m.d.s. with
P

 ( 0 ) =  , a finite and positive definite matrix. Then −12 =1  →  (0  ) 
2. Large Sample Theory for Linear Regression Models with Time Series Processes
(a) Basic Assumptions:
A4.1 (Ergodic stationarity and linearity) {  X } is stationary and ergodic such that
 =  0 X +  
4
A4.2
2 
A4.3
A4.4
A4.5
(Correct specification of the conditional mean) ( |X ) = 0 a.s. and (2 ) =
(Non-singularity)  ≡ (X X0 ) is finite and p.d.
(MDS) {X  } is an m.d.s. and  ≡ (X X0 2 ) is finite and p.d.
(Conditional homoskedasticity) (2 |X ) =  2 a.s.

ˆ →
 and ()
(b) Consistency of the OLS Estimator: Under Assumptions A4.1 - A4.4, () 

2 →  2 

ˆ − ) →
ˆ : Suppose Assumptions A4.1-A4.4 hold, then √(
(c) Asymptotic Normality of 
 (0 −1  −1 )
(d) Asymptotic Variance Estimator
√ ˆ

ˆ −1 →
≡ 2 
2 −1 
Case 1. Conditional homoskedasticity: [
( )
√ ˆ

−1
−1
ˆ ˆ 
ˆ →
Case 2. Conditional heteroskedasticity: [
( ) = 
−1  −1
(e) Hypothesis Testing: 0 :  = .

Case 1. Conditional homoskedasticity:  → 2 () 

ˆ − )0 [
ˆ − ) →
ˆ −1 ˆ 
ˆ −1 0 ]−1 (
Case 2. Conditional heteroskedasticity:  ≡ (
2 () 
(f) Minimum distance test
(g) Testing for Serial Correlation:
• 1. Box-Pierce and Ljung-Box test for serial correlation: Suppose { } is a stationary
m.d.s. and (2 |−1  −2  · · · ) =  2  ∞. Then the Box-Pierce (1970)  statistic  ≡
P
P
√

 =1 
ˆ2 = =1 ( ˆ
 )2 → 2 ()  No need to remember Ljung-Box’s (1979) modified
Q statistic.
2. Breusch-Godfrey (1979, 1978) Test for serial correlation.
3. Large Sample Theory for Linear Regression Model under Both Conditional Heteroskedasticity and
Autocorrelation
(a) Assumption A4.4∗ (Gordin’s condition on ergodic stationary processes) (i)  =
P∞
0
=−∞ Γ() is finite and p.d., where Γ() =Cov(X   X− − ) = (X  − X− ); (ii)
2
P
1

0
2
(X  |X−  −  X−−1  −−1  · · · ) → 0 as  → ∞; (iii) ∞
=0 [(    )]  ∞ where
  = (X  |X− −  X−−1 −−1  · · · ) − (X  |X−−1 −−1  X−−2 −−2  · · · )
(b) Long-run Variance Estimator: basic understanding.

ˆ→

(c) Consistency of OLS: 

ˆ − ) →
ˆ : √(
 (0 −1  −1 ).
(d) Asymptotic Normality of 
h
i

ˆ − ) →
ˆ − )0 
ˆ −1 0 (
ˆ −1 ˆ 
(e) Hypothesis Testing: 0 :  =   = (
2 ()
Topic 5: Additional Topics in Regression Analysis
1. Structural change
2. Partitioned regression
(a) Basic idea and Frisch-Waugh-Lovell Theorem
5
(b) Multicollinearity
3. Model selection criterion: AIC, BIC
4. Ridge regression
(a) Bias and variance calculation, asymptotic property
(b) Proof of the existence Theorem (skip?)
(c) Spectral view of OLS and ridge estimation (skip?)
5. Nonlinear least squares
(a) Consistency (following from the extreme estimation theory)
(b) Asymptotic normality
6. LAD and quantile regression (skip?)
Topic 6: Instrumental Variable Estimation
1. Endogeneity:
(a) Omitted variable
(b) Measurement error
(c) Simultaneity
2. Instrumental Variable Estimation:
(a) Basic assumptions
(b) Rank conditions
(c) 2SLS versus IV estimators
(d) Test of Endogeneity: Hausman test versus Durbin-Wu-Hausman test
Topic 7: GMM
It is OK to skip Sections 7.1 and 7.2.8
1. GMM estimation in the general nonlinear framework
(a) Definition, Basic assumptions
(b) Consistency, Asymptotic Normality, Efficient GMM
(c) Hansen’s -test for over-identifying restrictions
(d) Test a subset of orthogonality conditions
2. GMM estimation in the general nonlinear framework
(a) Consistency, Asymptotic Normality, Efficient GMM
(b) Hansen’s -test for over-identifying restrictions
6
(c) Test a subset of orthogonality conditions
(d) Simplification under m.d.s. and conditional homoskedasticity: Relationship between efficient
GMM and 2SLS estimators in linear regression models
3. Testing for conditional moment restrictions
(a) Optimal estimation in nonlinear regression model: derivation of the optimal IV
(b) Optimal estimation in single-equation nonlinear structural model: derivation of the optimal
IV
7