2
STAT 371 Final Exam Summary
Statistics for Finance I
1
Model Selection and Specification
Problems with X:
1. Suppose that we have an incorrect functional form (p.
OLS and Rβ
112).
• Log-log model: ln Yt = β1 + β2 ln Xt , semi-log model:
(a) Consequences?
ln Yt = β1 + β2 Xt , linear model: Yt = β1 + β2 Xt
i. It could be unbiased and inefficient
ˆ , U is n × k
• β, βˆ is k × 1, Y, Yˆ is n × k, X is n × k, U
ii. The t and F tests are invalid
(b) Detection?
Basic GLRM Framework:
i. The informal test would be to just plot the data.
• βˆOLS = (X t X)−1 X t Y
ii. The formal test is the Ramsey Reset test.
2
• V ar[U ] = σU
I
ˆ =σ
• V ar[β]
ˆu2 (X t X)−1 =
h
RSS
n−k
i
2. Suppose that we are underfitting.
(X t X)−1
(a) Let the true model be Yt = β1 + β2 X2t + β3 X3t + µt
Rβ Framework:
but you omitted X3t in the specification of your
• H0 : Rβ = r, H1 : Rβ 6= r, q := rank(R)
−1
ˆ
• βˆR = βˆ + (X t X)−1 Rt R(X t X)−1 Rt
(r − Rβ)
model.
β3 X3t 6= 0 and V ar[vt ] = β32 V ar(Xt ) + σu2 6= c for a
t t
ˆ t t
ˆt U
ˆ
• U
R R = y y − βR x y
constant c.
• V ar[βˆR ] = σ
ˆ 2 (I − AR)(xt x)−1 (I − AR)t where A =
u t −1 t −1
t −1 t
(x x) R R(x x) R
(b) Consequences?
i. On the least square estimators, the OLS esti-
• We have the following equivalent statements
T SS
mators are biased iff the excluded variable X3t
is correlated with the included variable X2t
= RSS + ESS
(r23 6= 0 )
ˆ tU
ˆ + βˆt X t Y − nY¯ 2
= U
Y t Y − nY¯ 2
ii. The t and F ratios are no longer valid.
ˆ tU
ˆ + βˆt xt y
= U
yt y
(c) Detection?
i. An informal test is to add X3t to your model
Key Statistics:
2
• R =
ESS
T SS ,
βˆ1 −β1
sd(βˆ1 )
• t =
and check if there is a change in R2 . If it goes
¯2 = 1 −
R
RSS/(n−k)
T SS/(n−1)
∼ t(n − k), √
= 1 − (1 −
r−Rβ
σ
ˆ 2 R(X t X)−1 Rt
n−1
R2 ) n−k
up it is relevant.
ii. Another informal test is to add X3t to the model
∼ t(n − k) for
and check the changes in the new estimated co-
q=1
• FStatistic =
ESS/(k−1)
RSS/(n−k)
efficients. If there is a significant change, then
we have a relevant variable.
=∼ F (k − 1, n − k) (ANOVA)
iii. The formal test is the Ramsey Reset test.
• We have for q ≥ 1,
t2 = F
So you mistakenly specified Yt = φ1 +
φ2 X2t + vt , vt = β3 X3t + µt and you get E[vt ] =
=
(Rβˆ − r)t [R(X t X)−1 Rt ](Rβˆ − r)/q
ˆ tU
ˆ )/(n − k)
(U
=
(RSSR − RSSU N )/q
∼ F (q, n − k)
RSSU N /(n − k)
3. Suppose that we are overfitting.
(a) Let the true model be Yt = β1 + β2 X2t + ut but the
mis-specified model be Yt = θ1 + θ2 X2t + θ3 X3t + vt
where X3t is an irrelevant variable.
Special Matrices:
t
−1
X(X X)
(b) Consequences?
t
X = ProjX (·)
i. The least squares estimator of the mis-specified
t
−1
M = (I − ProjX ) = (I − X(X X)
t
X ) = ProjUˆ (·) where M
model are unbiased and consistent but no longer
is idempotent and of rank n − k
efficient.
1
(Central Limit Theorem) Suppose that we have X1 , ..., Xn
ii. The t and F ratios are no longer valid.
i.i.d. r.v.s with mean µ and variance σ 2 . Then,
(c) Detection?
i. The informal tests are the same as above in the
¯ 2 and the esticase of undefitting. However, R
¯ ∼ N (µ, σ 2 /n) =⇒
lim X
√
n→∞
D
n(θˆ − θ) −→ N (0, V )
mated coefficients are note expected to change
Instrumental Variables:
very much.
ii. The more formal test is to test the restriction
• We need to find a matrix Zn×l , l ≥ k such that it satisfies
that θ3 = 0 using either the t test, the F test or
certain properties. These are
the t2 = F statistic.
• E[Z t U ] = 0
Ramsey Reset Test:
• E[Z t X] = ΣZX
This is used to test for an incorrect functional form or for
underfitting.
• We premultiply the observed model by Z t to get:
1. Run OLS and obtain Yˆt and Yˆt will incorporate the true
– Z tY
=
Z t Xβ + Z t U and so
(X t ZZ t X)−1 X t ZZ t Y = (Z t X)−1 Z t Y
functional form or the underfitting (if any exists)
2. Take the unrestricted model
βˆIV
=
– If l = k, p lim βˆ = β + ΣZX · 0 = β (we need
n→∞
Yt = φ0 + φ1 Xt + φ2 Yˆt2 + φ3 Yˆt3 + ... + φk Yˆtk
invertibility of ΣZX )
• Read the notes to understand the various properties and
and use the hypotheses H0 : ∀k, φk = 0, H1 : ∃k, φk 6= 0.
proofs.
Usually k = 3.
3. Compute
• The problems here are:
F =
(RSSR − RSSU N )/q
∼ Fq,n−k
RSSU N /(n − k)
1. The X 0 s are stochastic
2. E[X t ] 6= 0
and reject or don’t reject H0 . If we don’t reject then we
3. The errors 0 s are no longer white noise? (They are.
have an incorrect functional form.
See proof in notes)
Errors in Y :
Two-stages Least Squares:


• We then have the equation Yt = β1 + β2 X2t + ut + ξt 
| {z }
• If l > k, we do a procedure called the two-stages least
t
where we call t composite error.
squares (2SLS):
• The least squares estimators in Yt from above will remain
1. Regress X on Z and obtain a matrix of fitted values
ˆ (Project X onto Z). That is
X
unbiased but no longer efficient (see proof in notes; may
be on the final exam)
ˆ = Z(Z t Z)−1 Z t X
X
Errors in X:
ˆ and obtain βˆ2SLS where βˆ2SLS =
2. Regress Y on X
ˆ t X)
ˆ −1 X
ˆ t Y = (X t Proj X)−1 X t Proj Y
(X
Z
Z
• We have the equation
Y = (X − V )β + U = Xβ + (U − V β ) = Xβ + | {z }
3. We can show that βˆ2SLS = βˆIV . To do this, multiply
by (Z t Z)(Z t Z)−1 in the equation for βˆIV to get
=
• The βˆOLS from above is going to be biased in small
βˆIV
samples and inconsistent in large samples (see proof in
notes; may be on the final exam)
2
=
(X t Z(Z t Z)−1 Z t X)−1 X t Z(Z t Z)−1 Z t Y
=
(X t ProjZ X)−1 X t ProjZ Y = βˆ2SLS
3
Non-Spherical Disturbances
3. Of course we don’t know which of the explanatory variables is causing this, but we have some remedies:
When we have serial correlation and heteroskedasticity on
(a) Test using the White procedure
the error terms, we call these error terms non-spherical disturbances. This is when we have a covariance matrix that
(b) Narrow it down to a specific variable (could be in
is not diagonalized and and has non-zero entries on the off-
the model) or outside the model (one unknown
diagonal elements.
variable)
i. If it is coming from one of the X’s, we can: try
Sources of Heteroskedasticity:
to replace it with a proxy, try to replace it with
(1) Nature of Yt (2) Mis-specification (3) Transformations (4)
Varying coefficients
Mathematical Representation of
a combination of variables, drop it, do some
transformations
σt2 :
ii. It is due to Z (outside of the model), then: you
could have underfitting; raise your specification
(1) σt2 = σ 2 Zth for some h 6= 0 (2) σt2 = α0 + α1 Zt (3)
σt = α0 + α1 Zt (4)
σt2
and try to include that missing relevant variable
= f (Z1 , Z2 , ..., Zn )
Testing for Heteroskedasticity
4. What if you know the exact form of heteroskedasticity?
(a) Use General Least Squares
1. Park Test
i. Example.
(a) Park specified σt2 = σ 2 Xtβ evt for the model Yt =
Suppose that heteroskedasticity is
due to X2t and it is taking the following form:
β1 + β2 Xt + ut .
(b) From here, we linearize the above equation to get
ln σt2
h
σt2 = σ 2 X2t
,h = 2
2
= ln σ + β ln Xt + vt . Since u
ˆt is observed, it
How can we correct for this problem? We use
is a proxy for ut and
the method of Weighted Least Squares, also
2
V ar(ˆ
ut ) = E[(ˆ
ut − 0) ] =
E[ˆ
u2t ]
known as Generalized Least Squares (GLS)
√
A. To do this, we want to “divide by the of
we use ln u
ˆt as a proxy for ln ut . Our new equation
whatever is causing the heteroskedasticity
is then
B. So let’s transform our model as follows
ln u
ˆ2t = ln σ 2 + β ln Xt + vt
Y
β + β2 X2t + β3 X3t + ut
p t = 1
p
2
2
X2t
X2t
where we hope that vt is white noise.
(c) Test the hypothesis that H0 : β = 0 using a t test
We then get
and reject or not reject the null hypothesis. If we
reject, then we have heteroskedasticity.
"
ut
V ar p 2
X2t
2. White Test
#
=
1
2
2 V ar[ut ] = σ
X2t
and this new model is homoskedastic.
(a) Let Yt = β1 + β2 X2t + β3 X3t + ut and regress Y on
the X’s to get a series of u
ˆt
Serial Correlation:
(b) Run the auxiliary regression (stated in R formula
2
2
notation) u
ˆ2t ∼ (X2t + X3t )2 + X2t
+ X3t
1. Problem: Cov(ut , us ) 6= 0 for t 6= s
(c) Compute R2 from the previous regression
2. Sources: P. 162-164 (will be on the final exam)
(d) White showed that asymptotically, the quantity
W = nR2 ∼ χ2 (k − 1) where k is the number of
3. Mathematical Representation:
all the parameters in the auxiliary regression (here
k = 6) If the test statistic is larger than the critical
(a) Let the true model be Yt = β1 +
at α = 5%, k − 1 then we have heteroskedasticity.
2
Pn
i=2
βi Xit +ut such
that E[ut ] = 0, V ar(ut ) = σ and Cov(us , ut ) 6= 0
3
(b) We will only consider the AR(1) (autoregressive 1)
B. Compute ρˆ1 =
process given by
P
ˆt u
ˆt−1
Pu
u
ˆ2t−1
C. Use ρˆ1 for autocorrelation by applying GLS
to get the estimated version of (1)
ut = ρut−1 + ξt
D. Apply D-W to (1)
E. If H0 is accepted, then stop; if H0 is re-
with E[ξt ] = 0, V ar[ξt ] = σξ2 , Cov(ξt , ξs ) = 0 for
jected, go back to (2) using Yt − ρYt−1 as
t 6= s, and |p| < 1
the new proxy for Yt
(c) Remark that the conversion of this form into a gen-
F. Keep iterating until ρˆs ≈ ρˆs−1 and H0 is
eral linear process through the use of forward reP∞
cursion givesut = ξt + k=1 ξt−k ρk . This implies
that E[ut ] = 0, V ar[ut ] =
Cov(ut , ut−s ) =
σξ2
1−ρ2 .
accepted
iii. Remark that the above Iterative Procedure
We also get that
doesn’t also converge very well (it converges to
ρs σξ2
1−ρ2
a random walk) if ρ ≈ 1
4. Test: Durbin-Watson (D-W) [applies only to AR(1)]:
ρˆ =
Pn
ˆt u
ˆt−1
t=2 u
P
n
ˆ2t−1
t=2 u
Pn
(ˆ
u −ˆ
ut−1 )2
t=2
Pnt
2
u
t=1 ˆ t
≈ 2(1−ˆ
ρ) with
P 2
P 2
ˆt−1 ≈ u
ˆt .
due to the fact that u
(a) The d−statistic is d =
4
Maximum Likelihood Estimation
(b) Remark that if: ρ = −1 =⇒ d = 4, ρ = 1 =⇒ d = In MLE, we do the following:
0, ρ = 0 =⇒ d = 2
1. Assume a distribution for Y
(c) According to Durbin and Watson, if d ∈ (dL , dU ) the
test is inconclusive for dL , dU ∈ (0, 2) and similarly
2. Define the pdf of yi as fi (yi |θ) for each i
for a symmetric reflection across ρ = 2 (this other
3. Find the joint pdf of the n realizations, assuming indeQn
pendence, with f (Y |θ) = i=1 fi (yi |θ)
interval is (4 − dU , 4 − dL )). Otherwise we make
conclusions based on the proximity of d. Using this,
we have several tests related to this.
4. Define the likelihood function L(θ|Y ) = f (Y |θ) =
Qn
i=1 fi (yi |θ)
i. Test for autocorrelation (p. 169):
A. H0 : ρ = 0; no autocorrelation, H1 : ρ 6= 0;
5. Take the log of L as l(θ|Y ) = log L(θ|Y )
there exists autocorrelation
6. Find θ through θˆ = argmax{θ∈Θ} l(θ|Y )
B. Calculate d ≈ 2 − 2ˆ
ρ and use the d table to
get dL and dU ; use α and df1 = n, df2 =
MLE and the GLRM:
k−1
C. Reject, not reject, or say the test is incon-
• We define a few matrices:
clusive
1. Score Matrix: S(θ) =
5. Remedies: GLS (Aitken 1936)
(a) Set up: Yt = β1 +
P
∂l
∂θ (k+1)×1
= 0(k+1)×1
2. Hessian Matrix:
βk Xkt + ut , ut = ρut−1 + ξt
∂2l
H(θ) =
=
∂θ∂θ0
(b) Apply D-W and if autocorrelation exists, correct using:
i. Use GLS if ρ is known:
"
∂2l
∂β∂β 0
∂2l
∂σ 2 ∂β
∂2l
∂β∂σ 2
∂2l
∂(σ 2 )2
#
(k+1)×(k+1)
3. Fisher Information Matrix: I(θ) = −E[H(θ)]
A. Set up the equation (1) Yt −ρYt−1 = β1 (1−
ρ)+β2 (X2t −ρX2,t−1 )+...+ξt since (2) ut =
• Working in the GLRM framework (that is Y = Xβ + U ),
ρut−1 + ξt where ξt is white noise.
we will assume that ut ∼ N (0, σ 2 ) for all t. The first
order conditions give us
ii. Cochrane-Orcutt Iterative Procedure if ρ is
not known:
1. βˆM L = (X t X)−1 X t Y = βˆOLS
A. Run OLS on (2) Yt = β1 + ... + βk Xkt + ut
2
2. σ
ˆM
L =
and obtain a series of residuals u
ˆt
4
ˆ tU
ˆ
U
n
• In terms of unbiased-ness:
(a) Remark that LRTStatistic can also be re-written as
1. βˆM L = βˆOLS =⇒ the estimate is unbiased for β
LRTStatistic
=
2. σ
ˆM L =
6 σ
ˆOLS =⇒ σ
ˆM L is biased and E[ˆ
σM L ] =
2
n−k
σ
n
=
• In terms of efficiency,
1. βˆM L = βˆOLS
2 −n/2
σ
ˆR
−2 ln
σ
ˆM L
2 σ
ˆR
ln
= −2 ln (Λ)
σ
ˆM L
(One computation related to the likelihood ratio will be on
=⇒
the final)
V ar[βˆM L ] = V ar[βˆOLS ] =
σ 2 (X t X)−1 and so our estimate is efficient
n−k 2σ 4
2
2. V ar(ˆ
σM
6= σ 2 which means that it
L) = n
n
5
is inefficient and biased.
Basic Sampling Concepts
In sampling, we care about 3 characteristics of the popula-
• In conclusion,
tion:
1. In small samples, βˆM L is unbiased and efficient.
σ
ˆM L is biased and inefficient.
1. Population Total t =
2. In large samples, it can be shown that both estima-
PN
2. Population Mean: Y¯ =
tors are consistent and asymptotically normal (not
shown in this course); that is, θˆM L is a CAN (con-
i=1
1
N
Yi
PN
i=1
Yi =
t
N
3. Population Proportion: p
sistent and asymptotically normal) estimator.
3. We can also show that they achieve the Cramer-Rao
5.1
Simple Random Sampling (SRS)
lower bound (proof will be on the final)
In SRS,
Asymptotic Test using ML (LR test):
1. We use y¯ (the sample mean) to estimate Y¯ . That is, y¯
P
yi and has the
is an estimator for Y¯ . Here, y¯ = n1
Here LR test refers to the likelihood ratio test. The procedure
is as follows:
properties:
1. Start with the unrestricted model:
"
#
βˆM L = (X t X)−1 X t Y
ˆ
ˆ =
ˆ tU
(a) θM L =
where U
ˆ tU
ˆ
U
2
=
σ
ˆM
L
n
y t y − βˆt xt y
(a) E[¯
y ] = Y¯
2
(b) V ar[¯
y ] = (1 − f ) Sn where S 2 is the true population
variance. But S 2 is not known so we use the sample
P
variance s2 = 1
(yi − y¯)2 . Therefore, V d
ar[¯
y] =
2
2. Let’s examine how we use the sample to estimate the
population total. We know that t = N Y¯ and since y¯ is an
2. Then do the same thing with the restricted model:
"
(a) θˆR =
βˆR = βˆM L + (...)
2
σ
ˆR
=
ˆ t UˆR
U
R
n
n
#
estimator for Y¯ , we can use b
t = N Y¯ which will be our
where H0 : r = Rβ
estimator for t. It has the following properties:
n
2 −2 −2
(b) L(θˆR |Y ) = (2πˆ
σR
) e
(a) E[tˆ] = t
2
(b) V ar(tˆ) = N 2 V ar(¯
y ) = N 2 (1 − f ) s2
3. The Likelihood ratio test uses the fact that
LRTStatistic
n−1
(1 − f ) sn .
n
2
−n
2 e− 2
(b) L(θˆM L |Y ) = (2πˆ
σM
L)
h
i
= −2 ln L(θˆR ) − ln(θˆM L )
!
L(θˆR )
= −2 ln
∼ χ2 (q)
ˆ
L(θM L )
3. We skip the estimator, pˆ, for p.
(Assignment 4, Question 7) We are given N = 6, a population
set UIndex = {1, 2, 3, 4, 5, 6} with Yi = {3, 4, 3, 4, 2, 2}.
where H0 : r = Rβ, H1 : r 6= Rβ, LRTCritical = (α = a) We get that the population mean is Y¯i = 3 and the population variance is s2 = 0.8.
5%, q). If LRTStat > LRTCrit then reject H0 .
5
6
3
b) The possible number of SRS’s is
= 20
c) The probability of 1 SRS drawn is 1 over the number of
possible SRS’s. That is
1
20 .
d) The probability distribution of the sample mean is found
as follows. We generate a list of all possible 3 element combinations from Yi and the corresponding estimator values. Use
this information to create the frequency distribution for the
estimator. In this case, the mean has the following distribution:
2
8
4
9
8
7
P y¯ =
=
, P y¯ =
=
, P y¯ =
=
,
3
20
3
20
3
20
P
y¯ =
10
3
=
4
,P
20
y¯ =
11
3
and so E[¯
y ] = 3 = E[Y¯ ] with V ar(¯
y) =
=
P
2
20
(yi − y¯)2 P ri =
0.133.
5.2
Stratified Sampling
(Assignment 4 Question 8) We are given that
Uindex = {1, 2, 3, 4, 5, 6, 7, 8}, Yi = {1, 2, 4, 8, 4, 7, 7, 7}
| {z } | {z }
N1
N2
where N1 and N2 are the first and second stratums respectively. We want to take SRS’s from from stratums:
a) SRS1 of size n1 = 2:
The number of possible SRS1 is
4
2
= 6. We then have:
tˆ = N1 y¯
Sample No.
yi
P (si )
y¯
1
{1, 2}
1/6
1.5
4 × 1.5 = 6
2
{1, 4}
1/6
2.5
4 × 2.5 = 10
3
{1, 8}
1/6
4.5
4 × 4.5 = 18
4
{2, 4}
1/6
3
4 × 3 = 12
5
{2, 8}
1/6
5
4 × 5 = 20
6
{4, 8}
1/6
6
4 × 6 = 24
6