NMSA407 Linear Regression

Part VII
Submodel
ˇ Komarek,
´
doc. RNDr. Arnost
Ph.D.
Dept. of Probability and Mathematical Statistics
NMSA407 Linear Regression
VII. Submodel
Winter term 2014–15
Aim: Parsimonious expression of EY
General aim of a regression modelling
The simplest possible expression of the dependence of EY on a set
of covariates (predictors), i.e., on a set
>
>
z1 = z1,1 , . . . , z1,p , . . ., zn = zn,1 , . . . , zn,p .
In the framework of a linear model, we are looking for the “simplest
possible” matrix
Section VII.1

Submodel
Xn×k
 
x>
f0
1
 .  



=  ..  = 
x>
f0
n
z1
..
.
zn
...
..
.
...

fk −1 z1

..

.

fk−1 zn
such that EY = Xβ for some β ∈ Rk .
“Simplest possible” X
≡ the space M(X) of the lowest possible dimension.
w Notion of a submodel.
3
VII. Submodel
1. Submodel
4
VII. Submodel
1. Submodel
Submodel
Back to projection considerations
M X0n×k0 ⊂ M Xn×k ,
Repetition of a definition from Section II.8.
Definition VII.1 (Submodel)
Let Y ∼ Xβ, σ 2 In ≡ model
M, rank Xn×k = r ≤ k < n, (among
other things, EY ∈ M X ). Let X0n×k0 be a matrix of real constants
such that
M X0 ⊂ M(X),
0 < rank X0 = r0 < r ,
Let
P = Q0n×r0 , Q1n×(r −r0 ) , Nn×(n−r )
’
’
’
Notation
Situation that a model M0 is a submodel of model M will be denoted
as
’
M0 ⊂ M.
VII. Submodel
1. Submodel
Projections
be a matrix with the orthonormal basis of the Euclidean space Rn
such that
We say that Y follows a submodel of model M given by the matrix
X0 if
Y ∼ X0 β 0 , σ 2 In .
That is, if among other things EY ∈ M X0 .
5
0 < rank X0 = r0 < r = rank(X) ≤ k < n
Q0 : orthonormal basis of a submodel regression space
M X 0 = M Q0 ;
Q = Q0 , Q1 : orthonormal basis of a model regression space
M X =M Q ;
N : orthonormal basis of a model residual space
⊥
M X =M N ;
N0 = Q1 , N : orthonormal basis of a submodel residual space
⊥
M X0 = M N 0 .
6
VII. Submodel
1. Submodel
Notation
Let y ∈ Rn be a real vector.
w Typically the response vector Y.
Notation for further submodelconsiderations
Let Y ∼ Xβ, σ 2 In , M X0n×k0 ⊂ M Xn×k ,
0 < rank X0 = r0 < r = rank(X) ≤ k < n.
Then
>
>
y = QQ> + NN> y = Q0 Q0 + Q1 Q1 + NN> y
>
>
= Q0 Q0 y + Q1 Q1 y + NN> y
{z
} | {z }
|
u
b
y
>
b 0 = Q0 Q0 > Y :
Y
projection of Y into the regression space of the submodel.
>
= Q0 Q0 y + Q1 Q1 y + NN> y
| {z } |
{z
}
b0
y
u0
b 0 = Q1 Q1 > + NN> Y : residuals of the submodel.
U0 = Y − Y
2
SS0e = U0 : residual sum of squares of the submodel.
’
’
b
y0 : projection of y into M X0 ;
0
νe0 = n − r0 : submodel residual degrees of freedom.
b0
u =y−y .
MS0e =
Let
>
b−y
b0 = u0 − u = Q1 Q1 y.
d = y
7
VII. Submodel
1. Submodel
8
SS0e
: submodel residual mean square.
νe0
VII. Submodel
1. Submodel
On a submodel
On a submodel, continued
Theorem VII.1 (On a submodel, continued)
Theorem VII.1 (On a submodel)
0
0
0
2
5
Let the submodel hold, i.e., let Y ∼ X β , σ In . Then
1
2
b 0 is the best linear unbiased estimator (BLUE) of a vector
Y
parameter µ0 = X0 β 0 .
b
If further Y follows a normal distribution then the statistics Y
and U0 are independent and
SS0e − SSe
r − r0
F0 =
SSe
n−r
The submodel residual mean square MS0e is the unbiased
estimator of the residual variance σ 2 .
SS0e − SSe
νe0 − νe
=
SSe
νe
∼ Fr −r0 , n−r = Fνe0 −νe , νe .
0
3
4
b and U0 are uncorrelated.
Statistics Y
Proof. See the blackboard.
b−Y
b 0 = U0 − U satisfies
A random vector D = Y
2
D = SS0e − SSe .
k
Point (5) of Theorem VII.1 was stated earlier (without proof) as
Theorem II.14.
9
VII. Submodel
1. Submodel
Series of submodels
10
VII. Submodel
1. Submodel
On submodels
Series of submodels
Theorem VII.2 (On submodels)
When looking for a parsimonious expression for EY we often consider
a series of submodels.
Let Y ∼ Nn Xβ, σ 2 In . If the submodel Y ∼ Nn X0 β 0 , σ 2 In of the
submodel Y ∼ Nn X1 β 1 , σ 2 In holds then
The most important properties will be shown in the case of two submodels of a model.
In the following, let X0n×k0 , X1n×k1 , Xn×k , r0 = rank(X0 , r1 = rank(X1 ,
r = rank(X , be matrices of real constants such that
M X 0 ⊂ M X1 ⊂ M X ,
F0,1
SS0e − SS1e
r1 − r0
=
SSe
n−r
SS0e − SS1e
νe0 − νe1
=
SSe
νe
∼ Fr1 −r0 , n−r = Fνe0 −νe1 , νe .
0 < r0 < r1 < r ≤ k < n.
Notation:
b 0 , U0 , SS0 , ν 0 :
Y
e
e
b 1 , U1 , SS1 , ν 1 :
Y
e
e
b U, SSe , νe :
Y,
11
Proof. See the blackboard.
quantities under the (sub)model M0 : Y ∼ X0 β 0 , σ 2 In ;
quantities under the (sub)model M1 : Y ∼ X1 β 1 , σ 2 In ;
quantities under the model M: Y ∼ Xβ, σ 2 In .
VII. Submodel
1. Submodel
12
VII. Submodel
k
1. Submodel
Notation
F-test on a submodel using Theorem VII.1
M X0 ⊂ M X
When it is necessary to state explicitely which two models M1 ⊂ M2
are being compared, model M1 having a residual sum of squares SS1e
and model M2 having a residual sum of squares SS2e , related difference in the sum of squares
(squared
norm of the statistic D) will be
denoted as SS M2 M1 or as SS 2 1 . That is,
SS M2 M1 = SS 2 1 = SS1e − SS2e .
M0 :
M:
Y ∼ Nn X0 β 0 , σ 2 In
Y ∼ Nn Xβ, σ 2 In
VII. Submodel
1. Submodel
H0 :
H1 :
EY ∈ M X0
EY ∈ M X .
Is model M significantly better than model M0 ?
Does the regression space M X provide significantly
better
0
expression for EY over the regression space M X ?
Reject H0 if
SS0e − SSe
r − r0
F0 =
=
SSe
n−r
F0 ≥ Fr −r0 ,n−r (1 − α).
P-value when F0 = f0 :
p = 1 − CDFF , r −r0 ,n−r f0 .
Test statistic:
13
14
VII. Submodel
SS M M0
r − r0
.
SSe
n−r
1. Submodel
F-test on submodels using Theorem VII.2
M X0 ⊂ M X1 ⊂ M X
M0 :
M1 :
Y ∼ Nn X0 β 0 , σ 2 In
Y ∼ Nn X1 β 1 , σ 2 In
H0 :
H1 :
EY ∈ M X0
EY ∈ M X1 .
Is model M1 significantly better than model M0 ?
Does the regression space M X1 provide significantly
better
expression for EY over the regression space M X0 ?
SS0e
15
SS1e
−
r1 − r0
=
=
SSe
n−r
≥ Fr1 −r0 ,n−r (1 − α).
Omitting some columns
SS M1 M0
r1 − r0
.
SSe
n−r
Test statistic:
F0,1
Reject H0 if
F0,1
P-value when F0 = f0 :
p = 1 − CDFF , r1 −r0 ,n−r f0 .
VII. Submodel
Section VII.2
1. Submodel
16
VII. Submodel
2. Omitting some columns
Omitting some columns
Omitting some columns
Notation
The most common couple (model – submodel)
Model M:
Y ∼ Xβ, σ 2 In ,
Submodel M0 : Y ∼ X0 β 0 , σ 2 In ,
Projection matrices into the regression and residual spaces:
− >
−
>
H 0 = X0 X0 X0 X 0 , H = X X > X X> ,
M 0 = I n − H0 ,
M = In − H.
where the submodel matrix X0 is obtained by omitting some columns
from the model matrix X.
w What does the comparison of M and M
0
b 0 = H0 Y,
Projections: Y
U0 = M0 Y,
mean practically?
2
Sums of squares: SS0e = U0 ,
In the following, without the loss of generality, let
X = X0 , X1 ,
0 < rank X
17
0
2
SSe = U .
Effect of omitting the columns from the model matrix:
b−Y
b 0 = U0 − U,
D=Y
2
D = SS0e − SSe .
= r0 < r = rank(X) < n.
VII. Submodel
b = HY,
Y
U = MY.
2. Omitting some columns
Effect of omitting the columns from the model matrix
18
VII. Submodel
2. Omitting some columns
Omitting some columns
Discussion
Theorem VII.3 (Effect of omitting the columns from the model matrix)
Let Xn×k = X0 , X1 , 0 < rank X0 = r0 < r = rank(X) < n.
b−Y
b 0 = U0 − U then satisfies
A random vector D = Y
>
−
>
1
D = M 0 X1 X1 M 0 X 1
X 1 U0 ;
2
2
D = SS0e − SSe = U0 > X1 X1 > M0 X1 − X1 > U0 .
Proof. See the blackboard.
19
VII. Submodel
2
Remember that D equals SS0e − SSe which is the numerator of the
F statistic for comparison of the two models under the normality.
Two extremes:
1
’ M X
⊂ M X0 ,
i.e., all columns of X1 lie in M X0 , matrix X1 does not extend the
regression space beyond M X0 ).
2
⇔ M0 X1 = 0, D = 0.
⇔ r0 = r , which was not allowed by our assumptions.
’
k
2. Omitting some columns
M X1 ⊥ M X0 .
⇒ M 0 X1 = X 1 .
>
>
⇒ X1 U0 = X1 Y.
− >
>
⇒ D = X1 X1 X 1 X1 Y
(estimated EY in a model Y ∼ X1 β 1 , σ 2 In ).
20
VII. Submodel
2. Omitting some columns
Data Cars2004 (subset)
Data Cars2004 (subset)
consumption ∼ fdrive + lweight + fdrive:lweight
Model M : consumption ∼ fdrive + lweight + fdrive:lweight
mInter <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
summary(mInter)
‘
Data on vehicles that were on the U.S. market in 2004.
‘
Only non-hybrid cars with known: consumption, weight, engine size.
‘
n = 409.
‘
Dependence of consumption (Y ) on log(weight) (x1 ) and drive
position (front/rear/4x4) (x2 ).
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-52.8047
2.5266 -20.900 < 2e-16 ***
fdriverear
19.8445
5.1297
3.869 0.000128 ***
fdrive4x4
-12.5366
4.6506 -2.696 0.007319 **
lweight
8.5716
0.3461 24.763 < 2e-16 ***
fdriverear:lweight -2.5890
0.6956 -3.722 0.000226 ***
fdrive4x4:lweight
1.7837
0.6240
2.858 0.004480 **
--Residual standard error: 0.9404 on 403 degrees of freedom
Multiple R-squared: 0.8081, Adjusted R-squared: 0.8057
F-statistic: 339.4 on 5 and 403 DF, p-value: < 2.2e-16
Model M0 : consumption ∼ fdrive + lweight
script
LinRegr-07-01.R
‘
Coefficients:
mAddit <- lm(consumption ~ fdrive + lweight, data = CarsNow)
summary(mAddit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -52.5605
1.9627 -26.780 < 2e-16 ***
fdriverear
0.6964
0.1181
5.897 7.83e-09 ***
fdrive4x4
0.8787
0.1363
6.445 3.29e-10 ***
lweight
8.5381
0.2688 31.762 < 2e-16 ***
---
Follow-up of the analysis shown in LinRegr-05-02.R.
Residual standard error: 0.9726 on 405 degrees of freedom
Multiple R-squared: 0.7937, Adjusted R-squared: 0.7922
F-statistic: 519.5 on 3 and 405 DF, p-value: < 2.2e-16
21
VII. Submodel
2. Omitting some columns
22
VII. Submodel
Data Cars2004 (subset)
Data Cars2004 (subset)
consumption ∼ fdrive + lweight + fdrive:lweight
consumption ∼ fdrive + lweight + fdrive:lweight
2. Omitting some columns
Model M versus model M0
anova(mAddit, mInter)
Series of F-tests in model M
Analysis of Variance Table
Model 1:
Model 2:
Res.Df
1
405
2
403
---
mInter1 <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
anova(mInter1)
consumption ~ fdrive + lweight
consumption ~ fdrive + lweight + fdrive:lweight
RSS Df Sum of Sq
F
Pr(>F)
383.1
356.4 2
26.702 15.097 4.758e-07 ***
Analysis of Variance Table
Response: consumption
Df Sum Sq Mean Sq F value
Pr(>F)
fdrive
2 519.89 259.94 293.935 < 2.2e-16 ***
lweight
1 954.26 954.26 1079.040 < 2.2e-16 ***
fdrive:lweight
2 26.70
13.35
15.097 4.758e-07 ***
Residuals
403 356.40
0.88
---
w F-test based on Theorem VII.1.
anova(mInter)
Analysis of Variance Table
Response: consumption
Df Sum Sq Mean Sq F value
Pr(>F)
fdrive
2 519.89 259.94 293.935 < 2.2e-16 ***
lweight
1 954.26 954.26 1079.040 < 2.2e-16 ***
fdrive:lweight
2 26.70
13.35
15.097 4.758e-07 ***
Residuals
403 356.40
0.88
---
w The last F-test is again comparing models M and M
0
23
VII. Submodel
‘
F-tests based on TheoremVII.2 (on submodels).
‘
What is a sequence of models being tested?
using Theorem VII.1.
2. Omitting some columns
24
VII. Submodel
2. Omitting some columns
Data Cars2004 (subset)
Type I (sequential) ANOVA table
consumption ∼ fdrive + lweight + fdrive:lweight
Order A + B + AB
Effect
Again a series of F-tests in model M
A
Effect
sum of
squares
SS A 0
B
SS A + B A
AB
SS A + B + AB A + B
mInter2 <- lm(consumption ~ lweight + fdrive + fdrive:lweight, data = CarsNow)
anova(mInter2)
Analysis of Variance Table
Response: consumption
Df Sum Sq Mean Sq F value
Pr(>F)
lweight
1 1421.57 1421.57 1607.458 < 2.2e-16 ***
fdrive
2
52.58
26.29
29.726 9.079e-13 ***
lweight:fdrive
2
26.70
13.35
15.097 4.758e-07 ***
Residuals
403 356.40
0.88
---
Residual
νeA+B+AB
SSA+B+AB
e
Effect
mean
square
F-stat.
P-value
MSA+B+AB
e
X
X
The denominator of all F statistics in the table is MSeA+B+AB .
‘
Different F-statistics and p-values compared to the previous slide.
‘
How does this come?
‘
What is a sequence of models being tested?
The row of the effect E shows how much this effect helps to explain the
variability of the response on top of all effects on rows being above.
P-value on row E: significance of the influence of the effect E while
controlling (adjusting) for all effects shown on rows above E.
w Type I (sequential) F-tests.
25
Degrees
of
freedom
The sum of sums of squares shown in the above table gives the total
sum of squares SST .
VII. Submodel
2. Omitting some columns
Type I (sequential) ANOVA table
26
VII. Submodel
2. Omitting some columns
Type I ANOVA table
Order B + A + AB
Effect
B
Effect
sum of
squares
SS B 0
A
SS A + B B
AB
SS A + B + AB A + B
Residual
Degrees
of
freedom
νeA+B+AB
SSA+B+AB
e
Effect
mean
square
MSA+B+AB
e
Type I ANOVA table
F-stat.
X
P-value
X
Row E:
increase of the explained variability of the response due to the effect
E on top of the effects shown on rows above.
P-value on row E:
significance of the influence of the effect E on the response while
controlling (adjusting) for all effects shown on rows above E.
Interpretation of the F-tests on rows B and A is different in this table
compared to the previous table created in the order A + B + AB!
27
VII. Submodel
2. Omitting some columns
28
VII. Submodel
2. Omitting some columns
Data Cars2004 (subset)
Different types of ANOVA tables
consumption ∼ fdrive + lweight + fdrive:lweight
Type I ANOVA table
mInter1 <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
anova(mInter1)
Analysis of Variance Table
Response: consumption
Df Sum Sq Mean Sq F value
Pr(>F)
fdrive
2 519.89 259.94 293.935 < 2.2e-16 ***
lweight
1 954.26 954.26 1079.040 < 2.2e-16 ***
fdrive:lweight
2 26.70
13.35
15.097 4.758e-07 ***
Residuals
403 356.40
0.88
Type I (sequential) ANOVA table is one of several types of ANOVA
tables that are used in practice.
In practice, also other ANOVA tables are used which provides different
sequence of F-tests with in general different interpretation.
Type I ANOVA table, different order of rows
mInter2 <- lm(consumption ~ lweight + fdrive + fdrive:lweight, data = CarsNow)
anova(mInter2)
Analysis of Variance Table
Response: consumption
Df Sum Sq Mean Sq F value
Pr(>F)
lweight
1 1421.57 1421.57 1607.458 < 2.2e-16 ***
fdrive
2
52.58
26.29
29.726 9.079e-13 ***
lweight:fdrive
2
26.70
13.35
15.097 4.758e-07 ***
Residuals
403 356.40
0.88
29
VII. Submodel
2. Omitting some columns
Type II ANOVA table
Effect
Degrees
of
freedom
Effect
sum of
squares
B
SS A + B B
SS A + B A
AB
SS A + B + AB A + B
A
Residual
νeA+B+AB
SSA+B+AB
e
Effect
mean
square
F-stat.
P-value
2. Omitting some columns
MSA+B+AB
e
X
X
Type II ANOVA table
Model on row E:
submodel plus effect E.
The denominator of all F statistics in the table is
VII. Submodel
VII. Submodel
Type II ANOVA table
Submodel on row E:
the full model A + B + AB minus all effects containing E.
31
30
Row E:
the increase of the explained variability due to the effect E on top of
all other effects not containing E.
P-value on row E:
significance of the influence of the effect E on the response while
controlling (adjusting) for all other effects not containing E.
The order of factors on rows in the table does not play any role.
For practical purposes, this is probably the most useful ANOVA table.
MSeA+B+AB .
2. Omitting some columns
32
VII. Submodel
2. Omitting some columns
Data Cars2004 (subset)
Type III ANOVA table
consumption ∼ fdrive + lweight + fdrive:lweight
Effect
Type II ANOVA table
library("car")
Anova(mInter1, type = "II")
B
Response: consumption
Sum Sq Df F value
Pr(>F)
fdrive
52.58
2
29.726 9.079e-13 ***
lweight
954.26
1 1079.040 < 2.2e-16 ***
fdrive:lweight 26.70
2
15.097 4.758e-07 ***
Residuals
356.40 403
AB
Residual
Effect
mean
square
F-stat.
P-value
νeA+B+AB
SSA+B+AB
e
MSA+B+AB
e
X
X
Submodel on row E:
the full model A + B + AB minus coefficients corresponding to effect
E in a chosen parameterization (if there are some categorical covariates, . . . ).
Type II ANOVA table, different order of rows
Anova(mInter2, type = "II")
Anova Table (Type II tests)
Model on row E:
always the full model A + B + AB.
Response: consumption
Sum Sq Df F value
Pr(>F)
lweight
954.26
1 1079.040 < 2.2e-16 ***
fdrive
52.58
2
29.726 9.079e-13 ***
lweight:fdrive 26.70
2
15.097 4.758e-07 ***
Residuals
356.40 403
VII. Submodel
Effect
sum of
squares
SS A + B + AB B + AB
SS A + B + AB A + AB
SS A + B + AB A + B
A
Anova Table (Type II tests)
33
Degrees
of
freedom
The denominator of all F statistics in the table is MSA+B+AB
.
e
2. Omitting some columns
Type III ANOVA table
34
VII. Submodel
2. Omitting some columns
Data Cars2004 (subset)
consumption ∼ fdrive + lweight + fdrive:lweight
Type III ANOVA table
Anova(mInter1, type = "III")
Anova Table (Type III tests)
The order of factors on rows in the table does not play any role.
Response: consumption
Sum Sq Df F value
(Intercept)
386.28
1 436.793
fdrive
26.49
2 14.979
lweight
542.30
1 613.216
fdrive:lweight 26.70
2 15.097
Residuals
356.40 403
Results for factors that appear also in higher orders (e.g., A, B in the
interaction model) depend on chosen parameterization (pseudocontrasts for a categorical covariate, basis for polynomial, . . . )!
’
Also interpretation of results is different for different parameterizations!
Pr(>F)
< 2.2e-16
5.310e-07
< 2.2e-16
4.758e-07
***
***
***
***
Type III ANOVA table, different order of rows
Anova(mInter2, type = "III")
For practical purposes, rows A and B are in most cases useless.
Anova Table (Type III tests)
Response: consumption
Sum Sq Df F value
(Intercept)
386.28
1 436.793
lweight
542.30
1 613.216
fdrive
26.49
2 14.979
lweight:fdrive 26.70
2 15.097
Residuals
356.40 403
35
VII. Submodel
2. Omitting some columns
36
Pr(>F)
< 2.2e-16
< 2.2e-16
5.310e-07
4.758e-07
VII. Submodel
***
***
***
***
2. Omitting some columns
Data Cars2004 (subset)
Data Cars2004 (subset)
consumption ∼ fdrive + lweight + fdrive:lweight
consumption ∼ fdrive + lweight + fdrive:lweight
Type III ANOVA table, contr.treatment for fdrive
Interpretation of β(lweight) in the two parameterizations?
mInter <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
Anova(mInter, type = "III")
Model with contr.treatment for fdrive
summary(mInter)
Response: consumption
Sum Sq Df F value
(Intercept)
386.28
1 436.793
fdrive
26.49
2 14.979
lweight
542.30
1 613.216
fdrive:lweight 26.70
2 15.097
Residuals
356.40 403
Pr(>F)
< 2.2e-16
5.310e-07
< 2.2e-16
4.758e-07
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-52.8047
2.5266 -20.900 < 2e-16 ***
fdriverear
19.8445
5.1297
3.869 0.000128 ***
fdrive4x4
-12.5366
4.6506 -2.696 0.007319 **
lweight
8.5716
0.3461 24.763 < 2e-16 ***
fdriverear:lweight -2.5890
0.6956 -3.722 0.000226 ***
fdrive4x4:lweight
1.7837
0.6240
2.858 0.004480 **
***
***
***
***
Residual standard error: 0.9404 on 403 degrees of freedom
Multiple R-squared: 0.8081, Adjusted R-squared: 0.8057
F-statistic: 339.4 on 5 and 403 DF, p-value: < 2.2e-16
Type III ANOVA table, contr.SAS for fdrive
mInterSAS <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow,
contrasts = list(fdrive = contr.SAS))
Anova(mInterSAS, type = "III")
Response: consumption
Sum Sq Df F value
(Intercept)
247.68
1 280.063
fdrive
26.49
2 14.979
lweight
351.72
1 397.714
fdrive:lweight 26.70
2 15.097
Residuals
356.40 403
37
Pr(>F)
< 2.2e-16
5.310e-07
< 2.2e-16
4.758e-07
Model with contr.SAS for fdrive
summary(mInterSAS)
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-65.3414
3.9045 -16.735 < 2e-16 ***
fdrivefront
12.5366
4.6506
2.696 0.00732 **
fdriverear
32.3811
5.9309
5.460 8.35e-08 ***
lweight
10.3553
0.5192 19.943 < 2e-16 ***
fdrivefront:lweight -1.7837
0.6240 -2.858 0.00448 **
fdriverear:lweight
-4.3727
0.7961 -5.493 7.01e-08 ***
***
***
***
***
VII. Submodel
Residual standard error: 0.9404 on 403 degrees of freedom
Multiple R-squared: 0.8081, Adjusted R-squared: 0.8057
F-statistic: 339.4 on 5 and 403 DF, p-value: < 2.2e-16
2. Omitting some columns
38
VII. Submodel
2. Omitting some columns
Linear constraints
A submodel can also be obtained by constraining
the regression
coef
ficients of the linear model Y ∼ Xβ, σ 2 In , rank Xn×k = r ≤ k < n.
We shall now consider only independent linear constraints which
correspond to estimable parameters, that is, constraints of type L β =
θ 0 , where
Section VII.3
’
Linear constraints
’
Lm×k , m < r ≤ k, is a matrix with linearly independent rows such
that the parameter θ = L β is estimable;
θ 0 is a given real vector.
Such constraints determine a submodel as will be immediately shown.
w Couple (model – submodel)
Model M:
Submodel M0 :
Y ∼ Xβ, σ 2 In ,
Y ∼ Xβ, σ 2 In ,
Lβ = θ 0 .
w What does the comparison of M and M
0
39
VII. Submodel
3. Linear constraints
40
VII. Submodel
mean practically?
3. Linear constraints
Submodel determined by linear constraints on the
regression coefficients
Theorem VII.4 (Submodel determined by linear constraints on the
regression coefficients)
Let Y ∼ Xβ, σ 2 In , rank(Xn×k ) = r ≤ k . Let Lm×k , 0 < m < r
be a matrix with linearly independent rows such that M L> ⊂
M X> . Let θ 0 ∈ Rm .
Linear constraints
Estimated model mean under linear constraints
Let us look forthe LS estimator of µ = EY in a linear model
Y ∼ Xβ, σ 2 In under linear constraints Lβ = θ 0 .
w Let us look for Yb
0
= Xb0 ∈ M(X) being the closest to given Y and
satisfying Lb0 = θ 0 .
A set of linear constraints
Lβ = θ 0
then determines
a submodel of dimension r0 = r − m of a model
Y ∼ Xβ, σ 2 In . Further, a matrix
−
L X> X L>
does not depend on a choice of the pseudoinverse and is invertible.
Proof. See the blackboard.
41
VII. Submodel
Notation (to remind):
−
b = X> X X> Y (any solution to normal equations in a model
without constraints),
b = Xb (estimated model mean in a model without constraints).
Y
k
3. Linear constraints
Least squares solution under linear constraints
VII. Submodel
3. Linear constraints
Least squares solution under linear constraints
Theorem VII.5 (Least squares solution under linear constraints)
Let Y ∼ Xβ, σ 2 In , rank(Xn×k ) = r ≤ k . Let Lm×k , 0 < m < r
be a matrix with linearly independent rows such that M L> ⊂
M X> . Let θ 0 ∈ Rm .
b 0 = Xb0 minimizes Y − Xβ 2 given Lβ = θ 0 if and only if
Then Y
− n
− o−1
b0 = b − X> X L> L X> X L>
(Lb − θ 0 )
−
(depends on a choice of pseudoinverse X> X ).
Proof.
b 0 = Xb0 which
We are looking for Y
2
min Y − Xβ under the constraints Lβ = θ 0 .
β∈Rk
Use a method of Lagrange multipliers. Let
2
ϕ(β, λ) = Y − Xβ + 2λ> Lβ − θ 0
>
= Y − Xβ
Y − Xβ + 2λ> Lβ − θ 0
(2 is only included to simplify subsequent expressions).
Derivatives of ϕ:
Remark
b 0 = Xb0 does not depend on a choice of pseudoinverse X> X
Y
43
42
VII. Submodel
−
∂ϕ
(β, λ) = −2 X> Y − Xβ + 2 L> λ,
∂β
∂ϕ
(β, λ) = 2 Lβ − θ 0 .
∂λ
.
3. Linear constraints
44
VII. Submodel
3. Linear constraints
Least squares solution under linear constraints
First,
Least squares solution under linear constraints
∂ϕ
(β, λ) = 0 if and only if
∂β
X> Xβ = X> Y − L> λ.
’
Is this linear system consistent for any λ ∈ Rm ?
’
Why?
Second,
Lb0 (λ) = θ 0
−
Lb − L X> X L> λ = θ 0
−
L X> X L> λ = Lb − θ 0 .
|
{z
}
invertible as we already know
Let b0 (λ) be any solution to X> Xβ = X> Y − L> λ. That is,
b0 (λ) = X> X
−
X > Y − X> X
−
= b − X> X L> λ,
which depends on a choice of X> X
45
−
VII. Submodel
−
L> λ
That is,
n
o−1
−
λ = L X> X L>
Lb − θ 0 .
.
3. Linear constraints
Least squares solution under linear constraints
46
−
b 0 = Xb0 = Y
b − X X> X
Y
n
− o−1
L L X> X L>
(Lb − θ 0 ).
>
k
−
n
− o−1
L> L X> X L>
(Lb − θ 0 );
1
D = X X> X
2
o−1
2
n
D = SS0e −SSe = Lb − θ 0 > L X> X − L>
Lb−θ 0 .
Proof. See the blackboard.
47
VII. Submodel
3. Linear constraints
Theorem VII.6 (Effect of imposing linear constraints)
Let Y ∼ Xβ, σ 2 In , rank(Xn×k ) = r ≤ k . Let Lm×k , 0 < m < r
be a matrix with linearly independent rows such that M L> ⊂
b−Y
b 0 = U0 − U then
M X> . Let θ 0 ∈ Rm . A random vector D = Y
satisfies
n
− o−1
L> L X> X L>
(Lb − θ 0 ),
−
VII. Submodel
Effect of imposing linear constraints
Finally,
b0 = b − X> X
∂ϕ
(β, λ) = 0 if and only if
∂λ
3. Linear constraints
48
VII. Submodel
k
3. Linear constraints
Linear constraints
Linear constraints
Discussion
Remember that
o−1
2
n
D = SS0e − SSe = Lb − θ 0 > L X> X − L>
Lb − θ 0
is the numerator of the F statistic for comparison of the two models
under the normality.
Further r − r0 = r − (r − m) = m.
Hence statistic F0 according to Theorem VII.1 has the form
> n
− o−1
Lb − θ 0
L X> X L>
Lb − θ 0
=
Statistic F0 is the same as theF-statistic for testing H0 : Lβ = θ 0 in
a linear model Y ∼ Xβ, σ 2 In using Theorem VI.1 (7) which was
b = Lb and its independence with MSe .
based on normality of θ
Comparing the model and its submodel given by linear constraints
on the regression coefficients is indeed the same as evaluation of
whether the data support those linear constraints.
m
SSe
n−r
F0 =
Discussion, continued
> n
− o−1
1
Lb − θ 0
MSe L X> X L>
Lb − θ 0 .
m
49
VII. Submodel
3. Linear constraints
Linear constraints
50
VII. Submodel
3. Linear constraints
Linear constraints in a full-rank linear model
Direct observation
Consider L = l> , m = 1, θ0 ∈ R and denote θb = l> b, where b is any
solution to normal equations in a model without constraints.
Full-rank linear model (rank Xn×k = r = k ):
b = X> X −1 X> Y;
1
b=β
We see that
=
θb − θ0
q
−
MSe l> X> X l
!2
= T 2.
b 0 = σ2
var β
‘
(correspondence between the F -statistic for submodel testing and the t
statistic from Theorem VI.1 (6d)).
51
VII. Submodel
b0 = β
b − X> X
b0 = β
2
> n
− o−1 >
1 >
F0 =
l b − θ0
MSe l> X> X l
l b − θ0
m
3. Linear constraints
52
>
X X
−1
−1
>
− X X
n
−1 > o−1
b − θ0 ;
L> L X> X
L
Lβ
−1
n
−1 > o−1
−1
>
>
L
L X X
L
L X X
;
>
b or β
b 0?
Which estimator has a lower variance, β
VII. Submodel
3. Linear constraints
Linear constraints in a full-rank linear model
Full-rank linear model (rank Xn×k = r = k):
−1 > n
−1 > o−1
b − θ0 ;
4
D = X X> X
L L X> X
L
Lβ
5
53
Section VII.4
Coefficient of determination
o−1
2
n
b − θ 0 > L X> X −1 L>
b − θ0 .
D = Lβ
Lβ
VII. Submodel
3. Linear constraints
Coefficient of determination
54
VII. Submodel
Coefficient of determination
Coefficient of determination R 2 was defined and some of its properties discussed in Section II.8.
Also derived in Section II.8:
n
n
X
X
bi , that is, also
When 1n ∈ M X then
Yi =
Y
i=1
To remind, R 2 for a linear model Y ∼ Xβ, σ 2 In , where 1n ∈ M X
is defined (Definition II.12) as
R2 =
Y =
SSR
SSe
=1−
.
SST
SST
It was also shown that when rank X = r > 1 then the F-statistic F0
to compare the two models: Model M:
Y ∼ Xβ, σ 2 In ,
Submodel M0 : Y ∼ 1n β 0 , σ 2 In .
takes the form
VII. Submodel
i=1
n
n
i=1
i=1
1X
1Xb
Yi =
Yi .
n
n
b 0 = Y 1n ,
Under M0 : Y
2
2
Pn
SS0e = U0 = SST = i=1 Yi − Y .
So that (in the notation of this part):
n
X
2 bi − Y 2 = SSR .
b 0 2 =
D = b
Y−Y
Y
R2
n−r
F0 =
·
.
1 − R2 r − 1
55
4. Coefficient of determination
i=1
4. Coefficient of determination
56
VII. Submodel
4. Coefficient of determination
Coefficient of determination
Coefficient of determination and a sample correlation
coefficient
Theorem VII.7 (Coefficient determination and a sample correlation
coefficient)
Let Y ∼ Xβ, σ 2 In , rank(Xn×k ) = r ≤ k , r > 1, 1n ∈ M X . The
coefficient of determination R 2 then satisfies
Further (general statement following from projection considerations
performed at the beginning of this part):
b−Y
b 0 = U0 − U,
D=Y
U ⊥ D.
R 2 = rY2 ,Yb ,
where rY ,Yb is a sample correlation coefficient between Y1 , . . . , Yn
b1 , . . . , Y
bn .
and Y
57
VII. Submodel
4. Coefficient of determination
Coefficient of determination and a sample correlation
coefficient
58
VII. Submodel
4. Coefficient of determination
Prediction ability of the model
Proof.
n
o2
o2
bi − Y
b0 > Y
b−Y
b0
Yi − Y Y
Y−Y
2 o nPn
2 o = b 0 2 b
b 0 2
Y − Y
b
Y−Y
Yi − Y
i=1 Yi − Y
nP
n
i=1
rY2 ,Yb = nP
n
i=1
U
0>
2
n
D
= 2 2 =
U0 D
(D ⊥ U)
=
n o2
> o2
D2 + U> D
D+U D
2 2 =
2 2
U0 D
U0 D
Prediction ability of the model
b can also be considered as prediction of the values of the
A vector Y
random vector Y using the fitted model Xβ, σ 2 In .
Possible evaluation of a quality of prediction: (sample) correlation cob and Y.
efficient between Y
Theorem VII.7 then explains why the coefficient of determination is
a possible measure of the prediction ability of the model.
n o2
2
D2
D
SSR
= R2.
2 2 = 2 =
SST
U0 D
U0 k
59
VII. Submodel
4. Coefficient of determination
60
VII. Submodel
4. Coefficient of determination
Sample correlation coefficient
Sample correlation coefficient
Sample correlation coefficient, cont’d
Sample correlation coefficient
Suppose that the model matrix is


1 x1
. . 
. .
X=
. . ,
1 xn
>
>
and data y1 , x1 , . . . , yn , xn
correspond to a random sample
>
>
Y1 , X1 , . . . , Yn , Xn from a joint distribution of a generic random
>
vector Y , X .
Easy homework:
R 2 = rY2 ,X ,
where rY ,X is a sample correlation coefficient between Y and X .
The underlying linear model is EYi = β0 + β1 xi , i = 1, . . . , n.
Suppose that rank X = 2 and normality of the response can be
assumed. The overall F-test (Theorem II.15) to test H0 : β1 = 0 is then
based on the following statistic
F0 =
rY2 ,X
R2
n−2
=
·
· (n − 2),
1 − R2 2 − 1
1 − rY2 ,X
which under H0 follows F1, n−2 distribution.
Let %Y ,X be a (theoretical) correlation coefficient between Y and X .
>
When it can be assumed that (Y , X ) ∼ N2 , the null hypothesis H0 :
%Y ,X = 0 is being tested using the test statistic
√
rY ,X
T0 = q
n − 2,
1 − rY2 ,X
which under H0 follows tn−2 distribution (that is, T02 follows F1, n−2
distribution).
61
VII. Submodel
4. Coefficient of determination
Sample correlation coefficient
62
VII. Submodel
4. Coefficient of determination
Sample coefficient of multiple correlation
Sample coefficient of multiple correlation
Suppose that the model matrix is
Sample correlation coefficient, cont’d
The overall F-test (of H0 : β1 = 0) in a normal linear model Yi ∼
N (β0 +β1 xi , σ 2 ), i = 1, . . . , n is (technically) the same as the test of in>
dependence in a bivariate normal distribution of Yi , Xi , i = 1, . . . , n
based on the sample correlation coefficient.
A tricky question: Can we get two different P-values when examining
a significance of the slope in two simple regressions Y ∼ X and X ∼
Y by a classical t-test based on the estimated slope?


1 x>
1
.
.. 

.
X=
. ,
.
1 x>
n
>
> >
,
.
.
.
,
y
,
x
correspond
to a random sample
and data y1 , x>
n
n
1
> >
> >
Y1 , X1 , . . . , Yn , Xn
from a joint distribution of a generic ran>
dom vector Y , X .
Easy homework:
R 2 = rY2 ,X ,
where rY ,X is a sample coefficient of multiple correlation between Y
and X.
63
VII. Submodel
4. Coefficient of determination
64
VII. Submodel
4. Coefficient of determination