Part VII Submodel ˇ Komarek, ´ doc. RNDr. Arnost Ph.D. Dept. of Probability and Mathematical Statistics NMSA407 Linear Regression VII. Submodel Winter term 2014–15 Aim: Parsimonious expression of EY General aim of a regression modelling The simplest possible expression of the dependence of EY on a set of covariates (predictors), i.e., on a set > > z1 = z1,1 , . . . , z1,p , . . ., zn = zn,1 , . . . , zn,p . In the framework of a linear model, we are looking for the “simplest possible” matrix Section VII.1 Submodel Xn×k x> f0 1 . = .. = x> f0 n z1 .. . zn ... .. . ... fk −1 z1 .. . fk−1 zn such that EY = Xβ for some β ∈ Rk . “Simplest possible” X ≡ the space M(X) of the lowest possible dimension. w Notion of a submodel. 3 VII. Submodel 1. Submodel 4 VII. Submodel 1. Submodel Submodel Back to projection considerations M X0n×k0 ⊂ M Xn×k , Repetition of a definition from Section II.8. Definition VII.1 (Submodel) Let Y ∼ Xβ, σ 2 In ≡ model M, rank Xn×k = r ≤ k < n, (among other things, EY ∈ M X ). Let X0n×k0 be a matrix of real constants such that M X0 ⊂ M(X), 0 < rank X0 = r0 < r , Let P = Q0n×r0 , Q1n×(r −r0 ) , Nn×(n−r ) Notation Situation that a model M0 is a submodel of model M will be denoted as M0 ⊂ M. VII. Submodel 1. Submodel Projections be a matrix with the orthonormal basis of the Euclidean space Rn such that We say that Y follows a submodel of model M given by the matrix X0 if Y ∼ X0 β 0 , σ 2 In . That is, if among other things EY ∈ M X0 . 5 0 < rank X0 = r0 < r = rank(X) ≤ k < n Q0 : orthonormal basis of a submodel regression space M X 0 = M Q0 ; Q = Q0 , Q1 : orthonormal basis of a model regression space M X =M Q ; N : orthonormal basis of a model residual space ⊥ M X =M N ; N0 = Q1 , N : orthonormal basis of a submodel residual space ⊥ M X0 = M N 0 . 6 VII. Submodel 1. Submodel Notation Let y ∈ Rn be a real vector. w Typically the response vector Y. Notation for further submodelconsiderations Let Y ∼ Xβ, σ 2 In , M X0n×k0 ⊂ M Xn×k , 0 < rank X0 = r0 < r = rank(X) ≤ k < n. Then > > y = QQ> + NN> y = Q0 Q0 + Q1 Q1 + NN> y > > = Q0 Q0 y + Q1 Q1 y + NN> y {z } | {z } | u b y > b 0 = Q0 Q0 > Y : Y projection of Y into the regression space of the submodel. > = Q0 Q0 y + Q1 Q1 y + NN> y | {z } | {z } b0 y u0 b 0 = Q1 Q1 > + NN> Y : residuals of the submodel. U0 = Y − Y 2 SS0e = U0 : residual sum of squares of the submodel. b y0 : projection of y into M X0 ; 0 νe0 = n − r0 : submodel residual degrees of freedom. b0 u =y−y . MS0e = Let > b−y b0 = u0 − u = Q1 Q1 y. d = y 7 VII. Submodel 1. Submodel 8 SS0e : submodel residual mean square. νe0 VII. Submodel 1. Submodel On a submodel On a submodel, continued Theorem VII.1 (On a submodel, continued) Theorem VII.1 (On a submodel) 0 0 0 2 5 Let the submodel hold, i.e., let Y ∼ X β , σ In . Then 1 2 b 0 is the best linear unbiased estimator (BLUE) of a vector Y parameter µ0 = X0 β 0 . b If further Y follows a normal distribution then the statistics Y and U0 are independent and SS0e − SSe r − r0 F0 = SSe n−r The submodel residual mean square MS0e is the unbiased estimator of the residual variance σ 2 . SS0e − SSe νe0 − νe = SSe νe ∼ Fr −r0 , n−r = Fνe0 −νe , νe . 0 3 4 b and U0 are uncorrelated. Statistics Y Proof. See the blackboard. b−Y b 0 = U0 − U satisfies A random vector D = Y 2 D = SS0e − SSe . k Point (5) of Theorem VII.1 was stated earlier (without proof) as Theorem II.14. 9 VII. Submodel 1. Submodel Series of submodels 10 VII. Submodel 1. Submodel On submodels Series of submodels Theorem VII.2 (On submodels) When looking for a parsimonious expression for EY we often consider a series of submodels. Let Y ∼ Nn Xβ, σ 2 In . If the submodel Y ∼ Nn X0 β 0 , σ 2 In of the submodel Y ∼ Nn X1 β 1 , σ 2 In holds then The most important properties will be shown in the case of two submodels of a model. In the following, let X0n×k0 , X1n×k1 , Xn×k , r0 = rank(X0 , r1 = rank(X1 , r = rank(X , be matrices of real constants such that M X 0 ⊂ M X1 ⊂ M X , F0,1 SS0e − SS1e r1 − r0 = SSe n−r SS0e − SS1e νe0 − νe1 = SSe νe ∼ Fr1 −r0 , n−r = Fνe0 −νe1 , νe . 0 < r0 < r1 < r ≤ k < n. Notation: b 0 , U0 , SS0 , ν 0 : Y e e b 1 , U1 , SS1 , ν 1 : Y e e b U, SSe , νe : Y, 11 Proof. See the blackboard. quantities under the (sub)model M0 : Y ∼ X0 β 0 , σ 2 In ; quantities under the (sub)model M1 : Y ∼ X1 β 1 , σ 2 In ; quantities under the model M: Y ∼ Xβ, σ 2 In . VII. Submodel 1. Submodel 12 VII. Submodel k 1. Submodel Notation F-test on a submodel using Theorem VII.1 M X0 ⊂ M X When it is necessary to state explicitely which two models M1 ⊂ M2 are being compared, model M1 having a residual sum of squares SS1e and model M2 having a residual sum of squares SS2e , related difference in the sum of squares (squared norm of the statistic D) will be denoted as SS M2 M1 or as SS 2 1 . That is, SS M2 M1 = SS 2 1 = SS1e − SS2e . M0 : M: Y ∼ Nn X0 β 0 , σ 2 In Y ∼ Nn Xβ, σ 2 In VII. Submodel 1. Submodel H0 : H1 : EY ∈ M X0 EY ∈ M X . Is model M significantly better than model M0 ? Does the regression space M X provide significantly better 0 expression for EY over the regression space M X ? Reject H0 if SS0e − SSe r − r0 F0 = = SSe n−r F0 ≥ Fr −r0 ,n−r (1 − α). P-value when F0 = f0 : p = 1 − CDFF , r −r0 ,n−r f0 . Test statistic: 13 14 VII. Submodel SS M M0 r − r0 . SSe n−r 1. Submodel F-test on submodels using Theorem VII.2 M X0 ⊂ M X1 ⊂ M X M0 : M1 : Y ∼ Nn X0 β 0 , σ 2 In Y ∼ Nn X1 β 1 , σ 2 In H0 : H1 : EY ∈ M X0 EY ∈ M X1 . Is model M1 significantly better than model M0 ? Does the regression space M X1 provide significantly better expression for EY over the regression space M X0 ? SS0e 15 SS1e − r1 − r0 = = SSe n−r ≥ Fr1 −r0 ,n−r (1 − α). Omitting some columns SS M1 M0 r1 − r0 . SSe n−r Test statistic: F0,1 Reject H0 if F0,1 P-value when F0 = f0 : p = 1 − CDFF , r1 −r0 ,n−r f0 . VII. Submodel Section VII.2 1. Submodel 16 VII. Submodel 2. Omitting some columns Omitting some columns Omitting some columns Notation The most common couple (model – submodel) Model M: Y ∼ Xβ, σ 2 In , Submodel M0 : Y ∼ X0 β 0 , σ 2 In , Projection matrices into the regression and residual spaces: − > − > H 0 = X0 X0 X0 X 0 , H = X X > X X> , M 0 = I n − H0 , M = In − H. where the submodel matrix X0 is obtained by omitting some columns from the model matrix X. w What does the comparison of M and M 0 b 0 = H0 Y, Projections: Y U0 = M0 Y, mean practically? 2 Sums of squares: SS0e = U0 , In the following, without the loss of generality, let X = X0 , X1 , 0 < rank X 17 0 2 SSe = U . Effect of omitting the columns from the model matrix: b−Y b 0 = U0 − U, D=Y 2 D = SS0e − SSe . = r0 < r = rank(X) < n. VII. Submodel b = HY, Y U = MY. 2. Omitting some columns Effect of omitting the columns from the model matrix 18 VII. Submodel 2. Omitting some columns Omitting some columns Discussion Theorem VII.3 (Effect of omitting the columns from the model matrix) Let Xn×k = X0 , X1 , 0 < rank X0 = r0 < r = rank(X) < n. b−Y b 0 = U0 − U then satisfies A random vector D = Y > − > 1 D = M 0 X1 X1 M 0 X 1 X 1 U0 ; 2 2 D = SS0e − SSe = U0 > X1 X1 > M0 X1 − X1 > U0 . Proof. See the blackboard. 19 VII. Submodel 2 Remember that D equals SS0e − SSe which is the numerator of the F statistic for comparison of the two models under the normality. Two extremes: 1 M X ⊂ M X0 , i.e., all columns of X1 lie in M X0 , matrix X1 does not extend the regression space beyond M X0 ). 2 ⇔ M0 X1 = 0, D = 0. ⇔ r0 = r , which was not allowed by our assumptions. k 2. Omitting some columns M X1 ⊥ M X0 . ⇒ M 0 X1 = X 1 . > > ⇒ X1 U0 = X1 Y. − > > ⇒ D = X1 X1 X 1 X1 Y (estimated EY in a model Y ∼ X1 β 1 , σ 2 In ). 20 VII. Submodel 2. Omitting some columns Data Cars2004 (subset) Data Cars2004 (subset) consumption ∼ fdrive + lweight + fdrive:lweight Model M : consumption ∼ fdrive + lweight + fdrive:lweight mInter <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow) summary(mInter) Data on vehicles that were on the U.S. market in 2004. Only non-hybrid cars with known: consumption, weight, engine size. n = 409. Dependence of consumption (Y ) on log(weight) (x1 ) and drive position (front/rear/4x4) (x2 ). Estimate Std. Error t value Pr(>|t|) (Intercept) -52.8047 2.5266 -20.900 < 2e-16 *** fdriverear 19.8445 5.1297 3.869 0.000128 *** fdrive4x4 -12.5366 4.6506 -2.696 0.007319 ** lweight 8.5716 0.3461 24.763 < 2e-16 *** fdriverear:lweight -2.5890 0.6956 -3.722 0.000226 *** fdrive4x4:lweight 1.7837 0.6240 2.858 0.004480 ** --Residual standard error: 0.9404 on 403 degrees of freedom Multiple R-squared: 0.8081, Adjusted R-squared: 0.8057 F-statistic: 339.4 on 5 and 403 DF, p-value: < 2.2e-16 Model M0 : consumption ∼ fdrive + lweight script LinRegr-07-01.R Coefficients: mAddit <- lm(consumption ~ fdrive + lweight, data = CarsNow) summary(mAddit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -52.5605 1.9627 -26.780 < 2e-16 *** fdriverear 0.6964 0.1181 5.897 7.83e-09 *** fdrive4x4 0.8787 0.1363 6.445 3.29e-10 *** lweight 8.5381 0.2688 31.762 < 2e-16 *** --- Follow-up of the analysis shown in LinRegr-05-02.R. Residual standard error: 0.9726 on 405 degrees of freedom Multiple R-squared: 0.7937, Adjusted R-squared: 0.7922 F-statistic: 519.5 on 3 and 405 DF, p-value: < 2.2e-16 21 VII. Submodel 2. Omitting some columns 22 VII. Submodel Data Cars2004 (subset) Data Cars2004 (subset) consumption ∼ fdrive + lweight + fdrive:lweight consumption ∼ fdrive + lweight + fdrive:lweight 2. Omitting some columns Model M versus model M0 anova(mAddit, mInter) Series of F-tests in model M Analysis of Variance Table Model 1: Model 2: Res.Df 1 405 2 403 --- mInter1 <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow) anova(mInter1) consumption ~ fdrive + lweight consumption ~ fdrive + lweight + fdrive:lweight RSS Df Sum of Sq F Pr(>F) 383.1 356.4 2 26.702 15.097 4.758e-07 *** Analysis of Variance Table Response: consumption Df Sum Sq Mean Sq F value Pr(>F) fdrive 2 519.89 259.94 293.935 < 2.2e-16 *** lweight 1 954.26 954.26 1079.040 < 2.2e-16 *** fdrive:lweight 2 26.70 13.35 15.097 4.758e-07 *** Residuals 403 356.40 0.88 --- w F-test based on Theorem VII.1. anova(mInter) Analysis of Variance Table Response: consumption Df Sum Sq Mean Sq F value Pr(>F) fdrive 2 519.89 259.94 293.935 < 2.2e-16 *** lweight 1 954.26 954.26 1079.040 < 2.2e-16 *** fdrive:lweight 2 26.70 13.35 15.097 4.758e-07 *** Residuals 403 356.40 0.88 --- w The last F-test is again comparing models M and M 0 23 VII. Submodel F-tests based on TheoremVII.2 (on submodels). What is a sequence of models being tested? using Theorem VII.1. 2. Omitting some columns 24 VII. Submodel 2. Omitting some columns Data Cars2004 (subset) Type I (sequential) ANOVA table consumption ∼ fdrive + lweight + fdrive:lweight Order A + B + AB Effect Again a series of F-tests in model M A Effect sum of squares SS A 0 B SS A + B A AB SS A + B + AB A + B mInter2 <- lm(consumption ~ lweight + fdrive + fdrive:lweight, data = CarsNow) anova(mInter2) Analysis of Variance Table Response: consumption Df Sum Sq Mean Sq F value Pr(>F) lweight 1 1421.57 1421.57 1607.458 < 2.2e-16 *** fdrive 2 52.58 26.29 29.726 9.079e-13 *** lweight:fdrive 2 26.70 13.35 15.097 4.758e-07 *** Residuals 403 356.40 0.88 --- Residual νeA+B+AB SSA+B+AB e Effect mean square F-stat. P-value MSA+B+AB e X X The denominator of all F statistics in the table is MSeA+B+AB . Different F-statistics and p-values compared to the previous slide. How does this come? What is a sequence of models being tested? The row of the effect E shows how much this effect helps to explain the variability of the response on top of all effects on rows being above. P-value on row E: significance of the influence of the effect E while controlling (adjusting) for all effects shown on rows above E. w Type I (sequential) F-tests. 25 Degrees of freedom The sum of sums of squares shown in the above table gives the total sum of squares SST . VII. Submodel 2. Omitting some columns Type I (sequential) ANOVA table 26 VII. Submodel 2. Omitting some columns Type I ANOVA table Order B + A + AB Effect B Effect sum of squares SS B 0 A SS A + B B AB SS A + B + AB A + B Residual Degrees of freedom νeA+B+AB SSA+B+AB e Effect mean square MSA+B+AB e Type I ANOVA table F-stat. X P-value X Row E: increase of the explained variability of the response due to the effect E on top of the effects shown on rows above. P-value on row E: significance of the influence of the effect E on the response while controlling (adjusting) for all effects shown on rows above E. Interpretation of the F-tests on rows B and A is different in this table compared to the previous table created in the order A + B + AB! 27 VII. Submodel 2. Omitting some columns 28 VII. Submodel 2. Omitting some columns Data Cars2004 (subset) Different types of ANOVA tables consumption ∼ fdrive + lweight + fdrive:lweight Type I ANOVA table mInter1 <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow) anova(mInter1) Analysis of Variance Table Response: consumption Df Sum Sq Mean Sq F value Pr(>F) fdrive 2 519.89 259.94 293.935 < 2.2e-16 *** lweight 1 954.26 954.26 1079.040 < 2.2e-16 *** fdrive:lweight 2 26.70 13.35 15.097 4.758e-07 *** Residuals 403 356.40 0.88 Type I (sequential) ANOVA table is one of several types of ANOVA tables that are used in practice. In practice, also other ANOVA tables are used which provides different sequence of F-tests with in general different interpretation. Type I ANOVA table, different order of rows mInter2 <- lm(consumption ~ lweight + fdrive + fdrive:lweight, data = CarsNow) anova(mInter2) Analysis of Variance Table Response: consumption Df Sum Sq Mean Sq F value Pr(>F) lweight 1 1421.57 1421.57 1607.458 < 2.2e-16 *** fdrive 2 52.58 26.29 29.726 9.079e-13 *** lweight:fdrive 2 26.70 13.35 15.097 4.758e-07 *** Residuals 403 356.40 0.88 29 VII. Submodel 2. Omitting some columns Type II ANOVA table Effect Degrees of freedom Effect sum of squares B SS A + B B SS A + B A AB SS A + B + AB A + B A Residual νeA+B+AB SSA+B+AB e Effect mean square F-stat. P-value 2. Omitting some columns MSA+B+AB e X X Type II ANOVA table Model on row E: submodel plus effect E. The denominator of all F statistics in the table is VII. Submodel VII. Submodel Type II ANOVA table Submodel on row E: the full model A + B + AB minus all effects containing E. 31 30 Row E: the increase of the explained variability due to the effect E on top of all other effects not containing E. P-value on row E: significance of the influence of the effect E on the response while controlling (adjusting) for all other effects not containing E. The order of factors on rows in the table does not play any role. For practical purposes, this is probably the most useful ANOVA table. MSeA+B+AB . 2. Omitting some columns 32 VII. Submodel 2. Omitting some columns Data Cars2004 (subset) Type III ANOVA table consumption ∼ fdrive + lweight + fdrive:lweight Effect Type II ANOVA table library("car") Anova(mInter1, type = "II") B Response: consumption Sum Sq Df F value Pr(>F) fdrive 52.58 2 29.726 9.079e-13 *** lweight 954.26 1 1079.040 < 2.2e-16 *** fdrive:lweight 26.70 2 15.097 4.758e-07 *** Residuals 356.40 403 AB Residual Effect mean square F-stat. P-value νeA+B+AB SSA+B+AB e MSA+B+AB e X X Submodel on row E: the full model A + B + AB minus coefficients corresponding to effect E in a chosen parameterization (if there are some categorical covariates, . . . ). Type II ANOVA table, different order of rows Anova(mInter2, type = "II") Anova Table (Type II tests) Model on row E: always the full model A + B + AB. Response: consumption Sum Sq Df F value Pr(>F) lweight 954.26 1 1079.040 < 2.2e-16 *** fdrive 52.58 2 29.726 9.079e-13 *** lweight:fdrive 26.70 2 15.097 4.758e-07 *** Residuals 356.40 403 VII. Submodel Effect sum of squares SS A + B + AB B + AB SS A + B + AB A + AB SS A + B + AB A + B A Anova Table (Type II tests) 33 Degrees of freedom The denominator of all F statistics in the table is MSA+B+AB . e 2. Omitting some columns Type III ANOVA table 34 VII. Submodel 2. Omitting some columns Data Cars2004 (subset) consumption ∼ fdrive + lweight + fdrive:lweight Type III ANOVA table Anova(mInter1, type = "III") Anova Table (Type III tests) The order of factors on rows in the table does not play any role. Response: consumption Sum Sq Df F value (Intercept) 386.28 1 436.793 fdrive 26.49 2 14.979 lweight 542.30 1 613.216 fdrive:lweight 26.70 2 15.097 Residuals 356.40 403 Results for factors that appear also in higher orders (e.g., A, B in the interaction model) depend on chosen parameterization (pseudocontrasts for a categorical covariate, basis for polynomial, . . . )! Also interpretation of results is different for different parameterizations! Pr(>F) < 2.2e-16 5.310e-07 < 2.2e-16 4.758e-07 *** *** *** *** Type III ANOVA table, different order of rows Anova(mInter2, type = "III") For practical purposes, rows A and B are in most cases useless. Anova Table (Type III tests) Response: consumption Sum Sq Df F value (Intercept) 386.28 1 436.793 lweight 542.30 1 613.216 fdrive 26.49 2 14.979 lweight:fdrive 26.70 2 15.097 Residuals 356.40 403 35 VII. Submodel 2. Omitting some columns 36 Pr(>F) < 2.2e-16 < 2.2e-16 5.310e-07 4.758e-07 VII. Submodel *** *** *** *** 2. Omitting some columns Data Cars2004 (subset) Data Cars2004 (subset) consumption ∼ fdrive + lweight + fdrive:lweight consumption ∼ fdrive + lweight + fdrive:lweight Type III ANOVA table, contr.treatment for fdrive Interpretation of β(lweight) in the two parameterizations? mInter <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow) Anova(mInter, type = "III") Model with contr.treatment for fdrive summary(mInter) Response: consumption Sum Sq Df F value (Intercept) 386.28 1 436.793 fdrive 26.49 2 14.979 lweight 542.30 1 613.216 fdrive:lweight 26.70 2 15.097 Residuals 356.40 403 Pr(>F) < 2.2e-16 5.310e-07 < 2.2e-16 4.758e-07 Estimate Std. Error t value Pr(>|t|) (Intercept) -52.8047 2.5266 -20.900 < 2e-16 *** fdriverear 19.8445 5.1297 3.869 0.000128 *** fdrive4x4 -12.5366 4.6506 -2.696 0.007319 ** lweight 8.5716 0.3461 24.763 < 2e-16 *** fdriverear:lweight -2.5890 0.6956 -3.722 0.000226 *** fdrive4x4:lweight 1.7837 0.6240 2.858 0.004480 ** *** *** *** *** Residual standard error: 0.9404 on 403 degrees of freedom Multiple R-squared: 0.8081, Adjusted R-squared: 0.8057 F-statistic: 339.4 on 5 and 403 DF, p-value: < 2.2e-16 Type III ANOVA table, contr.SAS for fdrive mInterSAS <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow, contrasts = list(fdrive = contr.SAS)) Anova(mInterSAS, type = "III") Response: consumption Sum Sq Df F value (Intercept) 247.68 1 280.063 fdrive 26.49 2 14.979 lweight 351.72 1 397.714 fdrive:lweight 26.70 2 15.097 Residuals 356.40 403 37 Pr(>F) < 2.2e-16 5.310e-07 < 2.2e-16 4.758e-07 Model with contr.SAS for fdrive summary(mInterSAS) Estimate Std. Error t value Pr(>|t|) (Intercept) -65.3414 3.9045 -16.735 < 2e-16 *** fdrivefront 12.5366 4.6506 2.696 0.00732 ** fdriverear 32.3811 5.9309 5.460 8.35e-08 *** lweight 10.3553 0.5192 19.943 < 2e-16 *** fdrivefront:lweight -1.7837 0.6240 -2.858 0.00448 ** fdriverear:lweight -4.3727 0.7961 -5.493 7.01e-08 *** *** *** *** *** VII. Submodel Residual standard error: 0.9404 on 403 degrees of freedom Multiple R-squared: 0.8081, Adjusted R-squared: 0.8057 F-statistic: 339.4 on 5 and 403 DF, p-value: < 2.2e-16 2. Omitting some columns 38 VII. Submodel 2. Omitting some columns Linear constraints A submodel can also be obtained by constraining the regression coef ficients of the linear model Y ∼ Xβ, σ 2 In , rank Xn×k = r ≤ k < n. We shall now consider only independent linear constraints which correspond to estimable parameters, that is, constraints of type L β = θ 0 , where Section VII.3 Linear constraints Lm×k , m < r ≤ k, is a matrix with linearly independent rows such that the parameter θ = L β is estimable; θ 0 is a given real vector. Such constraints determine a submodel as will be immediately shown. w Couple (model – submodel) Model M: Submodel M0 : Y ∼ Xβ, σ 2 In , Y ∼ Xβ, σ 2 In , Lβ = θ 0 . w What does the comparison of M and M 0 39 VII. Submodel 3. Linear constraints 40 VII. Submodel mean practically? 3. Linear constraints Submodel determined by linear constraints on the regression coefficients Theorem VII.4 (Submodel determined by linear constraints on the regression coefficients) Let Y ∼ Xβ, σ 2 In , rank(Xn×k ) = r ≤ k . Let Lm×k , 0 < m < r be a matrix with linearly independent rows such that M L> ⊂ M X> . Let θ 0 ∈ Rm . Linear constraints Estimated model mean under linear constraints Let us look forthe LS estimator of µ = EY in a linear model Y ∼ Xβ, σ 2 In under linear constraints Lβ = θ 0 . w Let us look for Yb 0 = Xb0 ∈ M(X) being the closest to given Y and satisfying Lb0 = θ 0 . A set of linear constraints Lβ = θ 0 then determines a submodel of dimension r0 = r − m of a model Y ∼ Xβ, σ 2 In . Further, a matrix − L X> X L> does not depend on a choice of the pseudoinverse and is invertible. Proof. See the blackboard. 41 VII. Submodel Notation (to remind): − b = X> X X> Y (any solution to normal equations in a model without constraints), b = Xb (estimated model mean in a model without constraints). Y k 3. Linear constraints Least squares solution under linear constraints VII. Submodel 3. Linear constraints Least squares solution under linear constraints Theorem VII.5 (Least squares solution under linear constraints) Let Y ∼ Xβ, σ 2 In , rank(Xn×k ) = r ≤ k . Let Lm×k , 0 < m < r be a matrix with linearly independent rows such that M L> ⊂ M X> . Let θ 0 ∈ Rm . b 0 = Xb0 minimizes Y − Xβ 2 given Lβ = θ 0 if and only if Then Y − n − o−1 b0 = b − X> X L> L X> X L> (Lb − θ 0 ) − (depends on a choice of pseudoinverse X> X ). Proof. b 0 = Xb0 which We are looking for Y 2 min Y − Xβ under the constraints Lβ = θ 0 . β∈Rk Use a method of Lagrange multipliers. Let 2 ϕ(β, λ) = Y − Xβ + 2λ> Lβ − θ 0 > = Y − Xβ Y − Xβ + 2λ> Lβ − θ 0 (2 is only included to simplify subsequent expressions). Derivatives of ϕ: Remark b 0 = Xb0 does not depend on a choice of pseudoinverse X> X Y 43 42 VII. Submodel − ∂ϕ (β, λ) = −2 X> Y − Xβ + 2 L> λ, ∂β ∂ϕ (β, λ) = 2 Lβ − θ 0 . ∂λ . 3. Linear constraints 44 VII. Submodel 3. Linear constraints Least squares solution under linear constraints First, Least squares solution under linear constraints ∂ϕ (β, λ) = 0 if and only if ∂β X> Xβ = X> Y − L> λ. Is this linear system consistent for any λ ∈ Rm ? Why? Second, Lb0 (λ) = θ 0 − Lb − L X> X L> λ = θ 0 − L X> X L> λ = Lb − θ 0 . | {z } invertible as we already know Let b0 (λ) be any solution to X> Xβ = X> Y − L> λ. That is, b0 (λ) = X> X − X > Y − X> X − = b − X> X L> λ, which depends on a choice of X> X 45 − VII. Submodel − L> λ That is, n o−1 − λ = L X> X L> Lb − θ 0 . . 3. Linear constraints Least squares solution under linear constraints 46 − b 0 = Xb0 = Y b − X X> X Y n − o−1 L L X> X L> (Lb − θ 0 ). > k − n − o−1 L> L X> X L> (Lb − θ 0 ); 1 D = X X> X 2 o−1 2 n D = SS0e −SSe = Lb − θ 0 > L X> X − L> Lb−θ 0 . Proof. See the blackboard. 47 VII. Submodel 3. Linear constraints Theorem VII.6 (Effect of imposing linear constraints) Let Y ∼ Xβ, σ 2 In , rank(Xn×k ) = r ≤ k . Let Lm×k , 0 < m < r be a matrix with linearly independent rows such that M L> ⊂ b−Y b 0 = U0 − U then M X> . Let θ 0 ∈ Rm . A random vector D = Y satisfies n − o−1 L> L X> X L> (Lb − θ 0 ), − VII. Submodel Effect of imposing linear constraints Finally, b0 = b − X> X ∂ϕ (β, λ) = 0 if and only if ∂λ 3. Linear constraints 48 VII. Submodel k 3. Linear constraints Linear constraints Linear constraints Discussion Remember that o−1 2 n D = SS0e − SSe = Lb − θ 0 > L X> X − L> Lb − θ 0 is the numerator of the F statistic for comparison of the two models under the normality. Further r − r0 = r − (r − m) = m. Hence statistic F0 according to Theorem VII.1 has the form > n − o−1 Lb − θ 0 L X> X L> Lb − θ 0 = Statistic F0 is the same as theF-statistic for testing H0 : Lβ = θ 0 in a linear model Y ∼ Xβ, σ 2 In using Theorem VI.1 (7) which was b = Lb and its independence with MSe . based on normality of θ Comparing the model and its submodel given by linear constraints on the regression coefficients is indeed the same as evaluation of whether the data support those linear constraints. m SSe n−r F0 = Discussion, continued > n − o−1 1 Lb − θ 0 MSe L X> X L> Lb − θ 0 . m 49 VII. Submodel 3. Linear constraints Linear constraints 50 VII. Submodel 3. Linear constraints Linear constraints in a full-rank linear model Direct observation Consider L = l> , m = 1, θ0 ∈ R and denote θb = l> b, where b is any solution to normal equations in a model without constraints. Full-rank linear model (rank Xn×k = r = k ): b = X> X −1 X> Y; 1 b=β We see that = θb − θ0 q − MSe l> X> X l !2 = T 2. b 0 = σ2 var β (correspondence between the F -statistic for submodel testing and the t statistic from Theorem VI.1 (6d)). 51 VII. Submodel b0 = β b − X> X b0 = β 2 > n − o−1 > 1 > F0 = l b − θ0 MSe l> X> X l l b − θ0 m 3. Linear constraints 52 > X X −1 −1 > − X X n −1 > o−1 b − θ0 ; L> L X> X L Lβ −1 n −1 > o−1 −1 > > L L X X L L X X ; > b or β b 0? Which estimator has a lower variance, β VII. Submodel 3. Linear constraints Linear constraints in a full-rank linear model Full-rank linear model (rank Xn×k = r = k): −1 > n −1 > o−1 b − θ0 ; 4 D = X X> X L L X> X L Lβ 5 53 Section VII.4 Coefficient of determination o−1 2 n b − θ 0 > L X> X −1 L> b − θ0 . D = Lβ Lβ VII. Submodel 3. Linear constraints Coefficient of determination 54 VII. Submodel Coefficient of determination Coefficient of determination R 2 was defined and some of its properties discussed in Section II.8. Also derived in Section II.8: n n X X bi , that is, also When 1n ∈ M X then Yi = Y i=1 To remind, R 2 for a linear model Y ∼ Xβ, σ 2 In , where 1n ∈ M X is defined (Definition II.12) as R2 = Y = SSR SSe =1− . SST SST It was also shown that when rank X = r > 1 then the F-statistic F0 to compare the two models: Model M: Y ∼ Xβ, σ 2 In , Submodel M0 : Y ∼ 1n β 0 , σ 2 In . takes the form VII. Submodel i=1 n n i=1 i=1 1X 1Xb Yi = Yi . n n b 0 = Y 1n , Under M0 : Y 2 2 Pn SS0e = U0 = SST = i=1 Yi − Y . So that (in the notation of this part): n X 2 bi − Y 2 = SSR . b 0 2 = D = b Y−Y Y R2 n−r F0 = · . 1 − R2 r − 1 55 4. Coefficient of determination i=1 4. Coefficient of determination 56 VII. Submodel 4. Coefficient of determination Coefficient of determination Coefficient of determination and a sample correlation coefficient Theorem VII.7 (Coefficient determination and a sample correlation coefficient) Let Y ∼ Xβ, σ 2 In , rank(Xn×k ) = r ≤ k , r > 1, 1n ∈ M X . The coefficient of determination R 2 then satisfies Further (general statement following from projection considerations performed at the beginning of this part): b−Y b 0 = U0 − U, D=Y U ⊥ D. R 2 = rY2 ,Yb , where rY ,Yb is a sample correlation coefficient between Y1 , . . . , Yn b1 , . . . , Y bn . and Y 57 VII. Submodel 4. Coefficient of determination Coefficient of determination and a sample correlation coefficient 58 VII. Submodel 4. Coefficient of determination Prediction ability of the model Proof. n o2 o2 bi − Y b0 > Y b−Y b0 Yi − Y Y Y−Y 2 o nPn 2 o = b 0 2 b b 0 2 Y − Y b Y−Y Yi − Y i=1 Yi − Y nP n i=1 rY2 ,Yb = nP n i=1 U 0> 2 n D = 2 2 = U0 D (D ⊥ U) = n o2 > o2 D2 + U> D D+U D 2 2 = 2 2 U0 D U0 D Prediction ability of the model b can also be considered as prediction of the values of the A vector Y random vector Y using the fitted model Xβ, σ 2 In . Possible evaluation of a quality of prediction: (sample) correlation cob and Y. efficient between Y Theorem VII.7 then explains why the coefficient of determination is a possible measure of the prediction ability of the model. n o2 2 D2 D SSR = R2. 2 2 = 2 = SST U0 D U0 k 59 VII. Submodel 4. Coefficient of determination 60 VII. Submodel 4. Coefficient of determination Sample correlation coefficient Sample correlation coefficient Sample correlation coefficient, cont’d Sample correlation coefficient Suppose that the model matrix is 1 x1 . . . . X= . . , 1 xn > > and data y1 , x1 , . . . , yn , xn correspond to a random sample > > Y1 , X1 , . . . , Yn , Xn from a joint distribution of a generic random > vector Y , X . Easy homework: R 2 = rY2 ,X , where rY ,X is a sample correlation coefficient between Y and X . The underlying linear model is EYi = β0 + β1 xi , i = 1, . . . , n. Suppose that rank X = 2 and normality of the response can be assumed. The overall F-test (Theorem II.15) to test H0 : β1 = 0 is then based on the following statistic F0 = rY2 ,X R2 n−2 = · · (n − 2), 1 − R2 2 − 1 1 − rY2 ,X which under H0 follows F1, n−2 distribution. Let %Y ,X be a (theoretical) correlation coefficient between Y and X . > When it can be assumed that (Y , X ) ∼ N2 , the null hypothesis H0 : %Y ,X = 0 is being tested using the test statistic √ rY ,X T0 = q n − 2, 1 − rY2 ,X which under H0 follows tn−2 distribution (that is, T02 follows F1, n−2 distribution). 61 VII. Submodel 4. Coefficient of determination Sample correlation coefficient 62 VII. Submodel 4. Coefficient of determination Sample coefficient of multiple correlation Sample coefficient of multiple correlation Suppose that the model matrix is Sample correlation coefficient, cont’d The overall F-test (of H0 : β1 = 0) in a normal linear model Yi ∼ N (β0 +β1 xi , σ 2 ), i = 1, . . . , n is (technically) the same as the test of in> dependence in a bivariate normal distribution of Yi , Xi , i = 1, . . . , n based on the sample correlation coefficient. A tricky question: Can we get two different P-values when examining a significance of the slope in two simple regressions Y ∼ X and X ∼ Y by a classical t-test based on the estimated slope? 1 x> 1 . .. . X= . , . 1 x> n > > > , . . . , y , x correspond to a random sample and data y1 , x> n n 1 > > > > Y1 , X1 , . . . , Yn , Xn from a joint distribution of a generic ran> dom vector Y , X . Easy homework: R 2 = rY2 ,X , where rY ,X is a sample coefficient of multiple correlation between Y and X. 63 VII. Submodel 4. Coefficient of determination 64 VII. Submodel 4. Coefficient of determination
© Copyright 2025 ExpyDoc