2 STAT 371 Final Exam Summary Statistics for Finance I 1 Model Selection and Specification Problems with X: 1. Suppose that we have an incorrect functional form (p. OLS and Rβ 112). • Log-log model: ln Yt = β1 + β2 ln Xt , semi-log model: (a) Consequences? ln Yt = β1 + β2 Xt , linear model: Yt = β1 + β2 Xt i. It could be unbiased and inefficient ˆ , U is n × k • β, βˆ is k × 1, Y, Yˆ is n × k, X is n × k, U ii. The t and F tests are invalid (b) Detection? Basic GLRM Framework: i. The informal test would be to just plot the data. • βˆOLS = (X t X)−1 X t Y ii. The formal test is the Ramsey Reset test. 2 • V ar[U ] = σU I ˆ =σ • V ar[β] ˆu2 (X t X)−1 = h RSS n−k i 2. Suppose that we are underfitting. (X t X)−1 (a) Let the true model be Yt = β1 + β2 X2t + β3 X3t + µt Rβ Framework: but you omitted X3t in the specification of your • H0 : Rβ = r, H1 : Rβ 6= r, q := rank(R) −1 ˆ • βˆR = βˆ + (X t X)−1 Rt R(X t X)−1 Rt (r − Rβ) model. β3 X3t 6= 0 and V ar[vt ] = β32 V ar(Xt ) + σu2 6= c for a t t ˆ t t ˆt U ˆ • U R R = y y − βR x y constant c. • V ar[βˆR ] = σ ˆ 2 (I − AR)(xt x)−1 (I − AR)t where A = u t −1 t −1 t −1 t (x x) R R(x x) R (b) Consequences? i. On the least square estimators, the OLS esti- • We have the following equivalent statements T SS mators are biased iff the excluded variable X3t is correlated with the included variable X2t = RSS + ESS (r23 6= 0 ) ˆ tU ˆ + βˆt X t Y − nY¯ 2 = U Y t Y − nY¯ 2 ii. The t and F ratios are no longer valid. ˆ tU ˆ + βˆt xt y = U yt y (c) Detection? i. An informal test is to add X3t to your model Key Statistics: 2 • R = ESS T SS , βˆ1 −β1 sd(βˆ1 ) • t = and check if there is a change in R2 . If it goes ¯2 = 1 − R RSS/(n−k) T SS/(n−1) ∼ t(n − k), √ = 1 − (1 − r−Rβ σ ˆ 2 R(X t X)−1 Rt n−1 R2 ) n−k up it is relevant. ii. Another informal test is to add X3t to the model ∼ t(n − k) for and check the changes in the new estimated co- q=1 • FStatistic = ESS/(k−1) RSS/(n−k) efficients. If there is a significant change, then we have a relevant variable. =∼ F (k − 1, n − k) (ANOVA) iii. The formal test is the Ramsey Reset test. • We have for q ≥ 1, t2 = F So you mistakenly specified Yt = φ1 + φ2 X2t + vt , vt = β3 X3t + µt and you get E[vt ] = = (Rβˆ − r)t [R(X t X)−1 Rt ](Rβˆ − r)/q ˆ tU ˆ )/(n − k) (U = (RSSR − RSSU N )/q ∼ F (q, n − k) RSSU N /(n − k) 3. Suppose that we are overfitting. (a) Let the true model be Yt = β1 + β2 X2t + ut but the mis-specified model be Yt = θ1 + θ2 X2t + θ3 X3t + vt where X3t is an irrelevant variable. Special Matrices: t −1 X(X X) (b) Consequences? t X = ProjX (·) i. The least squares estimator of the mis-specified t −1 M = (I − ProjX ) = (I − X(X X) t X ) = ProjUˆ (·) where M model are unbiased and consistent but no longer is idempotent and of rank n − k efficient. 1 (Central Limit Theorem) Suppose that we have X1 , ..., Xn ii. The t and F ratios are no longer valid. i.i.d. r.v.s with mean µ and variance σ 2 . Then, (c) Detection? i. The informal tests are the same as above in the ¯ 2 and the esticase of undefitting. However, R ¯ ∼ N (µ, σ 2 /n) =⇒ lim X √ n→∞ D n(θˆ − θ) −→ N (0, V ) mated coefficients are note expected to change Instrumental Variables: very much. ii. The more formal test is to test the restriction • We need to find a matrix Zn×l , l ≥ k such that it satisfies that θ3 = 0 using either the t test, the F test or certain properties. These are the t2 = F statistic. • E[Z t U ] = 0 Ramsey Reset Test: • E[Z t X] = ΣZX This is used to test for an incorrect functional form or for underfitting. • We premultiply the observed model by Z t to get: 1. Run OLS and obtain Yˆt and Yˆt will incorporate the true – Z tY = Z t Xβ + Z t U and so (X t ZZ t X)−1 X t ZZ t Y = (Z t X)−1 Z t Y functional form or the underfitting (if any exists) 2. Take the unrestricted model βˆIV = – If l = k, p lim βˆ = β + ΣZX · 0 = β (we need n→∞ Yt = φ0 + φ1 Xt + φ2 Yˆt2 + φ3 Yˆt3 + ... + φk Yˆtk invertibility of ΣZX ) • Read the notes to understand the various properties and and use the hypotheses H0 : ∀k, φk = 0, H1 : ∃k, φk 6= 0. proofs. Usually k = 3. 3. Compute • The problems here are: F = (RSSR − RSSU N )/q ∼ Fq,n−k RSSU N /(n − k) 1. The X 0 s are stochastic 2. E[X t ] 6= 0 and reject or don’t reject H0 . If we don’t reject then we 3. The errors 0 s are no longer white noise? (They are. have an incorrect functional form. See proof in notes) Errors in Y : Two-stages Least Squares: • We then have the equation Yt = β1 + β2 X2t + ut + ξt | {z } • If l > k, we do a procedure called the two-stages least t where we call t composite error. squares (2SLS): • The least squares estimators in Yt from above will remain 1. Regress X on Z and obtain a matrix of fitted values ˆ (Project X onto Z). That is X unbiased but no longer efficient (see proof in notes; may be on the final exam) ˆ = Z(Z t Z)−1 Z t X X Errors in X: ˆ and obtain βˆ2SLS where βˆ2SLS = 2. Regress Y on X ˆ t X) ˆ −1 X ˆ t Y = (X t Proj X)−1 X t Proj Y (X Z Z • We have the equation Y = (X − V )β + U = Xβ + (U − V β ) = Xβ + | {z } 3. We can show that βˆ2SLS = βˆIV . To do this, multiply by (Z t Z)(Z t Z)−1 in the equation for βˆIV to get = • The βˆOLS from above is going to be biased in small βˆIV samples and inconsistent in large samples (see proof in notes; may be on the final exam) 2 = (X t Z(Z t Z)−1 Z t X)−1 X t Z(Z t Z)−1 Z t Y = (X t ProjZ X)−1 X t ProjZ Y = βˆ2SLS 3 Non-Spherical Disturbances 3. Of course we don’t know which of the explanatory variables is causing this, but we have some remedies: When we have serial correlation and heteroskedasticity on (a) Test using the White procedure the error terms, we call these error terms non-spherical disturbances. This is when we have a covariance matrix that (b) Narrow it down to a specific variable (could be in is not diagonalized and and has non-zero entries on the off- the model) or outside the model (one unknown diagonal elements. variable) i. If it is coming from one of the X’s, we can: try Sources of Heteroskedasticity: to replace it with a proxy, try to replace it with (1) Nature of Yt (2) Mis-specification (3) Transformations (4) Varying coefficients Mathematical Representation of a combination of variables, drop it, do some transformations σt2 : ii. It is due to Z (outside of the model), then: you could have underfitting; raise your specification (1) σt2 = σ 2 Zth for some h 6= 0 (2) σt2 = α0 + α1 Zt (3) σt = α0 + α1 Zt (4) σt2 and try to include that missing relevant variable = f (Z1 , Z2 , ..., Zn ) Testing for Heteroskedasticity 4. What if you know the exact form of heteroskedasticity? (a) Use General Least Squares 1. Park Test i. Example. (a) Park specified σt2 = σ 2 Xtβ evt for the model Yt = Suppose that heteroskedasticity is due to X2t and it is taking the following form: β1 + β2 Xt + ut . (b) From here, we linearize the above equation to get ln σt2 h σt2 = σ 2 X2t ,h = 2 2 = ln σ + β ln Xt + vt . Since u ˆt is observed, it How can we correct for this problem? We use is a proxy for ut and the method of Weighted Least Squares, also 2 V ar(ˆ ut ) = E[(ˆ ut − 0) ] = E[ˆ u2t ] known as Generalized Least Squares (GLS) √ A. To do this, we want to “divide by the of we use ln u ˆt as a proxy for ln ut . Our new equation whatever is causing the heteroskedasticity is then B. So let’s transform our model as follows ln u ˆ2t = ln σ 2 + β ln Xt + vt Y β + β2 X2t + β3 X3t + ut p t = 1 p 2 2 X2t X2t where we hope that vt is white noise. (c) Test the hypothesis that H0 : β = 0 using a t test We then get and reject or not reject the null hypothesis. If we reject, then we have heteroskedasticity. " ut V ar p 2 X2t 2. White Test # = 1 2 2 V ar[ut ] = σ X2t and this new model is homoskedastic. (a) Let Yt = β1 + β2 X2t + β3 X3t + ut and regress Y on the X’s to get a series of u ˆt Serial Correlation: (b) Run the auxiliary regression (stated in R formula 2 2 notation) u ˆ2t ∼ (X2t + X3t )2 + X2t + X3t 1. Problem: Cov(ut , us ) 6= 0 for t 6= s (c) Compute R2 from the previous regression 2. Sources: P. 162-164 (will be on the final exam) (d) White showed that asymptotically, the quantity W = nR2 ∼ χ2 (k − 1) where k is the number of 3. Mathematical Representation: all the parameters in the auxiliary regression (here k = 6) If the test statistic is larger than the critical (a) Let the true model be Yt = β1 + at α = 5%, k − 1 then we have heteroskedasticity. 2 Pn i=2 βi Xit +ut such that E[ut ] = 0, V ar(ut ) = σ and Cov(us , ut ) 6= 0 3 (b) We will only consider the AR(1) (autoregressive 1) B. Compute ρˆ1 = process given by P ˆt u ˆt−1 Pu u ˆ2t−1 C. Use ρˆ1 for autocorrelation by applying GLS to get the estimated version of (1) ut = ρut−1 + ξt D. Apply D-W to (1) E. If H0 is accepted, then stop; if H0 is re- with E[ξt ] = 0, V ar[ξt ] = σξ2 , Cov(ξt , ξs ) = 0 for jected, go back to (2) using Yt − ρYt−1 as t 6= s, and |p| < 1 the new proxy for Yt (c) Remark that the conversion of this form into a gen- F. Keep iterating until ρˆs ≈ ρˆs−1 and H0 is eral linear process through the use of forward reP∞ cursion givesut = ξt + k=1 ξt−k ρk . This implies that E[ut ] = 0, V ar[ut ] = Cov(ut , ut−s ) = σξ2 1−ρ2 . accepted iii. Remark that the above Iterative Procedure We also get that doesn’t also converge very well (it converges to ρs σξ2 1−ρ2 a random walk) if ρ ≈ 1 4. Test: Durbin-Watson (D-W) [applies only to AR(1)]: ρˆ = Pn ˆt u ˆt−1 t=2 u P n ˆ2t−1 t=2 u Pn (ˆ u −ˆ ut−1 )2 t=2 Pnt 2 u t=1 ˆ t ≈ 2(1−ˆ ρ) with P 2 P 2 ˆt−1 ≈ u ˆt . due to the fact that u (a) The d−statistic is d = 4 Maximum Likelihood Estimation (b) Remark that if: ρ = −1 =⇒ d = 4, ρ = 1 =⇒ d = In MLE, we do the following: 0, ρ = 0 =⇒ d = 2 1. Assume a distribution for Y (c) According to Durbin and Watson, if d ∈ (dL , dU ) the test is inconclusive for dL , dU ∈ (0, 2) and similarly 2. Define the pdf of yi as fi (yi |θ) for each i for a symmetric reflection across ρ = 2 (this other 3. Find the joint pdf of the n realizations, assuming indeQn pendence, with f (Y |θ) = i=1 fi (yi |θ) interval is (4 − dU , 4 − dL )). Otherwise we make conclusions based on the proximity of d. Using this, we have several tests related to this. 4. Define the likelihood function L(θ|Y ) = f (Y |θ) = Qn i=1 fi (yi |θ) i. Test for autocorrelation (p. 169): A. H0 : ρ = 0; no autocorrelation, H1 : ρ 6= 0; 5. Take the log of L as l(θ|Y ) = log L(θ|Y ) there exists autocorrelation 6. Find θ through θˆ = argmax{θ∈Θ} l(θ|Y ) B. Calculate d ≈ 2 − 2ˆ ρ and use the d table to get dL and dU ; use α and df1 = n, df2 = MLE and the GLRM: k−1 C. Reject, not reject, or say the test is incon- • We define a few matrices: clusive 1. Score Matrix: S(θ) = 5. Remedies: GLS (Aitken 1936) (a) Set up: Yt = β1 + P ∂l ∂θ (k+1)×1 = 0(k+1)×1 2. Hessian Matrix: βk Xkt + ut , ut = ρut−1 + ξt ∂2l H(θ) = = ∂θ∂θ0 (b) Apply D-W and if autocorrelation exists, correct using: i. Use GLS if ρ is known: " ∂2l ∂β∂β 0 ∂2l ∂σ 2 ∂β ∂2l ∂β∂σ 2 ∂2l ∂(σ 2 )2 # (k+1)×(k+1) 3. Fisher Information Matrix: I(θ) = −E[H(θ)] A. Set up the equation (1) Yt −ρYt−1 = β1 (1− ρ)+β2 (X2t −ρX2,t−1 )+...+ξt since (2) ut = • Working in the GLRM framework (that is Y = Xβ + U ), ρut−1 + ξt where ξt is white noise. we will assume that ut ∼ N (0, σ 2 ) for all t. The first order conditions give us ii. Cochrane-Orcutt Iterative Procedure if ρ is not known: 1. βˆM L = (X t X)−1 X t Y = βˆOLS A. Run OLS on (2) Yt = β1 + ... + βk Xkt + ut 2 2. σ ˆM L = and obtain a series of residuals u ˆt 4 ˆ tU ˆ U n • In terms of unbiased-ness: (a) Remark that LRTStatistic can also be re-written as 1. βˆM L = βˆOLS =⇒ the estimate is unbiased for β LRTStatistic = 2. σ ˆM L = 6 σ ˆOLS =⇒ σ ˆM L is biased and E[ˆ σM L ] = 2 n−k σ n = • In terms of efficiency, 1. βˆM L = βˆOLS 2 −n/2 σ ˆR −2 ln σ ˆM L 2 σ ˆR ln = −2 ln (Λ) σ ˆM L (One computation related to the likelihood ratio will be on =⇒ the final) V ar[βˆM L ] = V ar[βˆOLS ] = σ 2 (X t X)−1 and so our estimate is efficient n−k 2σ 4 2 2. V ar(ˆ σM 6= σ 2 which means that it L) = n n 5 is inefficient and biased. Basic Sampling Concepts In sampling, we care about 3 characteristics of the popula- • In conclusion, tion: 1. In small samples, βˆM L is unbiased and efficient. σ ˆM L is biased and inefficient. 1. Population Total t = 2. In large samples, it can be shown that both estima- PN 2. Population Mean: Y¯ = tors are consistent and asymptotically normal (not shown in this course); that is, θˆM L is a CAN (con- i=1 1 N Yi PN i=1 Yi = t N 3. Population Proportion: p sistent and asymptotically normal) estimator. 3. We can also show that they achieve the Cramer-Rao 5.1 Simple Random Sampling (SRS) lower bound (proof will be on the final) In SRS, Asymptotic Test using ML (LR test): 1. We use y¯ (the sample mean) to estimate Y¯ . That is, y¯ P yi and has the is an estimator for Y¯ . Here, y¯ = n1 Here LR test refers to the likelihood ratio test. The procedure is as follows: properties: 1. Start with the unrestricted model: " # βˆM L = (X t X)−1 X t Y ˆ ˆ = ˆ tU (a) θM L = where U ˆ tU ˆ U 2 = σ ˆM L n y t y − βˆt xt y (a) E[¯ y ] = Y¯ 2 (b) V ar[¯ y ] = (1 − f ) Sn where S 2 is the true population variance. But S 2 is not known so we use the sample P variance s2 = 1 (yi − y¯)2 . Therefore, V d ar[¯ y] = 2 2. Let’s examine how we use the sample to estimate the population total. We know that t = N Y¯ and since y¯ is an 2. Then do the same thing with the restricted model: " (a) θˆR = βˆR = βˆM L + (...) 2 σ ˆR = ˆ t UˆR U R n n # estimator for Y¯ , we can use b t = N Y¯ which will be our where H0 : r = Rβ estimator for t. It has the following properties: n 2 −2 −2 (b) L(θˆR |Y ) = (2πˆ σR ) e (a) E[tˆ] = t 2 (b) V ar(tˆ) = N 2 V ar(¯ y ) = N 2 (1 − f ) s2 3. The Likelihood ratio test uses the fact that LRTStatistic n−1 (1 − f ) sn . n 2 −n 2 e− 2 (b) L(θˆM L |Y ) = (2πˆ σM L) h i = −2 ln L(θˆR ) − ln(θˆM L ) ! L(θˆR ) = −2 ln ∼ χ2 (q) ˆ L(θM L ) 3. We skip the estimator, pˆ, for p. (Assignment 4, Question 7) We are given N = 6, a population set UIndex = {1, 2, 3, 4, 5, 6} with Yi = {3, 4, 3, 4, 2, 2}. where H0 : r = Rβ, H1 : r 6= Rβ, LRTCritical = (α = a) We get that the population mean is Y¯i = 3 and the population variance is s2 = 0.8. 5%, q). If LRTStat > LRTCrit then reject H0 . 5 6 3 b) The possible number of SRS’s is = 20 c) The probability of 1 SRS drawn is 1 over the number of possible SRS’s. That is 1 20 . d) The probability distribution of the sample mean is found as follows. We generate a list of all possible 3 element combinations from Yi and the corresponding estimator values. Use this information to create the frequency distribution for the estimator. In this case, the mean has the following distribution: 2 8 4 9 8 7 P y¯ = = , P y¯ = = , P y¯ = = , 3 20 3 20 3 20 P y¯ = 10 3 = 4 ,P 20 y¯ = 11 3 and so E[¯ y ] = 3 = E[Y¯ ] with V ar(¯ y) = = P 2 20 (yi − y¯)2 P ri = 0.133. 5.2 Stratified Sampling (Assignment 4 Question 8) We are given that Uindex = {1, 2, 3, 4, 5, 6, 7, 8}, Yi = {1, 2, 4, 8, 4, 7, 7, 7} | {z } | {z } N1 N2 where N1 and N2 are the first and second stratums respectively. We want to take SRS’s from from stratums: a) SRS1 of size n1 = 2: The number of possible SRS1 is 4 2 = 6. We then have: tˆ = N1 y¯ Sample No. yi P (si ) y¯ 1 {1, 2} 1/6 1.5 4 × 1.5 = 6 2 {1, 4} 1/6 2.5 4 × 2.5 = 10 3 {1, 8} 1/6 4.5 4 × 4.5 = 18 4 {2, 4} 1/6 3 4 × 3 = 12 5 {2, 8} 1/6 5 4 × 5 = 20 6 {4, 8} 1/6 6 4 × 6 = 24 6
© Copyright 2024 ExpyDoc