1 Contents 7 Contents 7 Random Samples Statistics and Sampling distributions Random Samples Statistics and Sampling distributions 7 2 7.1 Sampling Distributions . . . . . . . . . . . . . . . . . . 2 7.2 Sampling Distributions Related to the Normal Distribution 10 7.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . 39 7.1 2 Random Samples Statistics and Sampling distributions Sampling Distributions Definition and Notation If n random variables are independent and identically distributed, we refer them as a random sample, and denote them by i.i.d. rvs random variables). A Statistic is any real-valued function of the observable random ¯ = n−1 n Xi is a variables in a sample. E.g. the sample mean, X i=1 statistic because it is a function of only the observed values X 1 · · · Xn . 7 Random Samples Statistics and Sampling distributions One of the goals of statistical theory is to estimate the unknown population parameters by statistics, e.g., µ can be estimated ¯ by X. ¯ is called the The probability distribution of a statistic (such as X) sampling distribution of that statistic. Sampling distributions can be used to express the uncertainty of the estimators. 3 7 Random Samples Statistics and Sampling distributions How to find the sampling distributions of statistics? Three methods (Review): 1. Method of Distribution Functions Review) Procedure: 1. Integrate over the region in the y space that corresponds to the event U ≤ u} to obtain the CDF of U : FU (u) = P (U ≤ u) 2. Differentiate FU (u) with respect to u to obtain the pdf of U : fU (u). 2. Method of Transformations Review) Let Y be a r.v. with pdf fY (y). Let h(Y ) be a (strictly) increasing or decreasing function of Y , over the range of Y . Then pdf of U = h(Y ) is dy fU (u) = fY h−1 (u) . du 4 7 Random Samples Statistics and Sampling distributions 5 7 The MGF of a random variable X is defined as m(t) = E eXt tx all x e P (X = x) (Discrete) = . tx e fX (x)dx (Continuous) Procedure to determine the distribution of U = h(Y ): 1. Derive the mgf of U = h(Y ); 2. Compare the mgf to those of known distributions to identify the distribution of U = h(Y ) Note that MGF is a function of t, a real number. Why should we care about the MGF? MGFs can be used to determine distributions of linear combinations of independent r.v.s: • To calculate moments. It may be easier to work with the MGF than to directly calculate E(X r ) by the definition. Instead, Let Y1 Y2 · · · Yn be independent rvs with mgf s mY (t) mY2 (t) · · · mYn (t), and let U = a1 Y1 + a2 Y2 + · · · + an Yn , where ai are known constants, i = 1 · · · n. Then ∂ r m(t) E(X ) = |t=0 = mr) (t)|t=0 . ∂t r • To determine distributions of functions of random variables, as MGF uniquely determines the distribution. Random Samples Statistics and Sampling distributions Example 7.1.1 Suppose that Y1 · · · Y5 is a random sample from the unif orm(0 1) distribution. 1. Let U = max(Y1 Y2 · · · Y5 ). Find the pdf of U . 2. Let U = 1 + 2Y1 . Find the pdf of U . 6 Uniqueness Theorem: Suppose rvs X and Y have mgf s mX (t) and mY (t). If mX (t) = mY (t) for all t, then X and Y have the same distribution. 3. Method of Moment Generating Functions Review) 7 Random Samples Statistics and Sampling distributions mU (t) = mY (ta1 ) × mY2 (ta2 ) × · · · × mYn (tan ). 7 7 Random Samples Statistics and Sampling distributions Example 7.1.2 Suppose that X ∼ N (2 1) and Y ∼ N (1 4) are independent. Use the method of moment generating functions to find the distribution of U = X − 2Y . 8 7 9 Random Samples Statistics and Sampling distributions 7 Random Samples Statistics and Sampling distributions 7.2 Example 7.1.3 Suppose that X1 ∼ P oisson(λ1 = 5) and X2 ∼ P oisson(λ2 = 10) are independent. Use the method of moment generating functions to find the distribution of X1 + X2 and specify the value of the parameters). Hint: MGF of P oisson(λ): m(t) = expλ(et − 1)}. 10 Sampling Distributions Related to the Normal Distribution Sampling Distribution of Linear Combos of Normal rvs Recall Theorem 6.3: Let X1 · · · Xn be independent rvs from N (µi σi2 ) i = 1 · · · n, and let a1 · · · an be n known constants. Then the linear combination is also normally distributed: 2 U = a1 X1 + · · · + an Xn ∼ N (µU σU ) where µU = a1 µ1 + · · · + an µn 2 σU = a21 σ12 + · · · + a2n σn2 Page 321 has the proof using the method of moment generating functions. Example 7.1.2 on the slide is a special case. 7 11 Random Samples Statistics and Sampling distributions Sampling Distribution of the Sample Mean of Normals 2 Let X1 · · · Xn be a random sample from N (µ σ ). Apply Theorem 6.3 to determine the distribution of the sample mean: ¯ = (X1 + · · · + Xn )/n. X ¯ ∼ N (µ σ2 ) Answer: X n 7 Random Samples Statistics and Sampling distributions Example 7.2.1 Math) SAT scores of entering students at NCSU follow the normal distribution with mean 575 and standard deviation 40. A random sample of 25 students is selected. Find the probability that the sample mean of the 25 SAT scores is less than 585. 12 7 Random Samples Statistics and Sampling distributions 13 7 Example 7.2.2 In Example 7.2.1, how many observations should be included in the sample if we wish the sample mean to differ from the population mean by no more than 10 with probability 0.95? Random Samples Statistics and Sampling distributions 14 Sampling Distribution of Other Linear Combinations of Normals Note: Linear combinations of Normal random variables are still normal. Let X1 · · · Xn be a random sample from N (µ σ 2 ). Then • U1 = (X1 + · · · + Xn )/n ∼ N (µ σ 2 /n) • U2 = X1 + · · · + Xn ∼ • U3 = X1 − X2 ∼ • U4 = X √ σ 2 − X √2 σ 2 ∼ N (0 1) • U5 = X1 + X2 + X3 − 3X4 ∼ N (0 12σ 2 ) 7 Random Samples Statistics and Sampling distributions Example 7.2.3 Suppose that random variables Y1 Y2 and Y3 are a random sample from the normal distribution with µ = 0 and σ 2 = 1. State the distribution with associated parameter values of each of the following functions of Y1 · · · Y3 . 1. U1 = (Y1 + 2Y2 )/3 − Y3 2. U2 = Y1 + Y2 + Y3 15 7 Random Samples Statistics and Sampling distributions Sampling Distribution of the Chi-square Statistic The sum of squares of v independent standard normal rvs is a Chi-square with v degrees of freedom. That is, if Z1 · · · Zv are i.i.d. N (0 1) random variables, then U = Z12 + Z22 + · · · + Zv2 follows χ2v , the chi-square distribution with v degrees of freedom. See Theorem 6.4 in the text. Recall: Example 6.1.3 of ST421 (v = 1, proven by the method of distribution functions) or Example 6.11 in the text (proven using the method of moment generating functions). Recall: The chi-square distribution with v degrees of freedom is the same as that of the Gamma distribution with α = v/2 and β = 2. That is, χ2v ∼ Gamma(α = v/2 β = 2) 16 7 Random Samples Statistics and Sampling distributions 17 7 Sampling Distribution of the Sample Variance 18 Random Samples Statistics and Sampling distributions Proof (for n = 2): Let X1 · · · Xn be i.i.d. N (µ σ 2 ) rvs, and define the sample mean • First show that ¯ = 1 (X1 + · · · + Xn ) X n 2 ¯ = X −X) σ2 • What’s the distribution of 2 = X√ −X2 2σ X√ −X2 ? 2σ 2 and the sample variance n S2 = 1 ¯ 2. (Xi − X) n − 1 i=1 Then (n − 1)S 2 ∼ χ2n−1 σ2 i.e. it follows the chi-square distribution with v = n − 1 degrees of freedom. That is, n ¯ 2 i=1 (Xi − X) ∼ χ2n−1 . σ2 7 Random Samples Statistics and Sampling distributions Properties of the chi-square distribution The percentage points for the chi-square distribution are tabulated in Table 6 of Appendix 3. Suppose X is a rv following the chi-square distribution with v degrees of freedom. For a given α, the table gives the value χ2vα that solves: P (X ≥ χ2vα ) = α. The value χ2vα is the αth percentage point, and the (1 − α)th quantile of the chi-square distribution with v degrees of freedom. 19 7 Random Samples Statistics and Sampling distributions Example 7.2.4 Math) SAT scores of entering students at NCSU follow the normal distribution with mean of 575 and standard deviation of 40. A random sample of 25 students is selected. Find the probability that the sample variance of the 25 Math SAT scores is less than 2200. 20 7 21 Random Samples Statistics and Sampling distributions 7 Random Samples Statistics and Sampling distributions 22 Example 7.2.5 Suppose that random variables Y1 Y2 and Y3 are a random sample from the normal distribution with µ = 1 and σ 2 = 4. 3 3 Find the distribution of U = i=1 (Yi − Y¯ )2 , where Y¯ = 1/3 i=1 Yi . ¯ and S 2 are inIf X1 · · · Xn are i.i.d. N (µ σ 2 ) rvs, then X dependent. ¯ and S 2 (n = 2): Justification for the independence of X • U1 = X1 + X2 and U2 = X1 − X2 can be shown to be independent (see Example 6.13 on WMS text) ¯ is only a function of U1 , and S 2 is only a function of U2 , so X ¯ • X 2 and S are also independent. ¯ and S 2 for N (µ σ 2 ) r.s.: Properties of X ¯ ∼ N (µ σ 2 /n) 1. X 2. (n − 1)S 2 /σ 2 ∼ χ2n−1 ¯ and S 2 are independent 3. X (proof of 3. requires distribution theory for multivariate functions of random variables. ) 7 23 Random Samples Statistics and Sampling distributions Student’s t-statistic Random Samples Statistics and Sampling distributions Properties of the t-distribution χ2v ∼ chi-square with v degrees of Definition: If Z ∼ N (0 1) and freedom, and if Z and χ2v are independent, then the statistic t = tv = Z χ2v /v is a Student’s t-statistic with v degrees of freedom, and it has the pdf: −v+1)/2 t2 f (t) = K 1 + −∞ < t < ∞ v where 7 Γ((v + 1)/2) . K= √ vπΓ(v/2) See Exercise 7.98 for the derivation of the pdf. • Like the standard Normal, it is symmetric about 0. • Centered at 0, and has a shape similar to that of the Normal. • Has a fatter tail and larger variation than the standard Normal • As v → ∞, tv → N (0 1) • The mean and variance of the t-distribution are: E(tv ) = 0 if v > 1 V (tv ) = v if v > 2 v−2 • t1 : Cauchy(0,1) distribution, which has no mean, variance or higher moments defined. 24 7 25 Random Samples Statistics and Sampling distributions 7 Random Samples Statistics and Sampling distributions 26 Properties of the t-distribution A comparison of histograms of N (0 1), t3 and t10 : The percentage points for the t-distribution are tabulated in Table 5 of Appendix 3. For a given α, the table gives the value tvα that solves: P (tv ≥ tvα ) = α. The value tvα is the αth percentage point, and the (1 − α)th quantile of the t-distribution with v degrees of freedom. 7 27 Random Samples Statistics and Sampling distributions 7 Random Samples Statistics and Sampling distributions The t-statistic in Normal Samples William Gosset aka Student ¯ and S are the sample mean and variance of a random sample If X from N (µ σ 2 ), then William Gosset who worked at Guinness brewery published an article in 1908 on Biometrika describing the t-statistic and its distribution. He published under the pseudonym “Student”. 2 Z = ¯ −µ X √ ∼ N (0 1) σ/ n (1) 2 Verify (3)A W = T = (n − 1)S ∼ χ2n−1 σ2 ¯ −µ X √ ∼ tn−1 S/ n (2) (3) See: http://enwikipediaorg/wiki/William_Gosset 28 7 Random Samples Statistics and Sampling distributions 29 7 Example 7.2.6 Suppose T ∼ t5 . Random Samples Statistics and Sampling distributions 30 F Distribution–Definition Let W1 and W2 be independent chi-square random variables with v1 and v2 degrees of freedom, respectively. Then the statistic 1. Calculate P (T > 2) 2. Find c such that P (|T | < c) = 0.90 F = 3. Find c such that P (T > c) = 0.25 W1 /v1 W2 /v2 is said to follow the F distribution with v1 and v2 df . The v1 is called the numerator df and v2 is the denominator df . We denote F ∼ F (v1 v2 ). The pdf of the F Distribution The pdf for F (v1 v2 ) can be derived: −v +v2 )/2 2 Γ( v +v v1 v1 2 ) xv /2−1 1 + x fF (x) = Γ(v1 /2)Γ(v2 /2) v2 v2 for 0 < x < ∞, where v1 and v2 are positive integers. 7 Random Samples Statistics and Sampling distributions F Distribution–Percentage Points The percentage points for the F distribution are tabulated in Table 7 of Appendix 3. For given values of v1 v2 and α, the table gives the value Fv v2 α that solves: P (F ≥ Fv v2 α ) = α. where Fv v2 is a rv with the F distribution and v1 and v2 df. Fv v2 α denotes the αth percentage point of the F distribution with v1 v2 df . 31 7 Random Samples Statistics and Sampling distributions Histogram of F distribution, and the percentage point. 32 7 33 Random Samples Statistics and Sampling distributions 7 Random Samples Statistics and Sampling distributions independent random samples from N (µ1 σ 2 ) and N (µ2 σ 2 ), respectively. (Note the equality of population variances.) Let F Distribution For v2 > 2, E(Fv v2 ) = v2 /(v2 − 2). n Si2 = One useful result: If F ∼ F (v1 v2 ), then 1/F ∼ F (v2 v2 ). Why? Fv v2 1−α = (ni − 1)Si2 ∼ χ2n −1 i = 1 2 σ2 and W1 and W2 are independent. From these we can form Wi = 1 Fv2 v α F F Distribution–Connection to Normal Notice that if we have two independent samples of sizes n1 and n2 respectively from normal distributions with a common variance, then S12 /S22 has a F distribution with (n1 − 1) numerator df and (n2 − 1) denominator df . Example 7.2.7 Compute P (S1 ≤ cS2 ) for c2 = 2 n1 = 11 n2 = 18 using Appendix 3. = = W1 /(n1 − 1) W2 /(n2 − 1) S2 (n1 − 1)S12 /[σ 2 (n1 − 1)] = 12 . 2 2 (n2 − 1)S2 /[σ (n2 − 1)] S2 Therefore, under the stated condition, S12 /S22 follows the F distribution with v1 = n1 − 1 and v2 = n2 − 1 df . More formally, suppose X11 · · · X1n and X21 · · · X2n2 are Random Samples Statistics and Sampling distributions 1 ¯ i )2 i = 1 2. (Xij − X ni − 1 j=1 Then Percentage Point relationship (extends tables): 7 34 35 7 Random Samples Statistics and Sampling distributions Sir Ronald Fisher 1890-1962) The F stands for Fisher. Sir Ronald Fisher was a British geneticist and statistician who is often referred to as the father of modern statistics. In the 1940s, Fisher made several extended visits to NC State. http://enwikipediaorg/wiki/Ronald_Fisher The F statistic is used to test important hypotheses in the analysis of variance and in regression analysis. 36 7 37 Random Samples Statistics and Sampling distributions 7 Review for normal samples • (n − 1)Sx2 /σx2 ∼ χ2n−1 , Sx2 = (n − 1)−1 ¯ and S 2 are independent • X • T = • F = 7 ¯ X−µ √x Sx / n 2 2 Sx σy 2 Sy2 σx n i=1 (Xi 7.3 1. U1 = Y¯ 2. U2 = Y12 + Y22 + Y32 ¯ 2. − X) 3. U3 = ∼ Fn−1m−1 Central Limit Theorem Central Limit Theorem Assume that X1 · · · Xn are i.i.d. rvs with finite mean µ and variance σ 2 . Suppose that σ 2 < ∞. Then as n → ∞, the distribution of ¯ −µ X √ Un = σ/ n approaches that of the standard normal N (0 1). Thus, for large n we may use the approximation: ¯ X −µ √ ≤ t ≈ P (Z ≤ t) P σ/ n where Z is a standard normal rv. The approximation improves as n increases. √ Y +Y2 )/ 2 √ 2 Y3 4. U4 = √ ∼ tn−1 Random Samples Statistics and Sampling distributions 38 Example 7.2.8 Suppose that Y1 · · · Y3 are independent standard normal random variables. State the distribution with associated parameter values of each of the following functions of Y1 Y2 and Y3 . Suppose X1 · · · Xn i.i.d. ∼ N (µx σx2 ) and Y1 · · · Ym i.i.d. N (µy σy2 ) are independent. Then we have ¯ ∼ N (µx σx2 /n) • X Random Samples Statistics and Sampling distributions 5. U5 = 39 7 Y 0.5Y22 +Y32 ) 2Y2 Y22 +Y32 Random Samples Statistics and Sampling distributions ¯ is asymptotically normal AN) When the CLT holds, we say X 2 with mean µ and variance σn , and write σ2 ∙ ¯∼ X N µ . n n ¯ is Similarly, when the CLT holds, we say i=1 Xi = nX asymptotically normal (AN) with mean nµ and variance nσ 2 , i.e. n i=1 Xi ∼ N (nµ nσ 2 ). ∙ √ Alternatively, the distribution of n(X − µ) is said to converge in distribution, or in law, to that of a mean 0 normal RV: √ n(X − µ) −→ N (0 σ 2 ) 40 7 Random Samples Statistics and Sampling distributions 41 7 Figure 1: In this simulation experiment random samples of size n(= 1 10 20 30 40 100) were simulated from a Uniform(0, 1) and the sample mean x ¯ was calculated. The histograms are based on the 1000 simulated sample means. Normality is achieved with n = 10 7 Random Samples Statistics and Sampling distributions Random Samples Statistics and Sampling distributions Figure 2: In this simulation experiment random samples of size n(= 1 10 20 30 40 100) were simulated from a Exponential(1) and the sample mean x ¯ was calculated. The histograms are based on the 1000 simulated sample means. Normality is achieved with n = 20 43 7 Random Samples Statistics and Sampling distributions Example 7.3.1 Let X1 · · · Xn be a random sample of size n of inter-arrival times between calls to a switchboard. It is known that the Xi ’s are exponentially distributed with mean arrival rate 2 seconds. Find the probability that the sample mean of 36 observations will be less than 2.1. Figure 3: In this simulation experiment random samples of size n(= 1 10 20 30 40 100) were simulated from a t1 = Cauchy(0 1) and the sample mean x ¯ was calculated. The histograms are based on the 1000 simulated sample means. Clearly the CLT fails hereAA Why? 42 44 7 45 Random Samples Statistics and Sampling distributions 7 Continuity Correction: Approximate Distribution of a Discrete rv ∙ Suppose Y ∼ Binomial(n p). Let 1 if trial i is a success Xi = 0 otherwise 2 = P (X < 131) ≈ P (Z ≤ Then Y = X1 + · · · + Xn . Furthermore, X1 · · · Xn are independent and Xi ∼ Binomial(1 p) i = 1 · · · p. Thus, E(Xi ) = p and V (Xi ) = p(1 − p) ≤ 0.25 < ∞. By the CLT , 130.5 − 125 ) 5 = P (Z ≤ 1.1) = 0.8643 ∙ ¯∼ X N (p p(1 − p)/n) where 0.5 is added to 130 as a correction for continuity. A continuity correction can also be applied when other discrete distributions supported on the integers are approximated by the normal distribution. 7 Random Samples Statistics and Sampling distributions Criteria for Approximating Binomial(n p) with N (np np(1 − p)) The approximation is acceptable when 0 < p − 3 p(1 − p)/n and p+3 p(1 − p)/n < 1 both hold. These hold when n is moderately large and p is not near 0 or 1. In some other texts, the following criteria is used: np ≥ 10 and n(1 − p) ≥ 10. 46 Approximating the Binomial with the Normal Suppose that X ∼ N (µ σ ) but is discrete, and suppose X is measured to the nearest whole unit. For example, suppose X =weight of female patients measured to the nearest lb and suppose ∙ X ∼ N (µ = 125 σ 2 = 25). Then P (X ≤ 130) Random Samples Statistics and Sampling distributions and ¯ ∼ N (np np(1 − p)). Y = nX ∙ Therefore, if n is “large”, with the continuity correction, (y + 0.5) − np P (Y ≤ y) ≈ P Z ≤ . np(1 − p) 47 7 Random Samples Statistics and Sampling distributions Example 7.3.2 Suppose that Y ∼ Binomial(50 0.25), then calculate P (Y ≤ 10) with the normal approximation. 48
© Copyright 2024 ExpyDoc