Chapter 7: Cluster sampling design 2: Two-stage sampling Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 1 / 26 Introduction 1 Introduction 2 Estimation 3 Examples Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 2 / 26 Introduction Two-stage sampling Setup: 1 2 Stage 1: Draw AI ⊂ UI via pI (·) Stage 2: For every i ∈ AI , draw Ai ⊂ Ui via pi (· | AI ) Sample of elements: A = ∪i∈AI Ai Some simplifying assumptions 1 2 Invariance of the second-stage design pi (· | AI ) = pi (·) for every i ∈ UI and for every AI such that i ∈ AI Independence of the second-stage design P (∪i∈AI Ai | AI ) = Y Pr (Ai | AI ) i∈AI Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 3 / 26 Introduction Remark: A non-invariant design is two-phase sampling design. 1 2 Phase 1: Select a sample and observe xi Phase 2: Based on the observed value of xi , the second-phase sampling design is determined. The second-phase sample is selected by the second-phase sampling design. Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 4 / 26 Introduction Notation: Sample size nI : Number of PSU’s in the sample. m Pi : Number of sampled elements in Ai . i∈AI mi = |A|: The number of sampled elements. Notation: Inclusion probability Cluster inclusion probability: πIi and πIij (same as in the single-stage cluster sampling) Conditional inclusion probability: πk|i = Pr [k ∈ Ai | i ∈ AI ] πkl|i = Pr [k, l ∈ Ai | i ∈ AI ] ∆kl|i = πkl|i − πk|i πl|i . In general, πk|i is a random variable (in the sense that it is a function of AI ). Under invariance, it is fixed. Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 5 / 26 Introduction Element inclusion probability First order inclusion probability πik = Pr {(ik) ∈ A} = Pr (k ∈ Ai | i ∈ AI ) Pr (i ∈ AI ) = πk|i πIi . Second order inclusion probability πIi πk|i π π πik,jl = Ii kl|i πIij πk|i πl|j Kim (ISU) if i = j and k = l if i = j and k 6= l if i 6= j Ch. 7: Cluster sampling design 2 Fall, 2014 6 / 26 Estimation 1 Introduction 2 Estimation 3 Examples Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 7 / 26 Estimation HT estimation HT estimation for Y = P i∈UI Yi = P i∈UI P k∈Ui yik : X X yik X Yˆi = πIi πk|i πIi YˆHT = i∈AI k∈Ai i∈AI Properties of YˆHT 1 2 Unbiased Variance V YˆHT = VPSU + VSSU where VPSU = XX i∈UI j∈UI VSSU = ∆Iij Yi Yj πIi πIj XX X Vi yik yil ∆kl|i , Vi = V Yˆi | Ai = . πIi πk|i πl|i i∈UI Kim (ISU) k∈Ui l∈Ui Ch. 7: Cluster sampling design 2 Fall, 2014 8 / 26 Estimation Remark 1 If AI = UI , then the design is a stratified sampling. that πIi = 1, Note P ˆ πIij = 1, and ∆Iij = 0 for all i, j. Thus, V YHT = i∈UI Vi /1. 2 If Ai = Ui for every i ∈ AI , then the design is single-stage cluster sampling and V YˆHT = VPSU . Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 9 / 26 Estimation Variance estimation Variance estimation Vˆ YˆHT = VˆPSU + VˆSSU X Vˆi X X ∆Iij Yˆi Yˆj + , πIij πIi πIj πIi = i∈AI i∈AI j∈AI where VˆPSU = X X ∆Iij Yˆi Yˆj X 1 1 − − 1 Vˆi πIij πIi πIj πIi πIi i∈AI j∈AI VˆSSU i∈AI X Vˆi = πIi2 i∈AI and Vˆi satisfies E Vˆi | AI Kim (ISU) = V Yˆi | AI . Ch. 7: Cluster sampling design 2 Fall, 2014 10 / 26 Estimation Variance estimation (Cont’d) Here, we used the fact E Yˆi Yˆj | AI = Yi Yj Vi + Yi2 if i 6= j if i = j. by the independence of the second-stage sampling across the clusters. P ˆ . Often, i∈AI πVIii is ignored. (if nI /NI = 0). Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 11 / 26 Estimation Justification Let Vˆ ∗ = X X ∆Iij Yˆi Yˆj . πIij πIi πIj i∈AI j∈AI Then, we have X X ∆ 1 Iij E {Vˆ ∗ } = E1 E2 (Yˆi Yˆj ) πIij πIi πIj i∈AI j∈AI X ∆Iii Vi X X ∆Iij Yi Yj +E = E πIij πIi πIj πIii πIi2 i∈AI j∈AI = XX i∈UI j∈UI i∈AI X πIi − π 2 X Yi Yj Ii ˆHT ) − ∆Iij + V = V ( Y Vi . i πIi πIj πIi2 i∈UI i∈UI Biased downward. The bias is of order O(NI ). Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 12 / 26 Estimation Example 1 Two-stage sampling design 1 2 Stage One: Simple random sampling of clusters of size nI from NI clusters. Stage Two: Simple random sampling of size mi from Mi elements in the sampled cluster i P I PMi PNI HT estimator of Y¯ = N −1 N i=1 j=1 yij , where N = i=1 Mi is assumed to be known: NI X ˆ 1 X Mi X Yˆ¯HT = Yi = yij ¯ nI N mi nI M i∈A j∈A i∈A I ¯ = N −1 where M I Kim (ISU) I i PNI i=1 Mi . Ch. 7: Cluster sampling design 2 Fall, 2014 13 / 26 Estimation Example 1 (Cont’d) Variance 1 X V {YˆHT } = V Yi ¯ nI M i∈AI 1 X M2 X mi 1 i ¯i )2 Y 1 − (y − + E ik ¯2 nI2 M mi Mi Mi − 1 k∈U i∈A i I = 1 nI nI 1− NI 2 Sq1 NI X Mi2 1 + ¯2 mi nI NI M i=1 mi 1− Mi S2i2 PNI 2 = (N − 1)−1 ¯ where Sq1 ¯1 )2 with qi = Yi /M, I i=1 (qi − q P P NI −1 2 −1 ¯ 2 q¯1 = NI k∈Ui (yik − Yi ) . i=1 qi , and S2i = (Mi − 1) Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 14 / 26 Estimation Example 1 (Cont’d) If the sampling rate for the second stage sampling is constant such that mi /Mi = f2 , then we can write V {YˆHT } = NI 1 1 1 X 2 (1 − f1 )Sq1 + (1 − f2 ) Mi S2i2 nI nI m ¯ N i=1 1 1 (1 − f1 )B 2 + (1 − f2 )W 2 = nI nI m ¯ P I where f1 = nI /NI and m ¯ = NI−1 N i=1 mi . Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 15 / 26 Estimation Example 1 (Cont’d) Minimizing V (YˆHT ) = 1 2 1 B + W2 nI nI m ¯ subject to C = c0 + c1 nI + c2 nI m ¯ lead to s m ¯ opt = Kim (ISU) c1 W 2 × 2 c2 B Ch. 7: Cluster sampling design 2 Fall, 2014 16 / 26 Estimation Example 1 (Cont’d) If Mi are equal (Mi = M), then the following results are satisfied We have B 2 = Sb2 /M and W 2 = SSW · M/(M − 1) · 1/(NI M) = Sw2 . Thus, for sufficienty large M, we have S 2 = B 2 + W 2 . The homogeneity measure defined by δ= B2 B2 + W 2 is equal to the intracluster correlation coefficient ρ. Ignoring f1 term, 1 {1 + (m ¯ − 1)δ} Sy2 nI m ¯ ¯HT ) {1 + (m = VSRS (Yˆ ¯ − 1)δ} V {Yˆ¯HT } = Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 17 / 26 Estimation Example 1 (Cont’d) Even when Mi are unequal, we can still express 1 {1 + (m ¯ − 1)δ} kS 2 nI m ¯ ¯HT )k {1 + (m = VSRS (Yˆ ¯ − 1)δ} V {Yˆ¯HT } = where k = (B 2 + W 2 )/S 2 . Thus, deff = k {1 + (m ¯ − 1)δ} . If Mi = M, then k = 1. Otherwise, it is greater than one. Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 18 / 26 Examples 1 Introduction 2 Estimation 3 Examples Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 19 / 26 Examples Example 2 : Special case of Example 1 Sampling design 1 2 Stage 1: Select SRS of clusters of size nI from a population of NI clusters. Stage 2: Select SRS of elements of size m from a cluster of Mi = M elements at each cluster. P I PM HT estimator of Y¯ = N j=1 yij /(NI M): i=1 XX ¯ = 1 Yˆ yij nI m i∈AI j∈Ai Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 20 / 26 Examples Example 2 (Cont’d) Variance 2 Sb m Sw2 nI ˆ ¯ + 1− V Y = 1− NI nI M M nI m Or, nI S12 m S22 V Yˆ¯ = 1 − + 1− NI nI M nI m where N S12 = I 2 S 2 1 X Y¯i − Y¯ = b NI − 1 M i=1 and N S22 = Sw2 = M I X X 2 1 yij − Y¯i . NI (M − 1) i=1 j=1 Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 21 / 26 Examples Example 2 (Cont’d) Variance components: Sb2 = S 2 {1 + (M − 1) ρ} Sw2 = S 2 (1 − ρ) Ignoring f1 = nI /NI , . S2 ¯ = V Yˆ {1 + (m − 1) ρ} nI m Thus, Design effect = 1 + (m − 1) ρ. Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 22 / 26 Examples Example 2 (Cont’d) Variance estimation nI s12 nI m s22 Vˆ Yˆ¯ = 1 − + 1− NI nI NI M nI m where s12 = (nI − 1)−1 X ¯ y¯i − Yˆ 2 i∈AI s22 = nI−1 (m − 1)−1 XX (yij − y¯i )2 i∈AI j∈Ai and y¯i = P j∈Ai yij /m. . ¯ ) = s 2 /nI . If nI /NI = 0, then Vˆ (Yˆ 1 Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 23 / 26 Examples Example 3: Two-stage PPS sampling Sampling design 1 2 Stage One: PPS sampling of nI clusters with MOS = Mi Stage Two: SRS sampling of m elements in each selected clusters. Estimation of mean nI 1 X ˆ ¯ zk YPPS = nI k=1 where zk = tˆi /Mi if cluster i is selected in the k-th PPS sampling and tˆi = Mi X yij . m j∈Ai Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 24 / 26 Examples Example 3 (Cont’d) We can express XX ¯PPS = 1 Yˆ yij . nI m i∈AI j∈Ai Self-weighting design: equal weights Variance estimation ¯PPS = 1 sz2 Vˆ Yˆ nI where n sz2 = I 1 X (zk − z¯n )2 nI − 1 k=1 and zk = tˆi /Mi = Yˆ¯i if cluster i is selected in the k-th PPS sampling. Very popular in practice. Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 25 / 26 Examples Example 3 (Cont’d) Very popular in practice for the following reasons: 1 2 3 4 5 pi ∝ Mi : efficient Point estimation is easy (self-weighting) Interviewer workloads are equal Simple variance estimation. Works for multi-stage sampling design. Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 26 / 26
© Copyright 2025 ExpyDoc