Chapter 7 - Jae Kwang Kim

Chapter 7: Cluster sampling design 2:
Two-stage sampling
Jae-Kwang Kim
Iowa State University
Fall, 2014
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
1 / 26
Introduction
1
Introduction
2
Estimation
3
Examples
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
2 / 26
Introduction
Two-stage sampling
Setup:
1
2
Stage 1: Draw AI ⊂ UI via pI (·)
Stage 2: For every i ∈ AI , draw Ai ⊂ Ui via pi (· | AI )
Sample of elements: A = ∪i∈AI Ai
Some simplifying assumptions
1
2
Invariance of the second-stage design pi (· | AI ) = pi (·) for every i ∈ UI
and for every AI such that i ∈ AI
Independence of the second-stage design
P (∪i∈AI Ai | AI ) =
Y
Pr (Ai | AI )
i∈AI
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
3 / 26
Introduction
Remark: A non-invariant design is two-phase sampling design.
1
2
Phase 1: Select a sample and observe xi
Phase 2: Based on the observed value of xi , the second-phase sampling
design is determined. The second-phase sample is selected by the
second-phase sampling design.
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
4 / 26
Introduction
Notation: Sample size
nI : Number of PSU’s in the sample.
m
Pi : Number of sampled elements in Ai .
i∈AI mi = |A|: The number of sampled elements.
Notation: Inclusion probability
Cluster inclusion probability: πIi and πIij (same as in the single-stage
cluster sampling)
Conditional inclusion probability:
πk|i
= Pr [k ∈ Ai | i ∈ AI ]
πkl|i
=
Pr [k, l ∈ Ai | i ∈ AI ]
∆kl|i
=
πkl|i − πk|i πl|i .
In general, πk|i is a random variable (in the sense that it is a function
of AI ). Under invariance, it is fixed.
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
5 / 26
Introduction
Element inclusion probability
First order inclusion probability
πik = Pr {(ik) ∈ A} = Pr (k ∈ Ai | i ∈ AI ) Pr (i ∈ AI ) = πk|i πIi .
Second order inclusion probability

 πIi πk|i
π π
πik,jl =
 Ii kl|i
πIij πk|i πl|j
Kim (ISU)
if i = j and k = l
if i = j and k 6= l
if i 6= j
Ch. 7: Cluster sampling design 2
Fall, 2014
6 / 26
Estimation
1
Introduction
2
Estimation
3
Examples
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
7 / 26
Estimation
HT estimation
HT estimation for Y =
P
i∈UI
Yi =
P
i∈UI
P
k∈Ui
yik :
X X yik
X Yˆi
=
πIi
πk|i πIi
YˆHT =
i∈AI k∈Ai
i∈AI
Properties of YˆHT
1
2
Unbiased
Variance
V YˆHT = VPSU + VSSU
where
VPSU
=
XX
i∈UI j∈UI
VSSU
=
∆Iij
Yi Yj
πIi πIj
XX
X Vi
yik yil
∆kl|i
, Vi = V Yˆi | Ai =
.
πIi
πk|i πl|i
i∈UI
Kim (ISU)
k∈Ui l∈Ui
Ch. 7: Cluster sampling design 2
Fall, 2014
8 / 26
Estimation
Remark
1
If AI = UI , then the design is a stratified sampling.
that πIi = 1,
Note
P
ˆ
πIij = 1, and ∆Iij = 0 for all i, j. Thus, V YHT = i∈UI Vi /1.
2
If Ai = Ui for every
i ∈ AI , then the design is single-stage cluster
sampling and V YˆHT = VPSU .
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
9 / 26
Estimation
Variance estimation
Variance estimation
Vˆ YˆHT
= VˆPSU + VˆSSU
X Vˆi
X X ∆Iij Yˆi Yˆj
+
,
πIij πIi πIj
πIi
=
i∈AI
i∈AI j∈AI
where
VˆPSU
=
X X ∆Iij Yˆi Yˆj
X 1 1
−
− 1 Vˆi
πIij πIi πIj
πIi πIi
i∈AI j∈AI
VˆSSU
i∈AI
X Vˆi
=
πIi2
i∈AI
and Vˆi satisfies E Vˆi | AI
Kim (ISU)
= V Yˆi | AI .
Ch. 7: Cluster sampling design 2
Fall, 2014
10 / 26
Estimation
Variance estimation (Cont’d)
Here, we used the fact
E Yˆi Yˆj | AI
=
Yi Yj
Vi + Yi2
if i 6= j
if i = j.
by the independence of the second-stage sampling across the clusters.
P
ˆ
.
Often, i∈AI πVIii is ignored. (if nI /NI = 0).
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
11 / 26
Estimation
Justification
Let
Vˆ ∗ =
X X ∆Iij Yˆi Yˆj
.
πIij πIi πIj
i∈AI j∈AI
Then, we have


X X ∆

1
Iij
E {Vˆ ∗ } = E1
E2 (Yˆi Yˆj )


πIij πIi πIj
i∈AI j∈AI




X ∆Iii Vi
X X ∆Iij Yi Yj

+E
= E
πIij πIi πIj
πIii πIi2
i∈AI j∈AI
=
XX
i∈UI j∈UI
i∈AI
X πIi − π 2
X
Yi Yj
Ii
ˆHT ) −
∆Iij
+
V
=
V
(
Y
Vi .
i
πIi πIj
πIi2
i∈UI
i∈UI
Biased downward. The bias is of order O(NI ).
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
12 / 26
Estimation
Example 1
Two-stage sampling design
1
2
Stage One: Simple random sampling of clusters of size nI from NI
clusters.
Stage Two: Simple random sampling of size mi from Mi elements in
the sampled cluster i
P I PMi
PNI
HT estimator of Y¯ = N −1 N
i=1
j=1 yij , where N =
i=1 Mi is
assumed to be known:
NI X ˆ
1 X Mi X
Yˆ¯HT =
Yi =
yij
¯
nI N
mi
nI M
i∈A
j∈A
i∈A
I
¯ = N −1
where M
I
Kim (ISU)
I
i
PNI
i=1 Mi .
Ch. 7: Cluster sampling design 2
Fall, 2014
13 / 26
Estimation
Example 1 (Cont’d)
Variance


 1 X 
V {YˆHT } = V
Yi
¯
 nI M

i∈AI



 1 X M2 X
mi
1
i
¯i )2
Y
1
−
(y
−
+ E
ik
¯2

 nI2 M
mi
Mi Mi − 1
k∈U
i∈A
i
I
=
1
nI
nI
1−
NI
2
Sq1
NI
X
Mi2
1
+
¯2
mi
nI NI M
i=1
mi
1−
Mi
S2i2
PNI
2 = (N − 1)−1
¯
where Sq1
¯1 )2 with qi = Yi /M,
I
i=1 (qi − q
P
P
NI
−1
2
−1
¯ 2
q¯1 = NI
k∈Ui (yik − Yi ) .
i=1 qi , and S2i = (Mi − 1)
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
14 / 26
Estimation
Example 1 (Cont’d)
If the sampling rate for the second stage sampling is constant such
that mi /Mi = f2 , then we can write
V {YˆHT } =
NI
1
1
1 X
2
(1 − f1 )Sq1
+
(1 − f2 )
Mi S2i2
nI
nI m
¯
N
i=1
1
1
(1 − f1 )B 2 +
(1 − f2 )W 2
=
nI
nI m
¯
P I
where f1 = nI /NI and m
¯ = NI−1 N
i=1 mi .
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
15 / 26
Estimation
Example 1 (Cont’d)
Minimizing
V (YˆHT ) =
1 2
1
B +
W2
nI
nI m
¯
subject to
C = c0 + c1 nI + c2 nI m
¯
lead to
s
m
¯ opt =
Kim (ISU)
c1 W 2
× 2
c2
B
Ch. 7: Cluster sampling design 2
Fall, 2014
16 / 26
Estimation
Example 1 (Cont’d)
If Mi are equal (Mi = M), then the following results are satisfied
We have B 2 = Sb2 /M and W 2 = SSW · M/(M − 1) · 1/(NI M) = Sw2 .
Thus, for sufficienty large M, we have S 2 = B 2 + W 2 .
The homogeneity measure defined by
δ=
B2
B2 + W 2
is equal to the intracluster correlation coefficient ρ.
Ignoring f1 term,
1
{1 + (m
¯ − 1)δ} Sy2
nI m
¯
¯HT ) {1 + (m
= VSRS (Yˆ
¯ − 1)δ}
V {Yˆ¯HT } =
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
17 / 26
Estimation
Example 1 (Cont’d)
Even when Mi are unequal, we can still express
1
{1 + (m
¯ − 1)δ} kS 2
nI m
¯
¯HT )k {1 + (m
= VSRS (Yˆ
¯ − 1)δ}
V {Yˆ¯HT } =
where k = (B 2 + W 2 )/S 2 . Thus,
deff = k {1 + (m
¯ − 1)δ} .
If Mi = M, then k = 1. Otherwise, it is greater than one.
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
18 / 26
Examples
1
Introduction
2
Estimation
3
Examples
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
19 / 26
Examples
Example 2 : Special case of Example 1
Sampling design
1
2
Stage 1: Select SRS of clusters of size nI from a population of NI
clusters.
Stage 2: Select SRS of elements of size m from a cluster of Mi = M
elements at each cluster.
P I PM
HT estimator of Y¯ = N
j=1 yij /(NI M):
i=1
XX
¯ = 1
Yˆ
yij
nI m
i∈AI j∈Ai
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
20 / 26
Examples
Example 2 (Cont’d)
Variance
2
Sb
m Sw2
nI
ˆ
¯
+ 1−
V Y
=
1−
NI nI M
M nI m
Or,
nI S12 m S22
V Yˆ¯ = 1 −
+ 1−
NI nI
M nI m
where
N
S12
=
I
2 S 2
1 X
Y¯i − Y¯ = b
NI − 1
M
i=1
and
N
S22 = Sw2 =
M
I X
X
2
1
yij − Y¯i .
NI (M − 1)
i=1 j=1
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
21 / 26
Examples
Example 2 (Cont’d)
Variance components:
Sb2 = S 2 {1 + (M − 1) ρ}
Sw2
= S 2 (1 − ρ)
Ignoring f1 = nI /NI ,
. S2
¯ =
V Yˆ
{1 + (m − 1) ρ}
nI m
Thus, Design effect = 1 + (m − 1) ρ.
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
22 / 26
Examples
Example 2 (Cont’d)
Variance estimation
nI s12
nI m s22
Vˆ Yˆ¯ = 1 −
+
1−
NI nI
NI
M nI m
where
s12 = (nI − 1)−1
X
¯
y¯i − Yˆ
2
i∈AI
s22
=
nI−1 (m
− 1)−1
XX
(yij − y¯i )2
i∈AI j∈Ai
and y¯i =
P
j∈Ai
yij /m.
.
¯ ) = s 2 /nI .
If nI /NI = 0, then Vˆ (Yˆ
1
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
23 / 26
Examples
Example 3: Two-stage PPS sampling
Sampling design
1
2
Stage One: PPS sampling of nI clusters with MOS = Mi
Stage Two: SRS sampling of m elements in each selected clusters.
Estimation of mean
nI
1 X
ˆ
¯
zk
YPPS =
nI
k=1
where zk = tˆi /Mi if cluster i is selected in the k-th PPS sampling and
tˆi =
Mi X
yij .
m
j∈Ai
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
24 / 26
Examples
Example 3 (Cont’d)
We can express
XX
¯PPS = 1
Yˆ
yij .
nI m
i∈AI j∈Ai
Self-weighting design: equal weights
Variance estimation
¯PPS = 1 sz2
Vˆ Yˆ
nI
where
n
sz2 =
I
1 X
(zk − z¯n )2
nI − 1
k=1
and zk = tˆi /Mi = Yˆ¯i if cluster i is selected in the k-th PPS sampling.
Very popular in practice.
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
25 / 26
Examples
Example 3 (Cont’d)
Very popular in practice for the following reasons:
1
2
3
4
5
pi ∝ Mi : efficient
Point estimation is easy (self-weighting)
Interviewer workloads are equal
Simple variance estimation.
Works for multi-stage sampling design.
Kim (ISU)
Ch. 7: Cluster sampling design 2
Fall, 2014
26 / 26