Download pdf

Initiatie Wetenschappelijk Onderzoek
Biostatistiek — Jaar 1
Geert Molenberghs
m.m.v. Geert Verbeke
[email protected]
[email protected]
Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat)
KU Leuven (& UHasselt), Belgium
Interuniversity Institute for Biostatistics
and statistical Bioinformatics
www.ibiostat.be &
www.kuleuven.ac.be/biostat/
Contents
1
I
Some References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fundamental Concepts
1
3
2
Introductory material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
3
What is statistics ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
4
Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5
Confidence intervals & hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6
Use and misuse of statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
i
7
Data Structures and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
II
Contingency Tables
8
Contingency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
III
t Test
9
Comparing Groups with Continuous Outcomes: the t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
IV
Linear Regression
10
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
11
Simple (Single) Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
12
Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
13
Influential Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
100
171
181
ii
V
Analysis of Variance
14
1-way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
VI
Logistic Regression
15
Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
16
Use of Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
17
Case Study: Ille-et-Villaine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
287
323
iii
Chapter 1
Some References
General
• Bailar, J.C. and Mosteller, F. (1992) Medical Uses of Statistics. Boston: NEJM Books.
• Chatterjee, S., Handcock, M.S., and Simonoff, J.S. (1995) A Casebook for a First Course in Statistics and Data
Analysis. New York: John Wiley.
• Dunn, G. and Everitt, B. (1995) Clinical Biostatistics. London: Arnold.
• Everitt, B. and Dunn, G. (1998) Statistical Analysis of Medical Data. London: Arnold.
• Hill, A.B. (1977) A Short Textbook of Medical Statistics. 10th ed. Philadelphia: J.B. Lippincott Co.
• Pagano, M. and Gauvreau, K. (1993) Principles of Biostatistics. Belmont: Duxbury Press.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
1
• Rosner, B. (1995) Fundamentals of Biostatistics. Belmont: Duxbury Press.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
2
Part I
Fundamental Concepts
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
3
Chapter 2
Introductory material
. Motivation
. Course material
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
4
2.1
Motivation
• Statistics in the (bio-)medical literature
• Correct analysis of collected data
• Correct interpretation of results
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
5
2.2
Course material
• Copies of the course notes
• Papers from (bio-)medical literature
• Vestac JAVA applets
. Online:
http://ucs.kuleuven.be/links/index.htm
. Local installation:
http://ucs.kuleuven.be/java/download/download.html
and follow instructions
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
6
Chapter 3
What is statistics ?
. Example
. Population – sample
. Random variability
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
7
3.1
Example: Captopril data
• 15 patients with hypertension
• The response of interest is the supine blood pressure, before and after treatment
with CAPTOPRIL
• Research question:
How does treatment affect BP ?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
8
• Dataset ‘Captopril’
Before
After
Pati¨ent
SBP
DBP
SBP
DBP
1
210
130
201
125
2
169
122
165
121
3
187
124
166
121
4
160
104
157
106
5
167
112
147
101
6
176
101
145
85
7
185
121
168
98
8
206
124
180
105
9
173
115
147
103
10
146
102
136
98
11
174
98
151
90
12
201
119
168
98
13
198
106
179
110
14
148
107
129
103
15
154
100
131
82
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Average (mm Hg)
Diastolic before:
112.3
Diastolic after:
103.1
Systolic before:
176.9
Systolic after:
158.0
9
• It would be of interest to know how likely the observed changes in BP are to occur
by pure chance.
• If this is very unlikely, the above data provide evidence that BP indeed decreases
after treatment with Captopril. Otherwise, the above data do not provide evidence
for efficacy of Captopril.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
10
• Obviously, we are not interested in drawing conclusions about the 15 observed
patients only.
• Instead, we would like to draw conclusions about the effect of Captopril on the
total population of all hypertensive patients.
• Conclusion:
Statistics aims at drawing conclusions about some population,
based on what has been observed in a random sample
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
11
P
O
P
U
L
A
T
I
O
N
•••••••••••••••••••••••••••••••••••••••
•••••••••
•••••••••••••••
•
•
•
•
•
•
•
••••••
••
•
•
•
•
•••••
•
•
•
••••
•
••
•
•••
•
•
•••
•
•
•••
•
•
•
••
•
•••
••
•
•••
•
•••
••
•
•
•••
••
•••
•
••••
•••
•
•••••
•
•
•
••••••
••••
•
•
••••••••
•
•
•
•••••••••••••
•••••
••••••••••••••••••••••••••••••••••••••••••••••
•••
•••
•••
•••
•
••• ••••• ••
••••••••
••••••••
••
RANDOM
S
A
M
P
L
E
•••••••••••••
••••••••••••••• ••••••••••••••••••••••••
•
•
•
•
•
•
••••••
•••
•••••
••••
•
•
•••
•
•
•••
•
•
•••
•
•
•••
••••
•
•••
•
•
•••
•
•••
••
•
••••
•
•••••
••••
•
•
•
•••••••
•
•
••••
••••••••••••
••••••••••••••••••••••••••••••••••••••••
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Effect of Captopril in population
••
••••••••••
•
•• • •
•• •••• •••
••
••
••
••
••
STATISTICS
••
••••••••••
•
•• • ••
•• •••• ••
••
••
••
••
••
Effect of Captopril in 15 patients
12
3.2
Population versus random sample
• Population: Hypothetical group of current and future subjects, with a specific
condition, about which conclusions are to be drawn
• Sample: Subgroup from the population on which observations will be taken
• In order for effects observed in the sample to be generalizable to the total
population, the sample should be taken at random
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
13
3.3
The aim of statistics
• The aim of statistics is twofold:
. Descriptive statistics: Summarizing and describing observed data such that
the relevant aspects are made explicit.
. Inferential statistics: Studying to what extent observed trends/effects can
be generalized to a general (infinite) population
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
14
• Examples of descriptive statistics include tables, graphs, calculation of
averages,. . .
• Valid inferential statistics requires a strong link between the sample and the
population about which one wishes to draw conclusions.
• Valid inferential statistics requires:
. Correct statistical methodology
. Correct interpretation of results
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
15
Chapter 4
Summary statistics
. Introduction
. Measures of location
. Measures of spread
. Percentages
. Example from the biomedical literature
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
16
4.1
Introduction
A B
•••••••••••••
•••••••••
•••••••••
• • •• •
•••••••••••••
•••••••••
• • •• •
•••••••••
••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• •••••••••
• • •• • • • •• • • • • ••
••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• •••••••••
• • •• • • • •• • • • • ••
••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• •••••••••
• • •• • • • •• • • • • ••
••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• •••••••••
••••••••• ••••••••• •••••••••
• • •• • • • •• • • • • ••
••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• •••••••••
• • •• • • • •• • • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• •••••••••
••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• •••••••••
••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• •••••••••
• • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • ••
••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• •••••••••••••
••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• •••••••••
C
A and B have the same location but different spread
A and C have the same spread but different location
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
17
4.2
Measures of location
• Location measures:
Where are the observations more or less located ?
• As an example, consider the small sample:
1, 3, 3, 4, 5, 14
• Sample average (sample mean):
x =
1 + 3 + 3 + 4 + 5 + 14
x1 + . . . + xn
=
= 5
6
n
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
18
• The sample median is the middle
observation:
1
3
3
|
3+4
2
{z
↓
4}
5
14
= 3.5
• The sample mode is the value that
was observed the most often:
1,
3,
3,
4,
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
5,
14
19
• Note that the sample average is very sensitive to outliers:
1, 3, 3, 4, 5, 14 −→ 5
1, 3, 3, 4, 5, 20 −→ 6
1, 3, 3, 4, 5, 26 −→ 7
• This is not the case with the sample median:
1, 3, 3, 4, 5, 14 −→ 3.5
1, 3, 3, 4, 5, 20 −→ 3.5
1, 3, 3, 4, 5, 26 −→ 3.5
• The mode is not always informative:
Mode
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
20
• For symmetric data, the average and the median are the same. In general, they
are not:
Symmetric
Median = Mean
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Skewed
an ean
i
ed M
M
21
• With skewed data, the mean can be heavily influenced by the random presence of
a/some extreme observation(s).
• In order to still get a good idea about the location of the data, one then prefers
the use of the median over the mean:
Symmetric data =⇒ Mean
Skewed data =⇒ Median
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
22
4.3
Measures of spread
• Obviously, a measure of location only summarizes one specific aspect of the
observed data:
“Statistician drowning in a lake of average depth 0.5m”
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
23
• Measures of spread:
How similar are the observations ?
xn
....
x8
x7
x6
x5
x4
x3
x2
x1
x
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
xn
.. ..
x7
x4
or
x2
x8
x6
x5
x3
x1
x
24
• As an example, re-consider the small sample:
1, 3, 3, 4, 5, 14
• Mean deviation from the mean :
1
n
n
X
(xi − x) =
i=1
−4 − 2 − 2 − 1 + 0 + 9
0
=
= 0
6
6
• Mean quadratic deviation from the mean:
1
n
(−4)2 + (−2)2 + (−2)2 + (−1)2 + 02 + 92
(xi − x) =
i=1
6
n
X
2
=
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
106
= 17.67
6
25
• Sample variance:
s2 =
1
n−1
n
X
(xi − x)2
i=1
(−4)2 + (−2)2 + (−2)2 + (−1)2 + 02 + 92
106
=
= 21.2
=
5
5
• Note that the units of the sample variance and the mean quadratic deviation are
the squared units of the original observations
• The sample standard deviation is in the same units as the original
observations:
s =
v
u
u
u
u
t
1
n−1
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
n
X
(xi −
i=1
x)2
√
= 21.2 = 4.60
26
• Sample range:
R = max xi − min xi = 14 − 1 = 13
i
i
• Note that the range strongly depends on the sample size n: Larger samples are
more likely to contain extreme observations, hence are more likely to have a larger
range
• Since we hope that our measure of spread reflects the amount of variation in the
population, we prefer a measure that does not depend on the sample size.
• The sample interquartile range is the range obtained after deletion of the 25%
highest and 25% lowest values in the sample (rounded down if needed):
1, 3, 3, 4, 5, 14
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
−→
3,3,4,5
−→ IQR = 5 − 3 = 2
27
• The interquartile range does not depend on the sample size n, since a larger
number of observations is deleted in larger samples.
• The variance (hence also mean quadratic deviation and standard deviation), and
the range are very sensitive to outliers:
1, 3, 3, 4, 5, 14 −→ s2 = 21.2,
1, 3, 3, 4, 5, 20 −→ s2 = 48.8,
1, 3, 3, 4, 5, 26 −→ s2 = 88.4,
R = 13
R = 19
R = 28
• This is not the case with the interquartile range:
1, 3, 3, 4, 5, 14 −→ IQR = 2
1, 3, 3, 4, 5, 20 −→ IQR = 2
1, 3, 3, 4, 5, 26 −→ IQR = 2
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
28
• With skewed data, the standard deviation can be heavily influenced by the random
presence of a/some extreme observation(s).
• In order to still get a good idea about the variation in the data, one then prefers
the use of the interquartile range over the standard deviation:
Symmetric data =⇒ Standard deviation
Skewed data =⇒ IQR
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
29
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
30
4.4
Percentages
• Traditionally, measurements are summarized by a measure of location and a
measure of spread
• However, suppose the variable of interest is ‘sickness absence’
• For each subject i in the sample, we define xi as:








xi = 




1 if subject i was absent due to illness
0 otherwise
• The sample average equals
x =
x1 + x2 + . . . + xn
Number of people with sickness absence
=
n
n
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
31
• Hence, the average equals the observed proportion (percentage) of people with
sickness absence
• Note that, once the average is known, the number of zeroes and ones is known,
hence also the variability:
0
1
x6
x5
x4
x3
x2
x1
x = 0.5
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
0
1
0
x6
x5
x4
x3
x2
x1
x = 0.16
1
x6
x5
x4
x3
x2
x1
x = 0.84
32
• One can show that the variance is obtained as
s2 =
n
x (1 − x)
n−1
• Since the variance directly follows from average, only the average is reported, no
measure of spread
• For example, the variables ‘sickness absence’ could be summarized as follows:
Variable
Sickness:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
(n = 256)
Yes
103 (40.23%)
No
153 (59.77%)
33
4.5
Example from the biomedical literature
Wong et al. , Table 1 (first part):
. Means and standard deviations
. Medians and IQR’s
. Percentages
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
34
Chapter 5
Confidence intervals & hypothesis testing
. Random variability
. Confidence intervals
. Interpretation of confidence intervals
. Hypothesis testing
. Hypothesis testing versus confidence intervals
. Examples from biomedical literature
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
35
5.1
Random variability
• Descriptive statistics of the observed differences in diastolic BP, after treatment
with Captopril, in 15 subjects:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
After
DBP
Change
Pati¨ent
Before
DBP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
130
122
124
104
112
101
121
124
115
102
98
119
106
107
100
125
121
121
106
101
85
98
105
103
98
90
98
110
103
82
5
1
3
−2
11
16
23
19
12
4
8
21
−4
4
18
36
• Note that not all subjects experience the same benefit from the treatment
• An average decrease of 9.27 mmHg is observed in our sample
• A new, similar, experiment would lead to another sample, hence to another
observed change in BP:
. More reduction (11.57 mmHg) ?
. Less reduction (4.78 mmHg) ?
. No change (0.00 mmHg) ?
. Increase (-5.23 mmHg) ?
• This shows that the observed decrease of 9.27 mmHg should not be
overinterpreted
• This also shows that one should not hope that 9.27 mmHg is the gain in BP one
would observe if the total population were treated with Captopril.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
37
• Let µ be the average change in BP one would observe if the total population
would be treated
• 9.27 mmHg can then be interpreted as an estimate for µ, based on our sample
• Question:
Is our observed change of 9.27 mmHg sufficient evidence to
conclude that the treatment really affects the BP ?
• Answer:
Confidence intervals & Hypothesis testing
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
38
P
O
P
U
L
A
T
I
O
N
•••••••••••••••••••••••••••••••••••••••
•••••••••
•••••••••••••••
•
•
•
•
•
•
•
••••••
••
•
•
•
•
•••••
•
•
•
••••
•
••
•
•••
•
•
•••
•
•
•••
•
•
•
••
•
•••
••
•
•••
•
•••
••
•
•
•••
••
•••
•
••••
•••
•
•••••
•
•
•
••••••
••••
•
•
••••••••
•
•
•
•••••••••••••
•••••
••••••••••••••••••••••••••••••••••••••••••••••
•••
•••
•••
•••
•
••• ••••• ••
••••••••
••••••••
••
RANDOM
S
A
M
P
L
E
•••••••••••••
••••••••••••••• ••••••••••••••••••••••••
•
•
•
•
•
•
••••••
•••
•••••
••••
•
•
•••
•
•
•••
•
•
•••
•
•
•••
••••
•
•••
•
•
•••
•
•••
••
•
••••
•
•••••
••••
•
•
•
•••••••
•
•
••••
••••••••••••
••••••••••••••••••••••••••••••••••••••••
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Is µ different from 0
••
••••••••••
•
•• • •
•• •••• •••
••
••
••
••
••
STATISTICS
?
••
••••••••••
•
•• • ••
•• •••• ••
••
••
••
••
••
Observed effect of 9.27 mmHg
in 15 randomly selected patients
39
5.2
The confidence interval
• The estimate 9.27 mmHg for µ is based on this particular sample
• Repeating the experiment would lead to a different estimate for µ
• Hence, we should not expect µ to be exactly equal to 9.27 mmHg
• A confidence interval is an interval around 9.27 mmHg which is likely to contain
the unknown population average µ
• For example, a 95% confidence interval for µ:
[
4.91
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
9.27
]
13.63
40
• The percentage 95% is called the confidence level
• Confidence intervals for other confidence levels:
Level
35%
63%
82%
95%
99%
Confidence interval
[8.27; 10.27]
[7.27; 11.27]
[6.27; 12.27]
[4.91; 13.63]
[3.02; 15.52]
• In biomedical sciences, one traditionally uses 95% confidence levels
• Ideally, C.I.’s are small, as this reflects a very precise estimation of the unknown
population parameter µ
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
41
• The length of the C.I. increases with the confidence level:
Level
95%
99%
Confidence interval
[4.91; 13.63]
[3.02; 15.52]
• Intuitively: larger intervals are more likely to contain the unknown population
parameter µ
• The length of the C.I. decreases with the sample size n
• Intuitively: More observations leads to more precision:
One can ‘buy’ extra precision with extra observations
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
42
• What about 100% C.I.’s ?
• The 100% C.I. for µ equals [−∞; +∞], which is not informative at all
• Intuitively: Absolute certainty about
population characteristics cannot be attained based on a finite sample of observations
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
43
5.3
Interpretation of the confidence interval
• Let us focuss on the 95% confidence interval. For other confidence levels, the
interpretation is similar.
• For a specific data set, such as the Captopril data, the obtained confidence
interval [4.91; 13.63] may or may not contain µ.
• However it is very likely to contain µ, since only 5 out of 100 data sets would lead
to an interval not containing µ.
• Illustration: Vestac Java Applet → statistical tests → confidence interval for mean
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
44
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
45
5.4
Hypothesis testing
• As before, µ is the average change in diastolic BP one would observe if the total
population of hypertensive patients would be treated with Captopril.
• Note that µ will never be known, but we can use our sample to learn about µ.
• In case the treatment would have no effect, the average µ would be zero.
• So, if one can show that there is (strong) evidence that µ 6= 0, then this can be
considered as evidence for a treatment effect.
c = 9.27mmHg.
• Based on our sample of 15 observations, we estimated µ by µ
• Obviously, this estimate is relatively far away from 0, suggesting that the
treatment might affect BP
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
46
c = 9.27 could have occurred by pure
• On the other hand, the observed effect µ
chance, even if there would be no treatment effect at all.
• Question:
How likely would that be ?
• Only if this would be very unlikely to happen, the observed data will be considered
sufficient evidence for some effect of the treatment
• The procedure to decide whether there is sufficient evidence to believe the
treatment did affect BP is called test of hypothesis
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
47
• In practice, the research question is formulated in terms of a null hypothesis H0
and an alternative hypothesis HA:
H0 : µ = 0
versus
HA : µ 6= 0
• Based on our observed data, we will investigate whether H0 can be rejected in
favour of HA
• If not, the null hypothesis H0 is accepted and one decides that the treatment
was not effective
• Intuitively, it is obvious that H0 : µ = 0 will be rejected if the observed sample
c is too far away from 0
average µ
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
48
• Question:
How far is too far ?
• Answers:
If this result is very unlikely to happen by pure chance
If this result is not at all what you expect to see if µ would be 0
• One can calculate that, if Captopril would have no effect at all, that there is only
0.1% chance of observing a sample with average change in BP at least as big as
9.27mmHg.
• Hence, if Captopril would have no effect (i.e., if µ = 0), then it would be very
unlikely to observe a sample with average as extreme as 9.27. This would happen
only once every 1000 times a similar experiment would be performed.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
49
• We therefore consider the data observed in our experiment sufficient evidence to
reject the null hypothesis and we conclude that the treatment effect is
significantly different from 0, or equivalently, that there is a significant
treatment effect
• The probability 0.1% that expresses how extreme our observations are in case the
null hypothesis would be true, is denoted by p, and is called the p-value.
• A small p-value is indication of extreme results were H0 true. One then rejects
the null hypothesis
• A large p-value is indication that the observed results are perfectly in line with
what can be expected to observe, if H0 is true. One then does not reject the
null hypothesis, which is equivalent to accepting the null hypothesis
• In practice, one has to decide how small p should get before the null hypothesis is
rejected.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
50
• One therefore specifies the so-called level of significance α:
p < α =⇒ reject H0
p ≥ α =⇒ accept H0
• α is typicaly a small value, such as 0.01, 0.05, 0.10
• In biomedical sciences α = 0.05 =
5% is standard.
• One then rejects the null hypothesis as soon as the observed result
would happen in less than 5 times
in 100 experiments, assuming that
the null hypothesis would be correct
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
51
5.5
Hypothesis testing versus confidence intervals
• For the Captopril data, we have drawn conclusions about the average treatment
effect in the population, through 2 different statistical procedures:
. 95% confidence interval: [4.91; 13.63]
. Significance of treatment effect, p = 0.001
• We know from the C.I. that the average treatment effect is likely to be between
4.91 and 13.63, excluding 0
• The significance test has rejected the value 0 as possible value for µ
• So, both procedures agree
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
52
• Question:
Do both procedures always agree ?
• Answer:
Yes, provided the levels of significance and
confidence are complementary to each other:
Level of significance α Confidence level (1 − α)100%
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
0.05
95%
0.10
90%
0.01
99%
53
• In case of accepting H0 (p ≥ α = 0.05):
95% C.I.
[
....
.........
.. ... ..
....
...
...
..
]
x
H0
• In case of rejecting H0 (p < α = 0.05):
95% C.I.
.
....
.......
.. ... ..
....
..
.....
.
[
x
]
H0
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
54
• An alternative interpretation for the C.I. follows immediately:
A 95% C.I. is the collection of all null
hypotheses that would be accepted in a
statistical test
• Statistical tests are to some extent equivalent to C.I.’s
• However, C.I.’s have the advantage of giving an indication of the effect size
c
(treatment esstimate µ),
as well as of the precision of estimation (width of C.I.)
• So, C.I.’s should be preferred over statistical tests
↔
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Biomedical literature
55
5.6
Example from the biomedical literature
Wong et al.
. Section on statistical methodology:
. Two-sided tests
. 5% level of significance
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
56
. Table 2:
. C.I.’s for differences
between means and medians
. Corresponding tests for
significance
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
57
Chapter 6
Use and misuse of statistics
. Errors in statistics
. Two types of errors
. Multiple testing
. Equivalence tests
. Significance versus relevance
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
58
6.1
Possible errors in decision making
• In our example about the Captopril treatment, we obtained p = 0.001 leading to
the rejection of the null hypothesis of no treatment effect.
• This should not be considered as formal proof that there is a treatment effect
• Even if the treatment has no effect at all, a sample like ours would occur once
every 1000 times.
• Maybe, our sample was indeed the extreme one that happens once every thousand
experiments.
• Alternatively, suppose we would have obtained p = 0.9812. We then would not
have rejected the null hypothesis, and concluded that there is no evidence for any
treatment effect.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
59
• This should not have been considered as formal proof that any treatment effect
would be absent.
• Maybe, the treatment effect µ is not 0, but very close to 0. The data one then
would observe would look very similar to data that would be observed if µ = 0,
such that the data do not allow to detect that µ 6= 0
• Conclusion:
“Statistics can prove everything”
• Intuitively: Absolute certainty about
population characteristics cannot be
attained based on a finite sample of observations
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
60
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
61
6.2
Two types of errors
Reality
Test result
Accept H0
H0 correct
H0 not correct
No error
Type II error
Reject H0 Type I error
No error
• Type I error: H0 is incorrectly rejected
• Type II error: H0 is incorrectly accepted
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
62
• The probability of making a type I error equals the level of significance α,
specified by the user.
• In biomedical sciences α = 5% is often used, hereby allowing to make a type I
error in 5% of the cases.
• For a fixed α level, the probability of making a type II error can only be controled
by taking a sufficiently large sample.
• This calls for sample size calculations or, equivalently, power calculations
• The power of a test is the probability of correctly rejecting H0 , i.e., 1 minus the
probability of making a type II error.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
63
6.3
Multiple testing
• Each time a test is performed, there is probability α of making a type I error
• For example, if α = 0.05, we can expect to incorrectly reject the null hypothesis in
5 out of 100 times.
• Implication:
“The more tests one performs, the higher the probability
that something is detected by pure chance”
• This problem of multiple testing occurs very frequently in bio-medical sciences,
in various settings
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
64
6.3.1
Example: A classroom experiment
• On entry in the classroom, assign each student at random to be seated at the left
or at the right side of the classroom
• Compare both sides with respect to 100 aspects including weight, height, age,
gender, color of hair, color of eyes,. . .
• It is to be expected that for roughly 5 of these outcomes, a significant difference is
obtained at the 5% level of significance, by pure chance.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
65
6.3.2
Example: Testing many relations
• Amin et al., Table 2:
. 18 tests performed
. 2 significant results
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
66
6.3.3
Example: Subgroup analyses
• Kaplan et al., Table 5:
. Tests based on C.I.’s for odds ratios
. C.I. containing 1 is equivalent to a
non-significant test result
. 21 × 3 = 63 tests performed
. 5 significant results
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
67
6.3.4
Example: Searching for the most significant results
• This ‘scientific finding’ was printed in the Belgian newspapers:
• It was even stated that those who wake up before 7.21am have a statistically
significant higher stress level during the day than those who wake up after 7.21am.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
68
6.3.5
Conclusion
• Significant results obtained by multiple testing are often overinterpreted
• If the number of tests is reported, the reader knows that such results need to be
interpreted with extreme care
• The problem arises when only the significant results are reported, and one does
not know how many tests were performed in total
• This leads to reporting results which turn out to be not reproducible
• For example, a new study would not find that students seated on the left are taller
than those on the right. Instead, students seated on the left may weigh more than
those seated on the right.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
69
• For example, a new experiment might show no difference in stress levels between
subjects waking up early and those waking up late. Or maybe a difference would
be found only when waking up is later than 8.12am.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
70
6.4
Equivalence tests
• Suppose two groups A and B are to be compared, and a test is used to test
H0 : µ A = µ B
versus
HA : µA 6= µB
• In case of a non-significant test result, one often concludes that both groups are
identical or equivalent
• An alternative interpretation is that the experiment did not have sufficient power
to show an effect which is present.
• Conclusion:
Non-significance should not be interpreted as equivalence
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
71
• This can also be seen from the fact that, if the test could be used to show
equivalence, it would be best to collect data on (extremely) small samples, as this
would increase the chance to obtain an non-significant result, due to lack of power.
• Instead, one should reverse H0 and HA:
H0 : |µA − µB | > ∆
versus
HA : |µA − µB | ≤ ∆
where ∆ is a pre-specified constant, defining ‘equivalence’
• Obviously, the result of the equivalence test entirely depends on the choice of ∆
• Therefore, ∆ needs to be specified prior to the data collection
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
72
6.5
Example from the biomedical literature
Shatari et al.:
. Title:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
73
. Table 1:
No significant
differences !
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
74
. Results and conclusions (abstract):
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
75
6.6
Significance versus relevance
• We discussed before that the power to detect some effect ∆ increases with the
sample size
• This implies that any effect ∆, no matter how small, will, sooner or later, be
detected, if the sample is sufficiently large.
• For example, consider the Captopril data, where the observed difference of 9.27
mmHg was found significantly different from zero (p < 0.001), based on data
from 15 patients only:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
76
• Suppose that the observed difference would have been 0.1 mmHg.
• A p-value as small as 0.001 would be likely to be obtained, provided that the
sample would be sufficiently large.
• Obviously, an average change in BP as small as 0.1 mmHg is not relevant from a
clinical point of view.
• Conclusion:
Statistical significance
6=
Clinical relevance
• The p-value cannot distinguish between both situations
• It is therefore important not to blindly overinterpret significant results without
knowing the size of the effect
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
77
Chapter 7
Data Structures and Types
• Levels of complexity
• Multivariate analysis
• Longitudinal data
• Clustered data
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
78
7.1
Levels of Complexity
7.1.1
One-Sample Problem
• The simplest statistical analysis is concerned with a single outcome variable,
recorded for a sample of an homogeneous population.
Yi ,
i = 1, . . . , N
• Standard procedures include:
. the computation of means or medians (location parameters)
. the computation of standard errors or interquartile ranges (dispersion
parameters).
. For example, the height of a number of human subjects might be recorded.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
79
7.1.2
Two-Sample Problem
• A first level of complexity arises when a variable is recorded for a sample out of
two subgroups (subpopulations) of a larger population (treated and untreated
patients, two species, boys and girls): the two-sample problems:
Ygi ,
g = 1, 2,
i = 1, . . . , N
• A question of interest is whether the means are different in the two populations.
• The outcome variable might still be height, but we would have an explanatory
variable: treatment allocation, or sex. For example, the height of boys can be
compared to the height of girls.
• The outcome variable is often called dependent variable.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
80
• The predictor is often called covariate or independent variable.
• The statistical tools for this data setting include Analysis of Variance (ANOVA), t
test, Wilcoxon test.
• In the previous situation, the independent variable had only two levels: a binary or
dichotomous variable. This is the simplest case.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
81
7.1.3
Regression
• Alternatively, the predictor itself could be a variable with several levels.
• Examples: dose administered in a clinical trial; one of several species of a plant;
race,. . . ). In addition, it could have an infinite number of levels, just as is the case
with height.
• For example, a baseline height at 7 years of age can be compared to the height at
10 years.
• This leads to a family of models that is frequently referred to as regression models.
Yi = β0 + β1xi + εi,
i = 1, . . . , N
• When the dependent variable Yi is continuous (height) one often uses linear
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
82
regression.
• The independent variable xi can be continuous, binary, categorical, or discrete.
• The choice of the statistical analysis method is driven by the outcome or
dependent variables, rather than by the predictor variables.
• Should the dependent variable be binary (diseased/non diseased; dead/alive,. . . ),
then one would choose logistic regression rather than linear regression.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
83
7.1.4
Several Predictors
• Up to now, there was only one predictor variable. However, this need not be the
case.
• For instance, both treatment allocation and sex of the human subject might be of
interest.
• Most of the well-known methods extend easily.
• One-way ANOVA extends to two-way or even multi-way ANOVA.
• Simple linear regression or single linear regression extends to multiple regression.
• Most other techniques, such as logistic regression are easily extended to
encompass multiple covariates.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
84
• It has to be noted that, while simple in theory, methods for multiple covariates
require great care.
. Indeed, issues such as collinearity arise only for multiple covariate models.
. Often, not all predictors are on equal footing.
. Often, the relation between an EXPOSURE and a DISEASE is of interest,
while another variable is merely a CONFOUNDER.
Confounder
.
Exposure
&
−→
Disease
• Thus, model building and interpretation of (regression) coefficients require both
expertise as well as subject matter knowledge.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
85
7.1.5
Several Outcome Variables
• The final extension is concerned with the fact that sometimes several
dependent variables are recorded and studied simultaneously.
• In statistics, this is commonly termed multivariate analysis, in contrast to
multiple . . . .
• The medical and epidemiological literature uses “multivariate” when the
statistician would talk about “multiple”. Danger for confusion!
• In conclusion:
multiple refers to several independent variables;
multivariate refers to several dependent variables.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
86
7.2
Multivariate Analysis
• Multivariate analysis refers to a set of techniques which allow for the presence of
more than one outcome variable.
• For example, height and weight might be recorded simultaneously for a group of
boys and girls. Arguably, sex will influence height as well as weight. At the same
time, height and weight are likely to be correlated or associated.
• Remarks:
. association refers to the concept of dependence between two or more variables.
. In contrast, correlation refers to a family of measures that can be computed to
capture association (Pearson & Spearman correlation).
. Especially for categorical data, a million measures of association have been
proposed as alternatives to the correlation (including the odds ratio,
concordance, Kendall’s τ , the κ coefficient,. . . ).
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
87
7.3
General Multivariate Setting
• In general, one might have:
. a set of dependent variables, some of which are continuous, discrete,
categorical, binary,. . .
. a set of independent variables, some of which are continuous, discrete,
categorical, binary,. . .
• The most general setting is very hard to study. During the last century, a
multitude of sub-problems of the general problem have been studied.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
88
7.4
Other Correlated Data Settings
• The multivariate setting is but one of the correlated data settings.
• In all of the following situations, there are “many” outcome variables:
. repeated measures
. longitudinal data
. spatial data
. clustered data
Are they related to multivariate analysis? Or special cases?
• The answer is: Yes and No!
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
89
7.4.1
Example Situations
1
For each subject in a study, height is recorded.
2
For each family in a study, the height of all
sibs is recorded.
3
For each subject in a study, height and weight
are recorded.
• Example 1 is clearly univariate and example 3 is clearly multivariate.
• Example 2 is ambiguous:
. There is only one outcome variable: height.
. Each unit (family) yields several values.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
90
7.5
Longitudinal Data: The Vorozole Study
• open-label study in 67 North American centers
• postmenopausal women with metastatic breast cancer
• 452 patients, followed until disease progression/death
• two groups:
vorozole 2.5 mg × 1 ←→ megestrol acetate 40 mg × 4
• several outcomes: response rate, survival, safety,. . .
• focus: quality of life: total Function Living Index: Cancer (FLIC)
a higher score is more desirable
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
91
7.6
The Depression Trial
• Clinical trial: experimental drug versus standard drug
• 170 patients
• Response: change versus baseline in HAM D17 score
20
• 5 post-baseline visits: 4–8
-10
-20
-8
-6
Change
0
-10
Change
-4
10
-2
Standard Drug
Experimental Drug
4
5
6
=Visit
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
7
8
4
5
6
7
8
Visit
92
7.7
Age-related Macular Degeneration Trial
• Pharmacological Therapy for Macular Degeneration Study Group (1997)
• An occular pressure disease which makes patients progressively lose vision
• 240 patients enrolled in a multi-center trial (190 completers)
• Treatment: Interferon-α (6 million units) versus placebo
• Visits: baseline and follow-up at 4, 12, 24, and 52 weeks
• Continuous outcome: visual acuity: # letters correctly read on a vision chart
• Binary outcome: visual acuity versus baseline ≥ 0 or ≤ 0
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
93
• Missingness:
Measurement occasion
4 wks
12 wks
24 wks
52 wks
Number
%
O
188
78.33
Completers
O
O
O
Dropouts
O
O
O
M
24
10.00
O
O
M
M
8
3.33
O
M
M
M
6
2.50
M
M
M
M
6
2.50
Non-monotone missingness
O
O
M
O
4
1.67
O
M
M
O
1
0.42
M
O
O
O
2
0.83
M
O
M
M
1
0.42
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
94
7.8
The Analgesic Trial
• single-arm trial with 530 patients recruited (491 selected for analysis)
• analgesic treatment for pain caused by chronic nonmalignant disease
• treatment was to be administered for 12 months
• we will focus on Global Satisfaction Assessment (GSA)
• GSA scale goes from 1=very good to 5=very bad
• GSA was rated by each subject 4 times during the trial, at months 3, 6, 9, and 12.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
95
• Research questions:
. Evolution over time
. Relation with baseline covariates: age, sex, duration of the pain, type of pain,
disease progression, Pain Control Assessment (PCA), . . .
. Investigation of dropout
• Frequencies:
GSA
1
2
3
4
5
Tot
Month 3
55
14.3%
112
29.1%
151
39.2%
52
13.5%
15
3.9%
385
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Month 6
38
12.6%
84
27.8%
115
38.1%
51
16.9%
14
4.6%
302
Month 9
40
17.6%
67
29.5%
76
33.5%
33
14.5%
11
4.9%
227
Month 12
30
13.5%
66
29.6%
97
43.5%
27
12.1%
3
1.4%
223
96
• Missingness:
Month 3
O
O
O
O
O
O
O
O
M
M
M
M
Measurement occasion
Month 6
Month 9
Month 12
Completers
O
O
O
Dropouts
O
O
M
O
M
M
M
M
M
Non-monotone missingness
O
M
O
M
O
O
M
O
M
M
M
O
O
O
O
O
O
M
O
M
O
O
M
M
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Number
%
163
41.2
51
51
63
12.91
12.91
15.95
30
7
2
18
2
1
1
3
7.59
1.77
0.51
4.56
0.51
0.25
0.25
0.76
97
7.9
Schematic Representation
• Data structure:
X \Y
Continuous
Binary
Count
Time-to-event
Binary
ANOVA
χ2, Fisher
χ2,. . .
Kaplan-Meier
Continuous lin. regr. logistic regr. Poisson regr.
Cox PH,. . .
• Goal:
. Estimation
. Inference (s.e., confidence interval, hypothesis test)
• Paradigm:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
98
. Comparison:
∗ Experimental
∗ Observational
. Representation
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
99
Part II
Contingency Tables
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
100
Chapter 8
Contingency Tables
. Contingency tables
. 2 × 2 tables
. χ2 test
. Fisher’s exact test
. Extensions
. McNemar’s test
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
101
8.1
Contingency Tables
8.1.1
Preliminary Example 1
• For each experimental animal: two variables:
. The experimental animals belongs to control group or to treated group.
. The result of a laboratory test is positive (failur) or negative (success).
• Summarize the data in a 2 times 2
contingency table:
Respons
Failure Success
Group
Control
5
5
Experimental
5
5
• No difference between the groups.
• No statistical analysis necessary.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
102
8.1.2
Preliminay Example 2
• Consider the table:
Response
Failure Success
Group
Control
50
0
Experimental
0
500
• Difference is immediately clear.
• No statistical analysis necessary.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
103
8.1.3
Preliminary Example 3
• Consider the table:
Response
Failure Success
Group
Control
8
2
Experimental
5
5
• The decision between “significant difference” and “no significant difference” not
immediately clear.
• There clearly is a difference between the response rates in both groups:
. 20% success in the control group.
. 50% in the control group.
• Is this due to:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
104
. random noise?
. a systematic difference, related to treatment?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
105
8.2
Example 1
Comparative morphologic study on the effects of calcium entry
blockers against ischemic-hypoxic brain damage. Janssen
Pharmaceutica Preclinical Research Report on R14950 No. 33,
January 1985
• Through a combination of artery ligature and hypoxia, brain damage has been
caused.
• Research goal: assess to which extent a certain medicinal class protect the
experimental animals from damage.
. 16 control animals
. 8 animals treated with flunarizine.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
106
. Brain damage:
∗ In 15 out of 16 controls.
∗ For 2 out of 8 experimental animals.
Damage
Yes No
Treated Control
15 1 16
Flunarizine 2 6 8
17 7 24
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
107
8.3
Statistical Question
“Is there a difference between both groups?”
• Null hypothesis:
H0: The damage probability is equal between both groups.
H0 : There is no association between “Treatment” and “Damage.”
• Two aspects:
. the testing problem we just described
. the associated estimation problem:
∗ the difference between de damage probabilities in both groups
∗ a measure of association
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
108
8.4
χ2 Test for Contingency Tales
• Advantage:
. straightforward computations
. easy to perfrom a continuity correction
• Disadvantage:
. The normal approximation to the binary quantities’ distribution must hold.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
109
8.5
Example 2
Group
I II
Response Success 16 18 34
Failure 4 12 16
20 30 50
• χ2 test:
X2 =
X
2 X
2 (Oij − Eij )2
(O − E)2
X
=
∼ χ21,
i=1 j=1
E
Eij
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
110
• Notation:
. O: the observed cell counts
. E: the expected cell counts
• For example 2:
Group
I
II
Success O11 = 16 O12 = 18
O1+ = 34
Failure O21 = 4 O22 = 12
O2+ = 16
O+1 = 20 O+2 = 30 N = O++ = 50
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
111
• Computation of expected counts:
. H0 : “Product” and “Response” are independent.
. H0 : the success probability is the same among both groups.
• Under the null hypothesis, there is only 1 common probability:
p=
16 + 18 34
=
= 0.68.
20 + 30 50
• Given this probability, how many successes do we expect?
Group I : 20 × p = 20 × 0.68 = 13.6
Group II : 30 × p = 30 × 0.68 = 20.4
• The number of failures:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
112
Group I : 20 × (1 − p) = 20 × 0.32 = 6.4
Group II : 30 × (1 − p) = 30 × 0.32 = 9.6
• The number of successes and failures sums to the number of individuals within
every group:
13.6 + 6.4 = 20
20.4 + 9.6 = 30
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
113
• Next to the observed (O) table, we also have an E table of expected counts:
Group
I
II
Success O11 = 16 O12 = 18
O1+ = 34
Failure O21 = 4 O22 = 12
O2+ = 16
O+1 = 20 O+2 = 30 N = O++ = 50
Group
I
II
Success
E11 = 13.6
E12 = 20.4
E1+ = O1+ = 34
Failure
E21 = 6.4
E22 = 9.6
E2+ = O2+ = 16
E1+ = O+1 = 20 E+2 = O+2 = 30 N = E++ = O++ = 50
• Observed (O) and expected (E) marginal counts are equal, for both “Group”
and “Response”.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
114
• There is a simpler method to calculate the expected values:
Oi+O+j
Eij =
.
N
• Applied to our example:
Group
I
II
34×30
Response Success 13.6 = 34×20
20.4
=
34
50
50
16×30
Failure 6.4 = 16×20
9.6
=
16
50
50
20
30
50
• We do not need to carry out all four computations in the above table:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
115
First step:
Group
I
II
Response Success 13.6 = 34×20
34
50
Failure
16
20
30 50
Group
Second step:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
I
II
Response Success
13.6
34 − 13.6 34
Failure 20 − 13.6
16
20
30
50
116
Third step:
Group
I
II
Response Success 13.6
20.4
34
30 − 20.4
Failure 6.4
16
= 16 − 6.4
20
30
50
• Calculation of the χ2 test statistic:
2
X =
X
(O − E)2
E
(16 − 13.6)2 (18 − 20.4)2 (4 − 6.4)2 (12 − 9.6)2
=
+
+
+
= 2.206
13.6
20.4
6.4
9.6
• All numerators are equal to (2.4)2
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
117
• We can simplify the calculations:


1
1
1 
2
2  1

X = (2.4)
+
+
+
13.6 20.4 6.4 9.6


1
1
1 
2  1

+
+
+
= ∆
E11 E12 E21 E22
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
118
8.5.1
Degrees-of-freedom
• Given the marginals:
Group
I II
Response Success
Failure
34
16
20 30 50
is the E table fully determined.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
119
• Given
Group
I II
Response Success 16
34
Failure
16
20 30 50
is the O table fully known.
• Hence, the difference is 1 degree-of-freedom.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
120
Conclusion for the example
• The critical χ21 point is 3.84 for α = 0.05.
• This means that H0 is not rejected.
• The success probabilities in both groups:
16
= 0.8
pI =
20
18
pII =
= 0.6
30
are not significantly different.
• We do not reject the null hypothesis.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
121
8.6
Continuity Correction and Example 3
Pre-clinical study:
Expected values:
Group
Active Placebo
Response Carcinoma
9
6
No carcinoma 51
59
60
65
Group
Active
Placebo
15×60
15×65
Carcinoma
=
7.2
= 7.8
125
125
110×65
No carcinoma 110×60
=
52.8
= 57.2
125
125
60
65
15
110
125
15
110
125
• The test statistics takes value
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
122


1
1
1 
1

X 2 = ∆2 
+
+
+
E11 E12 E21 E22


1
1
1
1 
 = 0.98
+
+
+
= (1.8)2 
7.2 7.8 52.8 57.2
• One possible continuity correction replaces the decimal portion d (= 0.8)
from ∆ (= 1.8) to t0 (= 1.5):
0.0 < t ≤ 0.5 → t0 = 0.0
0.5 < t ≤ 1.0 → t0 = 0.5
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
123
• In the example:


1
1
1
1 
2
 = 0.68
Xcorr
= (1.5)2 
+
+
+
7.2 7.8 52.8 57.2
2
becomes smaller when the sample size
• The difference between X 2 and Xcorr
increases.
• Other continuity corrections are used as well.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
124
8.7
Validity of the Approximation
• The larger the sample size, the better the normal approximation to the binomial
distribution, and the better the approximation to the χ21 distribution.
• A rule of thumb:
The expected cell counts must at least be 5.
• Because the rule is conservative, small deviations are acceptable, with caution.
• Nevertheless, results have to be interpreted cautiously as well.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
125
8.8
Confidence Interval for the Difference
• Consider the table in general terms:
Group I Group II
NI
NII
• A confidence interval for the difference between proportions pI and pII :
CI = (ˆ
pI − pˆII ) ± Z
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
v
u
u
u
u
u
t
pˆI qˆI pˆII qˆII
+
NI
NII
126
where
. pˆI is the observed proportion in group I,
. qˆI = 1 − pˆI ,
. NI the total number of individuals in group I,
. Z = 1.96 when the normal approximation is used at the α = 0.05 nominal
level.
• For the example:
v
u
u
u
u
t
CI = (0.80 − 0.60) ± 1.96
(0.80)(0.20) (0.60)(0.40)
+
20
30
= (−0.048; 0.448)
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
127
8.9
Overview
• Consider example 1:
Treatment
Control Flunarizine
Damage Yes
15
2
17
No
1
6
7
16
8
24
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
128
• The expected counts:
Treatment
Control
Flunarizine
17×8
Damage Yes 17×16
=
11.3
= 5.67 17
24
24
7×8
No 7×16
=
4.67
= 2.33 7
24
24
16
8
24
• The test statistic:


1
1
1
1 
 = 12.23
X 2 = (15 − 11.33)2 
+
+
+
12.33 5.67 4.67 2.33
• We find a statistically significant difference.
• At the same time, for the corrected test statistic:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
129


1
1
1
1 
2
 = 11.12
Xcorr
+
+
+
= (3.5)2 
12.33 5.67 4.67 2.33
• The proportions:
. Control:
∗ Damage: qc = 15
16 = 0.9375,
∗ No damage: pc = 1 − q = 0.0625.
. Flunarizine:
∗ Damage: qf = 28 = 0.25,
∗ No damage: pf = 1 − q = 0.75.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
130
• Confidence interval:
CI = (0.75 −
0.0625)
v
u
u
u (0.9375)(0.0625)
(0.25)(0.75)
t
+
±1.96u
16
8
= 0.6875 ± 1.96 × 0.1646
= (0.36; 1.01)
• Problem:
The expected counts are small: 4.67 and 2.33 are smaller than 5 and 5.67 is
barely larger!
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
131
8.10
Extension Requested!
• Therefore, we want to extend the techniques available:
. 2 × 2 tables with smal expected counts,
. R × C tables.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
132
8.11
Fisher’s Exact Test
• When we have to analyze 2 × 2 tables with small observed counts that lead to
small expected counts, smaller than 5, then the χ2 approximation needs to be
called into question.
• The method of analysis is then best changed.
• There is a method specifically developed for this context.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
133
8.11.1
Example 4
• Consider a pre-clinical toxicologic study:
Carcinoma
Yes No
Treated Placebo 1
11 12
Active 4
10 14
5
21 26
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
134
• Principle: The margins are considered fixed.
• Note that the margins for the χ2 test do not change neither, when changing the
O table into the E table.
• Consider the following general table:
A
C
A+C
B
D
B+D
A+B C +D
N
where N can be considered as
N = (A + B) + (C + D) = (A + C) + (B + D).
• The probability of the observed configuration needs to be determined, given the
null hypothesis of independence between rows and columns.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
135
• Hence, for the example: independence between randomization and carcinoma.
• Fixed margins means:
. We know the values A + B, C + D, A + C, B + D in advance, or
. we condition on the values A + B, C + D, A + C, B + D.
• The probability to observe configuration (A, B, C, D), given the boundary
condtions the nis
p(A, B, C, D) =
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
(A + B)!(C + D)!(A + C)!(B + D)!
.
N !A!B!C!D!
136
• In the case of example 3 we find
5!21!12!14!
p(1, 4, 11, 10) =
= 0.1826
26!1!4!11!10!
• To carry out the test, we consider all configurations that are at least as unlikely as
the observed one.
• When the sum of all corresponding probabilities does not exceed, for example,
0.05, then the null hypothesis is not rejected.
• For example 4, the configurations are:
0 12
5 9
p1 = 0.0304
4 8
1 13
p2 = 0.1054
5 7
0 14
p3 = 0.0120
• The sum of these probabilities is
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
137
p = p1 + p2 + p3 = 0.1826 + 0.0304 + 0.1054 + 0.0120 = 0.3304 > 0.05
• Of course, we could have economized on some of the calculations, because
p ≥ 0.1826 > 0.05.
• Consider the observed table:
0 12
5 9
• The it follows that:
p = 0.0304 + 0.0120 = 0.0424 < 0.05
• Hence, the null hypothesis of independence between treatment and carcinoma is
rejected.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
138
• Then, we could have concluded that active therapy appears to lead to more
carcinoma.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
139
8.12
Analysis of Example 1
• For example 1, the null hypothesis has been tested with the χ2 method.
• However, there were problems with expected counts.
• It is safer to use Fisher’s exact test.
• The probability of the observed table and the one more extreme table:
15 2 17
1 6 7
16 8 24
p1 = 0.0013
16 1 17
0 7 7
16 8 24
p2 = 0.00002
• ⇒ p = p1 + p2 = 0.0013 and we, again, reject the null hypothesis.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
140
• Overview of p-values:
Method
p-values
χ2 (without continuity correction)
0.0005
χ2 (with continuity correction)
0.0009
Fisher’s exact test
0.0013
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
141
8.13
Example 3: pre-clinical test
Group
Active Placebo
Response Carcinoma
9
6
15
No carcinoma 51
59
110
60
65
125
Method
Statistic
p-value
χ2 (without continuity correction)
0.98
0.3222
χ2 (with continuity correction)
0.68
0.4096
Fisher’s exact test
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
0.4119
142
8.14
χ2 test versus Fisher’s exact test
Sample
χ2 test
Fisher’s exact test
small
simple
simple
unreliable
reliable
simple
computationally complex
reliable
reliable
large
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
143
8.15
Estimator for the Association
• Requested: a measure for the association between “Group” and “Response.”
• A whole series exists.
• For example: the odds ratio:
ψ=
9 × 59
= 1.735
6 × 51
• This measure is an approximation for the relative risk.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
144
8.16
R × C Contingency Tables
• More than two treatment groups.
• Two treatments compared on an ordinal scale.
• Example:
Treatment
A
B
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
No success
25
27
Response
Some success
10
23
Success
40
25
145
• In general:
Treatment
1
2
..
R
1
n11
n21
..
nR1
Response
2
...
n12
...
n22
...
..
...
nR2 . . .
C
n1C
n2C
..
nRC
• Null hypothesis:
H0: row and column classifications are independent
H0: response is equal across the various treatment groups
• χ2 test statistic:
2
X =
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
X
(O − E)2
∼ χ2(R−1)(C−1)
E
146
• In case R = C = 2 then (R − 1)(C − 1) = 1.
• The validity of this test is coupled to the expected counts being at least 5.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
147
8.17
Example 5
• Data:
Treatment
A
B
Very
13
19
32
Severity
Moderately
24
20
44
Mildly
18
12
30
55
51
106
• To answer the question as whether there is a relationship between “Treatment”
and “Severity,” we calculate the E table of expected values, in a few steps:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
148
Severity
First step:
A
Very
Moderately
55×32
106
55×44
106
Mildly
55
B
51
32
44
30
106
Severity
Second step:
Very
Moderately
Mildly
A
16.60
22.83
55
B
32 − 16.60
44 − 22.83
55 − 16.60 − 22.83
30
106
32
44
51
Severity
Third step:
Very
Moderately
Mildly
A
16.60
22.83
15.57
55
B
15.40
21.17
51
32
44
30 − 15.57 = 14.43
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
30
106
149
• In this case is (R − 1)(C − 1) = (2 − 1)(3 − 1) = 2
• The test statistic
(13 − 16.60)2 (24 − 22.83)2 (18 − 15.57)2
2
+
+
X =
16.60
22.83
15.57
(19 − 15.40)2 (20 − 21.17)2 (12 − 14.43)2
+
+
+
= 2.54
15.40
21.17
14.43
• The critical point is χ22(0.05) = 5.99.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
150
8.18
Example 6
• Three treatments are randomized over 60 patients, among whom N = 54
successfully complete the study
Treatment
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
A
B
C
Response
Success Failure
9
6
8
11
17
3
34
20
15
19
20
54
151
• The corresponding E table is:
Treatment
A
B
C
Response
Success Failure
9.44
5.56
11.96
7.04
12.59
7.41
34
20
15
19
20
54
• The test statistic is X 2 = 7.79, pointing to a significant difference.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
152
8.19
Several 2 × 2 Tables
• A study where 2 treatments are being compared w.r.t. success.
• The data are assembled in several centers.
• Hence, there is stratification per center.
• Even within one center, some stratifying variables may have an impact on
response (e.g., investigator).
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
153
8.20
Example 7
• Data:
Treatment
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Active
Placebo
Improvement
No
Yes
13
28
29
14
42
42
41
43
84
154
• Stratified for sex:
Treatment
Female
Improvement
No
Yes
Active
6
21
Placebo 19
13
25
34
27
32
59
Male
Treatment
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Active
Placebo
Improvement
No
Yes
7
7
10
1
17
8
14
11
25
155
• With notation:
Improvement
Sex
Treatment
Female
Test drug
n111 = 6
n112 = 21
n11+ = 27
Female
Placebo
n121 = 19
n122 = 13
n12+ = 32
n1+1 = 25
n1+2 = 34
n1 = 59
Female total
None
Some/marked
Total
Male
Test drug
n211 = 7
n212 = 7
n21+ = 14
Male
Placebo
n221 = 10
n222 = 1
n22+ = 11
n2+1 = 17
n2+2 = 8
n2 = 25
Male total
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
156
• For treatments that would be equally effective, the expected values for cells with
coding:
. (1,1,1) = (female, active, no),
. (2,1,1) = (male, active, no):
are equal to
n11+n1+1
n1
n21+n2+1
E(n211 ) = m211 =
n2
E(n111 ) = m111 =
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
157
with corresponding variances
Var(n111 ) = v111
n11+n12+ n1+1n1+2
=
n21 (n1 − 1)
27 × 32 × 25 × 34
=
592 × 58
= 3.63758
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Var(n211 ) = v211
n21+n22+ n2+1n2+2
=
n22 (n2 − 1)
14 × 11 × 17 × 8
=
252 × 24
= 1.39627
158
8.21
Mantel-Haenszel Statistic
• The null hypothesis
H0: no association between “Treatment” and “Improvement”, taking the
stratification for sex into account
• The (Cochran-)Mantel-Haenszel statistic:
QM H =
with proportions:
"
P2
ns1+ ns2+
(ps11 −
s=1
ns
P2
s=1 vs11
#
ps21)
2
ns11
ns1+
ns21
=
ns2+
ps11 =
ps21
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
159
8.22
Analysis of Example 7
• Test statistic:
QM H =
27×32 6
59
27
19
32
14×11 7
25
14
−
+
−
3.63748 + 1.39627
10
11
= 12.59
• The proportions are calculated as follows:
6
27
19
=
32
7
14
10
=
11
p111 =
p211 =
p121
p221
• This test statistic follows a QM H ∼ χ21 distribution.
• p = 0.0004
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
160
• A significant difference in the improvement probability in active versus placebo
groups.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
161
8.23
Omitting the Stratifying Variable
• The classic χ2 test.
• The expected values are equal to
Treatment
Active
Placebo
Improvement
No
Yes
20.5
20.5
21.5
21.5
42
42
41
43
84
and hence


1
1
1 
2
2  1
 = 10.72
X = (7.5)
+
+
+
20.5 20.5 21.5 21.5
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
162
• We find a slightly different value.
• Depending on possible differences between the distributions over the sexes, the
differnece between Mantel-Haenszel and the classical χ2 test can increase.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
163
8.24
Matched Pairs: McNemar’s Test
8.24.1
Example 8
• 50 people are subject to two allergy tests, A and B.
• Hence, we have a pair of responses for each individual:
pair: (response to test A, response to test B)
B
A
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
+
−
+
23
6
29
−
9
12
21
32
18
50
164
• Everybody is assigned to both treatments, and hence acts as his/her own control.
• This table only appears similar to all previous tables.
• The treatments are not separated, but rather every cell tells us something about
both treatment A as well as over treatment B.
• For example, there are 23 people with favorable response to both A and B, 9 react
positively to A but not to B, etc.
• The first null hypothesis considered:
H0: both tests have the same probability of success
H0: pA = pB
• We can estimate both proportions as follows:
32
pˆA =
= 0.64
50
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
165
pˆB =
29
= 0.58
50
• The probabilities are calculated entirely differently than with simple contingency
tables.
• We have two types of pairs:
. Concordant pairs: (+, +) and (−, −): 23 and 12
. Discordant pairs: (+, −) and (−, +): 9 and 6
• The discordant pairs are used to calculate the test statistic.
• When A and B would have the same probability of success, then the discortant
cells would be about equally strong.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
166
• We test equality of the diagonal (discordant) cells, using the following quantity:=
Z=
1
|observed proportion − 0.5| − 2N
s
1
4N
• Critical level Z = 1.96.
• In the example
Z=
9
9+6
1
− 0.5 − 30
s
1
60
= 0.7303 < 1.96
• This test is known as McNemar’s test.
• The quadratic version, without continuity correction:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
167
9
9+6
−
1
4×15
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
!
1 2
2
= 0.600
168
8.25
Independent Allergy Tests
• Next to the question of equal proportions, we can also consider the question of
independence between both tests.
• To this end, the conventional χ2 test can be used:


2
(O
−
E)
1
1
1 
X
2
2  1
 = 7.02
X =
= (4.44)
+
+
+
E
18.56 13.44 10.44 7.56
• p = 0.0081
• The corrected value is 5.70 (p = 0.0170).
• The p-value for Fisher’s exact test is p = 0.0158.
• Independence is rejected.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
169
• Conclusion:
. the tests are independent,
. and lead to the same probability of success.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
170
Part III
t Test
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
171
Chapter 9
Comparing Groups with Continuous Outcomes: the t-test
. Captopril data
. Unpaired t-test
. Paired t-test
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
172
9.1
Example: Captopril data
• 15 patients with hypertension
• The response of interest is the supine blood pressure, before and after treatment
with CAPTOPRIL
• Research question:
How does treatment affect BP ?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
173
• Dataset ‘Captopril’
Pati¨ent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Before
SBP
DBP
210
169
187
160
167
176
185
206
173
146
174
201
198
148
154
130
122
124
104
112
101
121
124
115
102
98
119
106
107
100
SBP
After
DBP
201
165
166
157
147
145
168
180
147
136
151
168
179
129
131
125
121
121
106
101
85
98
105
103
98
90
98
110
103
82
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Average (mm Hg)
Diastolic before:
112.3
Diastolic after:
103.1
Systolic before:
176.9
Systolic after:
158.0
174
• It would be of interest to know how likely the observed changes in BP are to occur
by pure chance.
• If this is very unlikely, the above data provide evidence that BP indeed decreases
after treatment with Captopril. Otherwise, the above data do not provide evidence
for efficacy of Captopril.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
175
9.2
Difference in DBP
Pati¨ent
DBP(before)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
130
122
124
104
112
101
121
124
115
102
98
119
106
107
100
mean y
signal: difference
112.33
variance s2
standard deviation s
common standard deviation
noise: standard error
109.67
10.47
t test statistics
p-value
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
-9.27
DBP(after)
=
∆(DBP)
125
121
121
106
101
85
98
105
103
98
90
98
110
103
82
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
-5
-1
-3
2
-11
-16
-23
-19
-12
-4
-8
-21
4
-4
-18
103.07
=
-9.27
-9.27
157.64
12.56
11.56
2×11.56
√
= 4.22
30
−9.27
4.22 = -2.20
0.0366
74.21
8.61
8.61
8.61
√
= 2.22
15
−9.27
2.22
= -4.17
0.0010
176
9.3
Two-sample t-test
. Two independent groups
Pati¨ent
DBP (group A)
DBP (group B)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
130
122
124
104
112
101
121
124
115
102
98
119
106
107
100
125
121
121
106
101
85
98
105
103
98
90
98
110
103
82
signal: difference
noise: standard error
t test statistics
p-value
-9.27
4.22
-2.20
0.0366
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
. Between-group heterogeneity
−→ less precise
. Null hypothesis: H0 : µA = µB
. Structure:
∆
XA − XB
√
t= √ =
∼ tn−1
s/ n
s/ n
. Extensions:
∗ A and B have different variances
∗ Sample sizes A and B different
∗ Dependent measures
→ paired t-test
177
9.4
Paired t-test
Pati¨ent
DBP(before)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
130
122
124
104
112
101
121
124
115
102
98
119
106
107
100
signal: difference
noise: standard error
t test statistics
p-value
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
DBP(after)
=
∆(DBP)
125
121
121
106
101
85
98
105
103
98
90
98
110
103
82
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
-5
-1
-3
2
-11
-16
-23
-19
-12
-4
-8
-21
4
-4
-18
-9.27
2.22
-4.17
0.0010
178
• More restricted applicability: paired measurements:
. The same subject measured at two different time points
. Paired organs for the same subject
. Twins
. Case-control data
. Split blood sample
• More power: when pairs are positively correlated
• Structure:
t=
∆
XA − XB
√
√
=
∼ tN −1
s∆ / N
s∆ / N
• Extensions:
. More than two measures
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
−→
(N = # pairs)
repeated measures analysis
179
9.5
t-tests: Concluding Remarks
• Assumption:
. Two-sample t-test: Data in each group should be roughly normally distributed
. Paired t-test: Differences should be roughly normally distributed
• What if this is not the case?
. Transform to (near) normality
. Apply a non-parametric (rank-based) test: less efficient, but more robust
against non-normality
• Apart from hypothesis testing, also confidence intervals can be constructed
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
180
Part IV
Linear Regression
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
181
Chapter 10
Introduction
Illustrative Example
• These data are central to this part.
• Origin: Prof.Dr. Koen Milisen, CZV, K.U.Leuven.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
182
10.1
Problem Setting
• Research into post-operative variability in the neuro-cognitive and functional
status with elderly patients with hip fractures.
• A surgical intervention in elderly patients often results in acute cognitive
disfunctioning (= delirium).
• Delirium versus dementia:
. Delirium: → acute start
→ usually temporary
. Dementia: → no acute start
→ slowly progressing
→ irreversible
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
183
• Delirium . . .
. leads to medical problems and problems of care
. often is the first symptom of a physical disorder or intoxication stemming from
medicines
. can lead to increased mortality
. is hard to detect
• Economical implications of delirium:
. Extra care
. Longer hospital stay
. High degree of institutionalization
• Research suggest that, among elderly hip fracture patients, the increased degree of
dependence is a consequence of delirium, rather than the hip fracture itself.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
184
10.2
Sample
• Longitudinal design: Certain variables are measured repeatedly over time.
• Prospective (e.g., complications) and retrospective (e.g., living conditions)
measurements.
• Data of 2 traumatological departments of U.Z. Gasthuisberg, K.U.Leuven.
• Inclusion criteria:
. ≥ 65 years of age
. hospitalized with hip fracture in the emergency room
. consent for participation into the study
. ...
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
185
• Exclusion criteria:
. time between admission and operation ≥ 72 hours
. various traumas
. ...
• Data collected 16/09/1996–28/02/1997.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
186
10.3
Data Collected
• Data on 60 patients
• 78 variables
• Data for every patient, prior to, during, and post operation
• Longitudinal and derived measurements
• Study questionnaiare, ADL score, MMSE, and CAM scores
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
187
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
188
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
189
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
190
10.3.1
Pre-operative Evaluation
Variable
nummer
leeftd
gesl
Description
patient number
age
sex
opnduur
burgst
length of stay
civil status
opleid
education
zijfrc
side fracture
typfrc
type fracture
cardio
cardiologic pathology
vascul
vascular pathology
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Values
1–60
(years)
1=male
2=female
(days)
1=single
2=married
3=widow(er)
4=divorced
5=religious
1=university/college
2=high school
3=lower secundary
4=primary
1=left
2=right
1=intra-capsular
2=extra-capsular
0=no
1=yes
0=not
1=yes
191
Variabele
pulmon
Description
pulmonary pathology
urinai
urinary pathology
abdom
abdominal pathology
hyper
hypertension
zicht
vision pathology
gehoor
auditive pathology
malign
malignant disease
diabet
diabetes
reumat
reumatological pathology
vrop
past surgery
neuro
neuro-psychiatric pathology
andere
other pathology
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Values
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
0=no
1=yes
192
Chapter 11
Simple (Single) Linear Regression
. Introduction
. The method of least squares
. Illustration and interpretation
. Statistical inference
. Illustration and interpretation
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
193
11.1
Introduction
• The correlation coefficient r measures the linear relationship between two
measurements, x and y. How can we describe this linear relationship?
• One possible way would be to construct the straight line that ‘fits best’ the
observed measurements:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
194
• A straight line is described analytically by an equation of the form
y = β0 + β1 x
• The parameter β0 is the intercept of the straight line. It is the value of y obtained
for x = 0
• The parameter β1 is the slope.
• If β1 > 0 :
. There is a positive relationship between x and y
. The larger β1, the faster y increases with x
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
195
• If β1 < 0 :
. There is a negative relationship between x and y
. The smaller β1, the faster y decreases with x
• The practical assignment is to estimate the parameters β0 and β1 based on the
collected data (xi , yi ).
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
196
11.2
The Least Squares Method
• To estimate β0 and β1, we first need to decide which criterion should be satisfied
by ‘the best’ straight line
y
•
....
........
........
........
.
.
.
.
.
.
.......
.......
........
.......
.......
.
.
.
.
.
.
.
.......
.......
........
.......
........
.
.
.
.
.
.
.
.......
.......
........
.......
.......
.
.
.
.
.
.
.
.
.......
.......
.......
........
.......
.
.
.
.
.
.
.
........
........
.......
........
.......
.
.
.
.
.
.
.......
.......
........
........
.......
.
.
.
.
.
.
.......
........
........
........
.......
.
.
.
.
.
.
.......
.......
........
.......
........
.
.
.
.
.
.
.......
........
.......
y
• i
•
β0
•
•
ydi •
y = β0 + β1 x
•
•
•
•
•
0
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
x
197
• If we would know β0 and β1, then for each observation in the set of data, based
on the x value, a predicted value can be calculated for y:
yci = β0 + β1xi
• The prediction will be good if yci lies closely to yi and will be poor if yci deviates
strongly from yi
• If the straight line describes the data (xi, yi) adequately, then we expect, for most
points, yci to lie closely to the true value yi.
• A possible measure to capture how well the straight line has been chosen is
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Q =
X
[yi − yci]2
=
X
[yi − (β0 + β1xi)]2
i
i
198
• Hence, Q is a measure for how closely the data lie to the straight line
y = β0 + β1x.
• Note that other straight lines (i.e., other β0 and β1), will lead to different Q
values.
• The straight line that describes the data best is the one for which Q is smallest.
• The least squares method calculates the values of β0 and β1 for which Q is
minimal.
• It can be shown that these values are given by:
c
β1 =
X
i
(xi − x)(yi − y)
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
X
i
(xi − x)2
,
βc0 = y − βc1x
199
• βc0 and βc1 are termed the least squares estimators for β0 and β1.
• The straight line so obtained,
y = βc0 + βc1x
is termed the regression line.
• Once the estimators for β0 and β1 known, we can make a prediction, for each
observation in the data set, for y based on x:
yci = βc0 + βc1xi
• We are also able, for each data point (xi, yi) in the set of data, to compute the
residual if we try to predict yi by yci:
ei = yi − yci = yi − (βc0 + βc1xi)
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
200
• The quantities ei are termed residuals:
. ei > 0 : the observed yi lies above the regression line
. ei = 0 : the observed yi lies on the regression line
. ei < 0 : the observed yi lies underneath the regression line
• Further, one can show that
X
i
ei = 0
i.e., the points above the regression line are ‘in equilibrium’ with these underneath
the regression line.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
201
11.3
Illustration + Interpretation
• The output for the regression coefficients (Statistica):
• The Y variable is termed response, or also dependent variable.
• The X variable is termed covariate, or also independent variable.
• The parameter estimates are βc0 = 23.65 and βc1 = −0.30.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
202
• The corresponding regression line is
ADL = 23.65 − 0.30 × M M SE
• The regression line predicts an ADL score of 23.65 if MMSE is equal to zero.
• Further, there is a negative linear relationship between MMSE and ADL: The
higher MMSE the lower ADL, and vice versa.
• The regression line predicts a decrease of 0.30 in ADL, for a unit increase of
MMSE.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
203
• This ought to be interpreted as follows:
. Consider two groups of patients
. All patients in the first group have identical MMSE (e.g., 20).
. All patients in the second group have identical MMSE values, too, but 1 unit
higher than these in the first group (hence, 21).
. Then, we expect the difference in average ADL score between both groups to
be 0.30, with the lower score for the group with highest MMSE.
• Hence, we should not conclude that an increase of MMSE with 1 unit in a given
patient will lead to a decrease of 0.30 in ADL.
In other words, we cannot draw ‘longitudinal’ conclusions from a ‘cross-sectional’
experiment.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
204
• Graphical representation:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
205
11.4
Statistical Inference
11.4.1
Introduction
• The regression output, obtained in Statistica, was:
• The p-values listed test the hypotheses
H0 : β0 = 0 versus HA : β0 6= 0
and
H0 : β1 = 0 versus HA : β1 6= 0
• Indeed, the least squares method allows us to calculate the straight line that
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
206
describes best our observations (xi, yi).
• However, a different sample from the same population would lead to a different
regression line
y = βc0 + βc1x
• Illustration: Vestac Java Applet → regression → regression plots
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
207
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
208
• Based on a sample and hence the corresponding estimators βc0 and βc1, statistical
inference (p-values, confidence intervals) aims to make a statement about the
regression line
y = β0 + β1 x
that captures the relationship in the entire population.
• This is not possible without additional assumptions about the distribution from
which the data are sampled.
• The assumptions needed are described by the so-called regression model.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
209
11.4.2
The simple linear regression model
• In real situations, the points (xi, yi) will never describe a perfect straight line, but
rather a cloud of points.
• This implies that the observations do not satisfy
yi = β0 + β1xi
but rather
yi = β0 + β1xi + εi
where εi expresses how much an observation yi lies above or below the regression
line.
• The quantities εi are termed errors, and the linear regression model assumes that
they are distributed following a normal distribution with mean 0 and (unknown)
variance σ 2:
εi ∼ N (0, σ 2 )
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
210
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
211
• Note that the εi are the ‘theoretical version’ of the residuals. ei
• Hence, the regression model assumes . . .
. . . . linearity: for each X, the mean of the corresponding Y -values lies on the
regression line
. . . . normality: implies that, for each X, the corresponding Y -values lie
symmetrically around the regression line
. . . . constant variance: the prediction errors for small X-values are neither larger
nor smaller than the errors for large X-values
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
212
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
213
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
214
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
215
11.4.3
Significance tests for β0 and β1
• If the slope β1 is equal to zero, then the regression model is described by
yi = β0 + εi
which implies that there is no linear relationship between Y and X.
• In practice, if we want to test whether there is a linear relationship between X
and Y , then we need to test the null hypothesis:
H0 : β 1 = 0
versus
HA : β1 6= 0
• The value observed in our sample is βc1 = −0.30
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
216
• This value could be obtained by coincidence, even if in the total population
β1 = 0 would hold.
• Research question:
How large is the probability that we, by accident, observe βc1 = −0.30,
even if β1 = 0?
• Illustration: Vestac Java Applet → regression → histograms of slope and intercept
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
217
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
218
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
219
• It is clear that, when β1 = 0, it becomes very unlikely to still observe βc1 = −0.30.
• Note that it would be equally unlikely to observe βc1 = +0.30.
• The chance that we would find an estimate with |βc1| ≥ 0.30 is p < 0.0001.
• Given that this probability is so small, more specifically that p < α = 0.05 = 5%,
we will conclude that what has been observed (βc1 = −0.30) is sufficient indication
to believe that β1 6= 0.
• We reject the null hypothesis and conclude that β1 is significantly different from
0, at the 5% significance level.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
220
• The regression model allows for, apart from testing hypotheses, constructing
confidence intervals. A 95% C.I. for β1 in our example is [−0.378; −0.218].
• Given that this interval is far away from 0, this is again strong evidence that
β1 6= 0.
• Analogously, a significance test can be constructed for
H0 : β 0 = 0
versus
HA : β0 6= 0
• In practice, one is primarily interested in tests for β1.
• Note that all tests and confidence intervals are valid only when all regression
model assumptions are satisfied.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
221
11.4.4
The ANOVA table
• How much better can we predict Y , given that we know X?
y
yi •
.
.......
.......
.......
.......
.......
.
.
.
.
.
.
........
.......
.......
........
.......
.
.
.
.
.
.
.
.......
.......
.......
.......
........
.
.
.
.
.
.
.......
........
.......
........
.......
.
.
.
.
.
.
.
.......
.......
.......
........
.......
.
.
.
.
.
.
.......
........
.......
........
.......
.
.
.
.
.
.
.......
.......
........
........
.......
.
.
.
.
.
.
.......
.......
.......
........
.......
.
.
.
.
.
.
.
.......
.......
........
........
.......
.
.
.
.
.
.
..
.......
.......
.......
.......
.......
.
.
.
.
.
.
.......
.......
........
........
........
.
.
.
.
.
.
..
.......
.......
.......
.......
ydi •
•
y
•
β0
•
•
•
yc = βc0 + βc1x
•
•
•
•
0
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
x
222
• When we would not have x-values, then the best possible prediction for each
yi-value is the sample average y.
• A measure for the error so made is the sum of squares
X
i
[yi − y]2
• Note that this is a measure for the variability in the yi.
• If we do use the observed xi-values to predict the y-values, then we predict each
yi by means of
yci = βc0 + βc1xi
• A measure for the error so made is the sum of squares
X
i
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
[yi − yci]2 =
X
i
e2i
223
• Because the use of this extra information coming from the xi leads to more
precise predictions, we have that
X
i
• One can show that
X
|
i
[yi − y]2 ≥
[yi − y]2 =
{z
↓
SST O
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
}
X
|
i
X
i
[yi − yci]2
[yi − yci]2 +
{z
↓
SSE
}
X
|
i
[yci − y]2
{z
↓
SSR
}
224
• SSTO: Total sum of squares
This term captures the total error made by predicting the yi without taking into
account the observed values xi.
• SSE: Error sum of squares
This term captures the error made upon predicting the yi by making use of the
observations xi.
• SSR: Regression sum of squares
This term captures the decrease in error by predicting the values yi with, rather
than without, making use of the covariates.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
225
• A measure for how well the data points (xi, yi) agree with the regression line is
SSR
2
R =
SST O
• R2 enjoys the following properties:
. 0 ≤ R2 ≤ 1
. R2 = 0 implies that SSR = 0 and hence that all yci are equal to y, i.e., the
regression line is flat. This is equivalent with βc1 = 0.
. R2 = 1 implies that SSE = 0. This implies that yi = yci for all i, and hence
that all points (xi, yi) lie on the regression line.
• It is said that R2 expresses ‘what fraction of the variability in the yi can be
explained by the xi’.
• One can show that R2 is equal to r2 , the square of the correlation between the xi
and yi values.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
226
11.4.5
Illustration + Interpretation
• Statistica output for the ANOVA table, with SSR and SSE:
• ‘R-square’ : R2 = 0.4940, the regression can explain about 50% of the total
variability in the yi values:
SSR
351.23
R =
=
= 0.4940
SST O
351.23 + 359.76
2
• The Pearson correlation, found before, was:
r=−
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
√
R2
√
= − 0.4940 = −0.70
227
Chapter 12
Model Diagnostics
. Example
. Linearity
. Constant error variance
. Normality of the errors
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
228
12.1
Example
• We wish to assess whether a patient’s dependence (ADL), one day post operation,
can be used to predict a patient’s length of stay:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
229
• There appears to be a slight increase of length of stay, as a function of the ADL
score. Is this relationship significant?
• Therefore, we fit the following regression model:
Length of stay = β0 + β1ADL + εi
• Statistica output:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
230
• The parameter estimates are:
. βc0 = 9.37
. βc1 = 0.29, p-value: 0.1173
• The fitted regression line is
Length of stay = 9.37 + 0.29ADL
• Note that there is no significant relationship between length of stay and ADL
score, 1 day post operation.
• Further, it follows from R2 = 0.0432 that ADL explains only 4% of the total
variability in length of stay.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
231
12.2
Model Assumptions
• The statistical inferences, obtained for the regression parameters, are valid only if
the model assumptions are satisfied, i.e.,
yi = β0 + β1xi + εi,
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
εi ∼ N (0, σ 2)
232
• Hence, the regression model assumes that . . .
. . . . linearity: for each X, the mean of the corresponding Y -values lie on the
regression line
. . . . normality: implies that, for each X, the corresponding Y -values lie
symmetrically around the regression line
. . . . constant variance: the prediction errors for small X-values are neither larger
nor smaller than the errors for large X-values
• How can these assumptions be verified?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
233
12.3
The Assumption of Linearity
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
234
• To illustrate the effect of non-linearity, consider the following fictitious example:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
235
• There clearly is a positive relationship between xi and yi, but the relationship
between xi and yi appears to deviate somewhat from linearity.
• What happens if we still apply linear regression?
• Statistica output:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
236
• R2 = 0.85: X explains 85% of the observed variability in Y .
• The regression line is given by
Y = 1.19 + 2.06X
• The slope β1 is significantly different from zero (p < 0.001).
• The observed points all lie close to the fitted regression line (explaining the high
R2), but the straight line poorly describes the relationship between xi and yi:
. Over-estimation of the yi for small and large xi
. Under-estimation of the yi in the middle
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
237
• The graph suggests that non-linearity can be discerned through studying the
residuals
ei = yi − yci = yi − (βc0 + βc1xi)
and to plot them as a function of x.
• Graphical representation:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
238
• If the assumption of linearity would be valid, then, for each value of X, the
corresponding value of Y would lie symmetrically around the regression line. The
residuals ei would then have to lie symmetrically around zero, for all possible
values of X.
• Clearly, this is not satisfied in the above example.
• Note that the residuals in fact suggest that the relationship between the yi and
the xi is rather a curved function. We return to this point as part of polynomial
regression.
• Oftentimes, the covariate X can be transformed so that the yi, as a function of
the transformed xi can be assumed linear.
• Frequently used transformations include ln(X),
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
√
X, 1/X, exp(X), ln(X + 1),. . .
239
• For our fictitious example we try a logarithmic transformation of the observed xi:
xi −→ ln(xi )
• Output of the regression procedure:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
240
• Accompanying graph:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
241
• Residual plot:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
242
• R2 = 0.92: our model has improved, because we now can explain more variability
in the y-values by means of the x-values.
• The estimated regression curve now is
Y = 2.95 + 0.80 ln(X)
• Hence, the transformation complicates the interpretation of the regression
coefficients.
For example, 0.80 is the estimated increase in Y when ln(X) increases with one
unit.
• At the same time is the transformation necessary to render the assumption of
normality more realistic, which in turn implies that our statistical inferences w.r.t.
β0 and β1 improve.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
243
12.4
Example: Length of Stay versus ADL
• We now check whether the linearity assumption is satisfied in the regression model
employed for the prediction of length of stay by means of the ADL score, 1 day
post operation.
• The residual plot does not indicate any systematic trend in the residuals:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
244
12.5
The Assumption of Constant Variance
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
245
• For illustration, we study the relationship between diastolic blood pressure and
age, using data of 54 healthy adult women, between 20 and 60 years of age:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
246
• We conduct a regression of blood pressure on age:
• The regression explains more than 40% of the variability in blood pressure
(R2 = 0.4077); there is a significant (p < 0.0001) linear relationship between age
and blood pressure; the estimated regression line is:
Blood pressure = 56.16 + 0.58 × Age
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
247
• Given that the residuals ei = yi − yci can be interpreted as estimates of the
theoretical deviations εi , we can assess the assumption of constant variance for
the εi via a scatter plot of the residuals:
• The residuals are distributed around zero in a ‘parallel’ fashion, pointing to the
fact that linearity would be satisfied.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
248
• At the same time does the residual plot suggest that the variance in εi increases
with age.
• Violation of this assumption will lead to less than optimal inferences about the
parameters β0 and β1:
. The estimated regression line remains to be correct
. The parameters β0 and β1 are estimated less precisely. This leads to larger
p-values and hence a linear relationship between X and Y may go undetected.
• An optimal analysis is obtained through a so-called weighted least squares analysis.
• Oftentimes, non-constant variance is paired with non-normality. A solution for the
non-normality problem very often generates, on the side, a solution for the
non-constant-variance problem.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
249
12.6
Example: Length of Stay versus ADL
• To check the assumption of constant residual variance for the regression model,
employed to predict length of stay by means of the ADL score, 1 day post
operation, we re-consider the residual scatter plot, already created to assess
linearity:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
250
• Apart from the outlier in the middle, there are no systematic trends in the
variability of the residuals.
• We can therefore accept the assumption of constant residual variance.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
251
12.7
The Assumption of Normality
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
252
• Given that the residuals ei = yi − yci are estimators for the theoretical deviations
εi , it is natural to assess the assumption of normality via residuals.
• In practice, one often uses a combination of two methods:
. Graphical: a histogram of residuals
. A formal test for normality
• Both techniques are illustrated by means of the blood pressure data in 54 women.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
253
12.7.1
A histogram of residuals
• A simple graphical way to explore the distribution of the residuals is by means of a
histogram, together with the normal distribution that most closely fits the
histogram:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
254
• From this histogram follows:
. There is no evidence for asymmetry in the distribution of the residuals
. The distribution appears not to be too different from the normal distribution
• We conclude that there is no graphical evidence for non-normal errors εi .
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
255
12.7.2
The normality test
• We can conduct a formal normality test.
• One tests the null hypothesis
H0 : the data are normally distributed
versus the alternative hypothesis
HA : the data are not normally distributed
• Various testing procedures are possible, all leading to a p-value, allowing us to
either reject or accept the null hypothesis
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
256
• Statistica output:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
257
• We obtain a histogram with the normal approximation, but also with the results of
3 test procedures for normality: Shapiro-Wilk, Kolmogorov-Smirnov, and Lilliefor.
The first two are the more common ones.
• Based on each of the 3 procedures, the null hypothesis of normality would be
accepted. We conclude that the residuals ei and hence the errors εi are normally
distributed.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
258
12.7.3
Histogram ←→ normality test
• The histogram is an exploration technique to study the distribution of the
residuals.
• The normality test is a formal test, allowing to test whether the assumption of
normality is acceptable.
• In (very) large samples is the rejection of normality, based on a statistical testing
procedure, rather likely: The smallest deviations of normality will be detected.
• It is known that small deviations from normality will still lead to correct results, as
long as the errors are symmetric.
• Hence, if non-normality is not due to asymmetry, then the results obtained will
still be reliable.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
259
12.8
Example: Length of Stay versus ADL
• We consider again the regression of length of stay with hip fracture patients on
their ADL score, 1 day post operation.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
260
• The residuals are clearly non-normally distributed.
• From the histogram, it follows that non-normality is due to asymmetry.
• In case non-normality results from asymmetry, one can sometimes transform the y
values so as to make residuals in the new regression normally distributed.
• Frequently used transformation are ln(Y ),
√
Y , 1/Y , exp(Y ), ln(Y + 1), . . .
• In our example, we have to transform the data (the y-values) such that the larger
residuals approach the bulk of the residuals.
• A possible transformation is
Length of stay −→ ln(Length of stay)
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
261
• Note that all observed values of length of stay are positive, making the above
transformation allowable.
• Before interpreting the regression model output, we check whether the distribution
of the new residuals is closer to a normal distribution:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
262
• Hence, we can conclude that the errors in the new regression model are normally
distributed.
• New regression output:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
263
• The regression model is slightly improved, given that the R2 value has increased
from 0.0432 to 0.0670
• The regression line is:
ln(Length of stay) = 2.23 + 0.02 × ADL
• Now, we do find a significant relationship:
p = 0.0497 in contrast with p = 0.1173 prior to transformation.
• Note that the relationship derived is no longer linear between the original variables
ADL and Length of Stay.
• This example underscores the need to check normality of errors, given that
possible non-linearity can strongly distort the results.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
264
• The transformation of the y-values can, again, distort linearity, and/or
non-constant variance of the errors εi. It is therefore useful to construct, after
transformation, a scatter plot of the y-values versus the residuals:
• Linearity and constant variability remain satisfied.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
265
12.9
General Conclusion
• Carrying out a regression is easy
• Evaluating a regression model is difficult
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
266
Chapter 13
Influential Observations
. Example
. Cook’s distance
. Application
. What shall we do with influential subjects?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
267
13.1
Example
• We consider again the regression of ln(Length of stay) on the ADL score, 1 day
post operation:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
268
• Patient #20 has got an ADL score of 17, and is hospitalized during 36 days,
which is exceptionally long in comparison with other patients.
• For subject #20, the residual ei = yi − yci is, therefore, very large.
• Given that the parameters β0 and β1 are estimated via the least squares method,
it is legitimate to investigate how strongly our results βc0 and βc1 are influenced by
this individual.
• A subject is highly influential if deleting the subject leads to strongly differing
results.
• Influential observations make interpreting the results more difficult, because the
conclusions become sample-dependent: A different sample would have led to
different results.
• To study a subject’s influence, we can compare βc0 and βc1 with and without the
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
269
given subject.
• To illustrate the method, we consider subject #20, and investigate the effect of
deleting this patient, together with what the effect would have been, had the
subject not had an ‘average’ ADL score, but rather a very large (24) or very small
(10, 5, 0) ADL.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
270
• Results for ADL= 17:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
271
• Results for ADL= 24:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
272
• Results for ADL= 10:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
273
• Results for ADL= 5:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
274
• Results for ADL= 0:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
275
• Summary of the regression results:
With subject #20
ADL
17
24
10
5
0
Parameter
Without subject #20
Estimate (p-value)
Estimate (p-value)
Intercept (β0)
2.233 (<0.001)
2.191 (<0.001)
Slope (β1)
0.022 (0.0497)
0.024 (0.0219)
Intercept (β0)
2.088 (<0.001)
2.191 (<0.001)
Slope (β1)
0.030 (0.0056)
0.024 (0.0219)
Intercept (β0)
2.420 (<0.001)
2.191 (<0.001)
Slope (β1)
0.012 (0.2801)
0.024 (0.0219)
Intercept (β0)
2.541 (<0.001)
2.191 (<0.001)
Slope (β1)
0.005 (0.6246)
0.024 (0.0219)
Intercept (β0)
2.636 (<0.001)
2.191 (<0.001)
-0.0003 (0.9764)
0.024 (0.0219)
Slope (β1)
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
276
• In general, a subject is influential if the following two conditions are satisfied:
. The subject is an outlier, i.e., the value yi is exceptionally large or small, given
its xi value.
. The subject is located at the outside of the X-space; in our example this
means that a large or small ADL score (day 1) is observed.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
277
13.2
Cook’s Distance
• The detection of influential subjects requires the following steps:
. Carry out the regression on all subjects
. Step 1: leave out the first subject and compare the new results with these
based on all data
. Step 2: leave out the second subject and compare the new results with these
based on all data
. Step 3: leave out the third subject and compare the new results with these
based on all data
. ...
. Step n: leave out the last subject and compare the new results with these
based on all data
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
278
• In each step, we have to compare the results obtained in the absence of a certain
subject, with these obtained based on all data.
• This can be done with Cook’s distance, which measures the ‘distance’ between the
results with and without such an observation.
• Cook’s distance for the ith observation is denoted by Di.
• Influential subjects correspond to large Di.
• Non-influential subjects correspond to small Di.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
279
13.3
Application
• We apply this to the regression of ln(Length of stay) on the ADL score, 1 day post
operation.
• In Statistica, this is done via the ‘Extended’ list of residuals and predicted values.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
280
• Statistica output:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
281
• Note that D20 is relatively large.
• In particular for large data sets, an index plot of Cook’s distances can be very
handy, possibly upon explictly constructing a variable with observation numbers.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
282
• Apart from subject #20, we also find that subject #45 has got a relatively large
Di .
• It is therefore of interest to carry out the analysis with each of these observations
removed in turn.
• Repeating the analysis without subject #45 ought to be done.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
283
• The results with all observations, without observation #20, and without
observation #45, respectively, are:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
284
13.4
What to Do With Influential Subjects?
• Does removing influential subjects lead to qualitatively different results?
• Are the data for influential subjects correct?
. Data-entry errors
. Mixing-up of patients case forms
. ...
• Do influential subjects satisfy the inclusion/exclusion criteria of the study?
. Are these genuine hip fracture patients?
. Could there be an additional complication/co-morbidity that could explain their
influence?
. ...
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
285
• When there are no objective criteria for omission, influential subjects ought to be
kept in the study.
• Possible, the least squares criterion can be replaced by a different criterion that is
less sensitive to individual observations.
=⇒ Robust regression techniques
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
286
Part V
Analysis of Variance
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
287
Chapter 14
1-way ANOVA
. Example
. Pairwise t-tests
. 1-way ANOVA
. Illustration
. Model diagnostics
. Influential observations
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
288
14.1
Example
• Because we suspect that the ADL score post operation is not only influenced by
operation-specific factors, but also by, for example, how dependent the patient
was prior to the operation, we study the relationship between the ADL score and
the patient’s living condition prior to operation.
• We distinguish between the following classes:
. Single
. With partner / family / religious community
. RH (Retirement-Home)
/ RCH (Retirement and Care Home)
. Other
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
289
• Descriptive statistics and graphical exploration:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
290
• The fourth group contains only 1 subject, and will not be included for analysis.
• From the graph, it appears that the average ADL score in RH/RCH patients is
higher than in the other two groups. Is this difference significant?
• Even if the three groups would be the same in the population, it would still be
possible to observe differences in the sample, purely by chance.
• How large is the probability that we observe this type of difference?
• Illustration: Vestac Java Applet → Anova → Anova plot
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
291
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
292
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
293
14.2
Pairwise t-tests
• In analogy with the unpaired t-test, we assume that we now have r different sets
of measurements (in the example, r = 3):
. y11 , y12 , y13 , . . . , y1n1 the measurements in the first group
. y21 , y22 , y23 , . . . , y2n2 the measurements in the second group
. ...
. yr1 , yr2, yr3, . . . , yrnr the measurements in the rth group
• Further, we assume that the measurements are sampled from the following
distributions:
Y1j ∼ N (µ1, σ 2),
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
Y2j ∼ N (µ2, σ 2),
...
Yrj ∼ N (µr , σ 2 )
294
• The null hypothesis that we want to test
is
H0 : µ 1 = µ 2 = . . . = µ r
versus the alternative hypothesis
HA : not all µi equal
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
295
• When the above null hypothesis is not satisfied, then at least two of the means µi
must be different. Therefore, we can, in principle, use unpaired t-tests. For r = 3,
this would mean that we test the following hypotheses:
H0 : µ 1 = µ 2
H0 : µ 1 = µ 3
H0 : µ 2 = µ 3
• For our example, we obtain the following p-values:
Single
Partner/family/relig.
RH/RCH
Single
Partner/family/relig.
—
0.8763
0.8763
—
0.0013
<0.0001
RH/RCH
0.0013
<0.0001
—
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
296
• Hence, we only find significant differences between the RH/RCH patients on the
one hand and the other two groups on the other hand.
• Note that, for each test conducted, there is a chance of 5% for a type-I error
(incorrectly rejection H0).
• It can be shown that, for our example, the total probability for a type-I error
satisfies:
P (H0 rejected | H0 )
= P (at least 1 significance | µ1 = µ2 = µ3)
≤ 3 × 5% = 15%
so that the chance for a type-I error is larger than the 5% requested.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
297
• In general, when conducting k tests, the total probability for a type-I error can
increase to k × α, and hence become large when the number of tests conducted is
large.
• It is therefore necessary to dispose of a testing quantity that allows us to test the
null hypothesis
H0 : µ 1 = µ 2 = . . . = µ r
without having to conduct all pairwise t-tests.
=⇒
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
ANOVA
298
14.3
1-way ANOVA
• ANOVA (Analysis of variance) is an extension of the unpaired t-test to the
comparison with more than 2 groups.
• Like with the t-test, the test procedure will compare the variability between groups
with the variability within groups.
• The following equations play a central role:
ni
r X
X
i=1 j=1
|
2
[yij − y ··] =
{z
↓
SST O
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
}
ni
r X
X
[yij − yi·] +
i=1 j=1
|
2
{z
↓
SSwithin
}
r
X
i=1
|
ni[y i· − y ··]2
{z
↓
}
SSbetween
299
Group 1
Group i
Group r
........
.........
................
...... ........
..... ........
....
....
....
...
....
...
....
...
...
...
...
...
...
...
..
..
...
...
.
..
.
.
.
.
.
..
...
.
.
...
..
.
.
..
.
.
.
..
.
..
.
..
..
.
.
.
.
.
.
...
..
...
.
.
..
....
.
.
..
..
..
..
..
..
.
...
.
.
.
..
.
..
..
.
.
.
.
.
.
..
..
..
.
.
..
.
.
.
..
.
.
.
.
..
..
..
..
..
..
..
..
.
.
.
..
..
..
.
.
..
...
.
.
..
.. ..
..
..
..
.. ..
...
.
..
.
..
.. ..
.
..
...
.
..
.. ..
.
..
.
.
.
.
.
.
.
..
..
.
....
.
.
.
..
...
.
.
..
.
..
.
.. ....
.
.
.
.
.
..
..
.
..
.. ...
.
.
..
.
.
.
..
.
..
..
.. ...
.
.
.
.
.
..
..
.
..
..
..
.
.
..
.
.
.
..
..
..
..
..
...
...
...
..
.
..
..
.
..
...
.
.
..
..
.
..
..
..
.
.
.
.
.
..
..
..
.
.
.
.
.
.
..
..
..
.
..
..
.
.
.
...
.
.
.
.
..
...
.
.
..
.
...
.
.
.
.
.
..
..
.
..
.
..
.
.
.
.
.
.
..
..
..
.
..
..
.
.
.
.
.
.
...
...
...
.
..
..
.
...
.
.
.
...
..
.
.
.
.
.
.
...
.
.
.
.
.
...
...
..
...
...
..
.
.
.
.
.
...
...
.
...
.
.
.
.
.
.
.
.
.
.
...
... ...
...
..
...
.
.
.
.
.
.
...
....
......
..
...
...
.
.
.
.
.
.
.
....
. .....
....
..
..
.
.
.
.
.
.
.
.
.
.
.
.
....
....
....
.
..
..
.
.
.
.
.....
.
.
.
.
.
.
.
.
.
.
.
.
.....
......
..
.......
...
.
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........
..........
..........
.....
.....
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........
......
.....
.....
y1j
y 1·
y1j − y 1·
y i· y ··
y r·
y 1· − y ··
y1j − y ··
. y ·· : global mean (all groups together)
. y i· : mean in the ith group
. yij : jth measurement in the ith group
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
300
• SSTO: Total sum of squares
This term expresses the total variability in the data.
• SSwithin: Within-group sum of squares
This term expresses the variability within the groups
• SSbetween: Between-group sum of squares
This term expresses the variability between the groups
• In ANOVA, the null hypothesis is rejected if
F =
SSbetween/(r − 1)
SSwithin/(N − r)
is large. N is the total sample size, N = Pi ni
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
301
• Note that F is the ratio of the variability between groups over the variability
within groups, which is entirely analogous to the unpaired t-test. This motivates
the terminology ‘ANOVA.’
• In our example, F = 8.59
• Under the null hypothesis, F is expected to be small.
• We wish to known in how far F = 8.59 can be obtained purely by chance.
• We calculate the probability that F = 8.59, in case that all populations truly
would be equal, i.e., when µ1 = µ2 = µ3.
• Illustration: Vestac Java Applet → Anova → Histograms of MSR, MSE, F
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
302
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
303
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
304
• Clearly, when there is no difference between the three populations, then it is very
improbable to observe F = 8.59.
• The chance to observe F ≥ 8.59 purely by chance is p = 0.0006.
• Given this chance is so small, more specifically p < α = 0.05 = 5%, we conclude
that the observed value (F = 8.59) is sufficient indication to conclude that µ1,
µ2, and µ3 are different.
• We reject the null hypothesis and conclude that the three groups are significantly
different at the 5% significance level.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
305
• Note that the calculation of the p-values makes use of the assumptions made:
. Normality within all groups
. Equal variance for all groups
• Exactly like with linear regression, these assumptions need to be checked (see
further).
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
306
14.4
Illustration
• Output table containing global F -test:
• The ‘SS MODEL’ is the SSbetween. In the F statistic, SSbetween needs to be
divided by r − 1 = 3 − 1. This quantity is called the number of degrees of
freedom for SSbetween (df=degrees of freedom).
• The ‘SS Residual’ is the SSwithin. In the F statistic, SSwithin needs to be divided
by N − r = 54 − 3. This quantity is called the number of degrees of freedom for
SSwithin.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
307
• The F statistic is
SSbetween/(r − 1)
168.60/2
F =
=
= 8.59
SSwithin/(N − r)
500.23/51
• The corresponding p-value is p = 0.0006, which points to significant differences
between the three groups, as far as the average ADL on day 1 is concerned.
• Exactly like with regression, one can compute a statistic, indicating which portion
of the variability in the ADL scores can be explained by the differences in living
conditions (= variability between groups):
SSbetween
168.60
R =
=
= 0.252
SST O
168.60 + 500.23
2
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
308
14.5
Model Diagnostics
• With ANOVA, one implicitly assumes that the data are sampled from the
following populations:
Y1j ∼ N (µ1, σ 2),
Y2j ∼ N (µ2, σ 2),
...
Yrj ∼ N (µr , σ 2 )
• Hence, we assume that . . .
. . . . constant variance: within every group the spread is equally large
. . . . normality: within each group the data are normally distributed
• When the assumptions are not satisfied, just like with linear regression, erroneous
statistical results can follow (p-values, confidence intervals).
• How can the above assumptions be verified?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
309
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
310
14.5.1
Assumption of constant variance
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
311
• Descriptive statistics and graphical exploration:
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
312
• Is there too much difference in the variance so as to doubt the assumption of
equal variance?
• In other words, to what extent can the observed differences in variance be ascribed
to chance?
• We ought to conduct a formal equal-variance test. The null hypothesis then is
H0 : σ12 = σ22 = . . . = σr2
versus the alternative hypothesis
HA : not all σi2 equal
• This can be done, for example, using Levene’s test.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
313
• Statistica output:
• Hence, we observe that the variances among the three groups are not significantly
different (p = 0.0808).
• When there are many groups, or when some groups contain (very) many
observations, then small differences can be found to be significant by the formal
testing procedure.
• At the same time, it is known that variances that are not too different pose little
or no problem.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
314
• Therefore, one employs, next to a formal test for equal variances, also a rule of
thumb, stating that variances should not differ by more than a factor 5, to avoid
adversely affecting the results.
• In our example, this is:
3.772
= 4.29
1.822
• In practice, one uses the formal test, combined with the rule of thumb, so as to
assess whether the assumption of equal variance is satisfied.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
315
14.5.2
Assumption of Normality
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
316
• ANOVA assumes that the data in every group are normally distributed, with
common variance. Above, we already discussed how the equality of variances can
be assessed. We now assume that the variances are equal, indeed. How can then
normality be tested?
• We rewrite the ANOVA model as
Y1j = µ1 + ε1j
Y2j = µ2 + ε2j
...
Yrj = µr + εrj
where the ‘error terms’ εij all come from the same normal distribution with mean
zero and variance σ 2 .
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
317
• Exactly as with regression, we will check the assumption of normality for the εij
via their estimators
c
eij = yij − µ
i = yij − y i·
• As with regression, the eij are termed residuals: they represent the error made
when the observed value yij for an individual in group i would be predicted by the
group average y i·.
• Once the residuals eij computed, we are in a position, again, to assess normality
using their histograms, or using formal normality tests.
• This is effectuated in full analogy with linear regression.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
318
• Statistica output:
• Hence, we can conclude that the assumption of normality is acceptable.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
319
• Exactly as with simple regression, we have that:
. Departures from normality still lead to correct results, as long as the
distribution of the errors is symmetric.
. In case of asymmetry, the response can sometimes be transformed, so as to
render the residuals in the new model normally distributed.
. However, some transformations can disrupt the constant variance, implying
that this needs to be assessed again after transformation.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
320
14.6
Influential Observations
• In spite of the fact that, with ANOVA, we strictly speaking do not dispose of
regression parameters, individual observations can still have a large influence on
c , and hence ultimately on the ANOVA
the estimation of the group averages, µ
i
results.
• Statistica allows us, exactly as with regression, to measure influence of each
c = y
observation through comparing the estimators µ
i
i· with these that would be
obtained upon deletion of such an observation.
• This results, again, in the so-called ‘Cook’s distance,’ a distance between the
estimators with and without a given observation.
• Exactly as with regression, we consider a scatter plot of Cook’s distances versus
the subject number.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
321
• The computations are done in analogy with simple linear regression.
• Statistica output:
• Hence, there are no observations with an unduly large influence.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
322
Part VI
Logistic Regression
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
323
Chapter 15
Logistic Regression
. Simple case of a proportion
. Two groups
. General definition of logistic regression
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
324
15.1
A Proportion
Successes
:
p
Failures
:
n−p
Total
:
p + (n − p) = n
Proportion
:
Transformation
:
πˆ =
π=
p
n
eα
1+eα
0 ≤ π ≤ 1 ←→ −∞ ≤ α ≤ +∞
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
325
15.2
Formulation of Logistic Regression
Two Groups
Untreated
Treated
Successes
p1
p2
Failures
Total
Prob
n1
eα1
π1 =
1 + eα1
n2
eα2
π2 =
1 + eα2
n1 − p1
n2 − p2
α2 = α1 + β = α + β
α1 = α1
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
= α
326
15.2.1
Effect of a Covariate x
α2 = α1 + β = α + β
α1 = α1
= α
eα
π1 =
1 + eα
eα+β
π2 =
1 + eα+β
eα+βx
π(x) =
1 + eα+βx
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
(x = 0)
(x = 1)
(x = 0, 1)
327
15.2.2
Odds Ratio
P (x = 1)
1 − P (x = 0)
OR =
.
1 − P (x = 1)
P (x = 0)
=
eα+β
1 + eα+β
1
1 + eα+β
= eα+β .
.
1
1 +αeα
e
1 + eα
1
eα
= eβ = ψ
β: log odds ratio
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
328
15.2.3
General Form Of Logistic Model
• Dichotomous outcome Yi:







Yi = 



1 event occurs
0 event does not occur
• p regression variables xi = (x1i , . . . , xpi)0
exp(β0 + β1x1i + . . . + βpxpi )
P (Yi = 1|xi) =
1 + exp(β0 + β1x1i + . . . + βpxpi)
π(xi) =
exp(β0 + β1x1i + . . . + βpxpi )
1 + exp(β0 + β1x1i + . . . + βpxpi)
logitP (Yi = 1|xi) = β0 + β1x1i + . . . + βpxpi
logit[π(xi)] = β0 + β1x1i + . . . + βpxpi
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
329
15.2.4
Odds Ratio
⇒ odds ratio for two individuals
with two series x∗ and x:
π(x∗)
1 − π(x)
OR =
.
1 − π(x∗)
π(x)
= exp

p
 X

j=o

βj (xj − x∗j ) .
exp βj : fraction with which the odds increase (decrease) for each unit change in xj ,
keeping all other covariate values constant
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
330
15.2.5
The Covariates x ?
• indicator variables (exposures: 0/1)
• continuous measures (age)
√
• transformation of measures ( age)
• cross terms = interactions, made up from cross products of other covariates
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
331
Chapter 16
Use of Logistic Regression
. Ordinary logistic regression
. Stratified analysis
. Prospective versus retrospective studies
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
332
16.1
Possible Settings
• Logistic regression can be used:
. Without covariates: then we are back to the case of a simple proportion
. With a single binary covariate: the two group case
. With a general set of covariates: the general definition seen above
• All of this starts from the setting of prospective studies. A few questions remain:
. What about retrospective studies?
. What when data are stratified, e.g., by gender or age group?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
333
16.2
Effect of Stratum
• Assume stratification for sex (M vs. F)
• Various situations can be considered
• Situation 1:
eαM +βM xi
PM (xi) =
1 + eαM +βM
eαF +βF xi
PF (xi) =
1 + eαF +βF
• Two completely different models are considered, one for females, one for males:
. The baseline risks eαM and eαF are different
. The relative risks eβM and eβF are different
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
334
• Situation 2:
eαM +βxi
PM (xi ) =
1 + eαM +β
. The baseline risks eαM
eαF +βxi
PF (xi ) =
1 + eαF +β
and eαF are different
. The relative risks eβM = eβF = eβ are common
• The latter model is often considered. It needs to be understood that there is
indeed an assumption behind it. It is not always true that there is only one
common β parameter. This assumption is equivalent to saying that the effect on
males and females, of the covariate xi, is the same.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
335
16.2.1
Stratum as Covariate
• Stratification can be seen as the inclusion of an extra covariate:
eβ0+β1gi+β2xi+β3gixi
P (xi, gi ) =
1 + eβ0+β1gi+β2xi+β3gixi
where gi = 1 for females and gi = 0 for males.
• We obtain a correspondence with Situation 1:
α M + β M xi = β 0 + β 2 xi
αF + βF xi = (β0 + β1) + (β2 + β3)xi
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
336
• When the relative risks have to be same, as in Situation 2, we have the
requirement:
β2 = β2 + β3
and hence
β3 = 0
or, in other words:
eβ0+β1gi+β2xi
P (xi, gi) =
1 + eβ0 +β1gi+β2xi
whence the interaction between gender and the covariate xi is absent.
• In conclusion, a stratified analysis is conducted by including the stratifying variable
as an ordinary covariate.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
337
• This is the strength of logistic regression.
• There is an important difference of interpretation:
. A “stratifying covariate” such as gi is included in the model without devoting
further study to it.
. An “ordinary covariate” such as xi is the subject of scientific study: we will be
interested in the strength of the effect on the outcome (through the coefficient
β2 or the associated relative risk; or through hypothesis testing).
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
338
16.3
Several Strata
• We do not have to restrict attention to stratification into two groups.
• In general, we may write:
eαm+βxi
Pm (xi) =
1 + eαm+β
where
. m = 1, . . . , M indexes M stratification groups (e.g., age class)
. xi is the level of the exposure variable for subject i (e.g., tobacco use)
• the baseline risk (eαm ) is different
• the relative risk (eβ ) is common
• A model with different relative risks could be constructed as well
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
339
16.4
Stratum Effect: General Situation
• A fully general model:
Pm (xi) =
Pp
eαm+ j=1 βpxpi
1+
Pp
eαm+ j=1 βpxpi
• Several exposure variables x1i, . . . , xpi
Example: tobacco and alcohol
• m = 1, . . . M : stratum indicator
Example: age class
• the baseline risk (eαm ) is different
• the relative risks (eβp ) are common
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
340
16.5
Prospective ←→ Retrospective
• Prospective:
. the exposures xi or E are fixed
. the outcomes yi or D are stochastic
• Retrospective (Case-Control):
. the exposures xi or E are stochastic
. the outcomes yi or D are fixed
• Can one adapt logistic regression ?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
341
16.6
Retrospective Logistic Regression
• Prospective:
exp(α + βxi)
P (yi = 1|xi) =
1 + exp(α + βxi)
. eα : the baseline risk
. eβ : the relative risk
• Retrospective (Case-Control):
P (yi = 1|xi selected into the study) =
exp(α∗ + βxi)
1 + exp(α∗ + βxi)
α∗
. e : no interpretation
. eβ : the relative risk
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
342
16.7
Analogy: 2 × 2 tables
• Contingency tables:
odds ratio ψ “exposed—unexposed”
=
odds ratio ψ “diseased—disease free”.
• Thus, for 2 × 2 tables:
inference for cohort studies
≡
inference for case-control studies
• This identity is transmitted to the general logistic model.
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
343
Chapter 17
Case Study: Ille-et-Villaine
. A single exposure
. Two exposures, qualitative analysis
. Two exposures, quantitative analysis
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
344
17.1
Ille-et-Villaine Study
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
345
17.2
The Data for a Single Binary Exposure
AGE=1
CASES
CONTROLS
TOBACCO+
1
9
10
TOBACCO0
106
106
CASES
CONTROLS
TOBACCO+
4
26
30
TOBACCO5
164
169
AGE=2
6 strata × 2 exposure groups
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
346
17.3
The Data
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
347
17.4
Modeling a Single Binary Exposure
model
# par
deviance
= −2 logl
0
..
1
..
∞
1
..
k
..
MAX
G0
..
G1
..
G∞
Considerations:
• parsimonious model: no non-significant effects
• fit is good: G1 − G∞ ∼ χ2MAX−k is small
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
348
17.4.1
Degrees of Freedom
• Decomposition:
df(model): 1 per parameter
df(fit): to test that the model is ‘good’
• In our case:
. 6 age classes
× 2 classes of exposure
= 12 degrees of freedom (12 df)
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
349
17.4.2
Model 0
• intercept α: common age effect
• Model
eα
Pm (xi ) =
1 + eα
• df(mod)= 1
• df(fit)= 12 − 1 = 11
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
350
17.4.3
Model 1
• age stratum: intercept per age category:
α1 , α2 , α3 , α4 , α5 , α6
• Model
eαm
Pm (xi) =
1 + eαm
• effects in the model:
. df(mod)= 6
. no test necessary !
• fit:
. df(fit)= 12 − 6 = 6
. G = G1 − G∞ = 90.56 ∼ χ26
. no good fit ⇒ untenable
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
351
17.4.4
Model 2
• age stratum: intercept per age category:
α1 , α2 , α3 , α4 , α5 , α6
• effect of exposure to tobacco:
• Model
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
β
eαm+βxi
Pm (xi) =
1 + eαm+βxi
352
• effects in the model:
. df(mod)= 7
. Estimation of the effect:
βˆ = 1.670(standard error = 0.190)
ψˆ = exp(1.670) = 5.31
. 95 % confidence limits:
βL = 1.670 − 1.96 × 0.190 = 1.30
βU = 1.670 + 1.96 × 0.190 = 2.04
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
353
. The αs:
No interpretation !
c
α
1
c
α
2
c
α
3
c
α
4
c
α
5
c
α
6
= −5.054
= −3.512
= −1.855
= −1.341
= −1.087
= −1.092
• fit:
. df(fit)= 12 − 7 = 5
. G = G2 − G∞ = 11.04 ∼ χ25
g
Pearson statistic: wideG
= 9.32
(p = 0.05)
(p = 0.15)
. fit is OK
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
354
17.4.5
Model 3
• age stratum: intercept per age:
α1 , α2 , α3 , α4 , α5 , α6
• effect of exposure to tobacco:
β
• linear interaction between age and exposure
γ
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
355
m
1
2
3
4
5
6
(m − 3.5)
-2.5
-1.5
-0.5
0.5
1.5
2.5
systematic trend in the relative risk of age
• Model
eαm+βxi+γxi(m−3.5)
Pm(xi ) =
1 + eαm+βxi+γxi(m−3.5)
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
356
• effects in the model:
. df(mod)= 8
. Estimation of the effect:
γˆ = 0.125(standard error = 0.189)
not significant
• fit:
. df(fit)= 12 − 8 = 4
. not needed since the effect is not significant
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
357
17.4.6
Model ∞
• age stratum: intercept per age:
α1 , α2 , α3 , α4 , α5 , α6
• effect of exposure, per age stratum:
β1 , β2 , β3 , β4 , β5 , β6
• Model
eαm+βmxi
Pm(xi ) =
1 + eαm+βmxi
• effects in the model:
. df(mod)= 12
• fit:
. df(fit)= 12 − 12 = 0
. fit is perfect
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
358
17.4.7
Summary
Mod.
df(mod)
df(fit)
dev
0
1
2
3
∞
1
6
7
8
12
11
6
5
4
0
G0
G1
G2
G3
G∞
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
359
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
360
17.5
Residuals
• Principle:
Expected number of cases = observed number of cases
• For the example:
0.33 + 4.10 + 24.50 + 40.13 + 23.74 + 3.20 = 96
ˆ where
• The residuals (O − E) have variance N Pˆ Q
. N : number of cases + number of controls
. Pˆ : estimated disease probability
ˆ = 1 − Pˆ
.Q
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
361
• Standardized residuals:
O−E
ˆ
N Pˆ Q
s
ˆ
• Squaring and adding ⇒ G.
• Make a plot and
. verify possible patterns
. verify the values: larger than 2 is a problem
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
362
17.5.1
Values for the Example
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
363
17.6
Qualitative Analysis
• Use so-called “grouped data” for the Ille-et-Villaine set
• This means that tobacco and alcohol are considered as being classified into four
categories each, rather than having continuous (quantitative) values
• At the same time, we will study the effects of tobacco and alcohol jointly , on the
relative risk of esophageal cancer in Ille-et-Villaine.
. cases: 200 males with diagnosed esophageal cancer (1972–1974)
. controls: 778 male adults (775 satisfy the criteria)
2 factors with 4 levels ⇒ 16 risk categories
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
364
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
365
17.6.1
Degrees of Freedom
• Levels:
Levels of alcohol
Levels of tobacco
Levels of age
4
4
6
• Cells: 4 × 4 × 6 = 96
• 8 empty cells
• 96 − 8 = 88 degrees of freedom
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
366
17.6.2
Alcohol
• Dichotomous variables:
Level
0 (1)
1 (2)
2 (3)
3 (4)
ALC2
0
1
0
0
ALC3
0
0
1
0
ALC4
0
0
0
1
• ALC1 is baseline category (why ?)
• Coefficients β2, β3, β4
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
367
• Relative risks:
. 1: RR of level 1 versus level 1
. eβ2 : RR of level 2 versus level 1
. eβ3 : RR of level 3 versus level 1
. eβ4 : RR of level 4 versus level 1
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
368
17.6.3
Steps
• Define modeling strategy
• Find acceptable model
• Study the model:
. Interpretation
. Are the significant effects also (medically) relevant ?
. ...
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
369
17.6.4
Modeling
AGE (df=6)
fit (df=82)
AGE (df=6)
.
&
ALC (df=3)
fit (df=79)
AGE (df=6)
ALC (df=3)
fit (df=79)
&
AGE (df=6)
.
ALC (df=3)
TOB (df=3)
fit (df=76)
↓
AGE (df=6)
ALC (df=3)
TOB (df=3)
ALC*TOB (df=9)
fit (df=67)
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
370
17.6.5
Model Fit
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
371
17.6.6
Comparison
• With or without interaction ?
• Compare:
. Alcohol:
∗ Model 2 versus Model 1: 246.9 − 105.9 = 141.0
∗ Model 4 versus Model 3: 210.3 − 82.3 = 128.0
. Tobacco:
∗ Model 3 versus Model 1: 246.9 − 210.3 = 36.6
∗ Model 4 versus Model 2: 105.9 − 82.3 = 23.6
adjusted χ2 < non-adjusted χ2
(due to the correlation between the consumption of alcohol and the consumption
of tobacco)
• Both covariates do have strong independent effects
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
372
17.6.7
Fitting
AGE (df=6)
fit (df=82)
G2 = 246.9
(p < 0.0001)
too simple
AGE (df=6)
ALC effect .
& TOB effect
AGE (df=6)
ALC (df=3)
ALC (df=3)
fit (df=79)
G2 = 105.9
fit (df=79)
G2 = 210.3
(p = 0.0234)
(p < 0.0001)
too simple
too simple
TOB|ALC effect &
AGE (df=6)
. ALC|TOB effect
ALC (df=3)
TOB (df=3)
fit (df=76)
G2 = 82.3
(p = 0.2907)
appropriate
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
373
17.6.8
Comparison
• ALC effect:
. Alcohol effect
• TOB|ALC effect:
. The effect of tobacco, after correction for alcohol
. χ2 = 141.0
. χ2 = 23.6
. df=3
. df=3
. p < 0.0001
. p < 0.0001
. strong effect
. strong effect
• TOB effect:
. Tobacco effect
• ALC|TOB effect:
. The effect of alcohol, after correction for tobacco
. χ2 = 36.3
. χ2 = 128.0
. df=3
. df=3
. p < 0.0001
. p < 0.0001
. strong effect
. strong effect
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
374
17.6.9
Model 2
• Estimated coefficients for Model 2:
k
Group
exp(βk )
1
0– 39 g/day
1.0
2
40– 79 g/day
4.2
3
80–119 g/day
7.4
4
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
120+ g/day
39.7
375
17.6.10
Model 4
• Model 4 is sufficient (χ276 = 82.34, p = 0.2907)
• Coefficients:
Alcohol
k
Group
Tobacco
exp(βk )
k
Group
exp(βk )
1
0– 39 g/day
exp(0)
1
0– 9 g/day
exp(0)
2
40– 79 g/day
exp(1.44)
2
10–19 g/day
exp(0.44)
3
80–119 g/day
exp(1.98)
3
20–29 g/day
exp(0.51)
exp(3.60)
4
30+ g/day
exp(1.64)
4
120+ g/day
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
376
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
377
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
378
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
379
17.6.11
Questions
Model 4
• What is the RR of exposure to ALC (80-119 g/day) ?
• What is the RR of exposure to TOB (10-19 g/day) ?
• What is the RR of exposure to
. TOB (10-19 g/day)
. ALC (80-119 g/day)
What assumptions ?
Initiatie Wetenschappelijk Onderzoek: Biostatistiek
380