Initiatie Wetenschappelijk Onderzoek Biostatistiek — Jaar 1 Geert Molenberghs m.m.v. Geert Verbeke [email protected] [email protected] Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat) KU Leuven (& UHasselt), Belgium Interuniversity Institute for Biostatistics and statistical Bioinformatics www.ibiostat.be & www.kuleuven.ac.be/biostat/ Contents 1 I Some References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fundamental Concepts 1 3 2 Introductory material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 What is statistics ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 Confidence intervals & hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6 Use and misuse of statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Initiatie Wetenschappelijk Onderzoek: Biostatistiek i 7 Data Structures and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 II Contingency Tables 8 Contingency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 III t Test 9 Comparing Groups with Continuous Outcomes: the t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 IV Linear Regression 10 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 11 Simple (Single) Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 12 Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 13 Influential Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 100 171 181 ii V Analysis of Variance 14 1-way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 VI Logistic Regression 15 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 16 Use of Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 17 Case Study: Ille-et-Villaine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 287 323 iii Chapter 1 Some References General • Bailar, J.C. and Mosteller, F. (1992) Medical Uses of Statistics. Boston: NEJM Books. • Chatterjee, S., Handcock, M.S., and Simonoff, J.S. (1995) A Casebook for a First Course in Statistics and Data Analysis. New York: John Wiley. • Dunn, G. and Everitt, B. (1995) Clinical Biostatistics. London: Arnold. • Everitt, B. and Dunn, G. (1998) Statistical Analysis of Medical Data. London: Arnold. • Hill, A.B. (1977) A Short Textbook of Medical Statistics. 10th ed. Philadelphia: J.B. Lippincott Co. • Pagano, M. and Gauvreau, K. (1993) Principles of Biostatistics. Belmont: Duxbury Press. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 1 • Rosner, B. (1995) Fundamentals of Biostatistics. Belmont: Duxbury Press. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 2 Part I Fundamental Concepts Initiatie Wetenschappelijk Onderzoek: Biostatistiek 3 Chapter 2 Introductory material . Motivation . Course material Initiatie Wetenschappelijk Onderzoek: Biostatistiek 4 2.1 Motivation • Statistics in the (bio-)medical literature • Correct analysis of collected data • Correct interpretation of results Initiatie Wetenschappelijk Onderzoek: Biostatistiek 5 2.2 Course material • Copies of the course notes • Papers from (bio-)medical literature • Vestac JAVA applets . Online: http://ucs.kuleuven.be/links/index.htm . Local installation: http://ucs.kuleuven.be/java/download/download.html and follow instructions Initiatie Wetenschappelijk Onderzoek: Biostatistiek 6 Chapter 3 What is statistics ? . Example . Population – sample . Random variability Initiatie Wetenschappelijk Onderzoek: Biostatistiek 7 3.1 Example: Captopril data • 15 patients with hypertension • The response of interest is the supine blood pressure, before and after treatment with CAPTOPRIL • Research question: How does treatment affect BP ? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 8 • Dataset ‘Captopril’ Before After Pati¨ent SBP DBP SBP DBP 1 210 130 201 125 2 169 122 165 121 3 187 124 166 121 4 160 104 157 106 5 167 112 147 101 6 176 101 145 85 7 185 121 168 98 8 206 124 180 105 9 173 115 147 103 10 146 102 136 98 11 174 98 151 90 12 201 119 168 98 13 198 106 179 110 14 148 107 129 103 15 154 100 131 82 Initiatie Wetenschappelijk Onderzoek: Biostatistiek Average (mm Hg) Diastolic before: 112.3 Diastolic after: 103.1 Systolic before: 176.9 Systolic after: 158.0 9 • It would be of interest to know how likely the observed changes in BP are to occur by pure chance. • If this is very unlikely, the above data provide evidence that BP indeed decreases after treatment with Captopril. Otherwise, the above data do not provide evidence for efficacy of Captopril. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 10 • Obviously, we are not interested in drawing conclusions about the 15 observed patients only. • Instead, we would like to draw conclusions about the effect of Captopril on the total population of all hypertensive patients. • Conclusion: Statistics aims at drawing conclusions about some population, based on what has been observed in a random sample Initiatie Wetenschappelijk Onderzoek: Biostatistiek 11 P O P U L A T I O N ••••••••••••••••••••••••••••••••••••••• ••••••••• ••••••••••••••• • • • • • • • •••••• •• • • • • ••••• • • • •••• • •• • ••• • • ••• • • ••• • • • •• • ••• •• • ••• • ••• •• • • ••• •• ••• • •••• ••• • ••••• • • • •••••• •••• • • •••••••• • • • ••••••••••••• ••••• •••••••••••••••••••••••••••••••••••••••••••••• ••• ••• ••• ••• • ••• ••••• •• •••••••• •••••••• •• RANDOM S A M P L E ••••••••••••• ••••••••••••••• •••••••••••••••••••••••• • • • • • • •••••• ••• ••••• •••• • • ••• • • ••• • • ••• • • ••• •••• • ••• • • ••• • ••• •• • •••• • ••••• •••• • • • ••••••• • • •••• •••••••••••• •••••••••••••••••••••••••••••••••••••••• Initiatie Wetenschappelijk Onderzoek: Biostatistiek Effect of Captopril in population •• •••••••••• • •• • • •• •••• ••• •• •• •• •• •• STATISTICS •• •••••••••• • •• • •• •• •••• •• •• •• •• •• •• Effect of Captopril in 15 patients 12 3.2 Population versus random sample • Population: Hypothetical group of current and future subjects, with a specific condition, about which conclusions are to be drawn • Sample: Subgroup from the population on which observations will be taken • In order for effects observed in the sample to be generalizable to the total population, the sample should be taken at random Initiatie Wetenschappelijk Onderzoek: Biostatistiek 13 3.3 The aim of statistics • The aim of statistics is twofold: . Descriptive statistics: Summarizing and describing observed data such that the relevant aspects are made explicit. . Inferential statistics: Studying to what extent observed trends/effects can be generalized to a general (infinite) population Initiatie Wetenschappelijk Onderzoek: Biostatistiek 14 • Examples of descriptive statistics include tables, graphs, calculation of averages,. . . • Valid inferential statistics requires a strong link between the sample and the population about which one wishes to draw conclusions. • Valid inferential statistics requires: . Correct statistical methodology . Correct interpretation of results Initiatie Wetenschappelijk Onderzoek: Biostatistiek 15 Chapter 4 Summary statistics . Introduction . Measures of location . Measures of spread . Percentages . Example from the biomedical literature Initiatie Wetenschappelijk Onderzoek: Biostatistiek 16 4.1 Introduction A B ••••••••••••• ••••••••• ••••••••• • • •• • ••••••••••••• ••••••••• • • •• • ••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• • • •• • • • •• • • • •• • • • •• • • • • •• • • • •• • • • •• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• ••••••••• C A and B have the same location but different spread A and C have the same spread but different location Initiatie Wetenschappelijk Onderzoek: Biostatistiek 17 4.2 Measures of location • Location measures: Where are the observations more or less located ? • As an example, consider the small sample: 1, 3, 3, 4, 5, 14 • Sample average (sample mean): x = 1 + 3 + 3 + 4 + 5 + 14 x1 + . . . + xn = = 5 6 n Initiatie Wetenschappelijk Onderzoek: Biostatistiek 18 • The sample median is the middle observation: 1 3 3 | 3+4 2 {z ↓ 4} 5 14 = 3.5 • The sample mode is the value that was observed the most often: 1, 3, 3, 4, Initiatie Wetenschappelijk Onderzoek: Biostatistiek 5, 14 19 • Note that the sample average is very sensitive to outliers: 1, 3, 3, 4, 5, 14 −→ 5 1, 3, 3, 4, 5, 20 −→ 6 1, 3, 3, 4, 5, 26 −→ 7 • This is not the case with the sample median: 1, 3, 3, 4, 5, 14 −→ 3.5 1, 3, 3, 4, 5, 20 −→ 3.5 1, 3, 3, 4, 5, 26 −→ 3.5 • The mode is not always informative: Mode Initiatie Wetenschappelijk Onderzoek: Biostatistiek 20 • For symmetric data, the average and the median are the same. In general, they are not: Symmetric Median = Mean Initiatie Wetenschappelijk Onderzoek: Biostatistiek Skewed an ean i ed M M 21 • With skewed data, the mean can be heavily influenced by the random presence of a/some extreme observation(s). • In order to still get a good idea about the location of the data, one then prefers the use of the median over the mean: Symmetric data =⇒ Mean Skewed data =⇒ Median Initiatie Wetenschappelijk Onderzoek: Biostatistiek 22 4.3 Measures of spread • Obviously, a measure of location only summarizes one specific aspect of the observed data: “Statistician drowning in a lake of average depth 0.5m” Initiatie Wetenschappelijk Onderzoek: Biostatistiek 23 • Measures of spread: How similar are the observations ? xn .... x8 x7 x6 x5 x4 x3 x2 x1 x Initiatie Wetenschappelijk Onderzoek: Biostatistiek xn .. .. x7 x4 or x2 x8 x6 x5 x3 x1 x 24 • As an example, re-consider the small sample: 1, 3, 3, 4, 5, 14 • Mean deviation from the mean : 1 n n X (xi − x) = i=1 −4 − 2 − 2 − 1 + 0 + 9 0 = = 0 6 6 • Mean quadratic deviation from the mean: 1 n (−4)2 + (−2)2 + (−2)2 + (−1)2 + 02 + 92 (xi − x) = i=1 6 n X 2 = Initiatie Wetenschappelijk Onderzoek: Biostatistiek 106 = 17.67 6 25 • Sample variance: s2 = 1 n−1 n X (xi − x)2 i=1 (−4)2 + (−2)2 + (−2)2 + (−1)2 + 02 + 92 106 = = 21.2 = 5 5 • Note that the units of the sample variance and the mean quadratic deviation are the squared units of the original observations • The sample standard deviation is in the same units as the original observations: s = v u u u u t 1 n−1 Initiatie Wetenschappelijk Onderzoek: Biostatistiek n X (xi − i=1 x)2 √ = 21.2 = 4.60 26 • Sample range: R = max xi − min xi = 14 − 1 = 13 i i • Note that the range strongly depends on the sample size n: Larger samples are more likely to contain extreme observations, hence are more likely to have a larger range • Since we hope that our measure of spread reflects the amount of variation in the population, we prefer a measure that does not depend on the sample size. • The sample interquartile range is the range obtained after deletion of the 25% highest and 25% lowest values in the sample (rounded down if needed): 1, 3, 3, 4, 5, 14 Initiatie Wetenschappelijk Onderzoek: Biostatistiek −→ 3,3,4,5 −→ IQR = 5 − 3 = 2 27 • The interquartile range does not depend on the sample size n, since a larger number of observations is deleted in larger samples. • The variance (hence also mean quadratic deviation and standard deviation), and the range are very sensitive to outliers: 1, 3, 3, 4, 5, 14 −→ s2 = 21.2, 1, 3, 3, 4, 5, 20 −→ s2 = 48.8, 1, 3, 3, 4, 5, 26 −→ s2 = 88.4, R = 13 R = 19 R = 28 • This is not the case with the interquartile range: 1, 3, 3, 4, 5, 14 −→ IQR = 2 1, 3, 3, 4, 5, 20 −→ IQR = 2 1, 3, 3, 4, 5, 26 −→ IQR = 2 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 28 • With skewed data, the standard deviation can be heavily influenced by the random presence of a/some extreme observation(s). • In order to still get a good idea about the variation in the data, one then prefers the use of the interquartile range over the standard deviation: Symmetric data =⇒ Standard deviation Skewed data =⇒ IQR Initiatie Wetenschappelijk Onderzoek: Biostatistiek 29 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 30 4.4 Percentages • Traditionally, measurements are summarized by a measure of location and a measure of spread • However, suppose the variable of interest is ‘sickness absence’ • For each subject i in the sample, we define xi as: xi = 1 if subject i was absent due to illness 0 otherwise • The sample average equals x = x1 + x2 + . . . + xn Number of people with sickness absence = n n Initiatie Wetenschappelijk Onderzoek: Biostatistiek 31 • Hence, the average equals the observed proportion (percentage) of people with sickness absence • Note that, once the average is known, the number of zeroes and ones is known, hence also the variability: 0 1 x6 x5 x4 x3 x2 x1 x = 0.5 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 0 1 0 x6 x5 x4 x3 x2 x1 x = 0.16 1 x6 x5 x4 x3 x2 x1 x = 0.84 32 • One can show that the variance is obtained as s2 = n x (1 − x) n−1 • Since the variance directly follows from average, only the average is reported, no measure of spread • For example, the variables ‘sickness absence’ could be summarized as follows: Variable Sickness: Initiatie Wetenschappelijk Onderzoek: Biostatistiek (n = 256) Yes 103 (40.23%) No 153 (59.77%) 33 4.5 Example from the biomedical literature Wong et al. , Table 1 (first part): . Means and standard deviations . Medians and IQR’s . Percentages Initiatie Wetenschappelijk Onderzoek: Biostatistiek 34 Chapter 5 Confidence intervals & hypothesis testing . Random variability . Confidence intervals . Interpretation of confidence intervals . Hypothesis testing . Hypothesis testing versus confidence intervals . Examples from biomedical literature Initiatie Wetenschappelijk Onderzoek: Biostatistiek 35 5.1 Random variability • Descriptive statistics of the observed differences in diastolic BP, after treatment with Captopril, in 15 subjects: Initiatie Wetenschappelijk Onderzoek: Biostatistiek After DBP Change Pati¨ent Before DBP 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 130 122 124 104 112 101 121 124 115 102 98 119 106 107 100 125 121 121 106 101 85 98 105 103 98 90 98 110 103 82 5 1 3 −2 11 16 23 19 12 4 8 21 −4 4 18 36 • Note that not all subjects experience the same benefit from the treatment • An average decrease of 9.27 mmHg is observed in our sample • A new, similar, experiment would lead to another sample, hence to another observed change in BP: . More reduction (11.57 mmHg) ? . Less reduction (4.78 mmHg) ? . No change (0.00 mmHg) ? . Increase (-5.23 mmHg) ? • This shows that the observed decrease of 9.27 mmHg should not be overinterpreted • This also shows that one should not hope that 9.27 mmHg is the gain in BP one would observe if the total population were treated with Captopril. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 37 • Let µ be the average change in BP one would observe if the total population would be treated • 9.27 mmHg can then be interpreted as an estimate for µ, based on our sample • Question: Is our observed change of 9.27 mmHg sufficient evidence to conclude that the treatment really affects the BP ? • Answer: Confidence intervals & Hypothesis testing Initiatie Wetenschappelijk Onderzoek: Biostatistiek 38 P O P U L A T I O N ••••••••••••••••••••••••••••••••••••••• ••••••••• ••••••••••••••• • • • • • • • •••••• •• • • • • ••••• • • • •••• • •• • ••• • • ••• • • ••• • • • •• • ••• •• • ••• • ••• •• • • ••• •• ••• • •••• ••• • ••••• • • • •••••• •••• • • •••••••• • • • ••••••••••••• ••••• •••••••••••••••••••••••••••••••••••••••••••••• ••• ••• ••• ••• • ••• ••••• •• •••••••• •••••••• •• RANDOM S A M P L E ••••••••••••• ••••••••••••••• •••••••••••••••••••••••• • • • • • • •••••• ••• ••••• •••• • • ••• • • ••• • • ••• • • ••• •••• • ••• • • ••• • ••• •• • •••• • ••••• •••• • • • ••••••• • • •••• •••••••••••• •••••••••••••••••••••••••••••••••••••••• Initiatie Wetenschappelijk Onderzoek: Biostatistiek Is µ different from 0 •• •••••••••• • •• • • •• •••• ••• •• •• •• •• •• STATISTICS ? •• •••••••••• • •• • •• •• •••• •• •• •• •• •• •• Observed effect of 9.27 mmHg in 15 randomly selected patients 39 5.2 The confidence interval • The estimate 9.27 mmHg for µ is based on this particular sample • Repeating the experiment would lead to a different estimate for µ • Hence, we should not expect µ to be exactly equal to 9.27 mmHg • A confidence interval is an interval around 9.27 mmHg which is likely to contain the unknown population average µ • For example, a 95% confidence interval for µ: [ 4.91 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 9.27 ] 13.63 40 • The percentage 95% is called the confidence level • Confidence intervals for other confidence levels: Level 35% 63% 82% 95% 99% Confidence interval [8.27; 10.27] [7.27; 11.27] [6.27; 12.27] [4.91; 13.63] [3.02; 15.52] • In biomedical sciences, one traditionally uses 95% confidence levels • Ideally, C.I.’s are small, as this reflects a very precise estimation of the unknown population parameter µ Initiatie Wetenschappelijk Onderzoek: Biostatistiek 41 • The length of the C.I. increases with the confidence level: Level 95% 99% Confidence interval [4.91; 13.63] [3.02; 15.52] • Intuitively: larger intervals are more likely to contain the unknown population parameter µ • The length of the C.I. decreases with the sample size n • Intuitively: More observations leads to more precision: One can ‘buy’ extra precision with extra observations Initiatie Wetenschappelijk Onderzoek: Biostatistiek 42 • What about 100% C.I.’s ? • The 100% C.I. for µ equals [−∞; +∞], which is not informative at all • Intuitively: Absolute certainty about population characteristics cannot be attained based on a finite sample of observations Initiatie Wetenschappelijk Onderzoek: Biostatistiek 43 5.3 Interpretation of the confidence interval • Let us focuss on the 95% confidence interval. For other confidence levels, the interpretation is similar. • For a specific data set, such as the Captopril data, the obtained confidence interval [4.91; 13.63] may or may not contain µ. • However it is very likely to contain µ, since only 5 out of 100 data sets would lead to an interval not containing µ. • Illustration: Vestac Java Applet → statistical tests → confidence interval for mean Initiatie Wetenschappelijk Onderzoek: Biostatistiek 44 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 45 5.4 Hypothesis testing • As before, µ is the average change in diastolic BP one would observe if the total population of hypertensive patients would be treated with Captopril. • Note that µ will never be known, but we can use our sample to learn about µ. • In case the treatment would have no effect, the average µ would be zero. • So, if one can show that there is (strong) evidence that µ 6= 0, then this can be considered as evidence for a treatment effect. c = 9.27mmHg. • Based on our sample of 15 observations, we estimated µ by µ • Obviously, this estimate is relatively far away from 0, suggesting that the treatment might affect BP Initiatie Wetenschappelijk Onderzoek: Biostatistiek 46 c = 9.27 could have occurred by pure • On the other hand, the observed effect µ chance, even if there would be no treatment effect at all. • Question: How likely would that be ? • Only if this would be very unlikely to happen, the observed data will be considered sufficient evidence for some effect of the treatment • The procedure to decide whether there is sufficient evidence to believe the treatment did affect BP is called test of hypothesis Initiatie Wetenschappelijk Onderzoek: Biostatistiek 47 • In practice, the research question is formulated in terms of a null hypothesis H0 and an alternative hypothesis HA: H0 : µ = 0 versus HA : µ 6= 0 • Based on our observed data, we will investigate whether H0 can be rejected in favour of HA • If not, the null hypothesis H0 is accepted and one decides that the treatment was not effective • Intuitively, it is obvious that H0 : µ = 0 will be rejected if the observed sample c is too far away from 0 average µ Initiatie Wetenschappelijk Onderzoek: Biostatistiek 48 • Question: How far is too far ? • Answers: If this result is very unlikely to happen by pure chance If this result is not at all what you expect to see if µ would be 0 • One can calculate that, if Captopril would have no effect at all, that there is only 0.1% chance of observing a sample with average change in BP at least as big as 9.27mmHg. • Hence, if Captopril would have no effect (i.e., if µ = 0), then it would be very unlikely to observe a sample with average as extreme as 9.27. This would happen only once every 1000 times a similar experiment would be performed. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 49 • We therefore consider the data observed in our experiment sufficient evidence to reject the null hypothesis and we conclude that the treatment effect is significantly different from 0, or equivalently, that there is a significant treatment effect • The probability 0.1% that expresses how extreme our observations are in case the null hypothesis would be true, is denoted by p, and is called the p-value. • A small p-value is indication of extreme results were H0 true. One then rejects the null hypothesis • A large p-value is indication that the observed results are perfectly in line with what can be expected to observe, if H0 is true. One then does not reject the null hypothesis, which is equivalent to accepting the null hypothesis • In practice, one has to decide how small p should get before the null hypothesis is rejected. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 50 • One therefore specifies the so-called level of significance α: p < α =⇒ reject H0 p ≥ α =⇒ accept H0 • α is typicaly a small value, such as 0.01, 0.05, 0.10 • In biomedical sciences α = 0.05 = 5% is standard. • One then rejects the null hypothesis as soon as the observed result would happen in less than 5 times in 100 experiments, assuming that the null hypothesis would be correct Initiatie Wetenschappelijk Onderzoek: Biostatistiek 51 5.5 Hypothesis testing versus confidence intervals • For the Captopril data, we have drawn conclusions about the average treatment effect in the population, through 2 different statistical procedures: . 95% confidence interval: [4.91; 13.63] . Significance of treatment effect, p = 0.001 • We know from the C.I. that the average treatment effect is likely to be between 4.91 and 13.63, excluding 0 • The significance test has rejected the value 0 as possible value for µ • So, both procedures agree Initiatie Wetenschappelijk Onderzoek: Biostatistiek 52 • Question: Do both procedures always agree ? • Answer: Yes, provided the levels of significance and confidence are complementary to each other: Level of significance α Confidence level (1 − α)100% Initiatie Wetenschappelijk Onderzoek: Biostatistiek 0.05 95% 0.10 90% 0.01 99% 53 • In case of accepting H0 (p ≥ α = 0.05): 95% C.I. [ .... ......... .. ... .. .... ... ... .. ] x H0 • In case of rejecting H0 (p < α = 0.05): 95% C.I. . .... ....... .. ... .. .... .. ..... . [ x ] H0 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 54 • An alternative interpretation for the C.I. follows immediately: A 95% C.I. is the collection of all null hypotheses that would be accepted in a statistical test • Statistical tests are to some extent equivalent to C.I.’s • However, C.I.’s have the advantage of giving an indication of the effect size c (treatment esstimate µ), as well as of the precision of estimation (width of C.I.) • So, C.I.’s should be preferred over statistical tests ↔ Initiatie Wetenschappelijk Onderzoek: Biostatistiek Biomedical literature 55 5.6 Example from the biomedical literature Wong et al. . Section on statistical methodology: . Two-sided tests . 5% level of significance Initiatie Wetenschappelijk Onderzoek: Biostatistiek 56 . Table 2: . C.I.’s for differences between means and medians . Corresponding tests for significance Initiatie Wetenschappelijk Onderzoek: Biostatistiek 57 Chapter 6 Use and misuse of statistics . Errors in statistics . Two types of errors . Multiple testing . Equivalence tests . Significance versus relevance Initiatie Wetenschappelijk Onderzoek: Biostatistiek 58 6.1 Possible errors in decision making • In our example about the Captopril treatment, we obtained p = 0.001 leading to the rejection of the null hypothesis of no treatment effect. • This should not be considered as formal proof that there is a treatment effect • Even if the treatment has no effect at all, a sample like ours would occur once every 1000 times. • Maybe, our sample was indeed the extreme one that happens once every thousand experiments. • Alternatively, suppose we would have obtained p = 0.9812. We then would not have rejected the null hypothesis, and concluded that there is no evidence for any treatment effect. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 59 • This should not have been considered as formal proof that any treatment effect would be absent. • Maybe, the treatment effect µ is not 0, but very close to 0. The data one then would observe would look very similar to data that would be observed if µ = 0, such that the data do not allow to detect that µ 6= 0 • Conclusion: “Statistics can prove everything” • Intuitively: Absolute certainty about population characteristics cannot be attained based on a finite sample of observations Initiatie Wetenschappelijk Onderzoek: Biostatistiek 60 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 61 6.2 Two types of errors Reality Test result Accept H0 H0 correct H0 not correct No error Type II error Reject H0 Type I error No error • Type I error: H0 is incorrectly rejected • Type II error: H0 is incorrectly accepted Initiatie Wetenschappelijk Onderzoek: Biostatistiek 62 • The probability of making a type I error equals the level of significance α, specified by the user. • In biomedical sciences α = 5% is often used, hereby allowing to make a type I error in 5% of the cases. • For a fixed α level, the probability of making a type II error can only be controled by taking a sufficiently large sample. • This calls for sample size calculations or, equivalently, power calculations • The power of a test is the probability of correctly rejecting H0 , i.e., 1 minus the probability of making a type II error. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 63 6.3 Multiple testing • Each time a test is performed, there is probability α of making a type I error • For example, if α = 0.05, we can expect to incorrectly reject the null hypothesis in 5 out of 100 times. • Implication: “The more tests one performs, the higher the probability that something is detected by pure chance” • This problem of multiple testing occurs very frequently in bio-medical sciences, in various settings Initiatie Wetenschappelijk Onderzoek: Biostatistiek 64 6.3.1 Example: A classroom experiment • On entry in the classroom, assign each student at random to be seated at the left or at the right side of the classroom • Compare both sides with respect to 100 aspects including weight, height, age, gender, color of hair, color of eyes,. . . • It is to be expected that for roughly 5 of these outcomes, a significant difference is obtained at the 5% level of significance, by pure chance. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 65 6.3.2 Example: Testing many relations • Amin et al., Table 2: . 18 tests performed . 2 significant results Initiatie Wetenschappelijk Onderzoek: Biostatistiek 66 6.3.3 Example: Subgroup analyses • Kaplan et al., Table 5: . Tests based on C.I.’s for odds ratios . C.I. containing 1 is equivalent to a non-significant test result . 21 × 3 = 63 tests performed . 5 significant results Initiatie Wetenschappelijk Onderzoek: Biostatistiek 67 6.3.4 Example: Searching for the most significant results • This ‘scientific finding’ was printed in the Belgian newspapers: • It was even stated that those who wake up before 7.21am have a statistically significant higher stress level during the day than those who wake up after 7.21am. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 68 6.3.5 Conclusion • Significant results obtained by multiple testing are often overinterpreted • If the number of tests is reported, the reader knows that such results need to be interpreted with extreme care • The problem arises when only the significant results are reported, and one does not know how many tests were performed in total • This leads to reporting results which turn out to be not reproducible • For example, a new study would not find that students seated on the left are taller than those on the right. Instead, students seated on the left may weigh more than those seated on the right. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 69 • For example, a new experiment might show no difference in stress levels between subjects waking up early and those waking up late. Or maybe a difference would be found only when waking up is later than 8.12am. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 70 6.4 Equivalence tests • Suppose two groups A and B are to be compared, and a test is used to test H0 : µ A = µ B versus HA : µA 6= µB • In case of a non-significant test result, one often concludes that both groups are identical or equivalent • An alternative interpretation is that the experiment did not have sufficient power to show an effect which is present. • Conclusion: Non-significance should not be interpreted as equivalence Initiatie Wetenschappelijk Onderzoek: Biostatistiek 71 • This can also be seen from the fact that, if the test could be used to show equivalence, it would be best to collect data on (extremely) small samples, as this would increase the chance to obtain an non-significant result, due to lack of power. • Instead, one should reverse H0 and HA: H0 : |µA − µB | > ∆ versus HA : |µA − µB | ≤ ∆ where ∆ is a pre-specified constant, defining ‘equivalence’ • Obviously, the result of the equivalence test entirely depends on the choice of ∆ • Therefore, ∆ needs to be specified prior to the data collection Initiatie Wetenschappelijk Onderzoek: Biostatistiek 72 6.5 Example from the biomedical literature Shatari et al.: . Title: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 73 . Table 1: No significant differences ! Initiatie Wetenschappelijk Onderzoek: Biostatistiek 74 . Results and conclusions (abstract): Initiatie Wetenschappelijk Onderzoek: Biostatistiek 75 6.6 Significance versus relevance • We discussed before that the power to detect some effect ∆ increases with the sample size • This implies that any effect ∆, no matter how small, will, sooner or later, be detected, if the sample is sufficiently large. • For example, consider the Captopril data, where the observed difference of 9.27 mmHg was found significantly different from zero (p < 0.001), based on data from 15 patients only: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 76 • Suppose that the observed difference would have been 0.1 mmHg. • A p-value as small as 0.001 would be likely to be obtained, provided that the sample would be sufficiently large. • Obviously, an average change in BP as small as 0.1 mmHg is not relevant from a clinical point of view. • Conclusion: Statistical significance 6= Clinical relevance • The p-value cannot distinguish between both situations • It is therefore important not to blindly overinterpret significant results without knowing the size of the effect Initiatie Wetenschappelijk Onderzoek: Biostatistiek 77 Chapter 7 Data Structures and Types • Levels of complexity • Multivariate analysis • Longitudinal data • Clustered data Initiatie Wetenschappelijk Onderzoek: Biostatistiek 78 7.1 Levels of Complexity 7.1.1 One-Sample Problem • The simplest statistical analysis is concerned with a single outcome variable, recorded for a sample of an homogeneous population. Yi , i = 1, . . . , N • Standard procedures include: . the computation of means or medians (location parameters) . the computation of standard errors or interquartile ranges (dispersion parameters). . For example, the height of a number of human subjects might be recorded. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 79 7.1.2 Two-Sample Problem • A first level of complexity arises when a variable is recorded for a sample out of two subgroups (subpopulations) of a larger population (treated and untreated patients, two species, boys and girls): the two-sample problems: Ygi , g = 1, 2, i = 1, . . . , N • A question of interest is whether the means are different in the two populations. • The outcome variable might still be height, but we would have an explanatory variable: treatment allocation, or sex. For example, the height of boys can be compared to the height of girls. • The outcome variable is often called dependent variable. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 80 • The predictor is often called covariate or independent variable. • The statistical tools for this data setting include Analysis of Variance (ANOVA), t test, Wilcoxon test. • In the previous situation, the independent variable had only two levels: a binary or dichotomous variable. This is the simplest case. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 81 7.1.3 Regression • Alternatively, the predictor itself could be a variable with several levels. • Examples: dose administered in a clinical trial; one of several species of a plant; race,. . . ). In addition, it could have an infinite number of levels, just as is the case with height. • For example, a baseline height at 7 years of age can be compared to the height at 10 years. • This leads to a family of models that is frequently referred to as regression models. Yi = β0 + β1xi + εi, i = 1, . . . , N • When the dependent variable Yi is continuous (height) one often uses linear Initiatie Wetenschappelijk Onderzoek: Biostatistiek 82 regression. • The independent variable xi can be continuous, binary, categorical, or discrete. • The choice of the statistical analysis method is driven by the outcome or dependent variables, rather than by the predictor variables. • Should the dependent variable be binary (diseased/non diseased; dead/alive,. . . ), then one would choose logistic regression rather than linear regression. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 83 7.1.4 Several Predictors • Up to now, there was only one predictor variable. However, this need not be the case. • For instance, both treatment allocation and sex of the human subject might be of interest. • Most of the well-known methods extend easily. • One-way ANOVA extends to two-way or even multi-way ANOVA. • Simple linear regression or single linear regression extends to multiple regression. • Most other techniques, such as logistic regression are easily extended to encompass multiple covariates. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 84 • It has to be noted that, while simple in theory, methods for multiple covariates require great care. . Indeed, issues such as collinearity arise only for multiple covariate models. . Often, not all predictors are on equal footing. . Often, the relation between an EXPOSURE and a DISEASE is of interest, while another variable is merely a CONFOUNDER. Confounder . Exposure & −→ Disease • Thus, model building and interpretation of (regression) coefficients require both expertise as well as subject matter knowledge. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 85 7.1.5 Several Outcome Variables • The final extension is concerned with the fact that sometimes several dependent variables are recorded and studied simultaneously. • In statistics, this is commonly termed multivariate analysis, in contrast to multiple . . . . • The medical and epidemiological literature uses “multivariate” when the statistician would talk about “multiple”. Danger for confusion! • In conclusion: multiple refers to several independent variables; multivariate refers to several dependent variables. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 86 7.2 Multivariate Analysis • Multivariate analysis refers to a set of techniques which allow for the presence of more than one outcome variable. • For example, height and weight might be recorded simultaneously for a group of boys and girls. Arguably, sex will influence height as well as weight. At the same time, height and weight are likely to be correlated or associated. • Remarks: . association refers to the concept of dependence between two or more variables. . In contrast, correlation refers to a family of measures that can be computed to capture association (Pearson & Spearman correlation). . Especially for categorical data, a million measures of association have been proposed as alternatives to the correlation (including the odds ratio, concordance, Kendall’s τ , the κ coefficient,. . . ). Initiatie Wetenschappelijk Onderzoek: Biostatistiek 87 7.3 General Multivariate Setting • In general, one might have: . a set of dependent variables, some of which are continuous, discrete, categorical, binary,. . . . a set of independent variables, some of which are continuous, discrete, categorical, binary,. . . • The most general setting is very hard to study. During the last century, a multitude of sub-problems of the general problem have been studied. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 88 7.4 Other Correlated Data Settings • The multivariate setting is but one of the correlated data settings. • In all of the following situations, there are “many” outcome variables: . repeated measures . longitudinal data . spatial data . clustered data Are they related to multivariate analysis? Or special cases? • The answer is: Yes and No! Initiatie Wetenschappelijk Onderzoek: Biostatistiek 89 7.4.1 Example Situations 1 For each subject in a study, height is recorded. 2 For each family in a study, the height of all sibs is recorded. 3 For each subject in a study, height and weight are recorded. • Example 1 is clearly univariate and example 3 is clearly multivariate. • Example 2 is ambiguous: . There is only one outcome variable: height. . Each unit (family) yields several values. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 90 7.5 Longitudinal Data: The Vorozole Study • open-label study in 67 North American centers • postmenopausal women with metastatic breast cancer • 452 patients, followed until disease progression/death • two groups: vorozole 2.5 mg × 1 ←→ megestrol acetate 40 mg × 4 • several outcomes: response rate, survival, safety,. . . • focus: quality of life: total Function Living Index: Cancer (FLIC) a higher score is more desirable Initiatie Wetenschappelijk Onderzoek: Biostatistiek 91 7.6 The Depression Trial • Clinical trial: experimental drug versus standard drug • 170 patients • Response: change versus baseline in HAM D17 score 20 • 5 post-baseline visits: 4–8 -10 -20 -8 -6 Change 0 -10 Change -4 10 -2 Standard Drug Experimental Drug 4 5 6 =Visit Initiatie Wetenschappelijk Onderzoek: Biostatistiek 7 8 4 5 6 7 8 Visit 92 7.7 Age-related Macular Degeneration Trial • Pharmacological Therapy for Macular Degeneration Study Group (1997) • An occular pressure disease which makes patients progressively lose vision • 240 patients enrolled in a multi-center trial (190 completers) • Treatment: Interferon-α (6 million units) versus placebo • Visits: baseline and follow-up at 4, 12, 24, and 52 weeks • Continuous outcome: visual acuity: # letters correctly read on a vision chart • Binary outcome: visual acuity versus baseline ≥ 0 or ≤ 0 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 93 • Missingness: Measurement occasion 4 wks 12 wks 24 wks 52 wks Number % O 188 78.33 Completers O O O Dropouts O O O M 24 10.00 O O M M 8 3.33 O M M M 6 2.50 M M M M 6 2.50 Non-monotone missingness O O M O 4 1.67 O M M O 1 0.42 M O O O 2 0.83 M O M M 1 0.42 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 94 7.8 The Analgesic Trial • single-arm trial with 530 patients recruited (491 selected for analysis) • analgesic treatment for pain caused by chronic nonmalignant disease • treatment was to be administered for 12 months • we will focus on Global Satisfaction Assessment (GSA) • GSA scale goes from 1=very good to 5=very bad • GSA was rated by each subject 4 times during the trial, at months 3, 6, 9, and 12. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 95 • Research questions: . Evolution over time . Relation with baseline covariates: age, sex, duration of the pain, type of pain, disease progression, Pain Control Assessment (PCA), . . . . Investigation of dropout • Frequencies: GSA 1 2 3 4 5 Tot Month 3 55 14.3% 112 29.1% 151 39.2% 52 13.5% 15 3.9% 385 Initiatie Wetenschappelijk Onderzoek: Biostatistiek Month 6 38 12.6% 84 27.8% 115 38.1% 51 16.9% 14 4.6% 302 Month 9 40 17.6% 67 29.5% 76 33.5% 33 14.5% 11 4.9% 227 Month 12 30 13.5% 66 29.6% 97 43.5% 27 12.1% 3 1.4% 223 96 • Missingness: Month 3 O O O O O O O O M M M M Measurement occasion Month 6 Month 9 Month 12 Completers O O O Dropouts O O M O M M M M M Non-monotone missingness O M O M O O M O M M M O O O O O O M O M O O M M Initiatie Wetenschappelijk Onderzoek: Biostatistiek Number % 163 41.2 51 51 63 12.91 12.91 15.95 30 7 2 18 2 1 1 3 7.59 1.77 0.51 4.56 0.51 0.25 0.25 0.76 97 7.9 Schematic Representation • Data structure: X \Y Continuous Binary Count Time-to-event Binary ANOVA χ2, Fisher χ2,. . . Kaplan-Meier Continuous lin. regr. logistic regr. Poisson regr. Cox PH,. . . • Goal: . Estimation . Inference (s.e., confidence interval, hypothesis test) • Paradigm: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 98 . Comparison: ∗ Experimental ∗ Observational . Representation Initiatie Wetenschappelijk Onderzoek: Biostatistiek 99 Part II Contingency Tables Initiatie Wetenschappelijk Onderzoek: Biostatistiek 100 Chapter 8 Contingency Tables . Contingency tables . 2 × 2 tables . χ2 test . Fisher’s exact test . Extensions . McNemar’s test Initiatie Wetenschappelijk Onderzoek: Biostatistiek 101 8.1 Contingency Tables 8.1.1 Preliminary Example 1 • For each experimental animal: two variables: . The experimental animals belongs to control group or to treated group. . The result of a laboratory test is positive (failur) or negative (success). • Summarize the data in a 2 times 2 contingency table: Respons Failure Success Group Control 5 5 Experimental 5 5 • No difference between the groups. • No statistical analysis necessary. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 102 8.1.2 Preliminay Example 2 • Consider the table: Response Failure Success Group Control 50 0 Experimental 0 500 • Difference is immediately clear. • No statistical analysis necessary. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 103 8.1.3 Preliminary Example 3 • Consider the table: Response Failure Success Group Control 8 2 Experimental 5 5 • The decision between “significant difference” and “no significant difference” not immediately clear. • There clearly is a difference between the response rates in both groups: . 20% success in the control group. . 50% in the control group. • Is this due to: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 104 . random noise? . a systematic difference, related to treatment? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 105 8.2 Example 1 Comparative morphologic study on the effects of calcium entry blockers against ischemic-hypoxic brain damage. Janssen Pharmaceutica Preclinical Research Report on R14950 No. 33, January 1985 • Through a combination of artery ligature and hypoxia, brain damage has been caused. • Research goal: assess to which extent a certain medicinal class protect the experimental animals from damage. . 16 control animals . 8 animals treated with flunarizine. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 106 . Brain damage: ∗ In 15 out of 16 controls. ∗ For 2 out of 8 experimental animals. Damage Yes No Treated Control 15 1 16 Flunarizine 2 6 8 17 7 24 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 107 8.3 Statistical Question “Is there a difference between both groups?” • Null hypothesis: H0: The damage probability is equal between both groups. H0 : There is no association between “Treatment” and “Damage.” • Two aspects: . the testing problem we just described . the associated estimation problem: ∗ the difference between de damage probabilities in both groups ∗ a measure of association Initiatie Wetenschappelijk Onderzoek: Biostatistiek 108 8.4 χ2 Test for Contingency Tales • Advantage: . straightforward computations . easy to perfrom a continuity correction • Disadvantage: . The normal approximation to the binary quantities’ distribution must hold. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 109 8.5 Example 2 Group I II Response Success 16 18 34 Failure 4 12 16 20 30 50 • χ2 test: X2 = X 2 X 2 (Oij − Eij )2 (O − E)2 X = ∼ χ21, i=1 j=1 E Eij Initiatie Wetenschappelijk Onderzoek: Biostatistiek 110 • Notation: . O: the observed cell counts . E: the expected cell counts • For example 2: Group I II Success O11 = 16 O12 = 18 O1+ = 34 Failure O21 = 4 O22 = 12 O2+ = 16 O+1 = 20 O+2 = 30 N = O++ = 50 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 111 • Computation of expected counts: . H0 : “Product” and “Response” are independent. . H0 : the success probability is the same among both groups. • Under the null hypothesis, there is only 1 common probability: p= 16 + 18 34 = = 0.68. 20 + 30 50 • Given this probability, how many successes do we expect? Group I : 20 × p = 20 × 0.68 = 13.6 Group II : 30 × p = 30 × 0.68 = 20.4 • The number of failures: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 112 Group I : 20 × (1 − p) = 20 × 0.32 = 6.4 Group II : 30 × (1 − p) = 30 × 0.32 = 9.6 • The number of successes and failures sums to the number of individuals within every group: 13.6 + 6.4 = 20 20.4 + 9.6 = 30 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 113 • Next to the observed (O) table, we also have an E table of expected counts: Group I II Success O11 = 16 O12 = 18 O1+ = 34 Failure O21 = 4 O22 = 12 O2+ = 16 O+1 = 20 O+2 = 30 N = O++ = 50 Group I II Success E11 = 13.6 E12 = 20.4 E1+ = O1+ = 34 Failure E21 = 6.4 E22 = 9.6 E2+ = O2+ = 16 E1+ = O+1 = 20 E+2 = O+2 = 30 N = E++ = O++ = 50 • Observed (O) and expected (E) marginal counts are equal, for both “Group” and “Response”. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 114 • There is a simpler method to calculate the expected values: Oi+O+j Eij = . N • Applied to our example: Group I II 34×30 Response Success 13.6 = 34×20 20.4 = 34 50 50 16×30 Failure 6.4 = 16×20 9.6 = 16 50 50 20 30 50 • We do not need to carry out all four computations in the above table: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 115 First step: Group I II Response Success 13.6 = 34×20 34 50 Failure 16 20 30 50 Group Second step: Initiatie Wetenschappelijk Onderzoek: Biostatistiek I II Response Success 13.6 34 − 13.6 34 Failure 20 − 13.6 16 20 30 50 116 Third step: Group I II Response Success 13.6 20.4 34 30 − 20.4 Failure 6.4 16 = 16 − 6.4 20 30 50 • Calculation of the χ2 test statistic: 2 X = X (O − E)2 E (16 − 13.6)2 (18 − 20.4)2 (4 − 6.4)2 (12 − 9.6)2 = + + + = 2.206 13.6 20.4 6.4 9.6 • All numerators are equal to (2.4)2 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 117 • We can simplify the calculations: 1 1 1 2 2 1 X = (2.4) + + + 13.6 20.4 6.4 9.6 1 1 1 2 1 + + + = ∆ E11 E12 E21 E22 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 118 8.5.1 Degrees-of-freedom • Given the marginals: Group I II Response Success Failure 34 16 20 30 50 is the E table fully determined. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 119 • Given Group I II Response Success 16 34 Failure 16 20 30 50 is the O table fully known. • Hence, the difference is 1 degree-of-freedom. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 120 Conclusion for the example • The critical χ21 point is 3.84 for α = 0.05. • This means that H0 is not rejected. • The success probabilities in both groups: 16 = 0.8 pI = 20 18 pII = = 0.6 30 are not significantly different. • We do not reject the null hypothesis. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 121 8.6 Continuity Correction and Example 3 Pre-clinical study: Expected values: Group Active Placebo Response Carcinoma 9 6 No carcinoma 51 59 60 65 Group Active Placebo 15×60 15×65 Carcinoma = 7.2 = 7.8 125 125 110×65 No carcinoma 110×60 = 52.8 = 57.2 125 125 60 65 15 110 125 15 110 125 • The test statistics takes value Initiatie Wetenschappelijk Onderzoek: Biostatistiek 122 1 1 1 1 X 2 = ∆2 + + + E11 E12 E21 E22 1 1 1 1 = 0.98 + + + = (1.8)2 7.2 7.8 52.8 57.2 • One possible continuity correction replaces the decimal portion d (= 0.8) from ∆ (= 1.8) to t0 (= 1.5): 0.0 < t ≤ 0.5 → t0 = 0.0 0.5 < t ≤ 1.0 → t0 = 0.5 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 123 • In the example: 1 1 1 1 2 = 0.68 Xcorr = (1.5)2 + + + 7.2 7.8 52.8 57.2 2 becomes smaller when the sample size • The difference between X 2 and Xcorr increases. • Other continuity corrections are used as well. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 124 8.7 Validity of the Approximation • The larger the sample size, the better the normal approximation to the binomial distribution, and the better the approximation to the χ21 distribution. • A rule of thumb: The expected cell counts must at least be 5. • Because the rule is conservative, small deviations are acceptable, with caution. • Nevertheless, results have to be interpreted cautiously as well. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 125 8.8 Confidence Interval for the Difference • Consider the table in general terms: Group I Group II NI NII • A confidence interval for the difference between proportions pI and pII : CI = (ˆ pI − pˆII ) ± Z Initiatie Wetenschappelijk Onderzoek: Biostatistiek v u u u u u t pˆI qˆI pˆII qˆII + NI NII 126 where . pˆI is the observed proportion in group I, . qˆI = 1 − pˆI , . NI the total number of individuals in group I, . Z = 1.96 when the normal approximation is used at the α = 0.05 nominal level. • For the example: v u u u u t CI = (0.80 − 0.60) ± 1.96 (0.80)(0.20) (0.60)(0.40) + 20 30 = (−0.048; 0.448) Initiatie Wetenschappelijk Onderzoek: Biostatistiek 127 8.9 Overview • Consider example 1: Treatment Control Flunarizine Damage Yes 15 2 17 No 1 6 7 16 8 24 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 128 • The expected counts: Treatment Control Flunarizine 17×8 Damage Yes 17×16 = 11.3 = 5.67 17 24 24 7×8 No 7×16 = 4.67 = 2.33 7 24 24 16 8 24 • The test statistic: 1 1 1 1 = 12.23 X 2 = (15 − 11.33)2 + + + 12.33 5.67 4.67 2.33 • We find a statistically significant difference. • At the same time, for the corrected test statistic: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 129 1 1 1 1 2 = 11.12 Xcorr + + + = (3.5)2 12.33 5.67 4.67 2.33 • The proportions: . Control: ∗ Damage: qc = 15 16 = 0.9375, ∗ No damage: pc = 1 − q = 0.0625. . Flunarizine: ∗ Damage: qf = 28 = 0.25, ∗ No damage: pf = 1 − q = 0.75. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 130 • Confidence interval: CI = (0.75 − 0.0625) v u u u (0.9375)(0.0625) (0.25)(0.75) t + ±1.96u 16 8 = 0.6875 ± 1.96 × 0.1646 = (0.36; 1.01) • Problem: The expected counts are small: 4.67 and 2.33 are smaller than 5 and 5.67 is barely larger! Initiatie Wetenschappelijk Onderzoek: Biostatistiek 131 8.10 Extension Requested! • Therefore, we want to extend the techniques available: . 2 × 2 tables with smal expected counts, . R × C tables. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 132 8.11 Fisher’s Exact Test • When we have to analyze 2 × 2 tables with small observed counts that lead to small expected counts, smaller than 5, then the χ2 approximation needs to be called into question. • The method of analysis is then best changed. • There is a method specifically developed for this context. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 133 8.11.1 Example 4 • Consider a pre-clinical toxicologic study: Carcinoma Yes No Treated Placebo 1 11 12 Active 4 10 14 5 21 26 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 134 • Principle: The margins are considered fixed. • Note that the margins for the χ2 test do not change neither, when changing the O table into the E table. • Consider the following general table: A C A+C B D B+D A+B C +D N where N can be considered as N = (A + B) + (C + D) = (A + C) + (B + D). • The probability of the observed configuration needs to be determined, given the null hypothesis of independence between rows and columns. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 135 • Hence, for the example: independence between randomization and carcinoma. • Fixed margins means: . We know the values A + B, C + D, A + C, B + D in advance, or . we condition on the values A + B, C + D, A + C, B + D. • The probability to observe configuration (A, B, C, D), given the boundary condtions the nis p(A, B, C, D) = Initiatie Wetenschappelijk Onderzoek: Biostatistiek (A + B)!(C + D)!(A + C)!(B + D)! . N !A!B!C!D! 136 • In the case of example 3 we find 5!21!12!14! p(1, 4, 11, 10) = = 0.1826 26!1!4!11!10! • To carry out the test, we consider all configurations that are at least as unlikely as the observed one. • When the sum of all corresponding probabilities does not exceed, for example, 0.05, then the null hypothesis is not rejected. • For example 4, the configurations are: 0 12 5 9 p1 = 0.0304 4 8 1 13 p2 = 0.1054 5 7 0 14 p3 = 0.0120 • The sum of these probabilities is Initiatie Wetenschappelijk Onderzoek: Biostatistiek 137 p = p1 + p2 + p3 = 0.1826 + 0.0304 + 0.1054 + 0.0120 = 0.3304 > 0.05 • Of course, we could have economized on some of the calculations, because p ≥ 0.1826 > 0.05. • Consider the observed table: 0 12 5 9 • The it follows that: p = 0.0304 + 0.0120 = 0.0424 < 0.05 • Hence, the null hypothesis of independence between treatment and carcinoma is rejected. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 138 • Then, we could have concluded that active therapy appears to lead to more carcinoma. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 139 8.12 Analysis of Example 1 • For example 1, the null hypothesis has been tested with the χ2 method. • However, there were problems with expected counts. • It is safer to use Fisher’s exact test. • The probability of the observed table and the one more extreme table: 15 2 17 1 6 7 16 8 24 p1 = 0.0013 16 1 17 0 7 7 16 8 24 p2 = 0.00002 • ⇒ p = p1 + p2 = 0.0013 and we, again, reject the null hypothesis. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 140 • Overview of p-values: Method p-values χ2 (without continuity correction) 0.0005 χ2 (with continuity correction) 0.0009 Fisher’s exact test 0.0013 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 141 8.13 Example 3: pre-clinical test Group Active Placebo Response Carcinoma 9 6 15 No carcinoma 51 59 110 60 65 125 Method Statistic p-value χ2 (without continuity correction) 0.98 0.3222 χ2 (with continuity correction) 0.68 0.4096 Fisher’s exact test Initiatie Wetenschappelijk Onderzoek: Biostatistiek 0.4119 142 8.14 χ2 test versus Fisher’s exact test Sample χ2 test Fisher’s exact test small simple simple unreliable reliable simple computationally complex reliable reliable large Initiatie Wetenschappelijk Onderzoek: Biostatistiek 143 8.15 Estimator for the Association • Requested: a measure for the association between “Group” and “Response.” • A whole series exists. • For example: the odds ratio: ψ= 9 × 59 = 1.735 6 × 51 • This measure is an approximation for the relative risk. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 144 8.16 R × C Contingency Tables • More than two treatment groups. • Two treatments compared on an ordinal scale. • Example: Treatment A B Initiatie Wetenschappelijk Onderzoek: Biostatistiek No success 25 27 Response Some success 10 23 Success 40 25 145 • In general: Treatment 1 2 .. R 1 n11 n21 .. nR1 Response 2 ... n12 ... n22 ... .. ... nR2 . . . C n1C n2C .. nRC • Null hypothesis: H0: row and column classifications are independent H0: response is equal across the various treatment groups • χ2 test statistic: 2 X = Initiatie Wetenschappelijk Onderzoek: Biostatistiek X (O − E)2 ∼ χ2(R−1)(C−1) E 146 • In case R = C = 2 then (R − 1)(C − 1) = 1. • The validity of this test is coupled to the expected counts being at least 5. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 147 8.17 Example 5 • Data: Treatment A B Very 13 19 32 Severity Moderately 24 20 44 Mildly 18 12 30 55 51 106 • To answer the question as whether there is a relationship between “Treatment” and “Severity,” we calculate the E table of expected values, in a few steps: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 148 Severity First step: A Very Moderately 55×32 106 55×44 106 Mildly 55 B 51 32 44 30 106 Severity Second step: Very Moderately Mildly A 16.60 22.83 55 B 32 − 16.60 44 − 22.83 55 − 16.60 − 22.83 30 106 32 44 51 Severity Third step: Very Moderately Mildly A 16.60 22.83 15.57 55 B 15.40 21.17 51 32 44 30 − 15.57 = 14.43 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 30 106 149 • In this case is (R − 1)(C − 1) = (2 − 1)(3 − 1) = 2 • The test statistic (13 − 16.60)2 (24 − 22.83)2 (18 − 15.57)2 2 + + X = 16.60 22.83 15.57 (19 − 15.40)2 (20 − 21.17)2 (12 − 14.43)2 + + + = 2.54 15.40 21.17 14.43 • The critical point is χ22(0.05) = 5.99. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 150 8.18 Example 6 • Three treatments are randomized over 60 patients, among whom N = 54 successfully complete the study Treatment Initiatie Wetenschappelijk Onderzoek: Biostatistiek A B C Response Success Failure 9 6 8 11 17 3 34 20 15 19 20 54 151 • The corresponding E table is: Treatment A B C Response Success Failure 9.44 5.56 11.96 7.04 12.59 7.41 34 20 15 19 20 54 • The test statistic is X 2 = 7.79, pointing to a significant difference. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 152 8.19 Several 2 × 2 Tables • A study where 2 treatments are being compared w.r.t. success. • The data are assembled in several centers. • Hence, there is stratification per center. • Even within one center, some stratifying variables may have an impact on response (e.g., investigator). Initiatie Wetenschappelijk Onderzoek: Biostatistiek 153 8.20 Example 7 • Data: Treatment Initiatie Wetenschappelijk Onderzoek: Biostatistiek Active Placebo Improvement No Yes 13 28 29 14 42 42 41 43 84 154 • Stratified for sex: Treatment Female Improvement No Yes Active 6 21 Placebo 19 13 25 34 27 32 59 Male Treatment Initiatie Wetenschappelijk Onderzoek: Biostatistiek Active Placebo Improvement No Yes 7 7 10 1 17 8 14 11 25 155 • With notation: Improvement Sex Treatment Female Test drug n111 = 6 n112 = 21 n11+ = 27 Female Placebo n121 = 19 n122 = 13 n12+ = 32 n1+1 = 25 n1+2 = 34 n1 = 59 Female total None Some/marked Total Male Test drug n211 = 7 n212 = 7 n21+ = 14 Male Placebo n221 = 10 n222 = 1 n22+ = 11 n2+1 = 17 n2+2 = 8 n2 = 25 Male total Initiatie Wetenschappelijk Onderzoek: Biostatistiek 156 • For treatments that would be equally effective, the expected values for cells with coding: . (1,1,1) = (female, active, no), . (2,1,1) = (male, active, no): are equal to n11+n1+1 n1 n21+n2+1 E(n211 ) = m211 = n2 E(n111 ) = m111 = Initiatie Wetenschappelijk Onderzoek: Biostatistiek 157 with corresponding variances Var(n111 ) = v111 n11+n12+ n1+1n1+2 = n21 (n1 − 1) 27 × 32 × 25 × 34 = 592 × 58 = 3.63758 Initiatie Wetenschappelijk Onderzoek: Biostatistiek Var(n211 ) = v211 n21+n22+ n2+1n2+2 = n22 (n2 − 1) 14 × 11 × 17 × 8 = 252 × 24 = 1.39627 158 8.21 Mantel-Haenszel Statistic • The null hypothesis H0: no association between “Treatment” and “Improvement”, taking the stratification for sex into account • The (Cochran-)Mantel-Haenszel statistic: QM H = with proportions: " P2 ns1+ ns2+ (ps11 − s=1 ns P2 s=1 vs11 # ps21) 2 ns11 ns1+ ns21 = ns2+ ps11 = ps21 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 159 8.22 Analysis of Example 7 • Test statistic: QM H = 27×32 6 59 27 19 32 14×11 7 25 14 − + − 3.63748 + 1.39627 10 11 = 12.59 • The proportions are calculated as follows: 6 27 19 = 32 7 14 10 = 11 p111 = p211 = p121 p221 • This test statistic follows a QM H ∼ χ21 distribution. • p = 0.0004 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 160 • A significant difference in the improvement probability in active versus placebo groups. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 161 8.23 Omitting the Stratifying Variable • The classic χ2 test. • The expected values are equal to Treatment Active Placebo Improvement No Yes 20.5 20.5 21.5 21.5 42 42 41 43 84 and hence 1 1 1 2 2 1 = 10.72 X = (7.5) + + + 20.5 20.5 21.5 21.5 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 162 • We find a slightly different value. • Depending on possible differences between the distributions over the sexes, the differnece between Mantel-Haenszel and the classical χ2 test can increase. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 163 8.24 Matched Pairs: McNemar’s Test 8.24.1 Example 8 • 50 people are subject to two allergy tests, A and B. • Hence, we have a pair of responses for each individual: pair: (response to test A, response to test B) B A Initiatie Wetenschappelijk Onderzoek: Biostatistiek + − + 23 6 29 − 9 12 21 32 18 50 164 • Everybody is assigned to both treatments, and hence acts as his/her own control. • This table only appears similar to all previous tables. • The treatments are not separated, but rather every cell tells us something about both treatment A as well as over treatment B. • For example, there are 23 people with favorable response to both A and B, 9 react positively to A but not to B, etc. • The first null hypothesis considered: H0: both tests have the same probability of success H0: pA = pB • We can estimate both proportions as follows: 32 pˆA = = 0.64 50 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 165 pˆB = 29 = 0.58 50 • The probabilities are calculated entirely differently than with simple contingency tables. • We have two types of pairs: . Concordant pairs: (+, +) and (−, −): 23 and 12 . Discordant pairs: (+, −) and (−, +): 9 and 6 • The discordant pairs are used to calculate the test statistic. • When A and B would have the same probability of success, then the discortant cells would be about equally strong. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 166 • We test equality of the diagonal (discordant) cells, using the following quantity:= Z= 1 |observed proportion − 0.5| − 2N s 1 4N • Critical level Z = 1.96. • In the example Z= 9 9+6 1 − 0.5 − 30 s 1 60 = 0.7303 < 1.96 • This test is known as McNemar’s test. • The quadratic version, without continuity correction: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 167 9 9+6 − 1 4×15 Initiatie Wetenschappelijk Onderzoek: Biostatistiek ! 1 2 2 = 0.600 168 8.25 Independent Allergy Tests • Next to the question of equal proportions, we can also consider the question of independence between both tests. • To this end, the conventional χ2 test can be used: 2 (O − E) 1 1 1 X 2 2 1 = 7.02 X = = (4.44) + + + E 18.56 13.44 10.44 7.56 • p = 0.0081 • The corrected value is 5.70 (p = 0.0170). • The p-value for Fisher’s exact test is p = 0.0158. • Independence is rejected. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 169 • Conclusion: . the tests are independent, . and lead to the same probability of success. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 170 Part III t Test Initiatie Wetenschappelijk Onderzoek: Biostatistiek 171 Chapter 9 Comparing Groups with Continuous Outcomes: the t-test . Captopril data . Unpaired t-test . Paired t-test Initiatie Wetenschappelijk Onderzoek: Biostatistiek 172 9.1 Example: Captopril data • 15 patients with hypertension • The response of interest is the supine blood pressure, before and after treatment with CAPTOPRIL • Research question: How does treatment affect BP ? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 173 • Dataset ‘Captopril’ Pati¨ent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Before SBP DBP 210 169 187 160 167 176 185 206 173 146 174 201 198 148 154 130 122 124 104 112 101 121 124 115 102 98 119 106 107 100 SBP After DBP 201 165 166 157 147 145 168 180 147 136 151 168 179 129 131 125 121 121 106 101 85 98 105 103 98 90 98 110 103 82 Initiatie Wetenschappelijk Onderzoek: Biostatistiek Average (mm Hg) Diastolic before: 112.3 Diastolic after: 103.1 Systolic before: 176.9 Systolic after: 158.0 174 • It would be of interest to know how likely the observed changes in BP are to occur by pure chance. • If this is very unlikely, the above data provide evidence that BP indeed decreases after treatment with Captopril. Otherwise, the above data do not provide evidence for efficacy of Captopril. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 175 9.2 Difference in DBP Pati¨ent DBP(before) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 130 122 124 104 112 101 121 124 115 102 98 119 106 107 100 mean y signal: difference 112.33 variance s2 standard deviation s common standard deviation noise: standard error 109.67 10.47 t test statistics p-value Initiatie Wetenschappelijk Onderzoek: Biostatistiek − − − − − − − − − − − − − − − − − -9.27 DBP(after) = ∆(DBP) 125 121 121 106 101 85 98 105 103 98 90 98 110 103 82 = = = = = = = = = = = = = = = -5 -1 -3 2 -11 -16 -23 -19 -12 -4 -8 -21 4 -4 -18 103.07 = -9.27 -9.27 157.64 12.56 11.56 2×11.56 √ = 4.22 30 −9.27 4.22 = -2.20 0.0366 74.21 8.61 8.61 8.61 √ = 2.22 15 −9.27 2.22 = -4.17 0.0010 176 9.3 Two-sample t-test . Two independent groups Pati¨ent DBP (group A) DBP (group B) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 130 122 124 104 112 101 121 124 115 102 98 119 106 107 100 125 121 121 106 101 85 98 105 103 98 90 98 110 103 82 signal: difference noise: standard error t test statistics p-value -9.27 4.22 -2.20 0.0366 Initiatie Wetenschappelijk Onderzoek: Biostatistiek . Between-group heterogeneity −→ less precise . Null hypothesis: H0 : µA = µB . Structure: ∆ XA − XB √ t= √ = ∼ tn−1 s/ n s/ n . Extensions: ∗ A and B have different variances ∗ Sample sizes A and B different ∗ Dependent measures → paired t-test 177 9.4 Paired t-test Pati¨ent DBP(before) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 130 122 124 104 112 101 121 124 115 102 98 119 106 107 100 signal: difference noise: standard error t test statistics p-value Initiatie Wetenschappelijk Onderzoek: Biostatistiek − − − − − − − − − − − − − − − − DBP(after) = ∆(DBP) 125 121 121 106 101 85 98 105 103 98 90 98 110 103 82 = = = = = = = = = = = = = = = -5 -1 -3 2 -11 -16 -23 -19 -12 -4 -8 -21 4 -4 -18 -9.27 2.22 -4.17 0.0010 178 • More restricted applicability: paired measurements: . The same subject measured at two different time points . Paired organs for the same subject . Twins . Case-control data . Split blood sample • More power: when pairs are positively correlated • Structure: t= ∆ XA − XB √ √ = ∼ tN −1 s∆ / N s∆ / N • Extensions: . More than two measures Initiatie Wetenschappelijk Onderzoek: Biostatistiek −→ (N = # pairs) repeated measures analysis 179 9.5 t-tests: Concluding Remarks • Assumption: . Two-sample t-test: Data in each group should be roughly normally distributed . Paired t-test: Differences should be roughly normally distributed • What if this is not the case? . Transform to (near) normality . Apply a non-parametric (rank-based) test: less efficient, but more robust against non-normality • Apart from hypothesis testing, also confidence intervals can be constructed Initiatie Wetenschappelijk Onderzoek: Biostatistiek 180 Part IV Linear Regression Initiatie Wetenschappelijk Onderzoek: Biostatistiek 181 Chapter 10 Introduction Illustrative Example • These data are central to this part. • Origin: Prof.Dr. Koen Milisen, CZV, K.U.Leuven. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 182 10.1 Problem Setting • Research into post-operative variability in the neuro-cognitive and functional status with elderly patients with hip fractures. • A surgical intervention in elderly patients often results in acute cognitive disfunctioning (= delirium). • Delirium versus dementia: . Delirium: → acute start → usually temporary . Dementia: → no acute start → slowly progressing → irreversible Initiatie Wetenschappelijk Onderzoek: Biostatistiek 183 • Delirium . . . . leads to medical problems and problems of care . often is the first symptom of a physical disorder or intoxication stemming from medicines . can lead to increased mortality . is hard to detect • Economical implications of delirium: . Extra care . Longer hospital stay . High degree of institutionalization • Research suggest that, among elderly hip fracture patients, the increased degree of dependence is a consequence of delirium, rather than the hip fracture itself. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 184 10.2 Sample • Longitudinal design: Certain variables are measured repeatedly over time. • Prospective (e.g., complications) and retrospective (e.g., living conditions) measurements. • Data of 2 traumatological departments of U.Z. Gasthuisberg, K.U.Leuven. • Inclusion criteria: . ≥ 65 years of age . hospitalized with hip fracture in the emergency room . consent for participation into the study . ... Initiatie Wetenschappelijk Onderzoek: Biostatistiek 185 • Exclusion criteria: . time between admission and operation ≥ 72 hours . various traumas . ... • Data collected 16/09/1996–28/02/1997. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 186 10.3 Data Collected • Data on 60 patients • 78 variables • Data for every patient, prior to, during, and post operation • Longitudinal and derived measurements • Study questionnaiare, ADL score, MMSE, and CAM scores Initiatie Wetenschappelijk Onderzoek: Biostatistiek 187 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 188 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 189 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 190 10.3.1 Pre-operative Evaluation Variable nummer leeftd gesl Description patient number age sex opnduur burgst length of stay civil status opleid education zijfrc side fracture typfrc type fracture cardio cardiologic pathology vascul vascular pathology Initiatie Wetenschappelijk Onderzoek: Biostatistiek Values 1–60 (years) 1=male 2=female (days) 1=single 2=married 3=widow(er) 4=divorced 5=religious 1=university/college 2=high school 3=lower secundary 4=primary 1=left 2=right 1=intra-capsular 2=extra-capsular 0=no 1=yes 0=not 1=yes 191 Variabele pulmon Description pulmonary pathology urinai urinary pathology abdom abdominal pathology hyper hypertension zicht vision pathology gehoor auditive pathology malign malignant disease diabet diabetes reumat reumatological pathology vrop past surgery neuro neuro-psychiatric pathology andere other pathology Initiatie Wetenschappelijk Onderzoek: Biostatistiek Values 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 0=no 1=yes 192 Chapter 11 Simple (Single) Linear Regression . Introduction . The method of least squares . Illustration and interpretation . Statistical inference . Illustration and interpretation Initiatie Wetenschappelijk Onderzoek: Biostatistiek 193 11.1 Introduction • The correlation coefficient r measures the linear relationship between two measurements, x and y. How can we describe this linear relationship? • One possible way would be to construct the straight line that ‘fits best’ the observed measurements: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 194 • A straight line is described analytically by an equation of the form y = β0 + β1 x • The parameter β0 is the intercept of the straight line. It is the value of y obtained for x = 0 • The parameter β1 is the slope. • If β1 > 0 : . There is a positive relationship between x and y . The larger β1, the faster y increases with x Initiatie Wetenschappelijk Onderzoek: Biostatistiek 195 • If β1 < 0 : . There is a negative relationship between x and y . The smaller β1, the faster y decreases with x • The practical assignment is to estimate the parameters β0 and β1 based on the collected data (xi , yi ). Initiatie Wetenschappelijk Onderzoek: Biostatistiek 196 11.2 The Least Squares Method • To estimate β0 and β1, we first need to decide which criterion should be satisfied by ‘the best’ straight line y • .... ........ ........ ........ . . . . . . ....... ....... ........ ....... ....... . . . . . . . ....... ....... ........ ....... ........ . . . . . . . ....... ....... ........ ....... ....... . . . . . . . . ....... ....... ....... ........ ....... . . . . . . . ........ ........ ....... ........ ....... . . . . . . ....... ....... ........ ........ ....... . . . . . . ....... ........ ........ ........ ....... . . . . . . ....... ....... ........ ....... ........ . . . . . . ....... ........ ....... y • i • β0 • • ydi • y = β0 + β1 x • • • • • 0 Initiatie Wetenschappelijk Onderzoek: Biostatistiek x 197 • If we would know β0 and β1, then for each observation in the set of data, based on the x value, a predicted value can be calculated for y: yci = β0 + β1xi • The prediction will be good if yci lies closely to yi and will be poor if yci deviates strongly from yi • If the straight line describes the data (xi, yi) adequately, then we expect, for most points, yci to lie closely to the true value yi. • A possible measure to capture how well the straight line has been chosen is Initiatie Wetenschappelijk Onderzoek: Biostatistiek Q = X [yi − yci]2 = X [yi − (β0 + β1xi)]2 i i 198 • Hence, Q is a measure for how closely the data lie to the straight line y = β0 + β1x. • Note that other straight lines (i.e., other β0 and β1), will lead to different Q values. • The straight line that describes the data best is the one for which Q is smallest. • The least squares method calculates the values of β0 and β1 for which Q is minimal. • It can be shown that these values are given by: c β1 = X i (xi − x)(yi − y) Initiatie Wetenschappelijk Onderzoek: Biostatistiek X i (xi − x)2 , βc0 = y − βc1x 199 • βc0 and βc1 are termed the least squares estimators for β0 and β1. • The straight line so obtained, y = βc0 + βc1x is termed the regression line. • Once the estimators for β0 and β1 known, we can make a prediction, for each observation in the data set, for y based on x: yci = βc0 + βc1xi • We are also able, for each data point (xi, yi) in the set of data, to compute the residual if we try to predict yi by yci: ei = yi − yci = yi − (βc0 + βc1xi) Initiatie Wetenschappelijk Onderzoek: Biostatistiek 200 • The quantities ei are termed residuals: . ei > 0 : the observed yi lies above the regression line . ei = 0 : the observed yi lies on the regression line . ei < 0 : the observed yi lies underneath the regression line • Further, one can show that X i ei = 0 i.e., the points above the regression line are ‘in equilibrium’ with these underneath the regression line. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 201 11.3 Illustration + Interpretation • The output for the regression coefficients (Statistica): • The Y variable is termed response, or also dependent variable. • The X variable is termed covariate, or also independent variable. • The parameter estimates are βc0 = 23.65 and βc1 = −0.30. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 202 • The corresponding regression line is ADL = 23.65 − 0.30 × M M SE • The regression line predicts an ADL score of 23.65 if MMSE is equal to zero. • Further, there is a negative linear relationship between MMSE and ADL: The higher MMSE the lower ADL, and vice versa. • The regression line predicts a decrease of 0.30 in ADL, for a unit increase of MMSE. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 203 • This ought to be interpreted as follows: . Consider two groups of patients . All patients in the first group have identical MMSE (e.g., 20). . All patients in the second group have identical MMSE values, too, but 1 unit higher than these in the first group (hence, 21). . Then, we expect the difference in average ADL score between both groups to be 0.30, with the lower score for the group with highest MMSE. • Hence, we should not conclude that an increase of MMSE with 1 unit in a given patient will lead to a decrease of 0.30 in ADL. In other words, we cannot draw ‘longitudinal’ conclusions from a ‘cross-sectional’ experiment. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 204 • Graphical representation: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 205 11.4 Statistical Inference 11.4.1 Introduction • The regression output, obtained in Statistica, was: • The p-values listed test the hypotheses H0 : β0 = 0 versus HA : β0 6= 0 and H0 : β1 = 0 versus HA : β1 6= 0 • Indeed, the least squares method allows us to calculate the straight line that Initiatie Wetenschappelijk Onderzoek: Biostatistiek 206 describes best our observations (xi, yi). • However, a different sample from the same population would lead to a different regression line y = βc0 + βc1x • Illustration: Vestac Java Applet → regression → regression plots Initiatie Wetenschappelijk Onderzoek: Biostatistiek 207 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 208 • Based on a sample and hence the corresponding estimators βc0 and βc1, statistical inference (p-values, confidence intervals) aims to make a statement about the regression line y = β0 + β1 x that captures the relationship in the entire population. • This is not possible without additional assumptions about the distribution from which the data are sampled. • The assumptions needed are described by the so-called regression model. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 209 11.4.2 The simple linear regression model • In real situations, the points (xi, yi) will never describe a perfect straight line, but rather a cloud of points. • This implies that the observations do not satisfy yi = β0 + β1xi but rather yi = β0 + β1xi + εi where εi expresses how much an observation yi lies above or below the regression line. • The quantities εi are termed errors, and the linear regression model assumes that they are distributed following a normal distribution with mean 0 and (unknown) variance σ 2: εi ∼ N (0, σ 2 ) Initiatie Wetenschappelijk Onderzoek: Biostatistiek 210 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 211 • Note that the εi are the ‘theoretical version’ of the residuals. ei • Hence, the regression model assumes . . . . . . . linearity: for each X, the mean of the corresponding Y -values lies on the regression line . . . . normality: implies that, for each X, the corresponding Y -values lie symmetrically around the regression line . . . . constant variance: the prediction errors for small X-values are neither larger nor smaller than the errors for large X-values Initiatie Wetenschappelijk Onderzoek: Biostatistiek 212 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 213 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 214 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 215 11.4.3 Significance tests for β0 and β1 • If the slope β1 is equal to zero, then the regression model is described by yi = β0 + εi which implies that there is no linear relationship between Y and X. • In practice, if we want to test whether there is a linear relationship between X and Y , then we need to test the null hypothesis: H0 : β 1 = 0 versus HA : β1 6= 0 • The value observed in our sample is βc1 = −0.30 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 216 • This value could be obtained by coincidence, even if in the total population β1 = 0 would hold. • Research question: How large is the probability that we, by accident, observe βc1 = −0.30, even if β1 = 0? • Illustration: Vestac Java Applet → regression → histograms of slope and intercept Initiatie Wetenschappelijk Onderzoek: Biostatistiek 217 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 218 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 219 • It is clear that, when β1 = 0, it becomes very unlikely to still observe βc1 = −0.30. • Note that it would be equally unlikely to observe βc1 = +0.30. • The chance that we would find an estimate with |βc1| ≥ 0.30 is p < 0.0001. • Given that this probability is so small, more specifically that p < α = 0.05 = 5%, we will conclude that what has been observed (βc1 = −0.30) is sufficient indication to believe that β1 6= 0. • We reject the null hypothesis and conclude that β1 is significantly different from 0, at the 5% significance level. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 220 • The regression model allows for, apart from testing hypotheses, constructing confidence intervals. A 95% C.I. for β1 in our example is [−0.378; −0.218]. • Given that this interval is far away from 0, this is again strong evidence that β1 6= 0. • Analogously, a significance test can be constructed for H0 : β 0 = 0 versus HA : β0 6= 0 • In practice, one is primarily interested in tests for β1. • Note that all tests and confidence intervals are valid only when all regression model assumptions are satisfied. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 221 11.4.4 The ANOVA table • How much better can we predict Y , given that we know X? y yi • . ....... ....... ....... ....... ....... . . . . . . ........ ....... ....... ........ ....... . . . . . . . ....... ....... ....... ....... ........ . . . . . . ....... ........ ....... ........ ....... . . . . . . . ....... ....... ....... ........ ....... . . . . . . ....... ........ ....... ........ ....... . . . . . . ....... ....... ........ ........ ....... . . . . . . ....... ....... ....... ........ ....... . . . . . . . ....... ....... ........ ........ ....... . . . . . . .. ....... ....... ....... ....... ....... . . . . . . ....... ....... ........ ........ ........ . . . . . . .. ....... ....... ....... ....... ydi • • y • β0 • • • yc = βc0 + βc1x • • • • 0 Initiatie Wetenschappelijk Onderzoek: Biostatistiek x 222 • When we would not have x-values, then the best possible prediction for each yi-value is the sample average y. • A measure for the error so made is the sum of squares X i [yi − y]2 • Note that this is a measure for the variability in the yi. • If we do use the observed xi-values to predict the y-values, then we predict each yi by means of yci = βc0 + βc1xi • A measure for the error so made is the sum of squares X i Initiatie Wetenschappelijk Onderzoek: Biostatistiek [yi − yci]2 = X i e2i 223 • Because the use of this extra information coming from the xi leads to more precise predictions, we have that X i • One can show that X | i [yi − y]2 ≥ [yi − y]2 = {z ↓ SST O Initiatie Wetenschappelijk Onderzoek: Biostatistiek } X | i X i [yi − yci]2 [yi − yci]2 + {z ↓ SSE } X | i [yci − y]2 {z ↓ SSR } 224 • SSTO: Total sum of squares This term captures the total error made by predicting the yi without taking into account the observed values xi. • SSE: Error sum of squares This term captures the error made upon predicting the yi by making use of the observations xi. • SSR: Regression sum of squares This term captures the decrease in error by predicting the values yi with, rather than without, making use of the covariates. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 225 • A measure for how well the data points (xi, yi) agree with the regression line is SSR 2 R = SST O • R2 enjoys the following properties: . 0 ≤ R2 ≤ 1 . R2 = 0 implies that SSR = 0 and hence that all yci are equal to y, i.e., the regression line is flat. This is equivalent with βc1 = 0. . R2 = 1 implies that SSE = 0. This implies that yi = yci for all i, and hence that all points (xi, yi) lie on the regression line. • It is said that R2 expresses ‘what fraction of the variability in the yi can be explained by the xi’. • One can show that R2 is equal to r2 , the square of the correlation between the xi and yi values. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 226 11.4.5 Illustration + Interpretation • Statistica output for the ANOVA table, with SSR and SSE: • ‘R-square’ : R2 = 0.4940, the regression can explain about 50% of the total variability in the yi values: SSR 351.23 R = = = 0.4940 SST O 351.23 + 359.76 2 • The Pearson correlation, found before, was: r=− Initiatie Wetenschappelijk Onderzoek: Biostatistiek √ R2 √ = − 0.4940 = −0.70 227 Chapter 12 Model Diagnostics . Example . Linearity . Constant error variance . Normality of the errors Initiatie Wetenschappelijk Onderzoek: Biostatistiek 228 12.1 Example • We wish to assess whether a patient’s dependence (ADL), one day post operation, can be used to predict a patient’s length of stay: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 229 • There appears to be a slight increase of length of stay, as a function of the ADL score. Is this relationship significant? • Therefore, we fit the following regression model: Length of stay = β0 + β1ADL + εi • Statistica output: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 230 • The parameter estimates are: . βc0 = 9.37 . βc1 = 0.29, p-value: 0.1173 • The fitted regression line is Length of stay = 9.37 + 0.29ADL • Note that there is no significant relationship between length of stay and ADL score, 1 day post operation. • Further, it follows from R2 = 0.0432 that ADL explains only 4% of the total variability in length of stay. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 231 12.2 Model Assumptions • The statistical inferences, obtained for the regression parameters, are valid only if the model assumptions are satisfied, i.e., yi = β0 + β1xi + εi, Initiatie Wetenschappelijk Onderzoek: Biostatistiek εi ∼ N (0, σ 2) 232 • Hence, the regression model assumes that . . . . . . . linearity: for each X, the mean of the corresponding Y -values lie on the regression line . . . . normality: implies that, for each X, the corresponding Y -values lie symmetrically around the regression line . . . . constant variance: the prediction errors for small X-values are neither larger nor smaller than the errors for large X-values • How can these assumptions be verified? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 233 12.3 The Assumption of Linearity Initiatie Wetenschappelijk Onderzoek: Biostatistiek 234 • To illustrate the effect of non-linearity, consider the following fictitious example: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 235 • There clearly is a positive relationship between xi and yi, but the relationship between xi and yi appears to deviate somewhat from linearity. • What happens if we still apply linear regression? • Statistica output: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 236 • R2 = 0.85: X explains 85% of the observed variability in Y . • The regression line is given by Y = 1.19 + 2.06X • The slope β1 is significantly different from zero (p < 0.001). • The observed points all lie close to the fitted regression line (explaining the high R2), but the straight line poorly describes the relationship between xi and yi: . Over-estimation of the yi for small and large xi . Under-estimation of the yi in the middle Initiatie Wetenschappelijk Onderzoek: Biostatistiek 237 • The graph suggests that non-linearity can be discerned through studying the residuals ei = yi − yci = yi − (βc0 + βc1xi) and to plot them as a function of x. • Graphical representation: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 238 • If the assumption of linearity would be valid, then, for each value of X, the corresponding value of Y would lie symmetrically around the regression line. The residuals ei would then have to lie symmetrically around zero, for all possible values of X. • Clearly, this is not satisfied in the above example. • Note that the residuals in fact suggest that the relationship between the yi and the xi is rather a curved function. We return to this point as part of polynomial regression. • Oftentimes, the covariate X can be transformed so that the yi, as a function of the transformed xi can be assumed linear. • Frequently used transformations include ln(X), Initiatie Wetenschappelijk Onderzoek: Biostatistiek √ X, 1/X, exp(X), ln(X + 1),. . . 239 • For our fictitious example we try a logarithmic transformation of the observed xi: xi −→ ln(xi ) • Output of the regression procedure: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 240 • Accompanying graph: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 241 • Residual plot: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 242 • R2 = 0.92: our model has improved, because we now can explain more variability in the y-values by means of the x-values. • The estimated regression curve now is Y = 2.95 + 0.80 ln(X) • Hence, the transformation complicates the interpretation of the regression coefficients. For example, 0.80 is the estimated increase in Y when ln(X) increases with one unit. • At the same time is the transformation necessary to render the assumption of normality more realistic, which in turn implies that our statistical inferences w.r.t. β0 and β1 improve. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 243 12.4 Example: Length of Stay versus ADL • We now check whether the linearity assumption is satisfied in the regression model employed for the prediction of length of stay by means of the ADL score, 1 day post operation. • The residual plot does not indicate any systematic trend in the residuals: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 244 12.5 The Assumption of Constant Variance Initiatie Wetenschappelijk Onderzoek: Biostatistiek 245 • For illustration, we study the relationship between diastolic blood pressure and age, using data of 54 healthy adult women, between 20 and 60 years of age: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 246 • We conduct a regression of blood pressure on age: • The regression explains more than 40% of the variability in blood pressure (R2 = 0.4077); there is a significant (p < 0.0001) linear relationship between age and blood pressure; the estimated regression line is: Blood pressure = 56.16 + 0.58 × Age Initiatie Wetenschappelijk Onderzoek: Biostatistiek 247 • Given that the residuals ei = yi − yci can be interpreted as estimates of the theoretical deviations εi , we can assess the assumption of constant variance for the εi via a scatter plot of the residuals: • The residuals are distributed around zero in a ‘parallel’ fashion, pointing to the fact that linearity would be satisfied. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 248 • At the same time does the residual plot suggest that the variance in εi increases with age. • Violation of this assumption will lead to less than optimal inferences about the parameters β0 and β1: . The estimated regression line remains to be correct . The parameters β0 and β1 are estimated less precisely. This leads to larger p-values and hence a linear relationship between X and Y may go undetected. • An optimal analysis is obtained through a so-called weighted least squares analysis. • Oftentimes, non-constant variance is paired with non-normality. A solution for the non-normality problem very often generates, on the side, a solution for the non-constant-variance problem. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 249 12.6 Example: Length of Stay versus ADL • To check the assumption of constant residual variance for the regression model, employed to predict length of stay by means of the ADL score, 1 day post operation, we re-consider the residual scatter plot, already created to assess linearity: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 250 • Apart from the outlier in the middle, there are no systematic trends in the variability of the residuals. • We can therefore accept the assumption of constant residual variance. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 251 12.7 The Assumption of Normality Initiatie Wetenschappelijk Onderzoek: Biostatistiek 252 • Given that the residuals ei = yi − yci are estimators for the theoretical deviations εi , it is natural to assess the assumption of normality via residuals. • In practice, one often uses a combination of two methods: . Graphical: a histogram of residuals . A formal test for normality • Both techniques are illustrated by means of the blood pressure data in 54 women. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 253 12.7.1 A histogram of residuals • A simple graphical way to explore the distribution of the residuals is by means of a histogram, together with the normal distribution that most closely fits the histogram: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 254 • From this histogram follows: . There is no evidence for asymmetry in the distribution of the residuals . The distribution appears not to be too different from the normal distribution • We conclude that there is no graphical evidence for non-normal errors εi . Initiatie Wetenschappelijk Onderzoek: Biostatistiek 255 12.7.2 The normality test • We can conduct a formal normality test. • One tests the null hypothesis H0 : the data are normally distributed versus the alternative hypothesis HA : the data are not normally distributed • Various testing procedures are possible, all leading to a p-value, allowing us to either reject or accept the null hypothesis Initiatie Wetenschappelijk Onderzoek: Biostatistiek 256 • Statistica output: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 257 • We obtain a histogram with the normal approximation, but also with the results of 3 test procedures for normality: Shapiro-Wilk, Kolmogorov-Smirnov, and Lilliefor. The first two are the more common ones. • Based on each of the 3 procedures, the null hypothesis of normality would be accepted. We conclude that the residuals ei and hence the errors εi are normally distributed. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 258 12.7.3 Histogram ←→ normality test • The histogram is an exploration technique to study the distribution of the residuals. • The normality test is a formal test, allowing to test whether the assumption of normality is acceptable. • In (very) large samples is the rejection of normality, based on a statistical testing procedure, rather likely: The smallest deviations of normality will be detected. • It is known that small deviations from normality will still lead to correct results, as long as the errors are symmetric. • Hence, if non-normality is not due to asymmetry, then the results obtained will still be reliable. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 259 12.8 Example: Length of Stay versus ADL • We consider again the regression of length of stay with hip fracture patients on their ADL score, 1 day post operation. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 260 • The residuals are clearly non-normally distributed. • From the histogram, it follows that non-normality is due to asymmetry. • In case non-normality results from asymmetry, one can sometimes transform the y values so as to make residuals in the new regression normally distributed. • Frequently used transformation are ln(Y ), √ Y , 1/Y , exp(Y ), ln(Y + 1), . . . • In our example, we have to transform the data (the y-values) such that the larger residuals approach the bulk of the residuals. • A possible transformation is Length of stay −→ ln(Length of stay) Initiatie Wetenschappelijk Onderzoek: Biostatistiek 261 • Note that all observed values of length of stay are positive, making the above transformation allowable. • Before interpreting the regression model output, we check whether the distribution of the new residuals is closer to a normal distribution: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 262 • Hence, we can conclude that the errors in the new regression model are normally distributed. • New regression output: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 263 • The regression model is slightly improved, given that the R2 value has increased from 0.0432 to 0.0670 • The regression line is: ln(Length of stay) = 2.23 + 0.02 × ADL • Now, we do find a significant relationship: p = 0.0497 in contrast with p = 0.1173 prior to transformation. • Note that the relationship derived is no longer linear between the original variables ADL and Length of Stay. • This example underscores the need to check normality of errors, given that possible non-linearity can strongly distort the results. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 264 • The transformation of the y-values can, again, distort linearity, and/or non-constant variance of the errors εi. It is therefore useful to construct, after transformation, a scatter plot of the y-values versus the residuals: • Linearity and constant variability remain satisfied. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 265 12.9 General Conclusion • Carrying out a regression is easy • Evaluating a regression model is difficult Initiatie Wetenschappelijk Onderzoek: Biostatistiek 266 Chapter 13 Influential Observations . Example . Cook’s distance . Application . What shall we do with influential subjects? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 267 13.1 Example • We consider again the regression of ln(Length of stay) on the ADL score, 1 day post operation: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 268 • Patient #20 has got an ADL score of 17, and is hospitalized during 36 days, which is exceptionally long in comparison with other patients. • For subject #20, the residual ei = yi − yci is, therefore, very large. • Given that the parameters β0 and β1 are estimated via the least squares method, it is legitimate to investigate how strongly our results βc0 and βc1 are influenced by this individual. • A subject is highly influential if deleting the subject leads to strongly differing results. • Influential observations make interpreting the results more difficult, because the conclusions become sample-dependent: A different sample would have led to different results. • To study a subject’s influence, we can compare βc0 and βc1 with and without the Initiatie Wetenschappelijk Onderzoek: Biostatistiek 269 given subject. • To illustrate the method, we consider subject #20, and investigate the effect of deleting this patient, together with what the effect would have been, had the subject not had an ‘average’ ADL score, but rather a very large (24) or very small (10, 5, 0) ADL. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 270 • Results for ADL= 17: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 271 • Results for ADL= 24: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 272 • Results for ADL= 10: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 273 • Results for ADL= 5: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 274 • Results for ADL= 0: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 275 • Summary of the regression results: With subject #20 ADL 17 24 10 5 0 Parameter Without subject #20 Estimate (p-value) Estimate (p-value) Intercept (β0) 2.233 (<0.001) 2.191 (<0.001) Slope (β1) 0.022 (0.0497) 0.024 (0.0219) Intercept (β0) 2.088 (<0.001) 2.191 (<0.001) Slope (β1) 0.030 (0.0056) 0.024 (0.0219) Intercept (β0) 2.420 (<0.001) 2.191 (<0.001) Slope (β1) 0.012 (0.2801) 0.024 (0.0219) Intercept (β0) 2.541 (<0.001) 2.191 (<0.001) Slope (β1) 0.005 (0.6246) 0.024 (0.0219) Intercept (β0) 2.636 (<0.001) 2.191 (<0.001) -0.0003 (0.9764) 0.024 (0.0219) Slope (β1) Initiatie Wetenschappelijk Onderzoek: Biostatistiek 276 • In general, a subject is influential if the following two conditions are satisfied: . The subject is an outlier, i.e., the value yi is exceptionally large or small, given its xi value. . The subject is located at the outside of the X-space; in our example this means that a large or small ADL score (day 1) is observed. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 277 13.2 Cook’s Distance • The detection of influential subjects requires the following steps: . Carry out the regression on all subjects . Step 1: leave out the first subject and compare the new results with these based on all data . Step 2: leave out the second subject and compare the new results with these based on all data . Step 3: leave out the third subject and compare the new results with these based on all data . ... . Step n: leave out the last subject and compare the new results with these based on all data Initiatie Wetenschappelijk Onderzoek: Biostatistiek 278 • In each step, we have to compare the results obtained in the absence of a certain subject, with these obtained based on all data. • This can be done with Cook’s distance, which measures the ‘distance’ between the results with and without such an observation. • Cook’s distance for the ith observation is denoted by Di. • Influential subjects correspond to large Di. • Non-influential subjects correspond to small Di. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 279 13.3 Application • We apply this to the regression of ln(Length of stay) on the ADL score, 1 day post operation. • In Statistica, this is done via the ‘Extended’ list of residuals and predicted values. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 280 • Statistica output: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 281 • Note that D20 is relatively large. • In particular for large data sets, an index plot of Cook’s distances can be very handy, possibly upon explictly constructing a variable with observation numbers. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 282 • Apart from subject #20, we also find that subject #45 has got a relatively large Di . • It is therefore of interest to carry out the analysis with each of these observations removed in turn. • Repeating the analysis without subject #45 ought to be done. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 283 • The results with all observations, without observation #20, and without observation #45, respectively, are: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 284 13.4 What to Do With Influential Subjects? • Does removing influential subjects lead to qualitatively different results? • Are the data for influential subjects correct? . Data-entry errors . Mixing-up of patients case forms . ... • Do influential subjects satisfy the inclusion/exclusion criteria of the study? . Are these genuine hip fracture patients? . Could there be an additional complication/co-morbidity that could explain their influence? . ... Initiatie Wetenschappelijk Onderzoek: Biostatistiek 285 • When there are no objective criteria for omission, influential subjects ought to be kept in the study. • Possible, the least squares criterion can be replaced by a different criterion that is less sensitive to individual observations. =⇒ Robust regression techniques Initiatie Wetenschappelijk Onderzoek: Biostatistiek 286 Part V Analysis of Variance Initiatie Wetenschappelijk Onderzoek: Biostatistiek 287 Chapter 14 1-way ANOVA . Example . Pairwise t-tests . 1-way ANOVA . Illustration . Model diagnostics . Influential observations Initiatie Wetenschappelijk Onderzoek: Biostatistiek 288 14.1 Example • Because we suspect that the ADL score post operation is not only influenced by operation-specific factors, but also by, for example, how dependent the patient was prior to the operation, we study the relationship between the ADL score and the patient’s living condition prior to operation. • We distinguish between the following classes: . Single . With partner / family / religious community . RH (Retirement-Home) / RCH (Retirement and Care Home) . Other Initiatie Wetenschappelijk Onderzoek: Biostatistiek 289 • Descriptive statistics and graphical exploration: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 290 • The fourth group contains only 1 subject, and will not be included for analysis. • From the graph, it appears that the average ADL score in RH/RCH patients is higher than in the other two groups. Is this difference significant? • Even if the three groups would be the same in the population, it would still be possible to observe differences in the sample, purely by chance. • How large is the probability that we observe this type of difference? • Illustration: Vestac Java Applet → Anova → Anova plot Initiatie Wetenschappelijk Onderzoek: Biostatistiek 291 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 292 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 293 14.2 Pairwise t-tests • In analogy with the unpaired t-test, we assume that we now have r different sets of measurements (in the example, r = 3): . y11 , y12 , y13 , . . . , y1n1 the measurements in the first group . y21 , y22 , y23 , . . . , y2n2 the measurements in the second group . ... . yr1 , yr2, yr3, . . . , yrnr the measurements in the rth group • Further, we assume that the measurements are sampled from the following distributions: Y1j ∼ N (µ1, σ 2), Initiatie Wetenschappelijk Onderzoek: Biostatistiek Y2j ∼ N (µ2, σ 2), ... Yrj ∼ N (µr , σ 2 ) 294 • The null hypothesis that we want to test is H0 : µ 1 = µ 2 = . . . = µ r versus the alternative hypothesis HA : not all µi equal Initiatie Wetenschappelijk Onderzoek: Biostatistiek 295 • When the above null hypothesis is not satisfied, then at least two of the means µi must be different. Therefore, we can, in principle, use unpaired t-tests. For r = 3, this would mean that we test the following hypotheses: H0 : µ 1 = µ 2 H0 : µ 1 = µ 3 H0 : µ 2 = µ 3 • For our example, we obtain the following p-values: Single Partner/family/relig. RH/RCH Single Partner/family/relig. — 0.8763 0.8763 — 0.0013 <0.0001 RH/RCH 0.0013 <0.0001 — Initiatie Wetenschappelijk Onderzoek: Biostatistiek 296 • Hence, we only find significant differences between the RH/RCH patients on the one hand and the other two groups on the other hand. • Note that, for each test conducted, there is a chance of 5% for a type-I error (incorrectly rejection H0). • It can be shown that, for our example, the total probability for a type-I error satisfies: P (H0 rejected | H0 ) = P (at least 1 significance | µ1 = µ2 = µ3) ≤ 3 × 5% = 15% so that the chance for a type-I error is larger than the 5% requested. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 297 • In general, when conducting k tests, the total probability for a type-I error can increase to k × α, and hence become large when the number of tests conducted is large. • It is therefore necessary to dispose of a testing quantity that allows us to test the null hypothesis H0 : µ 1 = µ 2 = . . . = µ r without having to conduct all pairwise t-tests. =⇒ Initiatie Wetenschappelijk Onderzoek: Biostatistiek ANOVA 298 14.3 1-way ANOVA • ANOVA (Analysis of variance) is an extension of the unpaired t-test to the comparison with more than 2 groups. • Like with the t-test, the test procedure will compare the variability between groups with the variability within groups. • The following equations play a central role: ni r X X i=1 j=1 | 2 [yij − y ··] = {z ↓ SST O Initiatie Wetenschappelijk Onderzoek: Biostatistiek } ni r X X [yij − yi·] + i=1 j=1 | 2 {z ↓ SSwithin } r X i=1 | ni[y i· − y ··]2 {z ↓ } SSbetween 299 Group 1 Group i Group r ........ ......... ................ ...... ........ ..... ........ .... .... .... ... .... ... .... ... ... ... ... ... ... ... .. .. ... ... . .. . . . . . .. ... . . ... .. . . .. . . . .. . .. . .. .. . . . . . . ... .. ... . . .. .... . . .. .. .. .. .. .. . ... . . . .. . .. .. . . . . . . .. .. .. . . .. . . . .. . . . . .. .. .. .. .. .. .. .. . . . .. .. .. . . .. ... . . .. .. .. .. .. .. .. .. ... . .. . .. .. .. . .. ... . .. .. .. . .. . . . . . . . .. .. . .... . . . .. ... . . .. . .. . .. .... . . . . . .. .. . .. .. ... . . .. . . . .. . .. .. .. ... . . . . . .. .. . .. .. .. . . .. . . . .. .. .. .. .. ... ... ... .. . .. .. . .. ... . . .. .. . .. .. .. . . . . . .. .. .. . . . . . . .. .. .. . .. .. . . . ... . . . . .. ... . . .. . ... . . . . . .. .. . .. . .. . . . . . . .. .. .. . .. .. . . . . . . ... ... ... . .. .. . ... . . . ... .. . . . . . . ... . . . . . ... ... .. ... ... .. . . . . . ... ... . ... . . . . . . . . . . ... ... ... ... .. ... . . . . . . ... .... ...... .. ... ... . . . . . . . .... . ..... .... .. .. . . . . . . . . . . . . .... .... .... . .. .. . . . . ..... . . . . . . . . . . . . ..... ...... .. ....... ... . ... . . . . . . . . . . . . . . . . . . . . . . . . . .......... .......... .......... ..... ..... ..... . . . . . . . . . . . . . . . . . ............. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... ...... ..... ..... y1j y 1· y1j − y 1· y i· y ·· y r· y 1· − y ·· y1j − y ·· . y ·· : global mean (all groups together) . y i· : mean in the ith group . yij : jth measurement in the ith group Initiatie Wetenschappelijk Onderzoek: Biostatistiek 300 • SSTO: Total sum of squares This term expresses the total variability in the data. • SSwithin: Within-group sum of squares This term expresses the variability within the groups • SSbetween: Between-group sum of squares This term expresses the variability between the groups • In ANOVA, the null hypothesis is rejected if F = SSbetween/(r − 1) SSwithin/(N − r) is large. N is the total sample size, N = Pi ni Initiatie Wetenschappelijk Onderzoek: Biostatistiek 301 • Note that F is the ratio of the variability between groups over the variability within groups, which is entirely analogous to the unpaired t-test. This motivates the terminology ‘ANOVA.’ • In our example, F = 8.59 • Under the null hypothesis, F is expected to be small. • We wish to known in how far F = 8.59 can be obtained purely by chance. • We calculate the probability that F = 8.59, in case that all populations truly would be equal, i.e., when µ1 = µ2 = µ3. • Illustration: Vestac Java Applet → Anova → Histograms of MSR, MSE, F Initiatie Wetenschappelijk Onderzoek: Biostatistiek 302 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 303 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 304 • Clearly, when there is no difference between the three populations, then it is very improbable to observe F = 8.59. • The chance to observe F ≥ 8.59 purely by chance is p = 0.0006. • Given this chance is so small, more specifically p < α = 0.05 = 5%, we conclude that the observed value (F = 8.59) is sufficient indication to conclude that µ1, µ2, and µ3 are different. • We reject the null hypothesis and conclude that the three groups are significantly different at the 5% significance level. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 305 • Note that the calculation of the p-values makes use of the assumptions made: . Normality within all groups . Equal variance for all groups • Exactly like with linear regression, these assumptions need to be checked (see further). Initiatie Wetenschappelijk Onderzoek: Biostatistiek 306 14.4 Illustration • Output table containing global F -test: • The ‘SS MODEL’ is the SSbetween. In the F statistic, SSbetween needs to be divided by r − 1 = 3 − 1. This quantity is called the number of degrees of freedom for SSbetween (df=degrees of freedom). • The ‘SS Residual’ is the SSwithin. In the F statistic, SSwithin needs to be divided by N − r = 54 − 3. This quantity is called the number of degrees of freedom for SSwithin. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 307 • The F statistic is SSbetween/(r − 1) 168.60/2 F = = = 8.59 SSwithin/(N − r) 500.23/51 • The corresponding p-value is p = 0.0006, which points to significant differences between the three groups, as far as the average ADL on day 1 is concerned. • Exactly like with regression, one can compute a statistic, indicating which portion of the variability in the ADL scores can be explained by the differences in living conditions (= variability between groups): SSbetween 168.60 R = = = 0.252 SST O 168.60 + 500.23 2 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 308 14.5 Model Diagnostics • With ANOVA, one implicitly assumes that the data are sampled from the following populations: Y1j ∼ N (µ1, σ 2), Y2j ∼ N (µ2, σ 2), ... Yrj ∼ N (µr , σ 2 ) • Hence, we assume that . . . . . . . constant variance: within every group the spread is equally large . . . . normality: within each group the data are normally distributed • When the assumptions are not satisfied, just like with linear regression, erroneous statistical results can follow (p-values, confidence intervals). • How can the above assumptions be verified? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 309 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 310 14.5.1 Assumption of constant variance Initiatie Wetenschappelijk Onderzoek: Biostatistiek 311 • Descriptive statistics and graphical exploration: Initiatie Wetenschappelijk Onderzoek: Biostatistiek 312 • Is there too much difference in the variance so as to doubt the assumption of equal variance? • In other words, to what extent can the observed differences in variance be ascribed to chance? • We ought to conduct a formal equal-variance test. The null hypothesis then is H0 : σ12 = σ22 = . . . = σr2 versus the alternative hypothesis HA : not all σi2 equal • This can be done, for example, using Levene’s test. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 313 • Statistica output: • Hence, we observe that the variances among the three groups are not significantly different (p = 0.0808). • When there are many groups, or when some groups contain (very) many observations, then small differences can be found to be significant by the formal testing procedure. • At the same time, it is known that variances that are not too different pose little or no problem. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 314 • Therefore, one employs, next to a formal test for equal variances, also a rule of thumb, stating that variances should not differ by more than a factor 5, to avoid adversely affecting the results. • In our example, this is: 3.772 = 4.29 1.822 • In practice, one uses the formal test, combined with the rule of thumb, so as to assess whether the assumption of equal variance is satisfied. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 315 14.5.2 Assumption of Normality Initiatie Wetenschappelijk Onderzoek: Biostatistiek 316 • ANOVA assumes that the data in every group are normally distributed, with common variance. Above, we already discussed how the equality of variances can be assessed. We now assume that the variances are equal, indeed. How can then normality be tested? • We rewrite the ANOVA model as Y1j = µ1 + ε1j Y2j = µ2 + ε2j ... Yrj = µr + εrj where the ‘error terms’ εij all come from the same normal distribution with mean zero and variance σ 2 . Initiatie Wetenschappelijk Onderzoek: Biostatistiek 317 • Exactly as with regression, we will check the assumption of normality for the εij via their estimators c eij = yij − µ i = yij − y i· • As with regression, the eij are termed residuals: they represent the error made when the observed value yij for an individual in group i would be predicted by the group average y i·. • Once the residuals eij computed, we are in a position, again, to assess normality using their histograms, or using formal normality tests. • This is effectuated in full analogy with linear regression. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 318 • Statistica output: • Hence, we can conclude that the assumption of normality is acceptable. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 319 • Exactly as with simple regression, we have that: . Departures from normality still lead to correct results, as long as the distribution of the errors is symmetric. . In case of asymmetry, the response can sometimes be transformed, so as to render the residuals in the new model normally distributed. . However, some transformations can disrupt the constant variance, implying that this needs to be assessed again after transformation. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 320 14.6 Influential Observations • In spite of the fact that, with ANOVA, we strictly speaking do not dispose of regression parameters, individual observations can still have a large influence on c , and hence ultimately on the ANOVA the estimation of the group averages, µ i results. • Statistica allows us, exactly as with regression, to measure influence of each c = y observation through comparing the estimators µ i i· with these that would be obtained upon deletion of such an observation. • This results, again, in the so-called ‘Cook’s distance,’ a distance between the estimators with and without a given observation. • Exactly as with regression, we consider a scatter plot of Cook’s distances versus the subject number. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 321 • The computations are done in analogy with simple linear regression. • Statistica output: • Hence, there are no observations with an unduly large influence. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 322 Part VI Logistic Regression Initiatie Wetenschappelijk Onderzoek: Biostatistiek 323 Chapter 15 Logistic Regression . Simple case of a proportion . Two groups . General definition of logistic regression Initiatie Wetenschappelijk Onderzoek: Biostatistiek 324 15.1 A Proportion Successes : p Failures : n−p Total : p + (n − p) = n Proportion : Transformation : πˆ = π= p n eα 1+eα 0 ≤ π ≤ 1 ←→ −∞ ≤ α ≤ +∞ Initiatie Wetenschappelijk Onderzoek: Biostatistiek 325 15.2 Formulation of Logistic Regression Two Groups Untreated Treated Successes p1 p2 Failures Total Prob n1 eα1 π1 = 1 + eα1 n2 eα2 π2 = 1 + eα2 n1 − p1 n2 − p2 α2 = α1 + β = α + β α1 = α1 Initiatie Wetenschappelijk Onderzoek: Biostatistiek = α 326 15.2.1 Effect of a Covariate x α2 = α1 + β = α + β α1 = α1 = α eα π1 = 1 + eα eα+β π2 = 1 + eα+β eα+βx π(x) = 1 + eα+βx Initiatie Wetenschappelijk Onderzoek: Biostatistiek (x = 0) (x = 1) (x = 0, 1) 327 15.2.2 Odds Ratio P (x = 1) 1 − P (x = 0) OR = . 1 − P (x = 1) P (x = 0) = eα+β 1 + eα+β 1 1 + eα+β = eα+β . . 1 1 +αeα e 1 + eα 1 eα = eβ = ψ β: log odds ratio Initiatie Wetenschappelijk Onderzoek: Biostatistiek 328 15.2.3 General Form Of Logistic Model • Dichotomous outcome Yi: Yi = 1 event occurs 0 event does not occur • p regression variables xi = (x1i , . . . , xpi)0 exp(β0 + β1x1i + . . . + βpxpi ) P (Yi = 1|xi) = 1 + exp(β0 + β1x1i + . . . + βpxpi) π(xi) = exp(β0 + β1x1i + . . . + βpxpi ) 1 + exp(β0 + β1x1i + . . . + βpxpi) logitP (Yi = 1|xi) = β0 + β1x1i + . . . + βpxpi logit[π(xi)] = β0 + β1x1i + . . . + βpxpi Initiatie Wetenschappelijk Onderzoek: Biostatistiek 329 15.2.4 Odds Ratio ⇒ odds ratio for two individuals with two series x∗ and x: π(x∗) 1 − π(x) OR = . 1 − π(x∗) π(x) = exp p X j=o βj (xj − x∗j ) . exp βj : fraction with which the odds increase (decrease) for each unit change in xj , keeping all other covariate values constant Initiatie Wetenschappelijk Onderzoek: Biostatistiek 330 15.2.5 The Covariates x ? • indicator variables (exposures: 0/1) • continuous measures (age) √ • transformation of measures ( age) • cross terms = interactions, made up from cross products of other covariates Initiatie Wetenschappelijk Onderzoek: Biostatistiek 331 Chapter 16 Use of Logistic Regression . Ordinary logistic regression . Stratified analysis . Prospective versus retrospective studies Initiatie Wetenschappelijk Onderzoek: Biostatistiek 332 16.1 Possible Settings • Logistic regression can be used: . Without covariates: then we are back to the case of a simple proportion . With a single binary covariate: the two group case . With a general set of covariates: the general definition seen above • All of this starts from the setting of prospective studies. A few questions remain: . What about retrospective studies? . What when data are stratified, e.g., by gender or age group? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 333 16.2 Effect of Stratum • Assume stratification for sex (M vs. F) • Various situations can be considered • Situation 1: eαM +βM xi PM (xi) = 1 + eαM +βM eαF +βF xi PF (xi) = 1 + eαF +βF • Two completely different models are considered, one for females, one for males: . The baseline risks eαM and eαF are different . The relative risks eβM and eβF are different Initiatie Wetenschappelijk Onderzoek: Biostatistiek 334 • Situation 2: eαM +βxi PM (xi ) = 1 + eαM +β . The baseline risks eαM eαF +βxi PF (xi ) = 1 + eαF +β and eαF are different . The relative risks eβM = eβF = eβ are common • The latter model is often considered. It needs to be understood that there is indeed an assumption behind it. It is not always true that there is only one common β parameter. This assumption is equivalent to saying that the effect on males and females, of the covariate xi, is the same. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 335 16.2.1 Stratum as Covariate • Stratification can be seen as the inclusion of an extra covariate: eβ0+β1gi+β2xi+β3gixi P (xi, gi ) = 1 + eβ0+β1gi+β2xi+β3gixi where gi = 1 for females and gi = 0 for males. • We obtain a correspondence with Situation 1: α M + β M xi = β 0 + β 2 xi αF + βF xi = (β0 + β1) + (β2 + β3)xi Initiatie Wetenschappelijk Onderzoek: Biostatistiek 336 • When the relative risks have to be same, as in Situation 2, we have the requirement: β2 = β2 + β3 and hence β3 = 0 or, in other words: eβ0+β1gi+β2xi P (xi, gi) = 1 + eβ0 +β1gi+β2xi whence the interaction between gender and the covariate xi is absent. • In conclusion, a stratified analysis is conducted by including the stratifying variable as an ordinary covariate. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 337 • This is the strength of logistic regression. • There is an important difference of interpretation: . A “stratifying covariate” such as gi is included in the model without devoting further study to it. . An “ordinary covariate” such as xi is the subject of scientific study: we will be interested in the strength of the effect on the outcome (through the coefficient β2 or the associated relative risk; or through hypothesis testing). Initiatie Wetenschappelijk Onderzoek: Biostatistiek 338 16.3 Several Strata • We do not have to restrict attention to stratification into two groups. • In general, we may write: eαm+βxi Pm (xi) = 1 + eαm+β where . m = 1, . . . , M indexes M stratification groups (e.g., age class) . xi is the level of the exposure variable for subject i (e.g., tobacco use) • the baseline risk (eαm ) is different • the relative risk (eβ ) is common • A model with different relative risks could be constructed as well Initiatie Wetenschappelijk Onderzoek: Biostatistiek 339 16.4 Stratum Effect: General Situation • A fully general model: Pm (xi) = Pp eαm+ j=1 βpxpi 1+ Pp eαm+ j=1 βpxpi • Several exposure variables x1i, . . . , xpi Example: tobacco and alcohol • m = 1, . . . M : stratum indicator Example: age class • the baseline risk (eαm ) is different • the relative risks (eβp ) are common Initiatie Wetenschappelijk Onderzoek: Biostatistiek 340 16.5 Prospective ←→ Retrospective • Prospective: . the exposures xi or E are fixed . the outcomes yi or D are stochastic • Retrospective (Case-Control): . the exposures xi or E are stochastic . the outcomes yi or D are fixed • Can one adapt logistic regression ? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 341 16.6 Retrospective Logistic Regression • Prospective: exp(α + βxi) P (yi = 1|xi) = 1 + exp(α + βxi) . eα : the baseline risk . eβ : the relative risk • Retrospective (Case-Control): P (yi = 1|xi selected into the study) = exp(α∗ + βxi) 1 + exp(α∗ + βxi) α∗ . e : no interpretation . eβ : the relative risk Initiatie Wetenschappelijk Onderzoek: Biostatistiek 342 16.7 Analogy: 2 × 2 tables • Contingency tables: odds ratio ψ “exposed—unexposed” = odds ratio ψ “diseased—disease free”. • Thus, for 2 × 2 tables: inference for cohort studies ≡ inference for case-control studies • This identity is transmitted to the general logistic model. Initiatie Wetenschappelijk Onderzoek: Biostatistiek 343 Chapter 17 Case Study: Ille-et-Villaine . A single exposure . Two exposures, qualitative analysis . Two exposures, quantitative analysis Initiatie Wetenschappelijk Onderzoek: Biostatistiek 344 17.1 Ille-et-Villaine Study Initiatie Wetenschappelijk Onderzoek: Biostatistiek 345 17.2 The Data for a Single Binary Exposure AGE=1 CASES CONTROLS TOBACCO+ 1 9 10 TOBACCO0 106 106 CASES CONTROLS TOBACCO+ 4 26 30 TOBACCO5 164 169 AGE=2 6 strata × 2 exposure groups Initiatie Wetenschappelijk Onderzoek: Biostatistiek 346 17.3 The Data Initiatie Wetenschappelijk Onderzoek: Biostatistiek 347 17.4 Modeling a Single Binary Exposure model # par deviance = −2 logl 0 .. 1 .. ∞ 1 .. k .. MAX G0 .. G1 .. G∞ Considerations: • parsimonious model: no non-significant effects • fit is good: G1 − G∞ ∼ χ2MAX−k is small Initiatie Wetenschappelijk Onderzoek: Biostatistiek 348 17.4.1 Degrees of Freedom • Decomposition: df(model): 1 per parameter df(fit): to test that the model is ‘good’ • In our case: . 6 age classes × 2 classes of exposure = 12 degrees of freedom (12 df) Initiatie Wetenschappelijk Onderzoek: Biostatistiek 349 17.4.2 Model 0 • intercept α: common age effect • Model eα Pm (xi ) = 1 + eα • df(mod)= 1 • df(fit)= 12 − 1 = 11 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 350 17.4.3 Model 1 • age stratum: intercept per age category: α1 , α2 , α3 , α4 , α5 , α6 • Model eαm Pm (xi) = 1 + eαm • effects in the model: . df(mod)= 6 . no test necessary ! • fit: . df(fit)= 12 − 6 = 6 . G = G1 − G∞ = 90.56 ∼ χ26 . no good fit ⇒ untenable Initiatie Wetenschappelijk Onderzoek: Biostatistiek 351 17.4.4 Model 2 • age stratum: intercept per age category: α1 , α2 , α3 , α4 , α5 , α6 • effect of exposure to tobacco: • Model Initiatie Wetenschappelijk Onderzoek: Biostatistiek β eαm+βxi Pm (xi) = 1 + eαm+βxi 352 • effects in the model: . df(mod)= 7 . Estimation of the effect: βˆ = 1.670(standard error = 0.190) ψˆ = exp(1.670) = 5.31 . 95 % confidence limits: βL = 1.670 − 1.96 × 0.190 = 1.30 βU = 1.670 + 1.96 × 0.190 = 2.04 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 353 . The αs: No interpretation ! c α 1 c α 2 c α 3 c α 4 c α 5 c α 6 = −5.054 = −3.512 = −1.855 = −1.341 = −1.087 = −1.092 • fit: . df(fit)= 12 − 7 = 5 . G = G2 − G∞ = 11.04 ∼ χ25 g Pearson statistic: wideG = 9.32 (p = 0.05) (p = 0.15) . fit is OK Initiatie Wetenschappelijk Onderzoek: Biostatistiek 354 17.4.5 Model 3 • age stratum: intercept per age: α1 , α2 , α3 , α4 , α5 , α6 • effect of exposure to tobacco: β • linear interaction between age and exposure γ Initiatie Wetenschappelijk Onderzoek: Biostatistiek 355 m 1 2 3 4 5 6 (m − 3.5) -2.5 -1.5 -0.5 0.5 1.5 2.5 systematic trend in the relative risk of age • Model eαm+βxi+γxi(m−3.5) Pm(xi ) = 1 + eαm+βxi+γxi(m−3.5) Initiatie Wetenschappelijk Onderzoek: Biostatistiek 356 • effects in the model: . df(mod)= 8 . Estimation of the effect: γˆ = 0.125(standard error = 0.189) not significant • fit: . df(fit)= 12 − 8 = 4 . not needed since the effect is not significant Initiatie Wetenschappelijk Onderzoek: Biostatistiek 357 17.4.6 Model ∞ • age stratum: intercept per age: α1 , α2 , α3 , α4 , α5 , α6 • effect of exposure, per age stratum: β1 , β2 , β3 , β4 , β5 , β6 • Model eαm+βmxi Pm(xi ) = 1 + eαm+βmxi • effects in the model: . df(mod)= 12 • fit: . df(fit)= 12 − 12 = 0 . fit is perfect Initiatie Wetenschappelijk Onderzoek: Biostatistiek 358 17.4.7 Summary Mod. df(mod) df(fit) dev 0 1 2 3 ∞ 1 6 7 8 12 11 6 5 4 0 G0 G1 G2 G3 G∞ Initiatie Wetenschappelijk Onderzoek: Biostatistiek 359 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 360 17.5 Residuals • Principle: Expected number of cases = observed number of cases • For the example: 0.33 + 4.10 + 24.50 + 40.13 + 23.74 + 3.20 = 96 ˆ where • The residuals (O − E) have variance N Pˆ Q . N : number of cases + number of controls . Pˆ : estimated disease probability ˆ = 1 − Pˆ .Q Initiatie Wetenschappelijk Onderzoek: Biostatistiek 361 • Standardized residuals: O−E ˆ N Pˆ Q s ˆ • Squaring and adding ⇒ G. • Make a plot and . verify possible patterns . verify the values: larger than 2 is a problem Initiatie Wetenschappelijk Onderzoek: Biostatistiek 362 17.5.1 Values for the Example Initiatie Wetenschappelijk Onderzoek: Biostatistiek 363 17.6 Qualitative Analysis • Use so-called “grouped data” for the Ille-et-Villaine set • This means that tobacco and alcohol are considered as being classified into four categories each, rather than having continuous (quantitative) values • At the same time, we will study the effects of tobacco and alcohol jointly , on the relative risk of esophageal cancer in Ille-et-Villaine. . cases: 200 males with diagnosed esophageal cancer (1972–1974) . controls: 778 male adults (775 satisfy the criteria) 2 factors with 4 levels ⇒ 16 risk categories Initiatie Wetenschappelijk Onderzoek: Biostatistiek 364 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 365 17.6.1 Degrees of Freedom • Levels: Levels of alcohol Levels of tobacco Levels of age 4 4 6 • Cells: 4 × 4 × 6 = 96 • 8 empty cells • 96 − 8 = 88 degrees of freedom Initiatie Wetenschappelijk Onderzoek: Biostatistiek 366 17.6.2 Alcohol • Dichotomous variables: Level 0 (1) 1 (2) 2 (3) 3 (4) ALC2 0 1 0 0 ALC3 0 0 1 0 ALC4 0 0 0 1 • ALC1 is baseline category (why ?) • Coefficients β2, β3, β4 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 367 • Relative risks: . 1: RR of level 1 versus level 1 . eβ2 : RR of level 2 versus level 1 . eβ3 : RR of level 3 versus level 1 . eβ4 : RR of level 4 versus level 1 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 368 17.6.3 Steps • Define modeling strategy • Find acceptable model • Study the model: . Interpretation . Are the significant effects also (medically) relevant ? . ... Initiatie Wetenschappelijk Onderzoek: Biostatistiek 369 17.6.4 Modeling AGE (df=6) fit (df=82) AGE (df=6) . & ALC (df=3) fit (df=79) AGE (df=6) ALC (df=3) fit (df=79) & AGE (df=6) . ALC (df=3) TOB (df=3) fit (df=76) ↓ AGE (df=6) ALC (df=3) TOB (df=3) ALC*TOB (df=9) fit (df=67) Initiatie Wetenschappelijk Onderzoek: Biostatistiek 370 17.6.5 Model Fit Initiatie Wetenschappelijk Onderzoek: Biostatistiek 371 17.6.6 Comparison • With or without interaction ? • Compare: . Alcohol: ∗ Model 2 versus Model 1: 246.9 − 105.9 = 141.0 ∗ Model 4 versus Model 3: 210.3 − 82.3 = 128.0 . Tobacco: ∗ Model 3 versus Model 1: 246.9 − 210.3 = 36.6 ∗ Model 4 versus Model 2: 105.9 − 82.3 = 23.6 adjusted χ2 < non-adjusted χ2 (due to the correlation between the consumption of alcohol and the consumption of tobacco) • Both covariates do have strong independent effects Initiatie Wetenschappelijk Onderzoek: Biostatistiek 372 17.6.7 Fitting AGE (df=6) fit (df=82) G2 = 246.9 (p < 0.0001) too simple AGE (df=6) ALC effect . & TOB effect AGE (df=6) ALC (df=3) ALC (df=3) fit (df=79) G2 = 105.9 fit (df=79) G2 = 210.3 (p = 0.0234) (p < 0.0001) too simple too simple TOB|ALC effect & AGE (df=6) . ALC|TOB effect ALC (df=3) TOB (df=3) fit (df=76) G2 = 82.3 (p = 0.2907) appropriate Initiatie Wetenschappelijk Onderzoek: Biostatistiek 373 17.6.8 Comparison • ALC effect: . Alcohol effect • TOB|ALC effect: . The effect of tobacco, after correction for alcohol . χ2 = 141.0 . χ2 = 23.6 . df=3 . df=3 . p < 0.0001 . p < 0.0001 . strong effect . strong effect • TOB effect: . Tobacco effect • ALC|TOB effect: . The effect of alcohol, after correction for tobacco . χ2 = 36.3 . χ2 = 128.0 . df=3 . df=3 . p < 0.0001 . p < 0.0001 . strong effect . strong effect Initiatie Wetenschappelijk Onderzoek: Biostatistiek 374 17.6.9 Model 2 • Estimated coefficients for Model 2: k Group exp(βk ) 1 0– 39 g/day 1.0 2 40– 79 g/day 4.2 3 80–119 g/day 7.4 4 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 120+ g/day 39.7 375 17.6.10 Model 4 • Model 4 is sufficient (χ276 = 82.34, p = 0.2907) • Coefficients: Alcohol k Group Tobacco exp(βk ) k Group exp(βk ) 1 0– 39 g/day exp(0) 1 0– 9 g/day exp(0) 2 40– 79 g/day exp(1.44) 2 10–19 g/day exp(0.44) 3 80–119 g/day exp(1.98) 3 20–29 g/day exp(0.51) exp(3.60) 4 30+ g/day exp(1.64) 4 120+ g/day Initiatie Wetenschappelijk Onderzoek: Biostatistiek 376 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 377 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 378 Initiatie Wetenschappelijk Onderzoek: Biostatistiek 379 17.6.11 Questions Model 4 • What is the RR of exposure to ALC (80-119 g/day) ? • What is the RR of exposure to TOB (10-19 g/day) ? • What is the RR of exposure to . TOB (10-19 g/day) . ALC (80-119 g/day) What assumptions ? Initiatie Wetenschappelijk Onderzoek: Biostatistiek 380
© Copyright 2025 ExpyDoc