當Y是數值變項時 Multiple Regressions 複廻歸 Rosner, Section 11.9 Outcome Y is a continuous variable… Only one X, and X is continuous-Simple Linear Regression Only one X, and X is categorical-ANOVA All X’s are continuous-Multiple Linear Regression All X’s are categorical (use of dummy variables)- Two-way ANOVA, three-way ANOVA, multiway ANOVA Some X’s are continuous, and some are categorical -Analysis of Covariance, ANCOVA, Linear Models 2 Multiple Linear Regression 針對一個或多個觀察變項(design variables, or independent variables)觀察其對某一 Outcome variable之線性關係 Outcome→Dependent Variable必須是 continuous Design →Independent Variables 多半是continuous 若categorical則需特殊處理(dummy variables) X與X之間必須互相獨立 Page 106, Pearson and Turton: Statistical methods in environmental health 3 4 Multiple Linear Regression: Y (continuous) y = α+β1X1+β2X2+ β3X3 …+ βkXk regression coefficient Dependent Variable Y Independent Variables X1 , X2 , … , Xk Multiple Linear Regression: Y (continuous) yˆ i= αˆ +βˆ1X1+ βˆ2 X2+ βˆ3 X3 …+ βˆ Xk plug in parameter estimates Estimated value 5 Multiple Linear Regression: Y (continuous) yˆ i= αˆ +βˆ1X1+ βˆ2 X2+ βˆ3 X3 …+ βˆ Xk plug in parameter estimates k Estimated value k Parameter estimates的取得是 X1 , X2 , … , Xk 彼此互相調整後 而得! 7 6 βj 迴歸係數的意義: 在調整過其他影響因素的情況 下,一單位Xj的變化相當於Y變 項βj單位的變化量; β 所對應的p-value乃是檢定 j H0: βj=0的結果,βj的數值越 大,愈容易顯著。 8 Meaning of βj Meaning of βj For each βj, j=1, 2, …, k The average increase in y per unit increase in Xj, with all other variables held constant Or, after adjusting for all other variables in the model Hypothesis testing of βj (p-value) H0: βj=0 βj>0 positive direction, β <0 negative direction j 9 10 11 12 Rosner example 11.39 Birthweight in oz (X1) Age in days (X2) Systolic blood pressure in mmHg (y) K=2 Response SBP Summary of Fit Rsquare 0.88091 Rsquare Adj 0.862589 Root Mean Square Error 2.479173 Mean of Response 88.0625 Observations(or Sum Wgts) 16 Parameter Estimates Term Estimate Std Error t Ratio Prob> t Intercept β birthwgt(oz) age(days) 4.531889 0.034336 0.680205 1 檢定所有 的X變項 合起來對 Y變項是否 據顯著性相關 DF 2 13 15 Sum of Squares 591.03564 79.90186 670.93750 Relationship of each X vs Y Mean Square 295.518 6.146 11.79 <.0001 3.66 0.0029 8.66 <.0001 β2 Analysis of Variance Source Model Error C.Total 53.450194 0.1255833 5.8877191 F Ratio 48.0806 Prob>F <0.0001 Significance of each β R2 = 88.1%, birth weight and age explained 88.1% Y’s variance β1 迴歸係數的意義:在調整過Age後,1 oz出生體重的變化量相當於0.13 mmHg 的血壓變化量,此變化量在統計上顯著 性不同於0(p-值=0.0029); β2 迴歸係數的意義:在調整過出生體重 後,年齡每增加1天相關於5.89 mmHg的 血壓變化量,此變化量具在統計上顯著 性意義(p-值<0.0001)。 15 β1 : 1 oz increase in birthweight relates to 0.13 mmHg increase in SBP (p=0.0029) after adjusting for age; β2 : 1 day increase in age relates to 5.89 mmHg increase in SBP (p<0.0001) after adjusting for birthweight. 16 TABLE 7.1 Sample Table for Reporting a Multiple Linear Regression Model with Three Eplanatory Varlables. Sample Presentation: Variable We developed a model to predict a score of overall function, Y, for patients with multiple sclerosis based on disease severity, X1, (level 1 being least severe and level 15 being most severe); ambulatory ability (measured as the rate of walking in laps per minute), X2; and number of lesions, X3: Intercept Coefficient( Stadard β) Error 95% CI 40.79 2.55 ─ 3.98 2.37 -0.67 to 5.63 X2 .123 0.29 X3 -2.09 0.28 X1 Ward X2 ─ P ─ 1.68 0.100 0.66 to 1.80 4.20 <0.001 -2.64 to-1.54 -7.43 <0.001 where intercept = a mathematical constant; no clinical interpretation = the three explanatory variables X1 to X3 coefficient = the mathematical weightings of the explanatory variables in the equation standard error = estimated precision of the coefficients 95%CI = 95% confidence intervals for the coefficients = the Wald test statistic calculated from the data to be compared with the chi-square Wald X2 distribution with 1 degree of freedom P value = variables 2 and 3 are statistically significant predictors of the response variable Y = 40.8 + 3.98X1 + 1.22X2 – 2.09X3 From: Lang & Secic, How to report statistics in medicine. 2nd (2006) From: Lang & Secic, How to report statistics in medicine. 2nd (2006) 17 When do we use regression? 19 Characterize the relationship between the dependent and independent variables by determining the extent, direction, and strength of the association. Seek a quantitative formula or equation to describe (e.g., predict) the dependent variable Y as a function of the independent variables. From pages 34-35, Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA 20 When do we use regression? Describe quantitatively or qualitatively the relationship between X’s and Y but control for the effects of still other variables. Determine the which of several independent variables are important and which are not for describing or predicting a dependent variable. Determine the best mathematical model for describing the relationsship between Y and X’s. From pages 34-35, Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA When do we use regression? 21 From pages 34-35, Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA Association vs causality 22 Association vs causality A “statistically significant” association in a particular study does not establish a causal relationship. To evaluate claims of causality, must consider criteria that are external to the specific characteristics and results. Experimental proof: a change in X always produce a chagne in Y Combined results from several studies From pages 36-37, Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA Compare several derived regression relationships. Assess the interactive effects of 2 or more independent variables. Obtain a valid and precise estimate of one or more regression coefficient 23 7 criteria: Strength of association Dose-response effect Lack of temporal ambiguity Consistency of the findings Biological and theoretical plausibility of the hypothesis Coherence of the evidence Specificity of the association From pages 36-37, Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA 24 Statistical vs determinitic Always involve error From Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA 25 線性(Linearity) y與x呈線性關係 迴歸的基本假設 KKM 115-117 存在性(existence) data中所有X的數值皆可對應出y,y的平均值與標準 差存在且有限 獨立性(Independence) 每筆資料都互相獨立沒有關聯 線性(Linearity) y與x呈線性關係 均質性(Homoscedasticity) y的變異數經x變項調整後所剩餘之剩餘量的變異數 均相同 常態分配(Normality) y經x調整後之剩餘量呈常態分配 From Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA 27 26 y = α+β1X1+β2X2+ β3X3 …+ βkXk regression coefficient Dependent Variable Y Independent Variables X1 , X2 , … , Xk 28 均質性(Homoscedasticity) 常態分配(Normality) Identifying Confounding Factors From Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA Definition 13.9: A factor that is associated with both the Y and X variables. Such a variable must usually be controlled for before looking at the original Y-X relationship. (Rosner) 29 Confounding Y=SBP, C=age whether variable C is a confounding factor of Y&X1: adjusted (multiple or multivariable or multivariate) model 1: Y=X1 C crude (univariate): model 2: Y=X1 SBP = α1 + 4.1 × X 1 + β1 × AGE SBP = α 2 + 4.2 × X 1 comparing β’s of X1 in the 2 models 31 32 SBP = α1 + β1 × X1 + β3 × AGE SBP = α 2 + β 2 × X 1 SBP = α1 + 4.1 × X 1 + β1 × AGE SBP = α 2 + 15.9 × X 1 β1=β2 Relationship of X1 and SBP is NOT affected by AGE β1>β2 Relationship of X1 and SBP is affected by AGE β1<β2 Relationship of X1 and SBP is affected by AGE 33 34 SBP=PlasmaRA Plasma renin activity was inversely related to both Systolic and diastolic BP in the total sample, independent of age, gender, race, BMI, alcohol consumption, and heart rate. How big is DIFFERENT! Objectively! Clinical point of view Max/min > 2 (definitely different) Max/min <1.5 (definitely not different) Between 1.5 and 2, up to the authors! Pp 558-559. SBP=PlasmaRA+age+gender+BMI+alcohol+HR+race He, et al., American Journal of Hypertension 1999; 12:555-562 35 36 Interactions (definition 12.8) y = α+β1X1+β2X2+ β3X1×X2 …+ βkXk regression coefficient From: Lang & Secic, How to report statistics in medicine. 2nd (2006) 37 38 Analysis with multiple regression indicated a statistically significant difference among the age levels (p=0.0211; Table 1), but not the group effects (p=0.1665). A significant interaction effect (p=0.0101) was also found between age and group. Hence, the age differences were distributed differently among groups. From: Lang & Secic, How to report statistics in medicine. 2nd (2006) 39 From: Lang & Secic, How to report statistics in medicine. 2nd (2006) 40 First interaction, Then confounding Dummy variables 虛擬變項 41 Dummy variables, rosner pages 585-588 Indicator variables for categories in categorical variables dietary group (DIET): SV, LV and NOR for k items, generate (k-1) dummy variables; choose NOR as the reference group, make dummy variables for SV and LV XSV = 1, if DIET=’SV’; = 0, otherwise XLV = 1, if DIET=’LV’; = 0, otherwise 43 ID 1 2 3 4 5 6 7 diet LV NOR SV SV LV NOR SV XSV 0 0 1 1 0 0 1 XLV 1 0 0 0 1 0 0 44 45 46 47 48 Dummy SV Dummy LV Intercept collinearity Independence among X variables X’s variables should avoid collinearity Use VIF or collinearity analysis 49 From Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA Page 240 50 Condition number (CN) = maximum of condition index Have collinearity when CN>=30 Have collinearity when VIF>10 From Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA 51 From Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA 52 Centering: centered variable = variable - mean Scaling : scaled variable = variable/s, s=10, 100, or … From Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA 53 54 Any questions? 引用圖文出處: Rosner: Fundamentals of Biostatistics, 6th. Wadsworth Publishing Company. KKM: Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA Lang & Secic: How to report statistics in medicine. 2nd ed (2006) Perason & Turton: Statistical methods in environmental health. Chapman and Hall From Kleinbaum, Kupper, Muller, Nizam: applied Regression Analysis and Multivariable Methods. Duxbury, CA, USA 55
© Copyright 2024 ExpyDoc