17/6/2014 Chapter1 BasicBiostatistics JamalludinAbRahmanMDMPH DepartmentofCommunityMedicine Kulliyyah ofMedicine 2 Content Basicpremises– variables,levelofmeasurements, probabilitydistribution Descriptivestatistics Inferentialstatistics,hypothesistesting www.jamalrahman.net 17 June, 2014 1 17/6/2014 3 Weobserve,webelieve. Whatweobservemightnotbethetruth www.jamalrahman.net 17 June, 2014 …andwecan’tobserveall.Wesample. www.jamalrahman.net Population Sample Parameter Statistic 4 17 June, 2014 2 17/6/2014 5 Variable Characteristicofapopulation Cantakedifferentvalues Data=measurementscollected/observed www.jamalrahman.net 17 June, 2014 Typeofdata 6 (Levelofmeasurement) Categorical Numerical Nominal Ordinal Discrete Continuous e.g.Gender,Race e.g.Cancerstaging, SeverityofCXRfor PTB e.g.Parity,Gravida e.g.Hb,RBS, cholesterol. www.jamalrahman.net 17 June, 2014 3 17/6/2014 7 Normaldistribution ; , 1 2 www.jamalrahman.net 17 June, 2014 8 Otherdistributions Discretevs.continuousprobabilitydistribution 2,F,Weibull, Binomial&Poisson www.jamalrahman.net 17/6/2014 4 17/6/2014 CharacteristicsofNormaldistribution 9 Smooth,symmetrical(aroundthemean), uni‐modal,bellshapedcurve Mean=Median=Mode Skewness =0 Kurtosis=0 www.jamalrahman.net 17 June, 2014 10 TestofNormality Anderson–DarlingTest CorrectedKolmogorov–SmirnovTest(Lilliefors Test) Cramér–von‐mises Criterion D'agostino's K‐squaredTest Jarque–Bera Test Pearson'sChi‐squareTest Shapiro–Francia Shapiro–Wilk Test www.jamalrahman.net 17 June, 2014 5 17/6/2014 11 UseNormalitytestwithcaution Smallsamplesalmostalwayspassanormality test.Normalitytestshavelittlepowertotellwhetherornotasmall sampleofdatacomesfromaGaussiandistribution. Withlargesamples,minordeviationsfrom normalitymaybeflaggedasstatistically significant,eventhoughsmalldeviationsfromanormal distributionwon’taffecttheresultsofattestorANOVA. www.jamalrahman.net 17 June, 2014 12 Statisticalobjectives 1. Determinepresenceofdifference(orsimilarity) 2. Determinedegreeofdifference 3. Determinethedirectionofchanges(outcome) 4. Predictchanges(outcomes) www.jamalrahman.net 17/6/2014 6 17/6/2014 13 Isthereanydifferencebetween A&B? Whichoneistaller?AorB? Howbigisthedifference betweenA&B? IsCdifferentfromA&B? Isthereanypatternnow? IftherewillbeD,canyou predicthowtallisD? A B www.jamalrahman.net C 17/6/2014 14 Association avalueandwhoseassociatedvaluemaybechanged Independent Dependent e.g.Smoking e.g.LungCancer www.jamalrahman.net 17 June, 2014 7 17/6/2014 15 Diseasemodel Exposure Exposure Outcome Time Exposure www.jamalrahman.net 17 June, 2014 16 Diseasemodel(example) Smoking Mineral Dust LungCancer Time Age www.jamalrahman.net 17 June, 2014 8 17/6/2014 17 Causalrelationship Factor1 Factor4 Outcome Factor2 Factor5 Factor3 Factor7 Factor6 www.jamalrahman.net 17 June, 2014 18 Descriptivestatistics Frequency (count)& Percentage Categorical Data Normal Mean(SD) NotNormal Median (Range/IQR) Numerical www.jamalrahman.net 17 June, 2014 9 17/6/2014 19 HypothesisTesting Involvemorethanonevariables ‐ exposure&outcome, ‐ predictor&criterion, ‐ risk&disease Trytoprovethat ExposurecausestheDisease e.g.SmokingcausingLungCancer Example~Ho:NodifferenceofrisktogetLung Cancerbetweensmokerandnon‐smoker www.jamalrahman.net 17 June, 2014 20 LungCancer NoLung Cancer Smoking 20(18.2%) 90(81.8%) NotSmoking 5(4.5%) 105(95.5%) 2 (df=1)= 10.150, p =0.001, OR = 4.7 (CI95% 1.7 – 13.0) Becausep<0.05,werejectH0.Thereforethereisa differentbetweensmoker&nonsmoker www.jamalrahman.net 17 June, 2014 10 17/6/2014 21 StatisticalTest Univariate~Onedependent&oneindependent Multivariate~Multipledependent&multiple independentvariable www.jamalrahman.net 17 June, 2014 22 Whattesttouse? Variable1 Variable2 Test Categorical Categorical Chi‐square Categorical(2pop) Numerical(Normal) Independent samplet‐test Categorical(2pop) Numerical(NotNormal) Mann‐WhitneyUtest Categorical(>2pop) Numerical(Normal) One‐wayANOVA Categorical(>2pop) Numerical(NotNormal) Kruskal‐Wallis test Numerical(Normal) Numerical(Normal) Pearson CorrelationCoefficient Test Numerical(Normal/Not Normal) Numerical(NotNormal) Spearman Correlation CoefficientTest Numerical(Normal) Numerical(Normal) – Paired Pairedt‐test Numerical(NotNormal) Numerical(NotNormal)– Paired Friedmantest www.jamalrahman.net 17 June, 2014 11 17/6/2014 23 Butlifeisnotsimple! LungCancer Factor Outcome Factor Exposureto mineral dust Smoking Noofcigarette smokedperday Factor PM2.5,PM10 (g/m3) Radiation RadiationAbsorbed dose(mGy)perday www.jamalrahman.net 17 June, 2014 24 Themultivariatemodel LungCA=Smoking+Radiation+Mineraldust+Others Regression Residual AgoodmodeliswhenRegression>Residual www.jamalrahman.net 17/6/2014 12 17/6/2014 25 MultivariateAnalysis Hypothesistesting&controlforconfounders – e.g.GeneralLinearModel,LogisticRegression Modeling – e.g.LinearRegression Datareduction – e.g.FactorAnalysis,ClusterAnalysis www.jamalrahman.net 17 June, 2014 Writingplanforstatisticalanalysis#1 26 DatawereanalyzedusingthecomplexsamplefunctionofSPSS (version13.0).Samplingerrorswereestimatedusingthe primarysamplingunitsandstrataprovidedinthedataset. Samplingweightswereusedtoadjustfornonresponsebiasand theoversamplingofblacks,MexicanAmericans,andtheelderly inNHANES.Theprevalenceofhypertension,aswellasthe awareness,treatment,andcontrolrates,wereageadjustedby directstandardizationtotheUS2000standardpopulation.To analyzedifferencesovertime,the2003–2004datawere comparedwiththe1999–2000data.Estimateswithacoefficient ofvariation>0.3wereconsideredunreliable.A2‐tailedPvalue <0.05wasconsideredstatisticallysignificant. (Ongetal.2009) www.jamalrahman.net 17 June, 2014 13 17/6/2014 Writingplanforstatisticalanalysis#2 27 Toassesstheeffectoftheselectionprocessonthecharacteristicsof thecases,wecomparedcasesincludedinthefinalanalysistotherest ofthecases.Sincecontrolsincludedinthepresentanalysiswere differentfromtherestofthediabetesfreeparticipantsbydesign,no similarcomparisonswereperformedforthatgroup.Tocompare baselinecharacteristicsofcasesandcontrolsappropriateunivariate statisticswereused.Similarbinarylogisticandmultiplelinear regressionmodelswerebuiltwithincidentdiabetesorHbA1cas respectiveoutcomesandadditiveblockentryofadiponectin and potentialconfounders.ForlinearregressionCRPandtriglycerides werelogtransformed.SinceHbA1ccouldbemodifiedbydrug treatment,weranasensitivityanalysisexcludingallparticipantson antidiabetic medication.Ap‐valueof<0.05wasconsidered significant.AnalyseswereperformedwithSPSS14.0forWindows. www.jamalrahman.net 17 June, 2014 28 Reportinganalysis(example) www.jamalrahman.net 17 June, 2014 14 29 www.jamalrahman.net Reportinganalysis(example) 17 June, 2014 17/6/2014 30 Reportinganalysis(example) www.jamalrahman.net 17 June, 2014 15 17/6/2014 Summary 1. Identify&definevariables 2. Type– independentvs.dependent 3. Levelofmeasurements– nominal,ordinalor continuous 4. Checkdistribution– Normalvs.NotNormal 5. Decidewhattodo‐ descriptivevs.analytical Chapter2 IntroductiontoSPSS IBMSPSSStatisticsv21forWindows JamalludinAbRahmanMDMPH DepartmentofCommunityMedicine Kulliyyah ofMedicine 16 17/6/2014 33 IBMSPSSStatistics IBMCorporation SoftwareGroup Route100 Somers,NY10589 ProducedintheUnitedStatesofAmerica May2012 www.jamalrahman.net 17/6/2014 34 SPSSLAYOUT www.jamalrahman.net 17/6/2014 17 17/6/2014 Layout Main menu 35 Toolbar Variables www.jamalrahman.net 17/6/2014 36 Dataeditor Enteryourdata here Rows=eachdata Rows=variables www.jamalrahman.net Define& describeyour variableshere 17/6/2014 18 17/6/2014 37 Viewer Theoutputof analyseswill bedisplayed here.Outputis separatedfrom data www.jamalrahman.net 17/6/2014 38 Syntax Wecancompileallthe stepsoftheanalyseshere. Extendtheprogramming functioninSPSS.Abilityto performcomplexstepse.g. “looping” www.jamalrahman.net 17/6/2014 19 17/6/2014 39 CREATINGDATASET www.jamalrahman.net 17/6/2014 40 BeforeevenyoustartSPSS! Youmustidentify&definerelevantvariables Definemeans 1. Name– preferablyshortsinglename,beginswith alphabet,nospecialcharacter,nospace 2. Typeofdata– e.g.Numeric,Date,String 3. Width&DecimalPlaces(ifnumeric) 4. Label– descriptionfortheName(willbedisplayedinViewer) 5. Values– labelsforvaluee.g.1=Male,2=Female 6. Missing– definemissingvaluee.g.999forN/A www.jamalrahman.net 17/6/2014 20 17/6/2014 41 Defineyourvariables www.jamalrahman.net 17/6/2014 42 VariableTypes www.jamalrahman.net 17/6/2014 21 17/6/2014 VariableType Fornumeric, determineWidth &Decimal. Decimal<Width 43 Newoptionfor Numerics with leadingzeros ForString,no optionfor DecimalPlace Decidethe suitable variabletype www.jamalrahman.net 17/6/2014 44 ValueLabels www.jamalrahman.net 17/6/2014 22 17/6/2014 45 MissingValue Thisquestionis NotApplicable tomale e.g.Assign999to representN/A value&thiswon’t beincludedinany analysis www.jamalrahman.net 17/6/2014 Chapter3 DescriptiveStatistics IBMSPSSStatisticsv21forWindows JamalludinAbRahmanMDMPH DepartmentofCommunityMedicine Kulliyyah ofMedicine 23 17/6/2014 47 Datasetfortheexercise Filename:healthstatus001.sav Hypothetical StudytodescribefactorsrelatedtoHbA1c& Homocystein N=301 13variables www.jamalrahman.net 17/6/2014 48 Retrievefileinformation www.jamalrahman.net 17/6/2014 24 17/6/2014 www.jamalrahman.net 17/6/2014 50 Exercise#1 1. Describesocio‐demographiccharacteristicsofthe respondent(age,gender&race) 2. Describetheexplanatoryvariables 1. Exercise 2. smokingstatus 3. BMIstatus& 4. BPstatus 3. DescribeHbA1c(takingcut‐offforPoorHbA1c≥6.5%)&HCY www.jamalrahman.net 17/6/2014 25 17/6/2014 51 DESCRIBENUMERICALDATA www.jamalrahman.net 17/6/2014 52 Explore www.jamalrahman.net 17/6/2014 26 17/6/2014 53 www.jamalrahman.net Results www.jamalrahman.net 17/6/2014 Check for Normality. Is Age data distributed Normally? 54 17/6/2014 27 17/6/2014 55 Is this Normal distribution? www.jamalrahman.net 17/6/2014 56 Describeage Normal Thesubjectsdistributedbetween23‐67yearsold withtheaverageof34(SD=8)years. IfnotNormal Thesubjectsdistributedbetween23‐67yearsold withthemedianof33(IQR=11)years www.jamalrahman.net 17/6/2014 28 17/6/2014 57 DESCRIBECATEGORICALDATA www.jamalrahman.net 17/6/2014 58 Frequency www.jamalrahman.net 17/6/2014 29 17/6/2014 59 www.jamalrahman.net 17/6/2014 60 Results www.jamalrahman.net 17/6/2014 30 17/6/2014 61 TRANSFORM www.jamalrahman.net 17/6/2014 62 Compute www.jamalrahman.net 17/6/2014 31 17/6/2014 63 weight / ((height / 100) ** 2) www.jamalrahman.net 17/6/2014 64 Visualbinning www.jamalrahman.net 17/6/2014 32 17/6/2014 65 Normal < 23 Overweight 23 to < 27.5 Obese >= 27.5 www.jamalrahman.net 17/6/2014 66 www.jamalrahman.net 17/6/2014 33 17/6/2014 Chapter4 Bivariable analyses IBMSPSSStatisticsv21forWindows JamalludinAbRahmanMDMPH DepartmentofCommunityMedicine Kulliyyah ofMedicine Tocheckassociationoftwovariables? Age www.jamalrahman.net 68 HbA1c 17/6/2014 34 17/6/2014 69 Thesteps 1. Determinewhichisdependant&whichis independent 2. Determinelevelofmeasurements 3. DetermineNormalityofthenumerical measurement 4. Determinethesuitablestatisticaltest www.jamalrahman.net 17/6/2014 70 Whatarethetests? Variable1 Variable2 Test Categorical Categorical Chi‐square Categorical(2pop) Numerical(Normal) Independent samplet‐test Categorical(2pop) Numerical(NotNormal) Mann‐WhitneyUtest Categorical(>2pop) Numerical(Normal) One‐wayANOVA Categorical(>2pop) Numerical(NotNormal) Kruskal‐Wallis test Numerical(Normal) Numerical(Normal) Pearson CorrelationCoefficient Test Numerical(Normal/Not Normal) Numerical(NotNormal) Spearman Correlation CoefficientTest Numerical(Normal) Numerical(Normal) – Paired Pairedt‐test Numerical(NotNormal) Numerical(NotNormal)– Paired Friedmantest www.jamalrahman.net 17 June, 2014 35 17/6/2014 71 Exercise#2 1. Determineassociationbetweensocio‐demographic characteristics&alltheriskfactorswithHbA1c 2. Determineassociationbetweensocio‐demographic characteristics&alltheriskfactorswithHCY Note:Itwouldbegoodifyoucouldconstructdummy table fortheanswersevenbeforetheanalysesstarted www.jamalrahman.net 17/6/2014 72 HCYnormalrange www.jamalrahman.net 17/6/2014 36 17/6/2014 C O M PA R I N G T W O M E A N S 73 INDEPENDENTSAMPLEt‐TEST www.jamalrahman.net 17/6/2014 74 Agevs.BP www.jamalrahman.net 17/6/2014 37 17/6/2014 75 www.jamalrahman.net Results 17/6/2014 The original t-test (Student’s t-test) assumes equal variances for equal sample sizes. However if the variances are equal, it is robust for different sizes. 76 Welch's correction Levene’s testcheckequalitybetweenvariances.Ho:Thereisnodifferenceofvariances.SoifP issignificant,werejectHo,andthereforeequalvariancesassumed. www.jamalrahman.net 17/6/2014 38 17/6/2014 77 Table– Distributionofagebybloodpressurestatus N Mean SD Statistics df P NormalBP 156 33.9 7.9 t=0.431 299 0.667 HighBP 145 34.4 8.9 www.jamalrahman.net 17/6/2014 D I F F E R E N C E O F T W O P R O P O RT I O N S 78 CHI‐SQUAREDTEST www.jamalrahman.net 17/6/2014 39 17/6/2014 79 Gendervs.BP www.jamalrahman.net 17/6/2014 80 www.jamalrahman.net 17/6/2014 40 17/6/2014 Some books may suggest the use of Continuity Correction at ALL time, but recent simulations showed that CC (or Yate’s correction) is OVERCONSERVATIVE. Hence, use Pearson 2 when < 20% of cells have expected count < 5 Results Describe this table first. What is your impression? 49% women vs. 47% men with high BP www.jamalrahman.net 81 When 20% of cells have EC < 5, use Fisher’s Exact Test This is given because we code the variables using numbers. Can be used to measure P-trend 17/6/2014 C O M PA R I N G M O R E T H A N T W O M E A N S 82 ONE‐WAYANOVA www.jamalrahman.net 17/6/2014 41 17/6/2014 83 Racevs.HbA1c www.jamalrahman.net 17/6/2014 84 www.jamalrahman.net 17/6/2014 42 17/6/2014 Results Describe these results first. What is your impression? HbA1c between races? 6.4 (SD 2.1) vs. 6.7 (SD 2.2) vs. 6.5 (SD 2.2) 85 The F test shows that there is no single significant difference between any two groups www.jamalrahman.net 17/6/2014 86 Results– BMIstatusvs.HbA1c The F test shows that at least there is ONE pair with significant different. Either N vs. OW, N vs. OB or OW vs. OB We need to run Post-hoc test to determine which of the PAIR is significant. To decide which Post-hoc test to choose, we have to test for equality of variances i.e. Homogeneity of variances (Levene’s test) www.jamalrahman.net 17/6/2014 43 17/6/2014 87 www.jamalrahman.net 17/6/2014 88 Results– Posthoc The significant difference is only for Normal vs. Obese (P=0.002) www.jamalrahman.net 17/6/2014 44 17/6/2014 89 Report ThereisasignificantassociationbetweenBMIStatus andHbA1c(F(2,298)=13.129,P<0.001).Post‐hoctest showedthatObesesubjectshavesignificantlyhigher HbA1ccomparedtoNormalandOverweightsubjects (P=0.001andP<0.001respectively). www.jamalrahman.net 17/6/2014 N O N PA R A M E T R I C T E S T S 90 MANN‐WHITNEY U www.jamalrahman.net 17/6/2014 45 17/6/2014 91 Gendervs.HCY www.jamalrahman.net 17/6/2014 92 Results This ranks table is not to be cited in the research paper. Instead, describe their MEDIAN www.jamalrahman.net 17/6/2014 46 17/6/2014 N O N ‐ PA R A M E T R I C T E S T S 93 KRUSKALLWALLIS www.jamalrahman.net 17/6/2014 94 Racevs.HCY www.jamalrahman.net 17/6/2014 47 17/6/2014 95 Results www.jamalrahman.net 17/6/2014 R E L AT I O N S H I P O F T W O N U M E R I C A L DATA 96 CORRELATIONTEST www.jamalrahman.net 17/6/2014 48 17/6/2014 97 Agevs.HbA1c www.jamalrahman.net 17/6/2014 98 Results www.jamalrahman.net 17/6/2014 49 17/6/2014 99 Agevs.HCY www.jamalrahman.net 17/6/2014 100 www.jamalrahman.net 17/6/2014 50
© Copyright 2024 ExpyDoc