Basic Biostatistics using SPSS (Note)

17/6/2014
Chapter1
BasicBiostatistics
JamalludinAbRahmanMDMPH
DepartmentofCommunityMedicine
Kulliyyah ofMedicine
2
Content
 Basicpremises– variables,levelofmeasurements,
probabilitydistribution
 Descriptivestatistics
 Inferentialstatistics,hypothesistesting
www.jamalrahman.net
17 June, 2014
1
17/6/2014
3
Weobserve,webelieve.
Whatweobservemightnotbethetruth
www.jamalrahman.net
17 June, 2014
…andwecan’tobserveall.Wesample.
www.jamalrahman.net
Population
Sample
Parameter
Statistic
4
17 June, 2014
2
17/6/2014
5
Variable
 Characteristicofapopulation
 Cantakedifferentvalues
 Data=measurementscollected/observed
www.jamalrahman.net
17 June, 2014
Typeofdata
6
(Levelofmeasurement)
Categorical
Numerical
Nominal
Ordinal
Discrete
Continuous
e.g.Gender,Race
e.g.Cancerstaging,
SeverityofCXRfor
PTB
e.g.Parity,Gravida
e.g.Hb,RBS,
cholesterol.
www.jamalrahman.net
17 June, 2014
3
17/6/2014
7
Normaldistribution
; ,
1
2
www.jamalrahman.net
17 June, 2014
8
Otherdistributions
 Discretevs.continuousprobabilitydistribution
2,F,Weibull,
Binomial&Poisson
www.jamalrahman.net
17/6/2014
4
17/6/2014
CharacteristicsofNormaldistribution
9
 Smooth,symmetrical(aroundthemean),
uni‐modal,bellshapedcurve
 Mean=Median=Mode
 Skewness =0
 Kurtosis=0
www.jamalrahman.net
17 June, 2014
10
TestofNormality
 Anderson–DarlingTest
 CorrectedKolmogorov–SmirnovTest(Lilliefors Test)
 Cramér–von‐mises Criterion
 D'agostino's K‐squaredTest
 Jarque–Bera Test
 Pearson'sChi‐squareTest
 Shapiro–Francia
 Shapiro–Wilk Test
www.jamalrahman.net
17 June, 2014
5
17/6/2014
11
UseNormalitytestwithcaution
 Smallsamplesalmostalwayspassanormality
test.Normalitytestshavelittlepowertotellwhetherornotasmall
sampleofdatacomesfromaGaussiandistribution.
 Withlargesamples,minordeviationsfrom
normalitymaybeflaggedasstatistically
significant,eventhoughsmalldeviationsfromanormal
distributionwon’taffecttheresultsofattestorANOVA.
www.jamalrahman.net
17 June, 2014
12
Statisticalobjectives
1. Determinepresenceofdifference(orsimilarity)
2. Determinedegreeofdifference
3. Determinethedirectionofchanges(outcome)
4. Predictchanges(outcomes)
www.jamalrahman.net
17/6/2014
6
17/6/2014
13
Isthereanydifferencebetween
A&B?
Whichoneistaller?AorB?
Howbigisthedifference
betweenA&B?
IsCdifferentfromA&B?
Isthereanypatternnow?
IftherewillbeD,canyou
predicthowtallisD?
A
B
www.jamalrahman.net
C
17/6/2014
14
Association
 avalueandwhoseassociatedvaluemaybechanged
Independent
Dependent
e.g.Smoking
e.g.LungCancer
www.jamalrahman.net
17 June, 2014
7
17/6/2014
15
Diseasemodel
Exposure
Exposure
Outcome
Time
Exposure
www.jamalrahman.net
17 June, 2014
16
Diseasemodel(example)
Smoking
Mineral
Dust
LungCancer
Time
Age
www.jamalrahman.net
17 June, 2014
8
17/6/2014
17
Causalrelationship
Factor1
Factor4
Outcome
Factor2
Factor5
Factor3
Factor7
Factor6
www.jamalrahman.net
17 June, 2014
18
Descriptivestatistics
Frequency
(count)&
Percentage
Categorical
Data
Normal
Mean(SD)
NotNormal
Median
(Range/IQR)
Numerical
www.jamalrahman.net
17 June, 2014
9
17/6/2014
19
HypothesisTesting
 Involvemorethanonevariables
‐ exposure&outcome,
‐ predictor&criterion,
‐ risk&disease
 Trytoprovethat
ExposurecausestheDisease
e.g.SmokingcausingLungCancer
 Example~Ho:NodifferenceofrisktogetLung
Cancerbetweensmokerandnon‐smoker
www.jamalrahman.net
17 June, 2014
20
LungCancer
NoLung
Cancer
Smoking
20(18.2%)
90(81.8%)
NotSmoking
5(4.5%)
105(95.5%)
2 (df=1)= 10.150, p =0.001, OR = 4.7 (CI95% 1.7 – 13.0)
Becausep<0.05,werejectH0.Thereforethereisa
differentbetweensmoker&nonsmoker
www.jamalrahman.net
17 June, 2014
10
17/6/2014
21
StatisticalTest
 Univariate~Onedependent&oneindependent
 Multivariate~Multipledependent&multiple
independentvariable
www.jamalrahman.net
17 June, 2014
22
Whattesttouse?
Variable1
Variable2
Test
Categorical
Categorical
Chi‐square
Categorical(2pop)
Numerical(Normal)
Independent samplet‐test
Categorical(2pop)
Numerical(NotNormal)
Mann‐WhitneyUtest
Categorical(>2pop)
Numerical(Normal)
One‐wayANOVA
Categorical(>2pop)
Numerical(NotNormal)
Kruskal‐Wallis test
Numerical(Normal)
Numerical(Normal)
Pearson CorrelationCoefficient
Test
Numerical(Normal/Not
Normal)
Numerical(NotNormal)
Spearman Correlation
CoefficientTest
Numerical(Normal)
Numerical(Normal) – Paired
Pairedt‐test
Numerical(NotNormal)
Numerical(NotNormal)–
Paired
Friedmantest
www.jamalrahman.net
17 June, 2014
11
17/6/2014
23
Butlifeisnotsimple!
LungCancer
Factor
Outcome
Factor
Exposureto
mineral
dust
Smoking
Noofcigarette
smokedperday
Factor
PM2.5,PM10
(g/m3)
Radiation
RadiationAbsorbed
dose(mGy)perday
www.jamalrahman.net
17 June, 2014
24
Themultivariatemodel
LungCA=Smoking+Radiation+Mineraldust+Others
Regression
Residual
AgoodmodeliswhenRegression>Residual
www.jamalrahman.net
17/6/2014
12
17/6/2014
25
MultivariateAnalysis
 Hypothesistesting&controlforconfounders
– e.g.GeneralLinearModel,LogisticRegression
 Modeling
– e.g.LinearRegression
 Datareduction
– e.g.FactorAnalysis,ClusterAnalysis
www.jamalrahman.net
17 June, 2014
Writingplanforstatisticalanalysis#1
26
DatawereanalyzedusingthecomplexsamplefunctionofSPSS
(version13.0).Samplingerrorswereestimatedusingthe
primarysamplingunitsandstrataprovidedinthedataset.
Samplingweightswereusedtoadjustfornonresponsebiasand
theoversamplingofblacks,MexicanAmericans,andtheelderly
inNHANES.Theprevalenceofhypertension,aswellasthe
awareness,treatment,andcontrolrates,wereageadjustedby
directstandardizationtotheUS2000standardpopulation.To
analyzedifferencesovertime,the2003–2004datawere
comparedwiththe1999–2000data.Estimateswithacoefficient
ofvariation>0.3wereconsideredunreliable.A2‐tailedPvalue
<0.05wasconsideredstatisticallysignificant.
(Ongetal.2009)
www.jamalrahman.net
17 June, 2014
13
17/6/2014
Writingplanforstatisticalanalysis#2
27
Toassesstheeffectoftheselectionprocessonthecharacteristicsof
thecases,wecomparedcasesincludedinthefinalanalysistotherest
ofthecases.Sincecontrolsincludedinthepresentanalysiswere
differentfromtherestofthediabetesfreeparticipantsbydesign,no
similarcomparisonswereperformedforthatgroup.Tocompare
baselinecharacteristicsofcasesandcontrolsappropriateunivariate
statisticswereused.Similarbinarylogisticandmultiplelinear
regressionmodelswerebuiltwithincidentdiabetesorHbA1cas
respectiveoutcomesandadditiveblockentryofadiponectin and
potentialconfounders.ForlinearregressionCRPandtriglycerides
werelogtransformed.SinceHbA1ccouldbemodifiedbydrug
treatment,weranasensitivityanalysisexcludingallparticipantson
antidiabetic medication.Ap‐valueof<0.05wasconsidered
significant.AnalyseswereperformedwithSPSS14.0forWindows.
www.jamalrahman.net
17 June, 2014
28
Reportinganalysis(example)
www.jamalrahman.net
17 June, 2014
14
29
www.jamalrahman.net
Reportinganalysis(example)
17 June, 2014
17/6/2014
30
Reportinganalysis(example)
www.jamalrahman.net
17 June, 2014
15
17/6/2014
Summary
1. Identify&definevariables
2. Type– independentvs.dependent
3. Levelofmeasurements– nominal,ordinalor
continuous
4. Checkdistribution– Normalvs.NotNormal
5. Decidewhattodo‐ descriptivevs.analytical
Chapter2
IntroductiontoSPSS
IBMSPSSStatisticsv21forWindows
JamalludinAbRahmanMDMPH
DepartmentofCommunityMedicine
Kulliyyah ofMedicine
16
17/6/2014
33
IBMSPSSStatistics
IBMCorporation
SoftwareGroup
Route100
Somers,NY10589
ProducedintheUnitedStatesofAmerica
May2012
www.jamalrahman.net
17/6/2014
34
SPSSLAYOUT
www.jamalrahman.net
17/6/2014
17
17/6/2014
Layout
Main
menu
35
Toolbar
Variables
www.jamalrahman.net
17/6/2014
36
Dataeditor
Enteryourdata
here
Rows=eachdata
Rows=variables
www.jamalrahman.net
Define&
describeyour
variableshere
17/6/2014
18
17/6/2014
37
Viewer
Theoutputof
analyseswill
bedisplayed
here.Outputis
separatedfrom
data
www.jamalrahman.net
17/6/2014
38
Syntax
Wecancompileallthe
stepsoftheanalyseshere.
Extendtheprogramming
functioninSPSS.Abilityto
performcomplexstepse.g.
“looping”
www.jamalrahman.net
17/6/2014
19
17/6/2014
39
CREATINGDATASET
www.jamalrahman.net
17/6/2014
40
BeforeevenyoustartSPSS!
 Youmustidentify&definerelevantvariables
 Definemeans
1. Name– preferablyshortsinglename,beginswith
alphabet,nospecialcharacter,nospace
2. Typeofdata– e.g.Numeric,Date,String
3. Width&DecimalPlaces(ifnumeric)
4. Label– descriptionfortheName(willbedisplayedinViewer)
5. Values– labelsforvaluee.g.1=Male,2=Female
6. Missing– definemissingvaluee.g.999forN/A
www.jamalrahman.net
17/6/2014
20
17/6/2014
41
Defineyourvariables
www.jamalrahman.net
17/6/2014
42
VariableTypes
www.jamalrahman.net
17/6/2014
21
17/6/2014
VariableType
Fornumeric,
determineWidth
&Decimal.
Decimal<Width
43
Newoptionfor
Numerics with
leadingzeros
ForString,no
optionfor
DecimalPlace
Decidethe
suitable
variabletype
www.jamalrahman.net
17/6/2014
44
ValueLabels
www.jamalrahman.net
17/6/2014
22
17/6/2014
45
MissingValue
Thisquestionis
NotApplicable
tomale
e.g.Assign999to
representN/A
value&thiswon’t
beincludedinany
analysis
www.jamalrahman.net
17/6/2014
Chapter3
DescriptiveStatistics
IBMSPSSStatisticsv21forWindows
JamalludinAbRahmanMDMPH
DepartmentofCommunityMedicine
Kulliyyah ofMedicine
23
17/6/2014
47
Datasetfortheexercise
 Filename:healthstatus001.sav
 Hypothetical
 StudytodescribefactorsrelatedtoHbA1c&
Homocystein
 N=301
 13variables
www.jamalrahman.net
17/6/2014
48
Retrievefileinformation
www.jamalrahman.net
17/6/2014
24
17/6/2014
www.jamalrahman.net
17/6/2014
50
Exercise#1
1. Describesocio‐demographiccharacteristicsofthe
respondent(age,gender&race)
2. Describetheexplanatoryvariables
1. Exercise
2. smokingstatus
3. BMIstatus&
4. BPstatus
3. DescribeHbA1c(takingcut‐offforPoorHbA1c≥6.5%)&HCY
www.jamalrahman.net
17/6/2014
25
17/6/2014
51
DESCRIBENUMERICALDATA
www.jamalrahman.net
17/6/2014
52
Explore
www.jamalrahman.net
17/6/2014
26
17/6/2014
53
www.jamalrahman.net
Results
www.jamalrahman.net
17/6/2014
Check for Normality.
Is Age data distributed Normally?
54
17/6/2014
27
17/6/2014
55
Is this Normal
distribution?
www.jamalrahman.net
17/6/2014
56
Describeage
Normal
 Thesubjectsdistributedbetween23‐67yearsold
withtheaverageof34(SD=8)years.
IfnotNormal
 Thesubjectsdistributedbetween23‐67yearsold
withthemedianof33(IQR=11)years
www.jamalrahman.net
17/6/2014
28
17/6/2014
57
DESCRIBECATEGORICALDATA
www.jamalrahman.net
17/6/2014
58
Frequency
www.jamalrahman.net
17/6/2014
29
17/6/2014
59
www.jamalrahman.net
17/6/2014
60
Results
www.jamalrahman.net
17/6/2014
30
17/6/2014
61
TRANSFORM
www.jamalrahman.net
17/6/2014
62
Compute
www.jamalrahman.net
17/6/2014
31
17/6/2014
63
weight / ((height / 100) ** 2)
www.jamalrahman.net
17/6/2014
64
Visualbinning
www.jamalrahman.net
17/6/2014
32
17/6/2014
65
Normal < 23
Overweight 23 to < 27.5
Obese >= 27.5
www.jamalrahman.net
17/6/2014
66
www.jamalrahman.net
17/6/2014
33
17/6/2014
Chapter4
Bivariable analyses
IBMSPSSStatisticsv21forWindows
JamalludinAbRahmanMDMPH
DepartmentofCommunityMedicine
Kulliyyah ofMedicine
Tocheckassociationoftwovariables?
Age
www.jamalrahman.net
68
HbA1c
17/6/2014
34
17/6/2014
69
Thesteps
1. Determinewhichisdependant&whichis
independent
2. Determinelevelofmeasurements
3. DetermineNormalityofthenumerical
measurement
4. Determinethesuitablestatisticaltest
www.jamalrahman.net
17/6/2014
70
Whatarethetests?
Variable1
Variable2
Test
Categorical
Categorical
Chi‐square
Categorical(2pop)
Numerical(Normal)
Independent samplet‐test
Categorical(2pop)
Numerical(NotNormal)
Mann‐WhitneyUtest
Categorical(>2pop)
Numerical(Normal)
One‐wayANOVA
Categorical(>2pop)
Numerical(NotNormal)
Kruskal‐Wallis test
Numerical(Normal)
Numerical(Normal)
Pearson CorrelationCoefficient
Test
Numerical(Normal/Not
Normal)
Numerical(NotNormal)
Spearman Correlation
CoefficientTest
Numerical(Normal)
Numerical(Normal) – Paired
Pairedt‐test
Numerical(NotNormal)
Numerical(NotNormal)–
Paired
Friedmantest
www.jamalrahman.net
17 June, 2014
35
17/6/2014
71
Exercise#2
1. Determineassociationbetweensocio‐demographic
characteristics&alltheriskfactorswithHbA1c
2. Determineassociationbetweensocio‐demographic
characteristics&alltheriskfactorswithHCY
Note:Itwouldbegoodifyoucouldconstructdummy
table fortheanswersevenbeforetheanalysesstarted
www.jamalrahman.net
17/6/2014
72
HCYnormalrange
www.jamalrahman.net
17/6/2014
36
17/6/2014
C O M PA R I N G T W O M E A N S
73
INDEPENDENTSAMPLEt‐TEST
www.jamalrahman.net
17/6/2014
74
Agevs.BP
www.jamalrahman.net
17/6/2014
37
17/6/2014
75
www.jamalrahman.net
Results
17/6/2014
The original t-test (Student’s t-test)
assumes equal variances for equal
sample sizes. However if the
variances are equal, it is robust for
different sizes.
76
Welch's correction
Levene’s testcheckequalitybetweenvariances.Ho:Thereisnodifferenceofvariances.SoifP
issignificant,werejectHo,andthereforeequalvariancesassumed.
www.jamalrahman.net
17/6/2014
38
17/6/2014
77
Table– Distributionofagebybloodpressurestatus
N
Mean
SD
Statistics
df
P
NormalBP
156
33.9
7.9
t=0.431
299
0.667
HighBP
145
34.4
8.9
www.jamalrahman.net
17/6/2014
D I F F E R E N C E O F T W O P R O P O RT I O N S
78
CHI‐SQUAREDTEST
www.jamalrahman.net
17/6/2014
39
17/6/2014
79
Gendervs.BP
www.jamalrahman.net
17/6/2014
80
www.jamalrahman.net
17/6/2014
40
17/6/2014
Some books may suggest the use of
Continuity Correction at ALL
time, but recent simulations
showed that CC (or Yate’s
correction) is
OVERCONSERVATIVE. Hence, use
Pearson 2 when < 20% of cells
have expected count < 5
Results
Describe this table first. What
is your impression? 49% women
vs. 47% men with high BP
www.jamalrahman.net
81
When
20% of cells have EC < 5,
use Fisher’s Exact Test
This is given because we code the variables using
numbers. Can be used to measure P-trend
17/6/2014
C O M PA R I N G M O R E T H A N T W O M E A N S
82
ONE‐WAYANOVA
www.jamalrahman.net
17/6/2014
41
17/6/2014
83
Racevs.HbA1c
www.jamalrahman.net
17/6/2014
84
www.jamalrahman.net
17/6/2014
42
17/6/2014
Results
Describe these results first. What
is your impression? HbA1c
between races? 6.4 (SD 2.1) vs.
6.7 (SD 2.2) vs. 6.5 (SD 2.2)
85
The F test shows
that there is no
single significant
difference between
any two groups
www.jamalrahman.net
17/6/2014
86
Results– BMIstatusvs.HbA1c
The F test shows that at
least there is ONE pair
with significant different.
Either N vs. OW, N vs. OB
or OW vs. OB
We need to run Post-hoc
test to determine which of
the PAIR is significant.
To decide which Post-hoc
test to choose, we have to
test for equality of variances
i.e. Homogeneity of
variances (Levene’s test)
www.jamalrahman.net
17/6/2014
43
17/6/2014
87
www.jamalrahman.net
17/6/2014
88
Results– Posthoc
The significant difference is
only for Normal vs. Obese
(P=0.002)
www.jamalrahman.net
17/6/2014
44
17/6/2014
89
Report
ThereisasignificantassociationbetweenBMIStatus
andHbA1c(F(2,298)=13.129,P<0.001).Post‐hoctest
showedthatObesesubjectshavesignificantlyhigher
HbA1ccomparedtoNormalandOverweightsubjects
(P=0.001andP<0.001respectively).
www.jamalrahman.net
17/6/2014
N O N PA R A M E T R I C T E S T S
90
MANN‐WHITNEY U
www.jamalrahman.net
17/6/2014
45
17/6/2014
91
Gendervs.HCY
www.jamalrahman.net
17/6/2014
92
Results
This ranks table is not to be cited
in the research paper. Instead,
describe their MEDIAN
www.jamalrahman.net
17/6/2014
46
17/6/2014
N O N ‐ PA R A M E T R I C T E S T S
93
KRUSKALLWALLIS
www.jamalrahman.net
17/6/2014
94
Racevs.HCY
www.jamalrahman.net
17/6/2014
47
17/6/2014
95
Results
www.jamalrahman.net
17/6/2014
R E L AT I O N S H I P O F T W O N U M E R I C A L DATA
96
CORRELATIONTEST
www.jamalrahman.net
17/6/2014
48
17/6/2014
97
Agevs.HbA1c
www.jamalrahman.net
17/6/2014
98
Results
www.jamalrahman.net
17/6/2014
49
17/6/2014
99
Agevs.HCY
www.jamalrahman.net
17/6/2014
100
www.jamalrahman.net
17/6/2014
50