Validating and norming of the Greek SF

Ó Springer 2005
Quality of Life Research (2005) 14: 1433–1438
DOI 10.1007/s11136-004-6014-y
Brief communication
Validating and norming of the Greek SF-36 Health Survey
Evelina Pappa, Nick Kontodimopoulos & Dimitris Niakas
Faculty of Social Sciences, Hellenic Open University, Patras, Greece (E-mail: [email protected])
Accepted in revised form 9 November 2004
Abstract
The main objective of this study was to validate the Greek SF-36 Health Survey and to provide general
population normative data. The survey was administered to a stratified representative sample (n ¼ 1426)
of the general population residing in the broader Athens area and the response rate was 70.6%.
Statistical analysis, according to documented procedures developed within the IQOLA Project, was
performed. The missing value rate was very low, ranging from 0.1 to 1.3% at the item level. Multitrait
scaling analysis confirmed the hypothesized scale structure of the SF-36. Cronbach’s a coefficient met
the criterion (>0.70) for group analysis in all eight scales. Known group comparisons yielded consistent
support of construct validity of the SF-36. Significant statistical differences in mean scores were
observed in relation to demographic and social characteristics such as gender, age, education and
marital status.
Key words: Multitrait scaling, Normative data, Quality of life, Reliability, SF–36, Validity
Introduction
Methods
In Greece, the use of the SF-36 Health Survey in
quality-of-life studies, involving the general
population or disease specific groups is, to date,
very limited [1, 2]. This paper reports on the
first, to our knowledge, study involving a stratified sample of the general population. We
implemented the procedures documented in the
second and third stages of the IQOLA Project
[3]. The objective was to validate the SF-36 and
provide normative data for Greece, facilitating
the interpretation of the SF-36 scales and
comparisons of health status scores from
populations across countries. We tested the
validity and the reliability of the questionnaire,
performed tests of scaling assumptions (item
internal consistency and item discriminant
validity) and parametric tests (t-test, ANOVA)
to ascertain the statistical significance of the
observed differences.
Data collection
The study took place in May of 2003 and involved a sample residing in the broader Athens
area, where approximately 35% of the Greek
population lives. Institutionalized people were
excluded. The participants were grouped,
according to socio-demographic characteristics,
proportionally to the Greek urban population,
according to a three-staged sampling methodology. This stratification ensured that the sample
was genuinely representative of this particular
population. Specifically, in the first sampling
stage, a random sample of 84 blocks of residences was selected according to information
from the 1991 national census. In the second
stage, for every block, households were selected
by systematic sampling. In the third stage, for
every household, a participant at least 18 years
1434
old was chosen by simple random sampling.
Totally, 1007 out of 1426 candidates (response
rate 70.6%) agreed to participate.
The questionnaire contained the SF-36 (Greek
version 1.0), additional questions covering health
services utilization, satisfaction and cost and a set
of common socio-demographic questions at the
end. The questionnaire was administered via
interview in order to enhance subjects’ understanding of the questions and minimize missing
values [4]. For each item, responses were coded,
summed and transformed into a scale from 0
(worst possible health status) to 100 (best possible
health status). The missing values were substituted according to the method suggested by
the developers for gaining scores for missing
values [4].
Results
Analysis
Data completeness
The first step in evaluating the summated ratings
scales was the examination of missing and out-ofrange data, which is often associated with translation problems [5]. Descriptive statistics were
generated to evaluate data completeness and to
characterize the response distributions. Reliability
was tested via Cronbach’s coefficient a and construct validity was investigated to determine the
extent to which scores correlated with criteria
based on theory [6, 7].
Tests of scaling assumptions examined the itemscale correlations and were used to confirm the
hypothesized scale structure. These tests were: (1)
item internal consistency which is substantial and
satisfactory when correlation between an item and
its hypothesized scale is at least 0.40 and (2) item
discriminant validity which is successful when the
correlation between an item and its own scale is
significantly higher, by two standard errors or
more, than with other scales.
Tests of ‘known-groups’ validity were implemented to provide supporting information for
interpreting scale scores and assessing the ability
of the questionnaire to distinguish between subgroups of respondents known to differ in key
socio-demographic or clinical variables. In this
study, these variables included gender, age, education and marital status. Parametric tests (t-test
and ANOVA) were performed to test the statistical significance of the observed differences.
Concerning the item response frequency for all
items (details can be obtained from the authors),
Sample characteristics
Our sample consisted of 46.6% men and 53.4%
women, roughly representing the gender distribution of the Greek urban population, according to
the 1991 census. As for the level of education,
11.5% of the respondents had completed primary
education, 20% were only high school graduates,
42.1% had completed 12-year education (lyceum)
and 25.8% were university graduates. The sample
was divided into six age groups with an overall
mean age of 45 years. Concerning marital status,
60.0% were married, 27.6% were single and 12.0%
were divorced or widowed. This information is
presented in Table 1.
Table 1. Sample socioeconomic and demographic characteristics compared to the Greek urban population
Sample
(N = 1007)
N
%
Urban population
aged 18+
(N = 4.573. 914)
%
Sex
Male
Female
469
538
46.6
53.4
47.3
52.7
Age group
18–24
25–34
35–44
45–54
55–54
65+
144
198
188
155
155
167
14.3
19.7
18.6
15.4
15.3
16.1
15.0
19.9
18.6
15.4
15.2
15.6
Education
Primary school
High school
Lyceum
University
Missing values
201
116
424
259
7
20.0
11.5
42.1
25.8
0.7
53.4
9.3
19.2
18.1
Marital status
Single
278
Married
604
Divorced
46
Widowed
74
Missing values
5
27.6
60.0
4.6
7.3
0.5
23.5
65.6
2.3
8.5
1435
Table 2. Reliability coefficients and inter-scale correlations
PF
RP
BP
GH
VT
SF
RE
MH
PF
RP
BP
GH
VT
SF
RE
MH
0.93
0.57
0.48
0.67
0.60
0.54
0.46
0.40
0.95
0.53
0.51
0.54
0.60
0.53
0.38
0.92
0.48
0.53
0.52
0.45
0.44
0.80
0.66
0.54
0.46
0.52
0.82
0.65
0.55
0.71
0.79
0.66
0.63
0.92
0.57
0.83
Cronbach’s a coefficients are presented in the diagonal.
Abbreviations: PF – Physical Function; RP – Role Physical; BP – Bodily Pain; GH – General Health; VT – Vitality; SF – Social
Functioning; RE – Role Emotional; MH – Mental Health.
most respondents scored in the favorable health
categories as expected from a general population.
All of the response choices were used, providing
supporting evidence that there were no problems
with their translation or with that of the associated
items. The missing value rate was very low ranging
from 0.1% for items RF2, PF4, GH1 and VT1 to
1.3% for item VT2 with an overall mean missing
value rate of 0.46%. At scale-level, the percentage
of missing values ranged from 0.2% for BP to
2.9% for PF.
Fundamental assumptions of closeness of correlations and variances of the items, within their
hypothesized scales, were generally fulfilled. Correlations demonstrating the widest range were in
the PF scale (PF1 compared to PF10), the GH
scale (GH1 compared to GH2), the VT scale (VT4
compared to VT2) and the MH scale (MH4
compared to MH2). The largest differences between standard deviations were observed in the PF
scale (0.4) and the GH scale (above 0.25). The
latter result is consistent with findings from other
studies [8–10] and it is attributed mostly to the
heterogeneity of the PF scale.
Tests of scaling assumptions
Significantly higher item-scale correlations were
observed for items and their hypothesized scales
than with competing scales, and the 0.4 criterion
was satisfied for all items. The item discriminant
validity test, after correction for overlap, indicated
maximum (100%) scaling success rates for five of
eight scales, PF, RP, BP, SF, RE and over 90% for
the other three GH, VT and MH. Only one probable
scaling failure was observed, namely in the MH
scale where item MH5 appeared to correlate slightly
better with a competing scale (VT) (details can be
obtained from the authors).
Reliability of scales
Internal consistency, measured by Cronbach’s a, is
summarized in Table 2. It ranged from 0.79 (SF
scale) to 0.95 (RP scale), exceeding, in all cases, the
0.70 standard for group level comparisons. The
correlations between scales ranged from 0.38 (MH
and RP) to 0.71 (MH and VT) and were always less
than their own reliability coefficient, satisfying the
basic criterion. This is firm evidence that each scale
measures a distinct concept [5].
Scale descriptive statistics
The full 0–100 range was observed in all eight
scales (Table 3). The scales which measure both
positive and negative aspects of well-being (GH,
VT and MH) produced lower mean scores, while
scales measuring health-related limitations (SF,
PF, RP, BP and RE) displayed higher mean
scores. The high median observed in all scales, was
expected given that the sample was drawn from the
non-institutionalized general population. The
negative skewness was indicative of the score distribution towards the positive end of the scales.
The abovementioned limitations scales had considerable ceiling effects and two of them, namely
RP and RE, had floor effects as well. GH, VT and
MH presented a wide distribution, with almost
negligible floor effects and very low ceiling effects.
Similar results were observed in studies from other
countries [11–14].
1436
Table 3. Descriptive statistics for the eight scales
Mean
95% CI
SD
Median
Range
Skewness
Floor
Ceiling
PF
RP
BP
GH
VT
SF
RE
MH
80.76
79.17–82.34
25.62
90
0–100
)1.57
1.6
36.8
79.74
77.41–82.07
37.72
100
0–100
)1.48
15.7
74.9
72.98
71.02–74.94
31.66
84
0–100
)0.79
3.3
48.2
67.46
66–68.91
23.54
72
0–100
)0.71
0.5
3.2
66.53
65.14–67.91
22.39
70
0–100
)0.82
1.0
4.1
82.05
80.31)83.79
28.12
100
0–100
)1.52
2.4
60.2
81.53
79.28–83.78
36.31
100
0–100
)1.62
14.4
77.2
68.23
66.91–69.54
21.26
72
0–100
)0.85
0.6
3.5
Abbreviations: PF – Physical Function; RP – Role Physical; BP – Bodily Pain; GH – General Health; VT – Vitality; SF – Social
Functioning; RE – Role Emotional; MH – Mental Health.
Known-groups comparisons
Table 4 summarizes known-groups comparisons,
with lower scores reflecting poorer health. The results confirmed the assumption that women report
worse health than men, a fact described and confirmed in many studies [1, 13–20]. Age was an
important health status factor and affected physical
health relatively more than mental health. Furthermore, differences were observed in all SF-36
scale scores with education as the differing criterion.
It is noteworthy that, in all eight scales, the differences between reported health for those having
completed only primary education (lowest scores)
and the university graduates (highest scores) were of
similar magnitude. Concerning marital status, single people reported better health and this was most
likely related to age. All the above mentioned differences were found to be statistically significant
(p < 0.0005).
Table 4. Mean scores of SF-36 scales in relation to demographic and socioeconomic characteristics
Variable
PF
RP
BP
GH
VT
SF
RE
MH
Sex
Male
Female
84.92
77.12
85.83
74.41
79.67
67.22
70.86
64.49
72.50
61.32
88.09
76.85
87.60
76.21
73.13
63.93
Age
18–24
25–34
35–44
45–54
55–64
65+
94.99
91.82
86.60
80.14
74.82
54.86
92.56
89.77
87.50
75.49
76.04
55.39
79.24
81.32
76.78
67.45
71.17
60.36
79.56
79.27
72.05
63.04
60.85
47.91
76.14
75.50
67.86
63.46
62.93
52.19
88.63
91.27
87.95
78.35
80.52
63.88
86.96
89.82
86.74
76.51
78.19
68.78
71.32
73.91
69.24
66.73
66.93
60.10
Education
Primary
High
Lyceum
University
62.55
80.69
85.86
85.50
59.31
80.46
85.06
86.60
58.64
70.49
76.33
79.72
53.05
67.02
71.45
72.28
51.59
62.77
71.28
71.84
66.17
81.17
85.28
89.69
65.20
80.23
84.58
90.02
58.35
64.56
70.29
73.94
Marital status
Single
Married
Divorced
Widowed
91.0
79.65
78.97
51.75
90.94
78.67
83.58
44.24
78.76
72.90
75.64
50.80
77.41
65.86
65.75
44.03
74.71
65.46
69.04
42.35
88.63
83.0
79.77
52.20
86.56
82.49
82.0
56.73
71.45
68.63
67.83
51.46
Abbreviations: PF – Physical Function; RP – Role Physical; BP – Bodily Pain; GH – General Health; VT – Vitality; SF – Social
Functioning; RE – Role Emotional; MH – Mental Health.
1437
Discussion
The aim of this study was the validation of the
Greek SF-36. We tested validity and reliability and
attempted to detect existing inequalities among
various subgroups. An interesting point was the
missing values rate, which characterizes the completeness of the data. In our study, the percentage
of missing values was only at item-level and an
explanation could be that personal interviews,
which were applied for the whole sample, helped
respondents in better understanding the questions.
Furthermore, concerning the item response frequencies, all of the choices were used by the
respondents, indicating that there were no problems with the translation of the response choices or
the associated items.
As for reliability and validity, we used item-level
and scale-level statistical methods. The results
were consistent with results from other countries
[9–20]. Items discriminated well across scales.
Cronbach’s a coefficient exceeded, in all scales, the
0.70 standard for group level comparisons indicating item homogeneity and internal consistency
of the scales. Furthermore the correlations between scales were always less than their own reliability coefficient, satisfying the basic criterion.
Expected score differences with respect to sex,
age, educational level and marital status were revealed. Men reported better health status than
women, age was an essential factor affecting
physical health more than mental health, low
educational levels were usually related to poorer
reported health and marital status was strongly
associated with age. It would be very interesting to
study these inequalities broken down into the two
genders, but it was not the target of our study at
this moment. All these differences were tested and
found statistically significant, a fact that reinforced
the construct validity of the Greek SF-36. As far as
the item and scale properties were concerned, we
found similarities with other studies. Mean scale
scores, standard deviations, and floor and ceiling
effects were appropriate for comparisons across
countries. The non-existence of differences in mean
scale scores between the Greek and other versions
was strong evidence in favor of the successful
translation of the SF-36 into the Greek language.
Concluding this paper we would like to focus on
two interesting points. This was the first attempt to
administer the SF-36 to such a large sample of the
Greek urban general population via personal
interview, as opposed to previous, smaller-scaled,
studies in Greece, in which self-administration was
applied. Secondly, an interesting challenge for future surveys would be the administration of the
SF-36 to a national representative sample of
Greece including the suburban and rural population. Parallel cross-sectional surveys could provide
information about the health status of the population with acute or chronic diseases, and longitudinal studies on the responsiveness of the Greek
SF-36 to differences and changes in health status
over time.
Acknowledgements
This study was supported by grants from the
Greek Ministry of Health and Social Solidarity.
References
1. Tountas Y, Demakakos P, Yfantopoulos J, Aga J, Houliara L, Pavi E. The health related quality of life of the
employees in the Greek hospitals: Assessing how healthy
are the health workers. Health Qual Life Outcomes 2003; 1:
61.
2. Yfantopoulos J, Pierrakos G, Zanakis V. A comparative
study of the quality of life of patients with hepatitis C. Arch
Hellenic Med 2001; 18: 288–296.
3. Ware JE, Gandek B. Overview of the SF-36 Health Survey
and the international quality of life assessment (IQOLA)
Project. J Clin Epidemiol 1998; 51: 903–912.
4. Ware JE, Snow KK, Kosinski M, Gandek B. SF-36 Health
Survey Manual and Interpretation Guide, Boston: New
England, Medical Center: 1993.
5. Ware JE, Gandek B. Methods for testing data quality,
scaling assumptions and reliability: The IQOLA Project
Approach. J Clin Epidemiol 1998; 51: 945–952.
6. Nunnaly JC, Bernstein IR. Psychometric Theory, 3rd ed.
New York: McGraw-Hill, 1994.
7. Gandek B, Ware JE. Methods for validating and norming
translations of health status questionnaires: The IQOLA
Project Approach. J Clin Epidemiol 1998; 51: 953–959.
8. Gandek B, Ware JE, et al. Tests of data quality, scaling
assumptions and reliability of the SF-36 in Eleven Countries: Results from the IQOLA Project. J Clin Epidemiol
1998; 51: 1149–1158.
9. Shadbolt B, McCallum J, Singh M. Health outcomes by
self-report: Validity of the SF-36 among Australian hospital
patients. Qual Life Res 1997; 6: 343–352.
10. Havard Loge J, Kaasa S, Hjermstad J, Kvien TK. Translation and performance of the Norwegian SF-36 Health
1438
11.
12.
13.
14.
15.
16.
Survey in patients with rheumatoid arthritis. I. Data quality, scaling assumptions, reliability and construct validity. J
Clin Epidemiol 1998; 51: 1069–1076.
Bjorner JB, Damsgaard M, Watt T, Groenvold M. Tests of
data quality, scaling assumptions and reliability of the
Danish SF-36. J Clin Epidemiol 1998; 51: 1001–1011.
Sullivan M, Karlsson J, Ware JE. The Swedish SF-36
Health Survey-I: Evaluation of data quality, scaling
assumptions, reliability and construct validity across general population in Sweden. Soc Sci Med 1995; 41: 1349–
1358.
Apolone G, Mosconi P. The Italian SF-36 Health Survey:
Translation, validation and norming. J Clin Epidemiol
1998; 51: 1025–1036.
Brazier JE, Harper R, Jones NM, et al. Validating the SF36 health survey questionnaire: New outcome measure for
primary care. Br Med J 1992; 305: 160–164.
Taft C, Karlsson J, Sullivan M. Performance of the Swedish
SF-36 version 2. Qual Life Res 2004; 13: 251–256.
Leplege A, Ecosse E, Verdier A, Perneger TV. The French
SF-36 Health Survey: Translation, cultural adaptation and
17.
18.
19.
20.
preliminary psychometric evaluation. J Clin Epidemiol
1998; 51: 1013–1023.
Garratt A, Ruta D, et al. The SF-36 health survey questionnaire: An outcome measure suitable for routine use
within the NHS. Br Med J 1993; 306: 1440–1444.
Jenkinsin C, Coulter A, Wright L. Short Form 36 (SF36)
health survey questionnaire: Normative data for adults of
working age. Br Med J 1993; 306: 1437–1440.
Stansfeld SA, Roberts R, Foot SP. Assessing the validity of
the SF-36 General Health Survey. Qual Life Res 1997; 6:
217–224.
Thumboo J, et al. A community-based study of scaling
assumptions and construct validity of the English (UK) and
Chinese (HK) SF-36 in Singapore. Qual Life Res 2001; 10:
175–188.
Address for correspondence: Evelina Pappa, Faculty of Social
Sciences, Hellenic Open University, Riga Feraiou 169 &
Tsamadou, 26222, Patras, Greece
E-mail: [email protected]