Ó Springer 2005 Quality of Life Research (2005) 14: 1433–1438 DOI 10.1007/s11136-004-6014-y Brief communication Validating and norming of the Greek SF-36 Health Survey Evelina Pappa, Nick Kontodimopoulos & Dimitris Niakas Faculty of Social Sciences, Hellenic Open University, Patras, Greece (E-mail: [email protected]) Accepted in revised form 9 November 2004 Abstract The main objective of this study was to validate the Greek SF-36 Health Survey and to provide general population normative data. The survey was administered to a stratified representative sample (n ¼ 1426) of the general population residing in the broader Athens area and the response rate was 70.6%. Statistical analysis, according to documented procedures developed within the IQOLA Project, was performed. The missing value rate was very low, ranging from 0.1 to 1.3% at the item level. Multitrait scaling analysis confirmed the hypothesized scale structure of the SF-36. Cronbach’s a coefficient met the criterion (>0.70) for group analysis in all eight scales. Known group comparisons yielded consistent support of construct validity of the SF-36. Significant statistical differences in mean scores were observed in relation to demographic and social characteristics such as gender, age, education and marital status. Key words: Multitrait scaling, Normative data, Quality of life, Reliability, SF–36, Validity Introduction Methods In Greece, the use of the SF-36 Health Survey in quality-of-life studies, involving the general population or disease specific groups is, to date, very limited [1, 2]. This paper reports on the first, to our knowledge, study involving a stratified sample of the general population. We implemented the procedures documented in the second and third stages of the IQOLA Project [3]. The objective was to validate the SF-36 and provide normative data for Greece, facilitating the interpretation of the SF-36 scales and comparisons of health status scores from populations across countries. We tested the validity and the reliability of the questionnaire, performed tests of scaling assumptions (item internal consistency and item discriminant validity) and parametric tests (t-test, ANOVA) to ascertain the statistical significance of the observed differences. Data collection The study took place in May of 2003 and involved a sample residing in the broader Athens area, where approximately 35% of the Greek population lives. Institutionalized people were excluded. The participants were grouped, according to socio-demographic characteristics, proportionally to the Greek urban population, according to a three-staged sampling methodology. This stratification ensured that the sample was genuinely representative of this particular population. Specifically, in the first sampling stage, a random sample of 84 blocks of residences was selected according to information from the 1991 national census. In the second stage, for every block, households were selected by systematic sampling. In the third stage, for every household, a participant at least 18 years 1434 old was chosen by simple random sampling. Totally, 1007 out of 1426 candidates (response rate 70.6%) agreed to participate. The questionnaire contained the SF-36 (Greek version 1.0), additional questions covering health services utilization, satisfaction and cost and a set of common socio-demographic questions at the end. The questionnaire was administered via interview in order to enhance subjects’ understanding of the questions and minimize missing values [4]. For each item, responses were coded, summed and transformed into a scale from 0 (worst possible health status) to 100 (best possible health status). The missing values were substituted according to the method suggested by the developers for gaining scores for missing values [4]. Results Analysis Data completeness The first step in evaluating the summated ratings scales was the examination of missing and out-ofrange data, which is often associated with translation problems [5]. Descriptive statistics were generated to evaluate data completeness and to characterize the response distributions. Reliability was tested via Cronbach’s coefficient a and construct validity was investigated to determine the extent to which scores correlated with criteria based on theory [6, 7]. Tests of scaling assumptions examined the itemscale correlations and were used to confirm the hypothesized scale structure. These tests were: (1) item internal consistency which is substantial and satisfactory when correlation between an item and its hypothesized scale is at least 0.40 and (2) item discriminant validity which is successful when the correlation between an item and its own scale is significantly higher, by two standard errors or more, than with other scales. Tests of ‘known-groups’ validity were implemented to provide supporting information for interpreting scale scores and assessing the ability of the questionnaire to distinguish between subgroups of respondents known to differ in key socio-demographic or clinical variables. In this study, these variables included gender, age, education and marital status. Parametric tests (t-test and ANOVA) were performed to test the statistical significance of the observed differences. Concerning the item response frequency for all items (details can be obtained from the authors), Sample characteristics Our sample consisted of 46.6% men and 53.4% women, roughly representing the gender distribution of the Greek urban population, according to the 1991 census. As for the level of education, 11.5% of the respondents had completed primary education, 20% were only high school graduates, 42.1% had completed 12-year education (lyceum) and 25.8% were university graduates. The sample was divided into six age groups with an overall mean age of 45 years. Concerning marital status, 60.0% were married, 27.6% were single and 12.0% were divorced or widowed. This information is presented in Table 1. Table 1. Sample socioeconomic and demographic characteristics compared to the Greek urban population Sample (N = 1007) N % Urban population aged 18+ (N = 4.573. 914) % Sex Male Female 469 538 46.6 53.4 47.3 52.7 Age group 18–24 25–34 35–44 45–54 55–54 65+ 144 198 188 155 155 167 14.3 19.7 18.6 15.4 15.3 16.1 15.0 19.9 18.6 15.4 15.2 15.6 Education Primary school High school Lyceum University Missing values 201 116 424 259 7 20.0 11.5 42.1 25.8 0.7 53.4 9.3 19.2 18.1 Marital status Single 278 Married 604 Divorced 46 Widowed 74 Missing values 5 27.6 60.0 4.6 7.3 0.5 23.5 65.6 2.3 8.5 1435 Table 2. Reliability coefficients and inter-scale correlations PF RP BP GH VT SF RE MH PF RP BP GH VT SF RE MH 0.93 0.57 0.48 0.67 0.60 0.54 0.46 0.40 0.95 0.53 0.51 0.54 0.60 0.53 0.38 0.92 0.48 0.53 0.52 0.45 0.44 0.80 0.66 0.54 0.46 0.52 0.82 0.65 0.55 0.71 0.79 0.66 0.63 0.92 0.57 0.83 Cronbach’s a coefficients are presented in the diagonal. Abbreviations: PF – Physical Function; RP – Role Physical; BP – Bodily Pain; GH – General Health; VT – Vitality; SF – Social Functioning; RE – Role Emotional; MH – Mental Health. most respondents scored in the favorable health categories as expected from a general population. All of the response choices were used, providing supporting evidence that there were no problems with their translation or with that of the associated items. The missing value rate was very low ranging from 0.1% for items RF2, PF4, GH1 and VT1 to 1.3% for item VT2 with an overall mean missing value rate of 0.46%. At scale-level, the percentage of missing values ranged from 0.2% for BP to 2.9% for PF. Fundamental assumptions of closeness of correlations and variances of the items, within their hypothesized scales, were generally fulfilled. Correlations demonstrating the widest range were in the PF scale (PF1 compared to PF10), the GH scale (GH1 compared to GH2), the VT scale (VT4 compared to VT2) and the MH scale (MH4 compared to MH2). The largest differences between standard deviations were observed in the PF scale (0.4) and the GH scale (above 0.25). The latter result is consistent with findings from other studies [8–10] and it is attributed mostly to the heterogeneity of the PF scale. Tests of scaling assumptions Significantly higher item-scale correlations were observed for items and their hypothesized scales than with competing scales, and the 0.4 criterion was satisfied for all items. The item discriminant validity test, after correction for overlap, indicated maximum (100%) scaling success rates for five of eight scales, PF, RP, BP, SF, RE and over 90% for the other three GH, VT and MH. Only one probable scaling failure was observed, namely in the MH scale where item MH5 appeared to correlate slightly better with a competing scale (VT) (details can be obtained from the authors). Reliability of scales Internal consistency, measured by Cronbach’s a, is summarized in Table 2. It ranged from 0.79 (SF scale) to 0.95 (RP scale), exceeding, in all cases, the 0.70 standard for group level comparisons. The correlations between scales ranged from 0.38 (MH and RP) to 0.71 (MH and VT) and were always less than their own reliability coefficient, satisfying the basic criterion. This is firm evidence that each scale measures a distinct concept [5]. Scale descriptive statistics The full 0–100 range was observed in all eight scales (Table 3). The scales which measure both positive and negative aspects of well-being (GH, VT and MH) produced lower mean scores, while scales measuring health-related limitations (SF, PF, RP, BP and RE) displayed higher mean scores. The high median observed in all scales, was expected given that the sample was drawn from the non-institutionalized general population. The negative skewness was indicative of the score distribution towards the positive end of the scales. The abovementioned limitations scales had considerable ceiling effects and two of them, namely RP and RE, had floor effects as well. GH, VT and MH presented a wide distribution, with almost negligible floor effects and very low ceiling effects. Similar results were observed in studies from other countries [11–14]. 1436 Table 3. Descriptive statistics for the eight scales Mean 95% CI SD Median Range Skewness Floor Ceiling PF RP BP GH VT SF RE MH 80.76 79.17–82.34 25.62 90 0–100 )1.57 1.6 36.8 79.74 77.41–82.07 37.72 100 0–100 )1.48 15.7 74.9 72.98 71.02–74.94 31.66 84 0–100 )0.79 3.3 48.2 67.46 66–68.91 23.54 72 0–100 )0.71 0.5 3.2 66.53 65.14–67.91 22.39 70 0–100 )0.82 1.0 4.1 82.05 80.31)83.79 28.12 100 0–100 )1.52 2.4 60.2 81.53 79.28–83.78 36.31 100 0–100 )1.62 14.4 77.2 68.23 66.91–69.54 21.26 72 0–100 )0.85 0.6 3.5 Abbreviations: PF – Physical Function; RP – Role Physical; BP – Bodily Pain; GH – General Health; VT – Vitality; SF – Social Functioning; RE – Role Emotional; MH – Mental Health. Known-groups comparisons Table 4 summarizes known-groups comparisons, with lower scores reflecting poorer health. The results confirmed the assumption that women report worse health than men, a fact described and confirmed in many studies [1, 13–20]. Age was an important health status factor and affected physical health relatively more than mental health. Furthermore, differences were observed in all SF-36 scale scores with education as the differing criterion. It is noteworthy that, in all eight scales, the differences between reported health for those having completed only primary education (lowest scores) and the university graduates (highest scores) were of similar magnitude. Concerning marital status, single people reported better health and this was most likely related to age. All the above mentioned differences were found to be statistically significant (p < 0.0005). Table 4. Mean scores of SF-36 scales in relation to demographic and socioeconomic characteristics Variable PF RP BP GH VT SF RE MH Sex Male Female 84.92 77.12 85.83 74.41 79.67 67.22 70.86 64.49 72.50 61.32 88.09 76.85 87.60 76.21 73.13 63.93 Age 18–24 25–34 35–44 45–54 55–64 65+ 94.99 91.82 86.60 80.14 74.82 54.86 92.56 89.77 87.50 75.49 76.04 55.39 79.24 81.32 76.78 67.45 71.17 60.36 79.56 79.27 72.05 63.04 60.85 47.91 76.14 75.50 67.86 63.46 62.93 52.19 88.63 91.27 87.95 78.35 80.52 63.88 86.96 89.82 86.74 76.51 78.19 68.78 71.32 73.91 69.24 66.73 66.93 60.10 Education Primary High Lyceum University 62.55 80.69 85.86 85.50 59.31 80.46 85.06 86.60 58.64 70.49 76.33 79.72 53.05 67.02 71.45 72.28 51.59 62.77 71.28 71.84 66.17 81.17 85.28 89.69 65.20 80.23 84.58 90.02 58.35 64.56 70.29 73.94 Marital status Single Married Divorced Widowed 91.0 79.65 78.97 51.75 90.94 78.67 83.58 44.24 78.76 72.90 75.64 50.80 77.41 65.86 65.75 44.03 74.71 65.46 69.04 42.35 88.63 83.0 79.77 52.20 86.56 82.49 82.0 56.73 71.45 68.63 67.83 51.46 Abbreviations: PF – Physical Function; RP – Role Physical; BP – Bodily Pain; GH – General Health; VT – Vitality; SF – Social Functioning; RE – Role Emotional; MH – Mental Health. 1437 Discussion The aim of this study was the validation of the Greek SF-36. We tested validity and reliability and attempted to detect existing inequalities among various subgroups. An interesting point was the missing values rate, which characterizes the completeness of the data. In our study, the percentage of missing values was only at item-level and an explanation could be that personal interviews, which were applied for the whole sample, helped respondents in better understanding the questions. Furthermore, concerning the item response frequencies, all of the choices were used by the respondents, indicating that there were no problems with the translation of the response choices or the associated items. As for reliability and validity, we used item-level and scale-level statistical methods. The results were consistent with results from other countries [9–20]. Items discriminated well across scales. Cronbach’s a coefficient exceeded, in all scales, the 0.70 standard for group level comparisons indicating item homogeneity and internal consistency of the scales. Furthermore the correlations between scales were always less than their own reliability coefficient, satisfying the basic criterion. Expected score differences with respect to sex, age, educational level and marital status were revealed. Men reported better health status than women, age was an essential factor affecting physical health more than mental health, low educational levels were usually related to poorer reported health and marital status was strongly associated with age. It would be very interesting to study these inequalities broken down into the two genders, but it was not the target of our study at this moment. All these differences were tested and found statistically significant, a fact that reinforced the construct validity of the Greek SF-36. As far as the item and scale properties were concerned, we found similarities with other studies. Mean scale scores, standard deviations, and floor and ceiling effects were appropriate for comparisons across countries. The non-existence of differences in mean scale scores between the Greek and other versions was strong evidence in favor of the successful translation of the SF-36 into the Greek language. Concluding this paper we would like to focus on two interesting points. This was the first attempt to administer the SF-36 to such a large sample of the Greek urban general population via personal interview, as opposed to previous, smaller-scaled, studies in Greece, in which self-administration was applied. Secondly, an interesting challenge for future surveys would be the administration of the SF-36 to a national representative sample of Greece including the suburban and rural population. Parallel cross-sectional surveys could provide information about the health status of the population with acute or chronic diseases, and longitudinal studies on the responsiveness of the Greek SF-36 to differences and changes in health status over time. Acknowledgements This study was supported by grants from the Greek Ministry of Health and Social Solidarity. References 1. Tountas Y, Demakakos P, Yfantopoulos J, Aga J, Houliara L, Pavi E. The health related quality of life of the employees in the Greek hospitals: Assessing how healthy are the health workers. Health Qual Life Outcomes 2003; 1: 61. 2. Yfantopoulos J, Pierrakos G, Zanakis V. A comparative study of the quality of life of patients with hepatitis C. Arch Hellenic Med 2001; 18: 288–296. 3. Ware JE, Gandek B. Overview of the SF-36 Health Survey and the international quality of life assessment (IQOLA) Project. J Clin Epidemiol 1998; 51: 903–912. 4. Ware JE, Snow KK, Kosinski M, Gandek B. SF-36 Health Survey Manual and Interpretation Guide, Boston: New England, Medical Center: 1993. 5. Ware JE, Gandek B. Methods for testing data quality, scaling assumptions and reliability: The IQOLA Project Approach. J Clin Epidemiol 1998; 51: 945–952. 6. Nunnaly JC, Bernstein IR. Psychometric Theory, 3rd ed. New York: McGraw-Hill, 1994. 7. Gandek B, Ware JE. Methods for validating and norming translations of health status questionnaires: The IQOLA Project Approach. J Clin Epidemiol 1998; 51: 953–959. 8. Gandek B, Ware JE, et al. Tests of data quality, scaling assumptions and reliability of the SF-36 in Eleven Countries: Results from the IQOLA Project. J Clin Epidemiol 1998; 51: 1149–1158. 9. Shadbolt B, McCallum J, Singh M. Health outcomes by self-report: Validity of the SF-36 among Australian hospital patients. Qual Life Res 1997; 6: 343–352. 10. Havard Loge J, Kaasa S, Hjermstad J, Kvien TK. Translation and performance of the Norwegian SF-36 Health 1438 11. 12. 13. 14. 15. 16. Survey in patients with rheumatoid arthritis. I. Data quality, scaling assumptions, reliability and construct validity. J Clin Epidemiol 1998; 51: 1069–1076. Bjorner JB, Damsgaard M, Watt T, Groenvold M. Tests of data quality, scaling assumptions and reliability of the Danish SF-36. J Clin Epidemiol 1998; 51: 1001–1011. Sullivan M, Karlsson J, Ware JE. The Swedish SF-36 Health Survey-I: Evaluation of data quality, scaling assumptions, reliability and construct validity across general population in Sweden. Soc Sci Med 1995; 41: 1349– 1358. Apolone G, Mosconi P. The Italian SF-36 Health Survey: Translation, validation and norming. J Clin Epidemiol 1998; 51: 1025–1036. Brazier JE, Harper R, Jones NM, et al. Validating the SF36 health survey questionnaire: New outcome measure for primary care. Br Med J 1992; 305: 160–164. Taft C, Karlsson J, Sullivan M. Performance of the Swedish SF-36 version 2. Qual Life Res 2004; 13: 251–256. Leplege A, Ecosse E, Verdier A, Perneger TV. The French SF-36 Health Survey: Translation, cultural adaptation and 17. 18. 19. 20. preliminary psychometric evaluation. J Clin Epidemiol 1998; 51: 1013–1023. Garratt A, Ruta D, et al. The SF-36 health survey questionnaire: An outcome measure suitable for routine use within the NHS. Br Med J 1993; 306: 1440–1444. Jenkinsin C, Coulter A, Wright L. Short Form 36 (SF36) health survey questionnaire: Normative data for adults of working age. Br Med J 1993; 306: 1437–1440. Stansfeld SA, Roberts R, Foot SP. Assessing the validity of the SF-36 General Health Survey. Qual Life Res 1997; 6: 217–224. Thumboo J, et al. A community-based study of scaling assumptions and construct validity of the English (UK) and Chinese (HK) SF-36 in Singapore. Qual Life Res 2001; 10: 175–188. Address for correspondence: Evelina Pappa, Faculty of Social Sciences, Hellenic Open University, Riga Feraiou 169 & Tsamadou, 26222, Patras, Greece E-mail: [email protected]
© Copyright 2024 ExpyDoc