Modelling missing values in cross-national surveys: a latent variable approach M. Katsikatsou, J. Kuha and I. Moustaki London School of Economics and Political Science Workshop on Cross-National Surveys: Methods of Design and Analysis Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 1/33 Outline 1 An Introduction to Latent Variable Models. 2 Latent Variable Models for Multi-group Complete Binary Data. 3 Variables and Joint Distributions. 4 Various Model specifications for handling Item Non-response. 5 An Application from the European Social Survey. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 2/33 Measurement and Structural Models 1 Many theories in behavioral and social sciences are formulated in terms of theoretical constructs that are not directly observed or measured: Prejudice, ability, radicalism, motivation, wealth. 2 The measurement of a construct is achieved through one or more observable indicators (questionnaire items - Measurement model). 3 The purpose of a measurement model is to describe how well the observed indicators serve as a measurement instrument for the constructs also known as latent variables. 4 In some cases, a concept may be represented by a single latent variable, but often they are multidimensional in nature and so involve more than one latent variable. 5 Subject-matter theories and research questions usually concern relationships among the latent variables, and perhaps also observed explanatory variables (structural models). Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 3/33 Application 2008 European Social Survey. Countries selected: Denmark, Great Britain, The Netherlands, Poland. Three questions are selected which aim to measure attitudes towards receivers of welfare provision. Most unemployed people do not really try to find a job. Many people manage to obtain benefits and services to which they are not entitled. Employees often pretend they are sick in order to stay at home. Response options (5-point scale): Agree strongly (negative attitude) to Disagree strongly (positive attitude). Missing categories: Refusal, Don’t know, No answer. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 4/33 Aim of the analysis Valid comparisons of the latent variable ’attitude towards receivers of welfare’ among the countries taking into account possible differences in the measurement of the latent variable across groups (measurement invariance) and the effect of item non-response. Measurement invariance will be assumed. Various model specifications are proposed for the missing data mechanism. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 5/33 Family of Latent Variable Models Metrical Latent variables Categorical Mixed Manifest variables Metrical Categorical Mixed factor latent trait latent trait analysis analysis analysis latent profile latent class latent class analysis analysis analysis Hybrid models Bartholomew, D.J. and Knott, M. amd Moustaki, I (2011) Latent Variable Models and Factor Analysis: A unified approach. Wiley. Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modelling: Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: Chapman and Hall/CRC. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 6/33 General scope and Notation Observed/Manifest variables/ items are denoted by: Y = (Y1 , Y2 , . . . , Yp )′ . Latent variables are denoted by: η = (η1 , η2 , . . . , ηq )′ . Latent variables can be either continuous, discrete or mixed. Covariates are denoted with X = (X1 , . . . , Xc )′ such as group variables, gender, age, etc. Response/non-response indicators: R = (R1 , R2 , . . . , Rp )′ defined as Rj = 1 if Yj is observed and Rj = 0 if Yj is not observed. This stochastic vector contains all the information about the missing patterns. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 7/33 Important features of our approach It allows information about the unobserved part of Y to be inferred through the observed part of Y since the manifest variables are expected to be correlated, Yc = (Yobs , Ymis ). An underlying model is assumed to model relationships among the variables being measured. Specifically, a single continuous latent variable η is assumed to be responsible for the dependencies among the Ys. A discrete latent dimension, response propensity, is assumed on which individuals in the population vary. Therefore a latent trait and a latent class model interplay together to allow for responding to an item to vary not only according to individual’s position on the latent variable η but also on the position on the response propensity dimension. Those two things together produce a non-ignorable non-response model. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 8/33 Model specification for multi-group complete binary data Suppose that each respondent belongs to one of G groups. The group is treated as a fixed and observed explanatory variable for η and Y. As η is unobserved, any inference will be based on the conditional distribution of Y given the group and other covariates: P(Y = y ∣ X) = ∫ P(Y ∣ η)p(η ∣ X) dη (1) Under conditional independence: p P(Y ∣ η) = ∏[πi (η)]yi [1 − πi (η)]1−yi , Yi = 0, 1 i=1 πi (η) = P(Yi = 1 ∣ η) Measurement model: P(Y ∣ η) Structural model: p(η ∣ X) Here X includes the group variable and other covariates. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 9/33 Latent Trait Model In a latent trait model with X containing only group information, the latent variable η ∼ N(m(g ) , φ(g ) ), g = 1, . . . , G This is the structural model. For the measurement model, we use the logistic model logit[πi (η)] = τi + αi η, i = 1, . . . , p, (2) where τi and αi are the intercept and loading parameter, respectively, taken here to be invariant across groups (measurement invariance). Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 10/33 Estimation For a random sample of size n, the Log-likelihood function: n `(θ) = ∑ log P(Y = yj ∣ Xj ; θ) (3) j=1 where, θ denotes the parameters of the model. Maximum likelihood or Bayesian estimation can be applied. Most commonly maximization algorithms are the E-M and Newton Raphson. Numerical methods for approximating the integrals are also needed. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 11/33 Variables and their Joint Distribution Conditional on the observed covariates X, the joint distribution of the other variables can be written as p(R, YC , η∣X) = p(R∣YC , η, X) p(YC ∣η, X) p(η∣X) (4) where p(⋅∣⋅) denotes a conditional probability function or probability density function. We will refer to p(R∣YC , η, X), p(YC ∣η, X) and p(η∣X) as the non-response model, measurement model and structural model. As η and Ymis are not observed, the conditional distribution of the observed variables is obtained from (4) as p(R, Y∣X) = ∫ p(R∣YC , η, X) p(YC ∣η, X) p(η∣X) dη dYmis (5) where the integrals are over the possible values of η and Ymis . Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 12/33 Variables and their Joint Distribution, cont’d Finally, we make some further assumptions which will reduce (5) to p p(R, Y∣X) = ∫ p(R∣η, X) [∏ p(Yi ∣η)] p(η∣X) dη. (6) i Measurement Invariance Conditional Independence Missingness depends on η and covariates X Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 13/33 Assumptions about missingness Ignorable non-response Missingness does not depend on the variable of interest but depends on covariates (MAR). p(R∣η, X) = p(R∣X). Missingness does not depend on the variable of interest or covariates (MCAR) p(R∣η, X) = p(R) It is evident that in the presence of MCAR or MAR, missing values can be ignored with no effect on the inference about η. In the MAR case, ML etimation and inference methods should be applied. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 14/33 Assumptions about missingness Non-ignorable non-response Missingness depends on the variable of interest, e.g. an attitude. If we ignore it (treat it as random) the inference about the variable of interest is likely to be biased. To avoid bias in inference, include the missing data mechanism in the model. To avoid a confounding correlation between η and R, one needs to condition on all the important covariates. A rich/appropriate model needs to be chosen for the missing data mechanism. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 15/33 Models proposed for p(R∣η, X) Missing indicators are summarised by a latent variable ξ that measures response propensity. The latent variable response propensity can be assumed to be continuous (latent trait model) or discrete (latent class model). Here the emphasis is also given in specifying a rich model for the response/non-response indicators. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 16/33 Diagrammatic representation of a model for missing data, Holman and Glas (2005) y1 y2 b R1 … R2 i η Katsikatsou, Kuha, Moustaki … b a i ξ Modelling item non-response 14th of December 2014 17/33 Diagrammatic representation of model for missing data, Knott, Albanese, Galbraith (1990), O’Muircheartaigh and Moustaki (1999), Moustaki and Knott (2000) y1 y2 b b R1 … R2 … b b η i i ξ x Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 18/33 Response propensity is treated as continuous: Latent Trait Model For each missing data indicator binary item Ri , dropping the group variable: S logit[πi (η, ξ, x)] = β0,i + β1,i η + β2,i ξ + ∑ γs,j xs , i = 1, . . . , p, (7) s=1 where πi (η, ξ, x) = P(Ri = 1 ∣ η, ξ, x), and β0i , β1,i , β2,i and γs,i are the intercept, loading parameters and regression coefficients, respectively, of the model for the i th response propensity item. The parameters β1,i provide information on non-ignorability separately for each item. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 19/33 The proposed model for missing data, η is continuous and ξ is discrete y1 y2 b R1 … R2 … b i i e η ξ a d x Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 20/33 Model specification: Latent Class Model The latent class is denoted by ξ with K latent classes, where K << 2p . The latent class model will approximate the observed multinomial distribution of R as the number of latent classes increases. The number of latent classes needs to be decided based on model fit criteria, AIC and BIC. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 21/33 Two ways of looking at the missing data mechanism model Version A Use the latent class membership (respondents/non-respondents latent classes) as a predictor for the mean and the variance of η. Such an interpretation is plausible in real data. For example, high levels of non-response is very likely to indicate high levels of lack of an ability, extreme attitudes, etc. Version B Use the attitude latent variable η as a predictor for the latent class membership. For example, respondents with more liberal views might be less likely to respond to questions about immigration. That definition is more in line with the literature on missing data. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 22/33 Find a good model for the response indicators, the single group case Fit a latent class model to the missing indicators R and define the number of latent classes in each country. The measurement model for the indicators in R under conditional independence and measurement invariance: K P (R = r ∣ X = x) = ∑ P (R∣ξ = k) P (ξ = k ∣ X = x) (8) k=1 The structural part of the model: It is important to consider all potential covariates that affect both η and ξ. P (ξ = k ∣ X = x) = exp (α0∣k + γ ′ξ X) ′ ∑K k=1 exp (α0∣k + γ ξ X) , (9) where α0∣K = 0 for identification purposes. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 23/33 Testing for ignorability - the multigroup case The test for MCAR: p (g ) (ξ = k∣X = x, η) = p (g ) (ξ = k) , (10) The test for MAR: p (g ) (ξ = k∣X = x, η) = p (g ) (ξ = k ∣ X = x) , Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 (11) 24/33 Testing for non-ignorability - the multigroup case p (g ) (ξ = k∣X = x, η) , P (g ) (g ) (ξ = k ∣ X = x, η) = (12) ′ (g ) exp (α0∣k + γ ξ (g ) (g ) X + λξ η) ′ (g ) ∑K k=1 exp (α0∣k + γ ξ (g ) , η (g ) = λη(g ) X + (g ) ′ Katsikatsou, Kuha, Moustaki ′ ′ Modelling item non-response (13) X + λξ η) (14) 14th of December 2014 25/33 European Social Survey: a study of non-response Scale on attitudes towards welfare: Most unemployed people do not really try to find a job. Many people manage to obtain benefits and services to which they are not entitled. Employees often pretend they are sick in order to stay at home. Response alternatives (5-point scale): Agree Strongly; Agree ; Neither agree nor disagree; Disagree; Disagree Strongly. Missing categories: Refusal, Don’t know, No answer. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 26/33 Regression of ξ on covariates and attitude, analysis conducted separately in each country FI DK GB FR NL PL GR Intercept 4.59 3.92 3.48 3.82 3.76 2.53 3.10 Age→ ξ -0.50** -0.38** -0.27* -0.36** 0.08 -0.27** -0.03 Ed L3→ ξ 1.09 2.12* 2.02 0.11 1.65** 1.85*** 1.44** Ed L45→ ξ 1.00 1.51* 0.40 0.65 1.87** 1.78*** 1.85* F→ ξ -0.74 -1 -0.53 -0.54 -0.02 -0.74* -0.75* η→ ξ 0.04 -0.29 -0.49* -0.54* -1.17* -1.06** 0.01 ***: p-value<0.001; **: p-value≤0.01; *: p-value≤0.05 Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 27/33 ESS: Multigroup model Country Q1 Q2 R1 η ξ Q3 R2 R3 Age Education Gender Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 28/33 Parameter estimates, multi-group model Measurement Model for the attitudinal items: (all countries) ˆ1 λ 0.52*** ˆ2 λ 0.50*** ˆ3 λ 0.49*** Regression of attitude on covariates: Country→ η Age→ η Ed L3→ η Ed L45→ η F→ η DK 0 (fixed) -0.02 0.56*** 1.09*** 0.10 GB -1.12*** 0.01 0.11 0.46*** -0.05 NL -0.35** 0.05* 0.36*** 0.92*** 0.03 PL -0.96*** 0.01 -0.10 -0.10 0.15* ***: p-value<0.001; **: p-value≤0.01; *: p-value≤0.05 Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 29/33 Regression of ξ on covariates and attitude Intercept Country→ξ Age→ξ Ed L3→ξ Ed L45→ξ F→ξ η →ξ Katsikatsou, Kuha, Moustaki DK 0 (fixed) -0.38** 2.09* 1.55* -1.01 -0.28 3.86*** GB NL -1.14 -0.66 -0.25*** 0.06 2.14* 1.54** 0.38 1.62** -0.48 -0.07 -0.50* -0.85* Modelling item non-response PL -2.32** -0.27** 1.84*** 1.75*** -0.73* -1.01** 14th of December 2014 30/33 Ongoing research Test other model specifications such as a continuous response propensity and discrete attitudes. Perform a sensitivity analysis that examines the effect of ignoring non-ignorable non-response on the structural part of the model. Perform a sensitivity analysis that examines the effect of various models for missing data on the structural and measurement models. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 31/33 References Knott, M. and Albanese, M. T. and Galbraith, J. (1990). Scoring attitudes to abortion. The Statistician, 40, 217-223. O’Muircheartaigh, O. and Moustaki, I. (1999). Symmetric pattern models: a latent variable approach to item non-response in attitude scales. Journal of the Royal Statistical Society, Series A., Vol. 162, 177-194. Moustaki, I. and Knott, M. (2000). Weighting for Item Non-Response in Attitude Scales Using Latent Variable Models with Covariates. Journal of the Royal Statistical Society, Series A, Vol. 163(3), 445-459. Holman, R. and Glas, C. A. W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory models.British Journal of Mathematical and Statistical Psychology, Vol. 58, 1-17. Katsikatsou, M., Kuha, J. and Moustaki, I. (in preparation) Multigroup data and item non-response: a general model framework. Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 32/33 Thank You and Many Good Wishes for 2015 Katsikatsou, Kuha, Moustaki Modelling item non-response 14th of December 2014 33/33
© Copyright 2024 ExpyDoc