Chapter 15 Dichotomous IRT Models c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 0 / 22 Today’s Menu Today we introduce parametric IRT models for dichotomous (binary) data: Rasch model: concepts, formulation, implications. Birnbaum models: two- and three-parameter logistic model. Extensions: Bayesian IRT, IRT mixed effects framework, restricted models, longitudinal models. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 1 / 22 Let’s Repeat In the last unit we’ve introduced nonparametric IRT models (Mokken scales): Monotone homogeneity model (MHM): Ordering persons on latent trait. Double monotonicity model (DMM): Ordering persons and items on latent trait. An important feature of Mokken scale is that we can explore monotonicity of the IRFs (ICCs). Parametric models are typically logit models that force the ICCs to be logistic. Parametric models map item and person parameters on a common trait continuum (interval scaled). We can perform all sorts of statistical tests. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 2 / 22 [ Dichotomous IRT Models ] The Rasch Model c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 3 / 22 Georg Rasch (1901–1980) Rasch was a Danish mathematician and philosopher and revolutionized Psychometrics in the 60s. Most important publications: Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Danish Institute for Educational Research. Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, IV, pp. 321–334. Berkeley. Rasch, G. (1977). On Specific Objectivity: An attempt at formalizing the request for generality and validity of scientific statements. The Danish Yearbook of Philosophy, 14, 58–93. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 4 / 22 Rasch Model Formulation Let X be a binary n × k data matrix: P (Xvi = 1) = exp(θv − βi ) 1 + exp(θv − βi ) mit βi (i = 1, . . . , k ) as item difficulty parameters and θv (v = 1, . . . , n ) as person ability parameters. Both are on an interval scale. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 5 / 22 Rasch Model Formulation Let X be a binary n × k data matrix: P (Xvi = 1) = exp(θv − βi ) 1 + exp(θv − βi ) mit βi (i = 1, . . . , k ) as item difficulty parameters and θv (v = 1, . . . , n ) as person ability parameters. Both are on an interval scale. Assumptions in the Rasch model: Raw score sufficiency: The raw scores (row and column sums) of X contain all the information we need. Unidimensionality: We assume a unidimensional latent trait. Local independence: Given the ability θ, the item responses are independent. Parallel item characteristic curves. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 5 / 22 Sufficiency This is an absolutely crucial assumption with significant practical implications: You are only allowed to sum up the item scores to a total score iff the Rasch model holds. This implies that the items have a weight of 1 (the 2-PL relaxes this assumption). In factor analysis terms this means that all the loadings are 1. The Rasch model doesn’t care which items you’ve solved correctly; it just requires the raw scores. It can be shown that all other Rasch assumptions are a consequence of raw score sufficiency (Fischer, 1995). c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 6 / 22 Parameter Estimation In general, IRT parameter estimation is carried out in 2 steps: 1 estimate the item parameters, 2 use these item parameters to estimate the person parameters (either ML or Bayesian). c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 7 / 22 Parameter Estimation In general, IRT parameter estimation is carried out in 2 steps: 1 estimate the item parameters, 2 use these item parameters to estimate the person parameters (either ML or Bayesian). Apart from Bayesian approaches, there are two reasonable ways to estimate the item parameters: Making use of the sufficiency property and use a conditional maximum likelihood (CML) approach: no assumptions on the trait, limited to Rasch models only, many possibilities for model/parameter testing. (package eRm) Marginal maximum likelihood (MML): trait needs to be normally distributed, works for higher-parameterized IRT models as well. (package ltm) c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 7 / 22 Parameter Estimation In Rasch models, CML estimation is the way to go. Besides the desirable properties mentioned above, it implies also that the item parameters across ANY person subgroup have to be the same. The CML estimation approach is the mathematical “translation” of Rasch’s epistemological theory of specific objectivity: Persons v and v ′ (with θˆv and θˆv ′ ) can be compared independently from the other persons in the sample and independently from the presented item subset Ψ (on an interval scale). Items i and i ′ (with βˆi and βˆi ′ ) can be compared independently from the remaining items in Ψ and independently from the persons in the sample. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 8 / 22 Parameter Estimation In Rasch models, CML estimation is the way to go. Besides the desirable properties mentioned above, it implies also that the item parameters across ANY person subgroup have to be the same. The CML estimation approach is the mathematical “translation” of Rasch’s epistemological theory of specific objectivity: Persons v and v ′ (with θˆv and θˆv ′ ) can be compared independently from the other persons in the sample and independently from the presented item subset Ψ (on an interval scale). Items i and i ′ (with βˆi and βˆi ′ ) can be compared independently from the remaining items in Ψ and independently from the persons in the sample. Implication: sample independent (objective) testing (Rasch, 1961, 1977). Statements about persons independently from the population and the items presented. Formulation of model tests based on this assumption. Representativity of the sample is of less importance. Separability of item and person parameters in the estimation. Test fairness (e.g. no gender effects, no cultural differences). c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 8 / 22 Rasch Model: Example To summarize, if a Rasch model fits your data, the corresponding scale fulfills highest measurement standards (Rasch model as a seal of quality for your scale). Example: Let’s look at a real life dataset based on the results on the Math exams at the Vienna University of Economics. We had multiple choice items; the responses were binarized to 0 (wrong) and 1 (correct). In total we have 20 different items related to various aspects of Math education at this University. The sample size is 9404. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 9 / 22 Model Test I The classical test in Rasch models is Andersen’s LR test. It makes use of subgroup invariance of the item parameters: Fit the Rasch model for all selected items. Fit single Rasch models for each person subgroup. Compute a LR-statistic: if significant, there are significant differences in the item parameters between the subgroups. A graphical inspection helps to eliminate items. We have to run this process repeatedly since we should only eliminate one item at a time. eRm provides automated stepwise selection functions. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 10 / 22 Model Test II: Permutation Tests A modern approach to Rasch model testing are permutation tests. We have to find permutations of our data matrix which are consistent with the Rasch model. In order to do that we can make use of the Rasch property of raw score sufficiency. We can use an MCMC algorithm that samples 0/1 matrices for fixed margins (implemented in the RaschSampler package). One we have sampled our matrices, we can construct statistics to test detailed item aspects such as local independence violations (see Ponocny, 2001, who proposed a whole set of such test statistics). c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 11 / 22 Model Test III: Itemfit Itemfit statistics are typically less strict than the approaches presented above. The big advantage is that they are generally applicable, i.e. pretty much for any type of IRT model (and sometimes this is all we can do). They are based on residuals. What are residuals in an IRT model? c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 12 / 22 Model Test III: Itemfit Itemfit statistics are typically less strict than the approaches presented above. The big advantage is that they are generally applicable, i.e. pretty much for any type of IRT model (and sometimes this is all we can do). They are based on residuals. What are residuals in an IRT model? As in any other statistical model, residuals refer to the difference between observed values and fitted values. Fitted values in Rasch models are the item solving probabilities from the Rasch formula. Based on these components we can compute standardized residuals: zvi = p xvi − pvi pvi (1 − pvi ) These residuals are the base for a variety of itemfit (and personfit) statistics. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 12 / 22 Model Test III: Itemfit We can construct a χ2 -statistic based on which we can eliminate items (the ones that are significant). Further, we can compute item infit and item outfit statistics. These are measures that compare the variance in the observed patterns with the variance in the fitted patterns. Outfit: unweighted SSQ zvi ; is sensitive to the outlying scores. Infit: weighted SSQ zvi (weighted by residual variance); puts more weight to the performances of persons closer to the item value. Cutoffs: MSQ > 1.3: upper cutoff, observed patterns have 30% more variance than fitted pattern. MSQ < 0.7: lower cutoff, observed patterns have 30% less variance than fitted pattern. In a similar way we can establish personfit indices. More on that later. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 13 / 22 Additional Fit Topics: Performing a PCA on the residuals, i.e. on the “unexplained” part, not accounted for by the Rasch model. Sometimes this is called Rasch factor analysis. The eigenvalues in the scree plot should be around 1. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 14 / 22 Additional Fit Topics: Performing a PCA on the residuals, i.e. on the “unexplained” part, not accounted for by the Rasch model. Sometimes this is called Rasch factor analysis. The eigenvalues in the scree plot should be around 1. Personfit: We can also explore whether persons fit the Rasch model. They are computed in the same way as itemfit statistics; we just sum up across the items. Let’s look at a few response interesting patterns (items sorted according to their difficulty) and assign characteristic MSQ values: Pattern 111...0110110100...000 111...1111100000...000 011...1111100000...000 000...0000011111...000 111...1111100000...001 111...1000011110...000 c 2014 Patrick Mair Diagnosis Rasch person deterministic carelessness miscode lucky guessing special knowledge Infit MSQ 1.1 0.5 1.0 4.3 1.0 1.3 Outfit MSQ 1.0 0.3 3.8 12.6 3.8 0.9 Fit Type good overfit underfit underfit mixed mixed Psych 3490 – 15 – Dichotomous IRT Models – 14 / 22 [ Dichotomous IRT Models ] 2-PL, 3-PL c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 15 / 22 2-PL Model Specification The 2-PL (Birnbaum, 1968) adds an item discrimination (slope) parameter αi to the Rasch model. That is, items can have different slopes and, therefore, they are allowed to intersect. P (Xvi = 1) = exp(αi (θv − βi )) 1 + exp(αi (θv − βi )) The 2-PL is the most popular dichotomous IRT model since it’s more relaxed than the Rasch model (but we loose many of the nice Rasch properties). The item and person parameters translate nicely into a factor analytic setting (intercept-slope specification, factor scores). The possibilities for model testing are somewhat limited. People use itemfit statistics or some other χ2 -derived measures (see e.g. de Ayala, 2008, for an overview). You can do model comparison using the LR-principle. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 16 / 22 Relation to Factor Analysis Takane and De Leeuw (1987) showed the relation between IRT models and factor analysis. Within a factor analytic context, αi represents the loading (→ no further conversion needed): λ1i = αi . The difficulty βi can be transformed into an intercept parameter as follows: λ0i = −βi αi c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 17 / 22 Relation to Factor Analysis Takane and De Leeuw (1987) showed the relation between IRT models and factor analysis. Within a factor analytic context, αi represents the loading (→ no further conversion needed): λ1i = αi . The difficulty βi can be transformed into an intercept parameter as follows: λ0i = −βi αi Thus, instead of having a multiplicative parameterization ! pi log = αi (θ − βi ), 1 − pi we have a linear parameterization in terms of ! pi log = λ0i + λ1i θ. 1 − pi c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 17 / 22 3-PL Model Specification The 3-PL adds a lower-asymptote (guessing) parameter to the 2-PL. P (Xvi = 1) = γi + (1 − γi ) c 2014 Patrick Mair exp(αi (θv − βi )) 1 + exp(αi (θv − βi )) Psych 3490 – 15 – Dichotomous IRT Models – 18 / 22 3-PL Model Specification The 3-PL adds a lower-asymptote (guessing) parameter to the 2-PL. P (Xvi = 1) = γi + (1 − γi ) exp(αi (θv − βi )) 1 + exp(αi (θv − βi )) Note: This model can be very tricky to estimate. Often you end up in a situation where you don’t reach convergence. You can: play around with the control() argument in ltm, using a Bayesian approach (not implemented in R yet). Some other, rather esoteric models like the 4-PL have been proposed, but they are not really relevant. For both 2-PL and 3-PL: Compared to the Rasch model we get rid of the sufficiency assumption (we have weighted item sums) and the parallel ICC assumption – due to the discrimination parameter. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 18 / 22 Item and Test Information Sometimes our test should cover a large spectrum of the trait and we want to know which item is informative in which area of the trait. For a 2-PL the item information is Ii (θ) = α2i (pi (θ)(1 − pi (θ)). This measure can be aggregated to the test information: TI(θ) = k X Ii (θ). i =1 These plots give a very clear picture which areas of the trait are covered by each individual item and the whole scale, respectively. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 19 / 22 [ Dichotomous IRT Models ] Extensions c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 20 / 22 Extensions Bayesian IRT: That’s a fairly recent development but, of course, we can estimate the item and person parameters within a Bayesian context. This is super helpful if we have small sample sizes and/or multidimensional models. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 21 / 22 Extensions Bayesian IRT: That’s a fairly recent development but, of course, we can estimate the item and person parameters within a Bayesian context. This is super helpful if we have small sample sizes and/or multidimensional models. Posing restrictions on the item parameters: Linear Logistic Test Model (LLTM) and a relaxed version of it called LLRA. Such restrictions (design) can be cognitive operations (e.g. in Raven matrices tests). These models can be also used for longitudinal IRT modeling. It’s all implemented in eRm. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 21 / 22 Extensions Bayesian IRT: That’s a fairly recent development but, of course, we can estimate the item and person parameters within a Bayesian context. This is super helpful if we have small sample sizes and/or multidimensional models. Posing restrictions on the item parameters: Linear Logistic Test Model (LLTM) and a relaxed version of it called LLRA. Such restrictions (design) can be cognitive operations (e.g. in Raven matrices tests). These models can be also used for longitudinal IRT modeling. It’s all implemented in eRm. The person parameters can be regarded as new metric variable and can be used for further analysis such as ANOVA, regression, path models, graphical models, etc. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 21 / 22 Extensions Bayesian IRT: That’s a fairly recent development but, of course, we can estimate the item and person parameters within a Bayesian context. This is super helpful if we have small sample sizes and/or multidimensional models. Posing restrictions on the item parameters: Linear Logistic Test Model (LLTM) and a relaxed version of it called LLRA. Such restrictions (design) can be cognitive operations (e.g. in Raven matrices tests). These models can be also used for longitudinal IRT modeling. It’s all implemented in eRm. The person parameters can be regarded as new metric variable and can be used for further analysis such as ANOVA, regression, path models, graphical models, etc. Rasch models can also be fitted within a mixed-effects framework (De Boeck et al., 2011). This gives tons of flexibility in terms of adding covariates, testing DIF, etc. c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 21 / 22 References Books: Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model (2nd ed.). Erlbaum. de Ayala, R. J. (2008). The Theory and Practice of Item Response Theory. Guilford. Fox, J. P. (2010). Bayesian Item Response Modeling: Theory and Applications. Springer. Articles: Mair, P. & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9), 1–20. URL: http://www.jstatsoft.org/v20/i09 Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response analysis. Journal of Statistical Software, 17(5), 1–25. URL: http://www.jstatsoft.org/v17/i05 Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437–460. Verhelst, N, Hatzinger, R. & Mair, P. (2007). The Rasch Sampler. Journal of Statistical Software, 20(4), 1–14. URL: http://www.jstatsoft.org/v20/i04 c 2014 Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 22 / 22
© Copyright 2024 ExpyDoc