Unit 15 - Dichotomous IRT

Chapter 15
Dichotomous IRT Models
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 0 / 22
Today’s Menu
Today we introduce parametric IRT models for dichotomous
(binary) data:
Rasch model: concepts, formulation, implications.
Birnbaum models: two- and three-parameter logistic model.
Extensions: Bayesian IRT, IRT mixed effects framework,
restricted models, longitudinal models.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 1 / 22
Let’s Repeat
In the last unit we’ve introduced nonparametric IRT models (Mokken scales):
Monotone homogeneity model (MHM): Ordering persons on latent trait.
Double monotonicity model (DMM): Ordering persons and items on latent
trait.
An important feature of Mokken scale is that we can explore monotonicity of the
IRFs (ICCs). Parametric models are typically logit models that force the ICCs to
be logistic.
Parametric models map item and person parameters on a common trait
continuum (interval scaled). We can perform all sorts of statistical tests.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 2 / 22
[ Dichotomous IRT Models ]
The Rasch Model
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 3 / 22
Georg Rasch (1901–1980)
Rasch was a Danish mathematician
and philosopher and revolutionized
Psychometrics in the 60s.
Most important publications:
Rasch, G. (1960). Probabilistic models for some
intelligence and attainment tests. Copenhagen,
Danish Institute for Educational Research.
Rasch, G. (1961). On general laws and the
meaning of measurement in psychology. In
Proceedings of the Fourth Berkeley Symposium on
Mathematical Statistics and Probability, IV, pp.
321–334. Berkeley.
Rasch, G. (1977). On Specific Objectivity: An
attempt at formalizing the request for generality and
validity of scientific statements. The Danish
Yearbook of Philosophy, 14, 58–93.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 4 / 22
Rasch Model Formulation
Let X be a binary n × k data matrix:
P (Xvi = 1) =
exp(θv − βi )
1 + exp(θv − βi )
mit βi (i = 1, . . . , k ) as item difficulty parameters and θv (v = 1, . . . , n ) as person
ability parameters. Both are on an interval scale.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 5 / 22
Rasch Model Formulation
Let X be a binary n × k data matrix:
P (Xvi = 1) =
exp(θv − βi )
1 + exp(θv − βi )
mit βi (i = 1, . . . , k ) as item difficulty parameters and θv (v = 1, . . . , n ) as person
ability parameters. Both are on an interval scale.
Assumptions in the Rasch model:
Raw score sufficiency: The raw scores (row and column sums) of X
contain all the information we need.
Unidimensionality: We assume a unidimensional latent trait.
Local independence: Given the ability θ, the item responses are
independent.
Parallel item characteristic curves.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 5 / 22
Sufficiency
This is an absolutely crucial assumption with significant practical implications:
You are only allowed to sum up the item scores to a total score iff the
Rasch model holds.
This implies that the items have a weight of 1 (the 2-PL relaxes this
assumption).
In factor analysis terms this means that all the loadings are 1.
The Rasch model doesn’t care which items you’ve solved correctly; it just
requires the raw scores.
It can be shown that all other Rasch assumptions are a consequence of raw
score sufficiency (Fischer, 1995).
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 6 / 22
Parameter Estimation
In general, IRT parameter estimation is carried out in 2 steps:
1
estimate the item parameters,
2
use these item parameters to estimate the person parameters (either ML
or Bayesian).
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 7 / 22
Parameter Estimation
In general, IRT parameter estimation is carried out in 2 steps:
1
estimate the item parameters,
2
use these item parameters to estimate the person parameters (either ML
or Bayesian).
Apart from Bayesian approaches, there are two reasonable ways to estimate the
item parameters:
Making use of the sufficiency property and use a conditional maximum
likelihood (CML) approach: no assumptions on the trait, limited to Rasch
models only, many possibilities for model/parameter testing. (package eRm)
Marginal maximum likelihood (MML): trait needs to be normally distributed,
works for higher-parameterized IRT models as well. (package ltm)
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 7 / 22
Parameter Estimation
In Rasch models, CML estimation is the way to go. Besides the desirable
properties mentioned above, it implies also that the item parameters across
ANY person subgroup have to be the same.
The CML estimation approach is the mathematical “translation” of Rasch’s
epistemological theory of specific objectivity:
Persons v and v ′ (with θˆv and θˆv ′ ) can be compared independently from
the other persons in the sample and independently from the presented
item subset Ψ (on an interval scale).
Items i and i ′ (with βˆi and βˆi ′ ) can be compared independently from the
remaining items in Ψ and independently from the persons in the sample.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 8 / 22
Parameter Estimation
In Rasch models, CML estimation is the way to go. Besides the desirable
properties mentioned above, it implies also that the item parameters across
ANY person subgroup have to be the same.
The CML estimation approach is the mathematical “translation” of Rasch’s
epistemological theory of specific objectivity:
Persons v and v ′ (with θˆv and θˆv ′ ) can be compared independently from
the other persons in the sample and independently from the presented
item subset Ψ (on an interval scale).
Items i and i ′ (with βˆi and βˆi ′ ) can be compared independently from the
remaining items in Ψ and independently from the persons in the sample.
Implication: sample independent (objective) testing (Rasch, 1961, 1977).
Statements about persons independently from the population and the items
presented.
Formulation of model tests based on this assumption.
Representativity of the sample is of less importance.
Separability of item and person parameters in the estimation.
Test fairness (e.g. no gender effects, no cultural differences).
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 8 / 22
Rasch Model: Example
To summarize, if a Rasch model fits your data, the corresponding scale fulfills
highest measurement standards (Rasch model as a seal of quality for your
scale).
Example: Let’s look at a real life dataset based on the results on the Math
exams at the Vienna University of Economics. We had multiple choice items; the
responses were binarized to 0 (wrong) and 1 (correct).
In total we have 20 different items related to various aspects of Math education
at this University. The sample size is 9404.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 9 / 22
Model Test I
The classical test in Rasch models is Andersen’s LR test. It makes use of
subgroup invariance of the item parameters:
Fit the Rasch model for all selected items.
Fit single Rasch models for each person subgroup.
Compute a LR-statistic: if significant, there are significant differences in
the item parameters between the subgroups.
A graphical inspection helps to eliminate items. We have to run this process
repeatedly since we should only eliminate one item at a time. eRm provides
automated stepwise selection functions.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 10 / 22
Model Test II: Permutation Tests
A modern approach to Rasch model testing are permutation tests.
We have to find permutations of our data matrix which are consistent with
the Rasch model.
In order to do that we can make use of the Rasch property of raw score
sufficiency.
We can use an MCMC algorithm that samples 0/1 matrices for fixed
margins (implemented in the RaschSampler package).
One we have sampled our matrices, we can construct statistics to
test detailed item aspects such as local independence violations
(see Ponocny, 2001, who proposed a whole set of such test
statistics).
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 11 / 22
Model Test III: Itemfit
Itemfit statistics are typically less strict than the approaches presented above.
The big advantage is that they are generally applicable, i.e. pretty much for any
type of IRT model (and sometimes this is all we can do).
They are based on residuals. What are residuals in an IRT model?
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 12 / 22
Model Test III: Itemfit
Itemfit statistics are typically less strict than the approaches presented above.
The big advantage is that they are generally applicable, i.e. pretty much for any
type of IRT model (and sometimes this is all we can do).
They are based on residuals. What are residuals in an IRT model?
As in any other statistical model, residuals refer to the difference between
observed values and fitted values. Fitted values in Rasch models are the item
solving probabilities from the Rasch formula. Based on these components we
can compute standardized residuals:
zvi = p
xvi − pvi
pvi (1 − pvi )
These residuals are the base for a variety of itemfit (and personfit) statistics.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 12 / 22
Model Test III: Itemfit
We can construct a χ2 -statistic based on which we can eliminate items (the
ones that are significant).
Further, we can compute item infit and item outfit statistics. These are measures
that compare the variance in the observed patterns with the variance in the fitted
patterns.
Outfit: unweighted SSQ zvi ; is sensitive to the outlying scores.
Infit: weighted SSQ zvi (weighted by residual variance); puts more weight
to the performances of persons closer to the item value.
Cutoffs:
MSQ > 1.3: upper cutoff, observed patterns have 30% more variance than
fitted pattern.
MSQ < 0.7: lower cutoff, observed patterns have 30% less variance than
fitted pattern.
In a similar way we can establish personfit indices. More on that later.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 13 / 22
Additional Fit Topics:
Performing a PCA on the residuals, i.e. on the “unexplained” part, not
accounted for by the Rasch model. Sometimes this is called Rasch factor
analysis. The eigenvalues in the scree plot should be around 1.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 14 / 22
Additional Fit Topics:
Performing a PCA on the residuals, i.e. on the “unexplained” part, not
accounted for by the Rasch model. Sometimes this is called Rasch factor
analysis. The eigenvalues in the scree plot should be around 1.
Personfit: We can also explore whether persons fit the Rasch model. They are
computed in the same way as itemfit statistics; we just sum up across the items.
Let’s look at a few response interesting patterns (items sorted according to their
difficulty) and assign characteristic MSQ values:
Pattern
111...0110110100...000
111...1111100000...000
011...1111100000...000
000...0000011111...000
111...1111100000...001
111...1000011110...000
c 2014
Patrick Mair Diagnosis
Rasch person
deterministic
carelessness
miscode
lucky guessing
special knowledge
Infit MSQ
1.1
0.5
1.0
4.3
1.0
1.3
Outfit MSQ
1.0
0.3
3.8
12.6
3.8
0.9
Fit Type
good
overfit
underfit
underfit
mixed
mixed
Psych 3490 – 15 – Dichotomous IRT Models – 14 / 22
[ Dichotomous IRT Models ]
2-PL, 3-PL
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 15 / 22
2-PL Model Specification
The 2-PL (Birnbaum, 1968) adds an item discrimination (slope) parameter αi to
the Rasch model. That is, items can have different slopes and, therefore, they
are allowed to intersect.
P (Xvi = 1) =
exp(αi (θv − βi ))
1 + exp(αi (θv − βi ))
The 2-PL is the most popular dichotomous IRT model since it’s more relaxed
than the Rasch model (but we loose many of the nice Rasch properties). The
item and person parameters translate nicely into a factor analytic setting
(intercept-slope specification, factor scores).
The possibilities for model testing are somewhat limited. People use itemfit
statistics or some other χ2 -derived measures (see e.g. de Ayala, 2008, for an
overview). You can do model comparison using the LR-principle.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 16 / 22
Relation to Factor Analysis
Takane and De Leeuw (1987) showed the relation between IRT models and
factor analysis. Within a factor analytic context, αi represents the loading (→ no
further conversion needed): λ1i = αi .
The difficulty βi can be transformed into an intercept parameter as follows:
λ0i = −βi αi
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 17 / 22
Relation to Factor Analysis
Takane and De Leeuw (1987) showed the relation between IRT models and
factor analysis. Within a factor analytic context, αi represents the loading (→ no
further conversion needed): λ1i = αi .
The difficulty βi can be transformed into an intercept parameter as follows:
λ0i = −βi αi
Thus, instead of having a multiplicative parameterization
!
pi
log
= αi (θ − βi ),
1 − pi
we have a linear parameterization in terms of
!
pi
log
= λ0i + λ1i θ.
1 − pi
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 17 / 22
3-PL Model Specification
The 3-PL adds a lower-asymptote (guessing) parameter to the 2-PL.
P (Xvi = 1) = γi + (1 − γi )
c 2014
Patrick Mair exp(αi (θv − βi ))
1 + exp(αi (θv − βi ))
Psych 3490 – 15 – Dichotomous IRT Models – 18 / 22
3-PL Model Specification
The 3-PL adds a lower-asymptote (guessing) parameter to the 2-PL.
P (Xvi = 1) = γi + (1 − γi )
exp(αi (θv − βi ))
1 + exp(αi (θv − βi ))
Note: This model can be very tricky to estimate. Often you end up in a situation
where you don’t reach convergence. You can:
play around with the control() argument in ltm,
using a Bayesian approach (not implemented in R yet).
Some other, rather esoteric models like the 4-PL have been proposed, but they
are not really relevant.
For both 2-PL and 3-PL: Compared to the Rasch model we get rid of the
sufficiency assumption (we have weighted item sums) and the parallel ICC
assumption – due to the discrimination parameter.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 18 / 22
Item and Test Information
Sometimes our test should cover a large spectrum of the trait and we want to
know which item is informative in which area of the trait. For a 2-PL the item
information is
Ii (θ) = α2i (pi (θ)(1 − pi (θ)).
This measure can be aggregated to the test information:
TI(θ) =
k
X
Ii (θ).
i =1
These plots give a very clear picture which areas of the trait are covered by
each individual item and the whole scale, respectively.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 19 / 22
[ Dichotomous IRT Models ]
Extensions
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 20 / 22
Extensions
Bayesian IRT: That’s a fairly recent development but, of course, we can estimate
the item and person parameters within a Bayesian context. This is super helpful
if we have small sample sizes and/or multidimensional models.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 21 / 22
Extensions
Bayesian IRT: That’s a fairly recent development but, of course, we can estimate
the item and person parameters within a Bayesian context. This is super helpful
if we have small sample sizes and/or multidimensional models.
Posing restrictions on the item parameters: Linear Logistic Test Model (LLTM)
and a relaxed version of it called LLRA. Such restrictions (design) can be
cognitive operations (e.g. in Raven matrices tests). These models can be also
used for longitudinal IRT modeling. It’s all implemented in eRm.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 21 / 22
Extensions
Bayesian IRT: That’s a fairly recent development but, of course, we can estimate
the item and person parameters within a Bayesian context. This is super helpful
if we have small sample sizes and/or multidimensional models.
Posing restrictions on the item parameters: Linear Logistic Test Model (LLTM)
and a relaxed version of it called LLRA. Such restrictions (design) can be
cognitive operations (e.g. in Raven matrices tests). These models can be also
used for longitudinal IRT modeling. It’s all implemented in eRm.
The person parameters can be regarded as new metric variable and can be
used for further analysis such as ANOVA, regression, path models, graphical
models, etc.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 21 / 22
Extensions
Bayesian IRT: That’s a fairly recent development but, of course, we can estimate
the item and person parameters within a Bayesian context. This is super helpful
if we have small sample sizes and/or multidimensional models.
Posing restrictions on the item parameters: Linear Logistic Test Model (LLTM)
and a relaxed version of it called LLRA. Such restrictions (design) can be
cognitive operations (e.g. in Raven matrices tests). These models can be also
used for longitudinal IRT modeling. It’s all implemented in eRm.
The person parameters can be regarded as new metric variable and can be
used for further analysis such as ANOVA, regression, path models, graphical
models, etc.
Rasch models can also be fitted within a mixed-effects framework (De Boeck et
al., 2011). This gives tons of flexibility in terms of adding covariates, testing DIF,
etc.
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 21 / 22
References
Books:
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model (2nd ed.). Erlbaum.
de Ayala, R. J. (2008). The Theory and Practice of Item Response Theory. Guilford.
Fox, J. P. (2010). Bayesian Item Response Modeling: Theory and Applications. Springer.
Articles:
Mair, P. & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R.
Journal of Statistical Software, 20(9), 1–20. URL: http://www.jstatsoft.org/v20/i09
Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response analysis. Journal of Statistical
Software, 17(5), 1–25. URL: http://www.jstatsoft.org/v17/i05
Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437–460.
Verhelst, N, Hatzinger, R. & Mair, P. (2007). The Rasch Sampler. Journal of Statistical Software, 20(4), 1–14. URL:
http://www.jstatsoft.org/v20/i04
c 2014
Patrick Mair Psych 3490 – 15 – Dichotomous IRT Models – 22 / 22