1. Introduction
Educational tests are often taken in a computerised form, which allows one to not only collect the students’ responses, but also the response times. This can be useful since response times can be an important source of information about the students’ performance (Luce, Reference Luce1986; van der Linden, Reference van der Linden2009). One of the most popular approaches for the joint modelling of item response accuracies and their response times in educational measurement is the hierarchical framework (van der Linden, Reference van der Linden2007). In this framework the dependence between response time and accuracy of an item is taken to be fully explained by a correlation between a person’s overall ability and overall speed, such that conditional on the latent variables speed and ability for each item i the response time
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_i$$\end{document}
and the response accuracy
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$X_i$$\end{document}
are assumed to be independent.
The hierarchical framework has been successfully used in several applications in educational and psychological testing (van der Linden, Reference van der Linden2008; van der Linden & Guo, Reference van der Linden and Guo2008; Klein Entink, Kuhn, Hornke, & Fox, Reference Klein Entink, Kuhn, Hornke and Fox2009; Goldhammer & Klein Entink, Reference Goldhammer and Klein Entink2011; Loeys, Rossel, & Baten, Reference Loeys, Rossel and Baten2011; Petscher, Mitchell, & Foorman, Reference Petscher, Mitchell and Foorman2015; Scherer, Grei, & Hautamäki, Reference Scherer, Greiff and Hautamäki2015). Although the hierarchical model assuming conditional independence is convenient from a statistical point of view and provides clear interpretations of the individual differences in speed and accuracy and the relations between them, in some cases the fundamental assumption of conditional independence is violated, implying that the higher-level dependencies between the speed and the ability parameters do not fully explain the dependence between the response time and the response accuracy (Partchev & De Boeck, Reference Partchev and De Boeck2012; Ranger & Ortner, Reference Ranger and Ortner2012; Bolsinova & Maris, Reference Bolsinova and Maris2016). Conditional dependence between time and accuracy may for example arise from respondents varying their speed or using different strategies to solve the items throughout the test. In this paper we propose to explicitly model the residual dependence between time and accuracy within each item after the higher-level correlation between overall speed and ability has been taken into account.
In the hierarchical framework (van der Linden, Reference van der Linden2007) the joint distribution of response accuracy and response time is modelled as a product of the marginal distributions of accuracy and time, which are obtained using standard IRT and response time models, respectively (e.g., a two-parameter logistic model for response accuracy and a log-normal model for response time). A more general way of modelling the joint distribution of response time and accuracy that does not require conditional independence is to decompose their joint distribution into a product of a marginal and a conditional distribution in one of two ways. One possibility is to have a standard IRT model for the marginal distribution of response accuracy (e.g., a two-parameter logistic model) and multiply it with the conditional distribution of response time given a response being correct or incorrect, as has been suggested by Bloxom (Reference Bloxom1985). Van der Linden and Glas (Reference van der Linden and Glas2010) have pursued such an approach, but with the goal of developing a test for the assumption of conditional independence rather than obtaining a substantively interpretable joint model. A second possibility is to have a standard model for the marginal distribution of response time (e.g., a log-normal model) and multiply it with the conditional distribution of response accuracy given response time.
In this study we consider the latter approach: Letting the parameters of the response accuracy model depend on whether the response is relatively fast or slow. We consider this second approach, because this aims at improving the model for response accuracy, which is the model that is often most important for practical applications. This choice is in line with the idea that response accuracy could be affected when a respondent provides a response to a particular item that is faster or slower than would be expected based on that person’s overall speed. Extending the model for response accuracy by incorporating response time allows one to study in more detail the impact that the relative speed of the response has on the response accuracy. Research by Partchev and De Boeck (Reference Partchev and De Boeck2012) indicates that there likely are important differences in the response processes of fast versus slow responses. Working with the conditional distribution of response accuracy given the response time makes it possible to study these differences.
We consider an extension of the two-parameter model for response accuracy, in which both the intercept (i.e., item easiness) and the slope (i.e., the strength of the relationship between the probability of a correct response and the measured ability) of the item characteristic curve (ICC) are dependent on whether the response is relatively fast or slow. Including response time in the model for response accuracy has a long tradition in response time modelling (Roskam, Reference Roskam, Roskam and Suck1987; van Breukelen & Roskam, Reference van Breukelen, Roskam, Doignon and Falmagne1991; Verhelst, Verstralen, & Jansen, Reference Verhelst, Verstralen, Jansen, van der Linden and Hambleton1997; van Breukelen, Reference van Breukelen2005; Wang and Hanson, Reference Wang and Hanson2005; Wang, Reference Wang2006; Goldhammer, Naumann, & Greiff, Reference Goldhammer, Naumann and Greiff2015). An important aspect that differentiates our approach from most existing approaches (the work of Goldhammer et al. (Reference Goldhammer, Naumann and Greiff2015) being an exception) is that not only the main effect of time on accuracy (i.e., an effect on the intercept), but also the interaction effect between time and ability (i.e., an effect on the slope) is included in the model for response accuracy. In contrast to the work of Goldhammer et al. (Reference Goldhammer, Naumann and Greiff2015) we do not use raw response times but instead use the residual response time, which we argue in this paper is needed to separate the between-person correlation between speed and ability from the conditional dependence between time and accuracy. Additionally, an important added value of our model is that it considers these main and interaction effects for a joint model for response time and accuracy, which to our knowledge has not been done before.
It may be noted that one could also choose to model conditional dependence at the level of the joint distribution by specifying a bivariate distribution for response time and response accuracy that includes an item-level residual correlation, as is done by Ranger and Ortner (Reference Ranger and Ortner2012). However, this approach does not provide a direct translation of how the probability of a correct response changes as a function of response time, and corresponds to only allowing possible effects of response time on the intercept of the item response function, and not the slope. For these reasons we propose to introduce time-dependent parameters in the response accuracy model, which allows for more versatility and results in additional item parameters that have a straightforward interpretation.
In this paper we propose a modification of the hierarchical model that explicitly models the effects of the relative speed of a response on the parameters of the ICC. Instead of dichotomising the response times into fast versus slow, as in the IRTree approach of Partchev and De Boeck (Reference Partchev and De Boeck2012), we consider the relative speed of the response quantified by residual log-response time as a continuous measure that serves as a covariate for the parameters of the ICC. When using a continuous measure of speed no arbitrary dichotomisation of response time is required, which may have the advantage of avoiding a loss of information. Additionally, by focusing on residual response time, our model is able to take into account what the effect is of a response being relatively fast or slow for that particular person, which may closely match substantively relevant differences that may be expected to exist in the response processes of that person across items (e.g., the cognitive effects of speeding up or slowing down while using the same response strategy, or differences between using fast versus slow strategies). In this way, our approach both maintains the important correlation between overall speed and overall ability, as in the standard hierarchical model, and also takes important extra characteristics of each particular response into account to better explain response time and response accuracy.
The paper is organised as follows. In Sect. 2 we describe the specification of the hierarchical model for response time and accuracy that is extended in this paper. In Sect. 3 we introduce a motivating empirical example which we will be using throughout the paper. We show that for this dataset conditional independence between response time and accuracy as assumed by the hierarchical model is violated. In Sect. 4 a model for conditional dependence between response time and accuracy with item-specific effects on the intercept and the slope are described, as well as its constrained versions (common effect on the intercept and the slope, common effect on the intercept and item-specific effects on the slope, and common effect on the slope and item-specific effects on the intercept), and a Bayesian estimation method is proposed. In Sect. 5 we return to the empirical data set. Different models for conditional dependence between time and accuracy are fitted to these data. The best model is chosen based on the deviance information criterion, and its goodness-of-fit is investigated with posterior predictive checks. Substantive interpretations are given to the estimated model parameters. In order to provide evidence for the stability of the conclusions drawn from the data set of interest, we present a small-scale simulation study in Sect. 6 that investigates the parameter recovery when data are simulated using the estimates from the empirical data set as true values. The paper concludes with a discussion.
2. Specification of the Hierarchical Model
Let us by
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {X}}$$\end{document}
denote an
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N\times n$$\end{document}
matrix of responses of N persons to n items taking values of 1 if the response is correct and 0 otherwise, and by
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {T}}$$\end{document}
an
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N\times n$$\end{document}
matrix of the corresponding response times. It may be noted that unlike what is typical for experimental psychology, in educational and psychological measurement one observation is usually available for a combination of a person with an item, and the number of persons is typically much larger than the number of items.
The hierarchical model for response times and accuracy is (van der Linden, Reference van der Linden2007):
where
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$x_{pi}$$\end{document}
are the response time and accuracy of person p on item i,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\tau _p$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\theta _p$$\end{document}
are the speed and the ability parameters of person p, and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\gamma }}_i$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\delta }}_i$$\end{document}
are the vectors of item parameters of item i related to time and accuracy, respectively. At the lower level of the hierarchical model both the model for response accuracy and the model for response time need to be specified. At the higher level the models for the relationship between the person parameters (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\theta _p$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\tau _p$$\end{document}
) and for the relationship between the item parameters (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\delta }}_i$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\gamma }}_i$$\end{document}
) need to be specified. The hierarchical framework provides users with a plug-and-play approach with a flexible choice of models at the lower and the higher level. Below we describe the full specification of the hierarchical model assuming conditional independence which is considered in this study (note that this specification slightly deviates from the original article of van der Linden (Reference van der Linden2007)).
The model for the response accuracy is the following (Birnbaum, Reference Birnbaum, Lord and Novick1968):Footnote 1
that is,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\delta }}_i=\{\alpha _i,\beta _i\}$$\end{document}
, where
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _i>0$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _i$$\end{document}
are the slope and the intercept of the ICC of item i which relates the ability of the person to the probability of a correct response to the item. The slope
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _i$$\end{document}
reflects the discriminative power of the item, since it specifies the strength of the relationship between the latent ability and the response to the item, and the intercept
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _i$$\end{document}
reflects item easiness.
The response times are distributed according to the log-normal distribution (denoted by
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ln \mathcal {N}$$\end{document}
)
that is
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\gamma }}_i=\{\xi _i,\sigma ^2_i\}$$\end{document}
. The mean of the logarithm of response time of person p to item i depends on the item time intensity (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\xi _{i}$$\end{document}
) and the person speed, and the variance parameter depends on the item. The residual variance of log-response time
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma ^2_i$$\end{document}
can be interpreted as the inverse of item time discrimination (van der Linden, Reference van der Linden2006), that is, the larger
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\frac{1}{\sigma ^2_i}$$\end{document}
is, the larger the proportion of variance of response time explained by the variation of speed across persons is.Footnote 2
At the higher level, a multivariate normal distribution for the item parameters
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\{\xi _i,\ln (\sigma ^2_i),\ln (\alpha _i),\beta _i\}$$\end{document}
and a multivariate normal distribution for the person parameters
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\{\theta _p,\tau _p\}$$\end{document}
are assumed with the identifiability restrictions of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mu _{\theta }=\mu _{\tau }=0$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma ^2_{\theta }=1$$\end{document}
(for the explanation of sufficiency of these constraints, see van der Linden, Reference van der Linden2007).
3. Motivating Example: A Violation of Conditional Independence
We present an analysis of a data set of the Major Field Test for the Bachelor’s Degree in BusinessFootnote 3, which is a low-stakes test used to assess the mastery of concepts, principles, and knowledge of graduating bachelor students in business education. The test is not used for making individual-level decisions, but for evaluating educational programmes. It is a computerised test in which responses and response times are recorded automatically. The test consists of 120 multiple-choice (with four response options) items separated into two parts. Only the first part of the test was analysed in this study. The time limit for this part was one hour, while the average time used by the respondents was 42 minutes. Some items in the test are based on diagrams, charts and data tables. The test items cover a wide range of difficulties and are aimed at the evaluation of both depth and breadth of business knowledge. From the original sample with the responses of 1000 persons to 60 items,Footnote 4 11 items were removed due to low item-rest score correlations (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$<\!.1$$\end{document}
). The responses for which recorded response time was equal to 0 were treated as missing values.
To test the assumption of conditional independence between response times and accuracy given speed and ability we used the Lagrange Multiplier test of van der Linden and Glas (Reference van der Linden and Glas2010). In this test for each item the hierarchical model assuming conditional independence (van der Linden, Reference van der Linden2007) is tested against a model that allows for differences in the expected log-response time for correct and incorrect responses by including an extra item parameter in the model for the response times:
Figure 1 shows the distribution of the p values for the item-level conditional independence test. For more than half of the items, conditional independence is violated (using
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha =.05$$\end{document}
for each test). These results indicate that the assumption of conditional independence cannot be maintained for these data. However, as has been demonstrated in simulation studies (Bolsinova & Tijmstra, Reference Bolsinova and Tijmstra2016), the Lagrange multiplier test of van der Linden and Glas (Reference van der Linden and Glas2010) also may pick up violations of conditional independence that are of a different type than what is specified in Eq. 4. Therefore, this test does not yet tell us in what way exactly conditional independence is violated.
Distribution of the p values of the Lagrange Multiplier test for conditional independence between response time and accuracy. Most of the p values are below .05, indicating that conditional independence is violated.

To investigate in which way conditional independence is violated we performed two posterior predictive checks (Meng, Reference Meng1994; Gelman, Meng, & Stern, Reference Gelman, Meng and Stern1996; Sinharay, Johnson, & Stern, Reference Sinharay, Johnson and Stern2006) that focus on differences in the behaviour of the items with respect to response accuracy between slow and fast responses, following the approach proposed by Bolsinova and Tijmstra (Reference Bolsinova and Tijmstra2016). Here, we defined slow and fast responses by a median split by defining a transformed time variable:
where
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{med,i}$$\end{document}
is the median response time for item i. Our goal is to investigate whether observed differences between the slow and the fast responses with respect to the difficulty of the items and their discriminatory power are unlikely to be observed under the hierarchical model assuming conditional independence.
When posterior predictive checks are implemented, the measures of interest have to be repeatedly computed for replicated datasets. Therefore, for reasons of computational convenience we decided not to estimate IRT difficulty and discrimination parameters separately for the slow and for the fast responses, but to compute simple classical test theory statistics which can be viewed as proxies for the difficulty and discrimination, namely the proportion of correct responses and the item-rest correlation, respectively. For each item two discrepancy measures were computed: the difference between the proportion of correct responses to the item among slow responses and among fast responses,
and the difference between item-rest correlations of the item among slow and fast responses,
where
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {x}}_{i,slow}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {x}}_{i,fast}$$\end{document}
are vectors of responses of all persons such that
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}^*=1$$\end{document}
or
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}^*=0$$\end{document}
, respectively;
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {x}}_{+,slow}^{(i)}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {x}}_{+,fast}^{(i)}$$\end{document}
are vectors of the numbers of correct responses to all the items excluding item i of all persons such that
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}^*=1$$\end{document}
or
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}^*=0$$\end{document}
, respectively.
To assess whether the observed values
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$D_{1i}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$D_{2i}$$\end{document}
are plausible under conditional independence they can be compared to values drawn from the posterior predictive distribution of these measures given the data and the hierarchical model, which can be obtained using draws from the posterior distribution of the model parameters. The conditional independence model was estimated using a Gibbs Sampler. The prior distributions and the sampling procedure were specified following the specification of Bolsinova and Tijmstra (Reference Bolsinova and Tijmstra2016), which means that independent vague priors were used for the hyper-parameters (mean vector and covariance matrix of the item parameters, variance of speed, and correlation between speed and accuracy). Two independent chains with 10,000 iterations each (5000 iterations burn-in, 5 thinning) were used. Using each of these resulting 2000 samples a new replicated dataset was simulated according to the hierarchical model:
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {X}}^{(g)}_{rep}$$\end{document}
,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {T}}^{(g)}_{rep}$$\end{document}
, where superscript g denotes g-th sample from the posterior distribution. In each replicated dataset
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$D_{1i}^{(g)}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$D_{2i}^{(g)}$$\end{document}
were computed for each item. For each item two posterior predictive p values were computed:
If
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$p_{1i}$$\end{document}
is close to 0 or close to 1, it means that the difference between the proportion of correct responses among the slow and the fast responses observed in the empirical data is not likely under the model. Similarly, if
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$p_{2i}$$\end{document}
is close to 0 or is close to 1, the observed difference between the item-rest correlations is unlikely under the model. Figure 2 shows the histograms of the posterior predictive p values of the items for the model assuming conditional independence. The large number of extreme p values indicates that the model does not capture an important aspect of the data, namely that the items behave differently for the slow and the fast responses.
Posterior predictive p values for the hierarchical model assuming conditional independence: a difference between the proportion of correct responses to an item for slow and fast responses; b difference between the item-rest correlations for slow and fast responses.

Next, we investigated whether the observed deviation from conditional independence can be explained by the extended hierarchical model, in which the model for response time is extended (see Eq. 4), while the same model for response accuracy (see Eq. 2) is used as in the hierarchical conditional independence model. In this extension of the hierarchical model, conditional independence as specified in Eq. 1 is violated, and this violation is taken to be fully explained using the additional parameter
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\lambda _{i}$$\end{document}
. This way of modelling conditional dependence is in line with the first approach described in the Introduction, that is, the distribution of response time is modelled conditional on whether the response is correct or not.
To determine whether this extension of the hierarchical model is able to fully explain the observed deviation from conditional independence in the data, we analysed to what extent the observed
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$D_{1i}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$D_{2i}$$\end{document}
would be plausible under this model. Figure 3 shows the histograms of the posterior predictive p values for the extended hierarchical model. The situation has improved with respect to the difference in the proportion of correct responses, but not with respect to the difference in the item-rest correlations. The finding that there was only an improvement with respect to the differences in the proportion of correct responses could be expected, since the extended hierarchical model only allows for a shift in the mean conditional response time to correct the incorrect responses. That is, such a shift cannot account for any observed variation in the discriminative power of an item as a function of response time, as reflected in differences in the item-rest correlations. Based on these results, we suggest extending the hierarchical model such that it takes into account that both the proportion of correct responses to the items and their discriminative power might change as a function of response time.
Posterior predictive p values for the model with an extra parameter for the difference in response times distributions of the correct and incorrect responses: a difference between the proportion of correct responses to an item for slow and fast responses; b difference between the item-rest correlations for slow and fast responses.

4. Residual Log-Response Time as a Covariate for the Parameters of the ICC
4.1. Model Specification
When considering the possible effects of having relatively fast or slow responses, we want to disentangle the particular response time from the overall speed of a person and the overall time intensity of an item. That is, it is not
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}$$\end{document}
in isolation that informs us whether a response is relatively slow or fast, but rather the difference between
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}$$\end{document}
and the expected response time for person p on item i. Two identical response times might differ with regard to whether they are fast or slow, depending on the speed of the corresponding persons and the time intensity of the items. Because of this, it may be reasonable to use a standardised residual of the log-response time (derived from Eq. 3) to capture the extent to which a response should be considered to be fast or slow. Let us denote the standardised residual log-response time of person p to item i by
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
:
If
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}>0$$\end{document}
, it means that the response of person p to item i is relatively slow, while if
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}<0$$\end{document}
it is relatively fast. The residuals are standardised in order to make the regression coefficients specifying the effects of residual time on accuracy comparable across items by taking into account the differences in
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma _i$$\end{document}
. If conditional independence between response times and accuracy given ability and speed holds, then the probability of a correct response to item i does not depend on whether the response is relatively slow or relatively fast, given ability and speed.
We suggest to use a time-related covariate both for the intercept and for the slope of the ICC, such that it can also capture the difference between the discriminative power of the items for the slow and fast responses as observed in the empirical data. Let the slope and the intercept in Eq. 2 depend on the standardised residual of the log-response time:
Since the slope parameter in the two-parameter logistic model is restricted to positive values, a linear model for
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ln (\alpha _{pi})$$\end{document}
rather than for
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{pi}$$\end{document}
is used. Another reason for using a multiplicative effect for the slope instead of the linear effect is that the slope parameter is itself a multiplicative parameter. The parameters
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{0i}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{0i}$$\end{document}
are the baseline slope and the baseline intercept of the item response function of item i, which refer to the responses
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$x_{pi}$$\end{document}
which are given as fast as expected for person p on item i (i.e.,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}=0$$\end{document}
). The parameters
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
are the effects of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
on the slope and the intercept of the ICC. If
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}=1$$\end{document}
or
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}=0$$\end{document}
it means that there is no effect of residual log-response time of the slope or on the intercept, respectively. The sign of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
indicates whether relatively fast responses are more (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i} < 0$$\end{document}
) or less (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i} > 0$$\end{document}
) often correct than relatively slow responses. Likewise, depending on the value of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
, relatively fast responses either contain more (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i} < 1$$\end{document}
) or less (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i} > 1$$\end{document}
) information about ability. As with any IRT model, these inferences should only be taken to hold generally, and may break down when very extreme or aberrant response processes are considered (e.g., with very extreme residual response times).
If one would assume that persons keep a constant speed across items (which is usually assumed within the hierarchical modelling framework), then
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
would be closely related to the conditional accuracy function (van Breukelen, Reference van Breukelen2005), since the residual response time does not reflect a change in effective speed as a latent variable. It may also be noted that the marginal model for response accuracy (i.e., after integrating out response times) is no longer a two-parameter logistic model. Although theoretically conditional independence between response time and accuracy can be checked by testing the fit of the two-parameter logistic model for response accuracy, this kind of test would have low power compared to tests designed specifically to detect conditional dependence such as the posterior predictive checks that are developed in this paper.
The full model allows the effects of the covariate to vary across the items. However, one might assume that the effect of responding relatively fast or slow is the same for all the items, choosing one of the constrained models: equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
and equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
for all items, equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
but varying
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
, or equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
but varying
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
. It may be noted that if one chooses to model only the effects on the intercept (i.e., varying
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
while
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _1=1$$\end{document}
for all items) then the model is similar in structure to the model of Ranger and Ortner (Reference Ranger and Ortner2012) with the exception that we consider a logistic model for response accuracy instead of a normal ogive model.
As in the hierarchical model that assumes conditional independence, we need to specify the higher-level models for the person parameters and for the item parameters. The dependence between the item parameters of individual items is modelled by a multivariate normal distribution for the vector
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\{\xi _i,\ln (\sigma _i^2),\ln (\alpha _{0i}),\ln (\alpha _{1i}),\beta _{0i},\beta _{1i}\}$$\end{document}
with a mean vector
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\mu }}_{\mathcal {I}}$$\end{document}
and a covariance matrix
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\Sigma }}_{\mathcal {I}}$$\end{document}
. Logarithmic transformation is used for the parameters which are restricted to positive numbers. A multivariate normal distribution is used for speed and ability with the same identifiability restrictions as in the conditional independence model:
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mu _{\theta }=\mu _{\tau }=0$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma ^2_{\theta }=1$$\end{document}
. Hence, two person population parameters are freely estimated, namely the correlation between speed and ability (denoted by
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\rho _{\theta \tau }$$\end{document}
) and the variance of speed (denoted by
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma ^2_{\tau }$$\end{document}
).
4.2. Estimation
For the estimation of the model we developed a Gibbs Sampler (Geman & Geman, Reference Geman and Geman1984; Casella & George, Reference Casella and George1992) implemented in the R programming language (R Development Core Team, 2006) to obtain samples from the joint distribution of the model parameters:
which includes both the parameters of the individual persons and items and the hyper-parameters of the person population distribution and the item population distribution.
Although the variance of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\theta $$\end{document}
is constrained to 1, to improve the convergence of the model at each iteration of the Gibbs Sampler the full covariance matrix
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\Sigma }}_{\mathcal {P}}$$\end{document}
is sampled and at the end of each iteration all parameters are transformed to fit the scale defined by
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma ^2_{\theta }=1$$\end{document}
(see Appendix for details). We choose independent vague prior distributions for the item and the person hyper-parameters:
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal {N}(0,100)$$\end{document}
for the means of the item parameters, half t-distributions with
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\nu =2$$\end{document}
degrees of freedom and a scale parameter
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$A=2$$\end{document}
for the standard deviations of the item parameters, a marginally uniform joint distribution for the correlations between the item parameters (Huang & Wand, Reference Huang and Wand2013), and an inverse-Wishart distribution with 4 degrees of freedom and identity matrix
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathbf {I}_2$$\end{document}
as the scale parameter for
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\Sigma }}_{\mathcal {P}}$$\end{document}
(Hoff, Reference Hoff2009). Results are not sensitive to the specification of the prior scale parameter, because the posterior distribution is dominated by the data when the sample size is large (Hoff, Reference Hoff2009, p.110).
The estimation algorithm includes Metropolis–Hastings steps (Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, Reference Metropolis, Rosenbluth, Rosenbluth, Teller and Teller1953) and a modification of the composition algorithm by Marsman et al. (Reference Marsman, Maris, Bechger and Glas2014). In the Gibbs Sampler, the model parameters are subsequently sampled from their full conditional posterior distributions given the current values of all other parameter (see Appendix for details).
For model comparison purposes, modifications of the algorithm have also been developed to estimate the constrained models (equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
, equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
but varying
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
, or equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
but varying
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
), models with different time-related covariates (one might be interested in the effect of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}$$\end{document}
,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t^*_{pi}$$\end{document}
, as defined in Eq. 5, and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ln t_{pi}$$\end{document}
on the IRT parameters instead of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
), the hierarchical model assuming conditional independence, and the modified hierarchical model with an extra parameter for the difference in the location parameters of the distribution of the response times given a correct and an incorrect response (see Eq. 4).
4.3. Model Selection and Goodness-of-Fit
To select the best model the deviance information criterion [DIC] can be used, because it adequately takes the complexity of hierarchical models into account (Spiegelhalter, Best, Carlin, & van der Linde, Reference Spiegelhalter, Best, Carlin and van der Linde2002). The DIC can be computed using the output of the Gibbs Sampler. First, at each iteration (after discarding the burn-in and thinning) the deviance is computed. For example, for the full model the deviance is:
This expression does not include the hyper-parameters
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma ^2_{\tau }$$\end{document}
,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\rho _{\theta \tau }$$\end{document}
,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\mu }}_{\mathcal {I}}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\Sigma }}_{\mathcal {I}}$$\end{document}
, because the distribution of the data is independent of the hyper-parameters given the individual item and person parameters. Second, the deviance is computed for the posterior means of the model parameters:
The DIC is equal to:
where G is the total number of iterations which are taken into account when computing the DIC and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$p_D$$\end{document}
is the number of effective parameters which is equal to the difference
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\left( \frac{\sum _gD^{(g)}}{G}-\hat{D}\right) $$\end{document}
.
To evaluate the absolute fit of the best fitting model, posterior predictive checks for a global discrepancy measure between the data and the model can be used, for example the log-likelihood of the data under the model. For each g-th sample from the posterior distribution of the model parameters given the observed data, a replicated dataset (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {X}}_{rep}^{(g)},{\mathbf {T}}_{rep}^{(g)}$$\end{document}
) is simulated under the model and the log-likelihood is computed both for the observed data and the replicated data:
The posterior predictive p value is the proportion of samples in which the observed data are less likely under the model than the replicated data. If the posterior predictive p value is small, then the data are unlikely under the model. The goodness-of-fit can be further evaluated using posterior predictive checks based on
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$D_{1i}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$D_{2i}$$\end{document}
statistics (see Eqs. 6, 7).
5. Results
5.1. Fitted Models
Nine different models for response time and accuracy were fitted to the dataset of interest. First, the hierarchical model assuming conditional independence was estimated. Second, the modification of the hierarchical model with an extra parameter for the difference between the log-normal distributions of the response times given a correct and an incorrect response (see Eq. 4) was fitted. Third, four models with residual log-response time (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
) as a covariate for the parameters of the ICC were estimated: the full model and its three constrained versions (equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
, equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
but varying
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
, equal
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
but varying
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
). Finally, three models with alternative time-related covariates (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t_{pi}$$\end{document}
,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ln t_{pi}$$\end{document}
, and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$t^*_{pi}$$\end{document}
) for the parameters of the ICC were fitted.
5.2. Convergence
Convergence was assessed using
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\hat{R}}$$\end{document}
-statistic (Gelman & Rubin, Reference Gelman and Rubin1992) for all the hyper-parameters individually and overall with the multivariate scale reduction factor (Brooks & Gelman Reference Brooks and Gelman1998). For all fitted models all multivariate
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\hat{R}}$$\end{document}
and the multivariate scale reduction factor were smaller than 1.1, indicating that convergence was not an issue.
DIC of the fitted models.

5.3. Model Selection
The values of the DIC of the different models are presented in Table 1. As expected based on the results of the test for conditional independence, the hierarchical model assuming conditional independence fits worse than the models taking conditional dependence between time and accuracy into account. When models for response accuracy are considered that include
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
as a covariate for the item slope and intercept, allowing both these effects to vary across items improves the model, as evidenced by the fact that the full model has the lowest DIC, while the model with fixed effects has the highest DIC of the four models. It can also be observed that the full model outperforms the extension of the hierarchical model that includes a shift parameter (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\lambda _i$$\end{document}
) for the model for response time. Finally, the residual log-response time is a better covariate for the parameters of the ICC than the response time, the dichotomized response time or the log-response time, as can be seen from the comparison of the four full models with different specifications of the time-related covariates. Since the full model with
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
as a covariate is the best performing model, this is the model that will be the focus in the remaining of the paper.
5.4. Posterior Predictive Checks
In the previous subsection we concluded that the full model with
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
as a covariate fits the best of the fitted models. Now, we will further investigate its goodness-of-fit using posterior predictive checks. First, we performed a posterior predictive check for the global discrepancy measure. The posterior predictive p value is equal to .35 (i.e., the proportion of iterations in which the log-likelihood of the observed data was lower than the log-likelihood of the data replicated under the model), which means that the observed data are not much more unlikely under the model than data simulated under the model, providing support for the general fit of the model.
Second, we performed the same posterior predictive check as for the model assuming conditional independence in Sect. 2. Figure 4 shows the histogram of the posterior predictive p values for the difference between the proportion of correct responses to the items among the slow and the fast responses (see Eq. 6) and for the difference between the item-rest correlations among the slow and the fast responses (see Eq. 7). Neither of the two measures resulted in a disproportionate amount of extreme posterior predictive p values, which indicates that the model adequately captures these aspects of the data. These results are in line with what could be expected, since the posterior predictive checks focus on exactly the kind of dependencies that are meant to be captured by the added parameters
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
.
Posterior predictive p values for the full model with residual response time as a covariate for item parameters: a difference between the proportion of correct responses to an item for slow and fast responses; b difference between the item-rest correlations for slow and fast responses.

5.5. Effect of Residual Time on the ICC
Figure 5 displays the estimates of the effect of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
on the intercept and the slope of the ICC. For many of the items the credible intervals for these effects exclude 0 and 1, respectively, which indicates that the residual log-response time does have an effect on the behaviour of the items. The estimates of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
differ across items. However, for most of them the estimated value is below 1, which indicates that these items discriminate worse if the response is slower. The effect of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
on the intercept (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
) is more variable across items compared to the effects on the slope (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
). For most of the items the effects on the intercept are negative, that is, the probability of correct responses among the relatively slow responses (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}>0$$\end{document}
) was lower than among the relatively fast responses (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}<0$$\end{document}
). For some of the items
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
is positive, that is, the easiness of the item is higher for the relatively slow responses.
To zoom in on the differences between relatively fast responses (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi} = -1$$\end{document}
) and relatively slow responses (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi} = 1$$\end{document}
), we present the scatterplots of the predicted slopes and intercept of the items given these two values of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
(see Fig. 6). The predicted values for the intercepts (see Fig. 6a) lie both above and below the diagonal, meaning that for some items the probability of a correct response is higher for slow responses and for other items the probability of a correct response is higher for fast responses. The predicted values of the slopes (see Fig. 6b) lie mainly above the diagonal, meaning that for these items the relationship between the accuracy of the response and the ability
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\theta $$\end{document}
is stronger for fast responses.
Estimated effects of residual response time on the slope and the intercept of the ICC.

Predicted intercepts (a) and slopes (b) of the ICC given a slow response (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}=1$$\end{document}
) on the x-axis and given a fast response (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}=-1$$\end{document}
) on the y-axis computed using the estimated baseline intercept (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _0$$\end{document}
), effect of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
on the intercept (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i}$$\end{document}
), baseline slope (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{0i}$$\end{document}
) and effect of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
on the slope (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{1i}$$\end{document}
).

Table 2 presents the posterior means and the 95
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\%$$\end{document}
credible intervals for the means and the variances of the item parameters, and the correlations between them. On average the items have a low baseline discrimination (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-0.57$$\end{document}
on the log scale, corresponding to a baseline discrimination parameter of 0.57), and are somewhat easy (the mean of the baseline intercept is 0.17). The effects of residual log-response time on the intercept and on the logarithm of the slope are on average negative (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-0.21$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-0.27$$\end{document}
, respectively) but the variance of the effect is larger for the intercepts than for the slopes (0.15 and 0.04, respectively).
Between-item variances of the item parameters (on the diagonal), correlations between the item parameters (off-diagonal), and the mean vector of the item parameters, with their 95 % credible interval between brackets.

The baseline intercept of the ICC is strongly negatively correlated with the effects on the intercept (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-$$\end{document}
.75) and on the log of the slope (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-$$\end{document}
.62). Figure 7 shows the scatterplots of the effects of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
on the intercept (a) and on the slope (b) of the ICC against the baseline intercept of the ICC. For very difficult items the effect of being slow is positive and for easier items the effect of being slow is more and more negative. In other words, for very difficult items being slow is associated with a higher probability of a correct response, whereas for very easy items being slow decreases the probability of a correct response. Moreover, slow responses are less informative (have lower discrimination) than the fast responses for the easy items, and are either more informative or equally informative as the fast responses for the difficult items.
As can be observed in Table 2, the effect of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
on the intercept and the effect on the log of the slope are strongly correlated (.73). Part of this correlation is explained by considering the baseline intercept, which is negatively correlated with both effects. However, after conditioning on the baseline intercept a positive correlation of .47 remains. This can be taken to indicate that items differ in the extent to which differences between fast and slow responses are present. That is, some items show both a strong effect on the slope and the intercept, whereas for other items both effects are weaker, indicating that there may not be any large differences between fast and slow responses for those items.
The effects of the residual log-response time on the intercept (a) and on the slope (b) of the ICC on the y-axis against the baseline intercept of the ICC on the x-axis.

Item time intensity is negatively correlated with item baseline intercept (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-.44$$\end{document}
), indicating that more difficult items require more time. This negative correlation between item baseline intercept and time intensity is in line with expectations of van der Linden (Reference van der Linden2007). Furthermore, time intensity is positively correlated with the effects of residual time on the item intercept (.52) and the item slope (.44). The first of these two correlations means that in our example spending relatively more time on the time-intensive items increases the probability of a correct response while it decreases the probability of a correct response on items that do not require a lot of time. The second correlation implies that in this test time-intensive items are more informative if answered relatively slowly, whereas items with low time intensity discriminate better if they are answered fast relative to what is expected for the combination of the person and the item. However, if we condition on the baseline intercept these correlations decrease to .33 [.06, .57] and .19 [
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-$$\end{document}
.10, .48], respectively, that is, these correlations can be largely explained by the negative correlations of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{0}$$\end{document}
with both
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\xi $$\end{document}
and the effects of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
on the intercept and the slope.
5.6. Sensitivity Analysis: Robustness to Outliers
For the original analysis none of the response time outliers were removed. However, it is important to check if the presence of outliers with respect to response time affects the estimates of the model parameters. To do that, we fitted the full model with
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
as a covariate to the dataset without the responses that were considered to be possible outliers. Responses with the item-wise z-scores of the log-response times below the 0.1-th quantile or above the 99.9-th quantile of the standard normal distribution were identified as outliers, resulting in the removal of 514 responses out of the total of 49,000 responses.
Removing the outliers resulted in the decrease of standard deviation of speed from 0.33 [0.31, 0.34] in the original dataset to 0.28 [0.27, 0.29] in the dataset without the outliers, and in a weakening of the correlation between speed and accuracy from
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-$$\end{document}
.09 [
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-$$\end{document}
.16,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-$$\end{document}
.02] to
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-$$\end{document}
.02 [
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$-$$\end{document}
.09, .05]. These effects of the removal of the outliers are not very influential for the overall conclusions, since
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma _{\tau }$$\end{document}
is not the primary parameter of interest, and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\rho _{\theta \tau }$$\end{document}
was already too close to zero to assign any substantive relevance to it.
With respect to the estimates of the item hyper-parameters, removing outliers mostly affected the estimates related to the
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ln (\sigma ^2_i)$$\end{document}
, as could have been expected from the fact that removing extreme values from the sample decreases the estimated variance. Its mean and variance decreased and the correlations with other item parameters became less strong. The 95 % credible intervals for the correlations between
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ln (\sigma ^2_i)$$\end{document}
and three parameters (
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ln (\alpha _{1i}), \beta _{0i}, \beta _{1i}$$\end{document}
) included zero after the removal of the outliers. For this reason, we do not give any substantive interpretations to these correlations. Table 3 summarises the differences between the estimates of the item hyper-parameters after and before the removal of the outliers.
Difference between the estimates of the hyper-parameters of the items after the removal of the outliers compared to the original estimates.

6. Simulation Study
To assess parameter recovery of the model a simulation study based on the empirical example was performed. In this applied paper we are not aiming at showing the performance of the model for various combinations of item hyper-parameters, person hyper-parameters, test and sample sizes, but rather mainly at the specific combination of those factors from the empirical data that we are dealing with. Therefore, in the simulation study we used the estimates of the item and the person hyper-parameters to simulate replicated datasets of the same sample size (1000 persons) and the same number of items (49). To investigate how parameter recovery is affected by a decrease in sample size and number of items and to evaluate the applicability of the model in a wider range in conditions, three more conditions were considered:
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N=500, n=49$$\end{document}
;
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N=1000, n=25$$\end{document}
;
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N=500, n=25$$\end{document}
. For each condition, 100 datasets were simulated under the full model with
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$z_{pi}$$\end{document}
as a covariate for the parameters of the ICC. In each replication the model was fitted using the Gibbs Sampler with one chain of 10,000 iterations (including 5000 iterations of burn-in).
Table 4 shows the simulation results: the average expected a posteriori estimates of the hyper-parameters and the number of the replications (out of 100) in which the true value was within the 95
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\,\%$$\end{document}
credible interval. First, let us consider the results obtained when the same sample size and the number of items as in the empirical example were used. The mean vector of the item parameters, the standard deviation of speed and the correlation between speed and ability were correctly recovered. The correlations between the item parameters are estimated to be closer to zero than the true values, and the variances of the item parameters are slightly overestimated. However, this bias is relatively small and does not influence the substantive interpretation of the relations between the item parameters.
Results of the simulation study: the expected a posteriori (EAP) estimates of the hyper-parameters averaged across 100 replications and the number of replications in each the true value was within the
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$95\,\%$$\end{document}
credible interval.

When the sample size was reduced (500 instead of 1000), the results were not seriously affected. However, when the number of items was reduced (25 compared to 49), the bias of the variances of item parameters and of the correlations between them increased. This is likely due to the fact that these hyper-parameters were estimated based on a relatively small sample of items. In the condition with
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N=1000$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n=25$$\end{document}
, the number of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$95\,\%$$\end{document}
credible intervals which contained the true value is smaller than when the sample size was smaller. This can be explained by the fact that the posterior variance is smaller when the sample size is larger. Overall these results indicate that for accurate recovery of the item hyper-parameters test size should not be too small.
7. Discussion
In this paper we provided empirical evidence that a higher-level dependence between the persons’ speed and ability cannot always fully explain the dependence between response time and accuracy. For cases in which conditional independence is violated, we propose an approach to modelling the conditional dependence by introducing an effect of residual response time on the intercept and on the slope of the ICC. As evidenced in the empirical example, deviations of the response times from their expected values (determined by a person’s speed and an item’s time intensity) do structurally influence response accuracy. This shows that while there may be measurement error when using response times, there is still sufficient information in these residual response times to relevantly improve the model for response accuracy and detect substantively interesting patterns in these relationships.
The conclusions drawn from the fitted model in the empirical example are interesting from a substantive point of view. The negative correlation between the baseline item intercept and the effect of the residual response time on the intercept is consistent with the results of Goldhammer et. al (Reference Goldhammer, Naumann, Stelter, Tóth, Rölke and Klieme2014), who also provided evidence for the increase of the probability of a correct response for difficult items and the decrease of the probability of the correct response for easy items for slow responses. It is important to note that since most of the effects of residual log-response time on the item intercept are negative, this kind of conditional dependence cannot be explained by the speed-accuracy trade-off.
The average negative effect of the residual response time on the item slope is contradicting the findings regarding the ‘worst performance rule’ (Coyle, Reference Coyle2003), which predict that slow responses contain the most information about persons’ ability. One possible explanation could be that the ‘worst performance rule’ applies to the difficult items but not to the easy items (see Fig. 7), which are perhaps better answered using fast automated strategies. Another possible explanation for the decrease of the item discriminative power if a person takes more time on the item than expected, is that if responses are fast, then persons are more likely to use the same strategy, whereas the more time persons take the more diverse strategies they may use, hence making the relationship between the measured ability and the probability of a correct response weaker. However, care should be taken with any of these interpretations, since a long response time might also simply be a product of the respondent not having spent all of the recorded time on solving the item. In general, it may be noted that the model should be seen as dealing with different forms of normal processing and hence may not be appropriate for dealing aberrant responses, such as for example when response time goes to infinity.
Modelling conditional dependence between response time and accuracy allows one to find out more about the relationship between time and accuracy than is possible with just one overall correlation between ability and speed. In the presented example, we were able to detect interesting patterns of positive and negative relationships between time and accuracy, while overall the correlation between ability and speed was close to zero. It would be interesting to investigate whether similar conditional dependence phenomena would be observed in other datasets in which the correlation between the two latent traits would be strong and negative or strong and positive.
Appendix
Here, we describe a Gibbs Sampler for sampling from the joint posterior distribution of the model parameters, which is proportional to the product of the prior distribution and the density of the data:
Before the algorithm can be started initial values have to be specified. Identity matrices are used for
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\Sigma }}_{\mathcal {P}}$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\Sigma }}_{\mathcal {I}}$$\end{document}
; zero mean vector is used for
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\mu }}_{\mathcal {I}}$$\end{document}
. It is important for the initial values of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\xi }}$$\end{document}
,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\sigma }}^2$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\tau }}$$\end{document}
to be chosen close to where the posterior density is concentrated since these parameters determine the values of the residuals of log response times. First, random values are chosen for these parameters:
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\xi _{i0}\sim {\mathcal {N}}(0,1), \sigma ^2_{i0}\sim \ln {\mathcal {N}}(0,1), \forall i\in [1:n]$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\tau _{p0}\sim {\mathcal {N}}(0,1),\forall p\in [1:N]$$\end{document}
. Second, for 20 iterations, values are drawn from the conditional posterior distributions of each of these parameters given the response time data only and an improper prior
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$p(\varvec{\xi },\varvec{\sigma },{\varvec{\tau }})\propto \prod _i\frac{1}{\sigma ^2_i}$$\end{document}
. Random initial values are chosen for the parameters in the response accuracy models:
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha _{0i0}\sim \ln {\mathcal {N}}(0,0.2),\alpha _{1i0}\sim \ln {\mathcal {N}}(0,0.2),\beta _{0i0}\sim {\mathcal {N}}(0,0.5)$$\end{document}
,
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta _{1i0}\sim {\mathcal {N}}(0,0.5),\forall i\in [1:n]$$\end{document}
, and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\theta _p\sim \mathcal {N}(0,1),\forall p\in [1:N]$$\end{document}
.
After initialisation the algorithm goes through the steps described below, in which the parameters are sampled from their full conditional posterior distributions.
Step 1: For each person p sample the person speed parameter
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\tau _p$$\end{document}
from:
Sampling is done using Metropolis–Hastings algorithm with a candidate value drawn from the proposal density:
which is proportional to the product
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$p(\tau _p\,|\, {\varvec{\Sigma }}_{\mathcal {P}},\theta _p)f({\mathbf {T}}_p\,|\, \tau _p,\dots )$$\end{document}
. The acceptance ratio is equal to:
Step 2: For each item i sample the time intensity parameter from
Sampling is done using Metropolis–Hastings algorithm with a candidate value drawn from the proposal density
where
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mu _{\xi }^*$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma ^{*2}_{\xi }$$\end{document}
are the conditional mean and the conditional variance of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\xi _i$$\end{document}
given the other item parameters of item i. This proposal is proportional to the product of the density of the response times data and the density of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\xi _i$$\end{document}
given other item parameters of item i and the item hyper-parameters. The acceptance probability is:
Step 3: For each item i sample the residual variance of log-response time from
Metropolis–Hastings algorithm is used with a proposal distribution
and an acceptance probability
where
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mu _{\ln \sigma ^2}^*$$\end{document}
and
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma ^{*2}_{\ln \sigma ^2}$$\end{document}
are the conditional mean and the conditional variance of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ln \sigma ^2_i$$\end{document}
given the other item parameters of item i.
Step 4: For each person p sample person ability parameter from:
To sample from this distribution the single variable exchange algorithm (Marsman et al., Reference Marsman, Maris, Bechger and Glas2014) is used. First, sample a candidate value from
then using this value simulate a response vector
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {x}}^*$$\end{document}
:
The probability of accepting
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\theta ^*$$\end{document}
as a new value of
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\theta _p$$\end{document}
is:
Step 5: For each item i sample item parameters
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\{\alpha _{0i},\alpha _{1i},\beta _{0i},\beta _{1i}\}$$\end{document}
from
Metropolis–Hastings algorithm is used with a multivariate normal distribution with a mean vector equal to the current values of the parameters, all variances equal to 0.01 and all correlations equal to 0 as a proposal density.
Step 6: Sample the covariance matrix of person parameters from
This is the conditional posterior of the covariance matrix of a multivariate normal distribution, which given the Inverse-Wishart prior is known to be an Inverse-Wishart distribution (see for example, Hoff (Reference Hoff2009)):
Step 7: Sample the mean vector of the item parameters from
With a multivariate normal prior for
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\mu }}_{\mathcal {I}}$$\end{document}
, this conditional posterior is also a multivariate normal with a mean vector equal to
and the covariance matrix equal to
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\left( (100\mathbf {I}_6)^{-1}+n{\varvec{\Sigma }}_{\mathcal {I}}^{-1}\right) ^{-1}$$\end{document}
.
Step 8: Sample the covariance matrix of the item parameters from:
A Metropolis–Hastings algorithm is used to sample from this conditional posterior with the candidate value
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\varvec{\Sigma }}^*$$\end{document}
sampled from the Inverse-Wishart distribution with n degrees of freedom and the scale matrix equal to
such that the acceptance probability is equal to:
Step 9: Re-scale model parameters to equate the variance of ability to 1:



















