Measuring vacillations in reasoning

Revati Vijay Shivnekar; Nisheeth Srivastava

doi:10.1017/jdm.2024.15

Measuring vacillations in reasoning

Published online by Cambridge University Press: 16 May 2024

Revati Vijay Shivnekar

and

Nisheeth Srivastava

Show author details

Revati Vijay Shivnekar*: Affiliation:
Department of Cognitive Science, Indian Institute of Technology Kanpur, Kanpur, India
Nisheeth Srivastava: Affiliation:
Department of Cognitive Science, Indian Institute of Technology Kanpur, Kanpur, India
*: Corresponding author: Revati Vijay Shivnekar; Email: revatis@iitk.ac.in

Article contents

Abstract
Introduction
Existing measures of mental conflict
Moral reasoning
Logical reasoning
General discussion
Data availability statement
Funding statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

Our experience of reasoning is replete with conflict. People phenomenologically vacillate between options when confronted with challenging decisions. Existing experimental designs typically measure a summary of the experience of the conflict experienced throughout the choice process for any individual choice or even between multiple observers for a choice. We propose a new method for measuring vacillations in reasoning during the time-course of individual choices, utilizing them as a fine-grained indicator of cognitive conflict. Our experimental paradigm allows participants to report the alternative they were considering while deliberating. Through 3 experiments, we demonstrate that our measure correlates with existing summary judgments of conflict and confidence in moral and logical reasoning problems. The pattern of deliberation revealed by these vacillations produces new constraints for theoretical models of moral and syllogistic reasoning.

Keywords

conflict vacillations reasoning moral reasoning syllogistic reasoning

Type: Empirical Article
Information: Judgment and Decision Making , Volume 19 , 2024 , e15

DOI: https://doi.org/10.1017/jdm.2024.15 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Society for Judgment and Decision Making and European Association for Decision Making

1 Introduction

Conflict is a constant in our daily decisions, whether the choices are trivial, such as deciding between tea or coffee, or nontrivial, such as determining the best course of treatment for an ailing parent. When we deliberate such choices, we consciously consider multiple lines of arguments and counterfactuals before making a decision. Phenomenologically, these thoughts seem to appear to us sequentially with different arguments leading us to tentatively prefer different options one after another. We fluctuate in preferences mentally, vacillating back and forth between options as different considerations reveal themselves as we reason. In this article, we argue that such shifts or vacillations in our thoughts can be explicitly measured in real-time and thereby used to quantify the degree of conflict experienced during reasoning.

We investigate how observed vacillations during the decision-making process align with broader measures of conflict currently used in moral and logical reasoning research. To accomplish this purpose, we introduce a new experimental method that tracks mental vacillations by allowing reasoners to express their interim preferences during deliberation. During experimental validation of this method, participants were presented with a series of problems with 2 alternatives identified on the left or right arrow keys. They reported which of the two was the leading choice over the time-course of their reasoning process, thus revealing vacillations as switches in key presses. The sequence in which keys were pressed also allowed us to discern how many times people had changed their mind while reasoning. Figure 1 illustrates a representative deliberation phase in a trial. The green and red circles represent keys pressed by the participant during deliberation. In this example, the participant reported their preference 7 times and switched 4 times. Participants could conclude their deliberation at any point after 1 min had passed to record their final decision on the next page.

Figure 1 From Shivnekar and Srivastava (Reference Shivnekar and Srivastava2023). The figure depicts representative key-presses during the deliberation phase of a trial. After reading the problem, participants pressed these keys whenever they wished to record an interim preference. Green and red symbols are the LEFT and RIGHT key presses, respectively. Blue triangle indicates participant ending the deliberation to record the final judgment which they could do so only after 1 min was over.

In statistical analyses, we used these switches or vacillations in deliberation as an indicator of conflict experienced by the reasoner. We sought to validate this new measure of cognitive conflict vis-a-vis existing measures in this experimental paradigm.

2 Existing measures of mental conflict

Reasoning research has a long history in psychology and cognitive science alike. Despite this, measuring conflict within the realm of reasoning poses significant challenges, as noted in previous studies such as Tversky and Shafir (Reference Tversky and Shafir1992). Measures of conflict usually provide 1 summary measure per trial. We call such measures trial-level measures as these can differentiate whether one trial showed more conflict than another, but not how conflict unfolds within a trial. Reaction times (De Neys and Glumicic, Reference De Neys and Glumicic2008; Greene, Reference Greene2008; Greene et al., Reference Greene, Sommerville, Nystrom, Darley and Cohen2001, Reference Greene, Morelli, Lowenberg, Nystrom and Cohen2008; Trippas et al., Reference Trippas, Thompson and Handley2017), normative expectations of behavior within the given choice framework (Evans and Curtis-Holmes, Reference Evans and Curtis-Holmes2005; Greene and Haidt, Reference Greene and Haidt2002), and subjective ratings (Frey et al., Reference Frey, Johnson and De Neys2018; Mevel et al., 2015; Pennycook et al., Reference Pennycook, Fugelsang and Koehler2015a) serve as examples of trial-level conflict measures. These measures are commonly employed in both moral and logical reasoning experiments and have played a pivotal role in advancing theoretical considerations.

Conflict evolves in tandem with our thoughts that reveal contrasting arguments and choices to us. Given its dynamic nature, a more effective measurement tool for conflict needs to operate in real-time, tracking changes during deliberation while also being minimally intrusive to avoid substantial interference with the process being studied (Schulte-Mecklenbeck et al., Reference Schulte-Mecklenbeck, Johnson, Böckenholt, Goldstein, Russo, Sullivan and Willemsen2017). Researchers in moral and logical reasoning fields have used mouse-tracking, eye-tracking, think-aloud paradigms, and so forth, to map the processes underlying the decisions (Bacon et al., Reference Bacon, Handley and Newstead2003; Gürçay and Baron, Reference Gürçay and Baron2017; Purcell et al., Reference Purcell, Howarth, Wastell, Roberts and Sweller2022; Skulmowski et al., Reference Skulmowski, Bunge, Kaspar and Pipa2014; Swann Jr et al., Reference Swann, Gómez, Buhrmester, López-Rodríguez, Jiménez and Vázquez2014). In mouse-tracking studies designed to measure conflict, a typical trial starts with the problem presented centrally on the screen and 2 alternatives displayed in the upper left and right corners. Participants then navigate the cursor with the mouse from the starting position at the center of the bottom edge of the screen to the corner corresponding to their choice. The underlying assumption of mouse-tracking methods is that motor movements in a given period reflect the cognitive processes occurring during that period (Kieslich et al., Reference Kieslich, Henninger, Wulff, Haslbeck, Schulte-Mecklenbeck, Schulte-Mecklenbeck, Kuhberger and Johnson2019; Spivey et al., Reference Spivey, Grosjean and Knoblich2005). Consequently, when recording a choice, more curvature in the mouse trajectory between the start point and the choice corner suggests that the non-chosen alternative had a relatively greater influence on the decision-making process than when the trajectory is straighter. Therefore, challenging choices in which both alternatives are attractive should manifest as trajectories with greater curvature. Mouse movements are also utilized to elucidate the temporal dynamics of a choice. Researchers have also examined particular mouse movements, such as the cursor pointer crossing to the side of the non-chosen alternative before switching to the chosen alternative’s side, to evaluate whether a specific switching pattern is more predominant than others (Gürçay and Baron, Reference Gürçay and Baron2017; Koop, Reference Koop2013).

However, mouse-tracking measures, as described above, lack clarity regarding which part of the process is captured or whether the entirety of it is reflected in the response dynamics. These measures omit a substantial portion of the process under examination by restricting the analysis of the choice or the underlying process to how a response is produced. But when reasoning about a difficult problem, reasoning itself may contain signs of conflict. Eye-tracking methods, on the other hand, are not restricted to just the response dynamics. They operate on the assumption that the decision process is dynamic but gaze reveals what information is being favored currently (Ghaffari and Fiedler, Reference Ghaffari and Fiedler2018; Pärnamets et al., Reference Pärnamets, Johansson, Hall, Balkenius, Spivey and Richardson2015). Based on this assumption, eye tracking paradigms have used the gaze locations to decipher the moment-to-moment updating of preferences as decisions are being constructed. However, the validity of this assumption across individuals and decision contexts remains unclear. For example, a participant might focus on an alternative not to support it, but to gather evidence against it. In the case of conscious and reportable vacillation, as argued in this paper, gaze pattern shifts between alternatives are only inferential.

Think-aloud protocols offer a closer approximation to tracking the reasons and arguments produced by a reasoner during task deliberation. In these paradigms, participants are prompted to report their thoughts or explain their actions concurrently while completing a task. While thinking aloud during reasoning can provide a finer temporal resolution than mouse-tracking, the method has the potential to interfere with the reasoning itself. Indeed, when participants are asked to explain their actions while performing a task, it can alter their performance on the task (Gagne and Smith Jr, Reference Gagne and Smith1962). However, verbalizing thoughts during the task does not seem to improve or worsen performance but it may produce slower responses, presumably due to the additional processing time required for verbalization (Fox et al., Reference Fox, Ericsson and Best2011). Furthermore, verbal data analyses, such as componential analysis, demand meticulous attention to identifying the purpose and units of analysis, as well as establishing coding systems in advance (Van Someren et al., Reference Van Someren, Barnard and Sandberg1994).

2.1 A key-press paradigm for measuring mental vacillations

Measuring mental vacillations within choice trials is crucial for a realistic assessment of cognitive conflict and the differentiation of plausible theories of the reasoning process. As we note above, existing measures of mental conflict are either insufficiently granular or overly intrusive to adequately measure such vacillations. We introduce a novel method for measuring vacillations. Our method captures participants’ instantaneous preferences during reasoning unobtrusively.

In the experiments described below, participants were instructed to report the direction their thoughts were leaning while deliberating on a problem with 2 possible choice alternatives. They were encouraged to express their preferences whenever they felt them building and as frequently as desired. The trial structure was explained to participants using a relatable example of choosing between a favorite and highly rated dish in a restaurant. In a trial, the decision process was divided into reasoning and committing to a final decision. The problem was presented at the center of the screen, with each of the 2 alternatives identified by either the right or left arrow keys. While reasoning, participants pressed the arrow keys whenever they felt they were strongly considering the corresponding choice. This allowed for multiple presses of the same key and the freedom to press them in any order. Participants were explicitly informed that the key presses they made during reasoning did not necessarily have to align with their final decision. This approach was implemented to mitigate the potential bias associated with feeling compelled to reason in line with the normative choice.

The primary dependent variable in our paradigm was the switches in preference. When a participant pressed the right key after pressing the left key, or vice versa, we inferred that during that period, the participant changed their mind. We hypothesized that participants would exhibit more switches while deliberating over conflicting problems compared to those with a straightforward choice (see Section 3.1.1 and Figure 2 for detailed description of the paradigm).

Figure 2 Trial structure in all experiments involves a new problem displayed centrally on the screen during the deliberation phase. Here, a moral dilemma is displayed. In Experiment 3, participants saw the syllogism’s 2 premises and the conclusion on separate lines. Participants record their final decisions in the decision phase on a separate screen. Every trial concludes with rating the reasoning experience on subjective measures.

Our objective in employing this paradigm was to establish both the internal and external validity of our vacillation measurement as a gauge of cognitive conflict. To establish internal validity, we aimed to demonstrate that people vacillate more when they subjectively feel conflicted during a choice. For external validity, we wanted to see how vacillations map onto previously proposed measurements of conflict in the literature. Experiments 1 and 2 tested 2 operationalizations of conflict in moral reasoning proposed by Koenigs et al. (Reference Koenigs, Young, Adolphs, Tranel, Cushman, Hauser and Damasio2007) and Bago and De Neys (Reference Bago and De Neys2019), respectively. Next, we wanted to test the generalizability of the paradigm when the rules of reasoning are familiar. We used categorical syllogisms in Experiment 3 in place of moral dilemmas, while keeping other details unchanged. Our results demonstrate that directly measuring vacillations in reasoning can help differentiate theories of both moral and logical decision-making.

3 Moral reasoning

Moral dilemmas have been widely employed to study moral reasoning. These dilemmas often pit deontological and utilitarian principles against each other such that choosing one option rules out the reasoner endorsing principle behind the option not chosen. We tested whether 2 specific operationalizations of moral conflict from literature align with our indicator of conflict, that is, vacillations. We expected to see more vacillations in trials which are proposed to be conflicting. Experiment 1 explored whether conflicting dilemmas, identified as those that result in disagreement or dissimilarity in final judgments at the group level, also result in vacillations within the individual. However, such a conceptualization may not reflect the internal conflict within the process of making a choice for an individual.Footnote ¹ Experiment 2 investigated whether individuals vacillate more when the 2 ethical principles cue separate choices rather than the same one. Bago and De Neys (Reference Bago and De Neys2019) propose that when deontological and utilitarian principles prompt different choices, a conflict arises in resolving a moral dilemma as opposed to when these principles cue the same choice leading to minimized conflict in decision making.

We were also interested in investigating the order in which deontological and utilitarian responses (henceforth D and U, respectively) are preferred in moral dilemmas. Dual-process theory models (DPT) often hypothesize a specific order in which people consider ethical principles, particularly because they are supported by a fast, emotional system and a slow, deliberative system (see, for a review, Evans and Stanovich, Reference Evans and Stanovich2013). In the default-interventionist or corrective model of DPT applied to moral decision-making, wherein the systems engage sequentially, the emotional system is proposed to activate first. This model attributes D responses to the emotional system, while U judgments are attributed to the deliberative system. Hence, the temporal order prediction about the judgments is that reasoners first consider the D choice before the deliberative System 2 overriding it and emitting U as the final choice. However, such a pattern in responding has reportedly been observed only in some select dilemmas (Cushman et al., Reference Cushman, Young and Hauser2006; Greene et al., Reference Greene, Cushman, Stewart, Lowenberg, Nystrom and Cohen2009; Moore et al., Reference Moore, Clark and Kane2008). Thus, we hypothesize that people, if they are to be consistent with the theoretical expectations of the corrective model of DPT in our experimental paradigm, would primarily switch from D to U options, but negligibly from U to D options during choices. The hybrid model of DPT of moral decision-making entails that the emotional system simultaneously generates both D and U responses, albeit with different activation strengths. In other words, reasoners can have D and U inclinations from the beginning, and which gets selected as the initial response will depend on which one of the two has the strongest activation. Whether this response gets updated further depends on the relative difference in the strengths of these activations. Prior research has garnered more support for the hybrid over the corrective model (Bago and De Neys, Reference Bago and De Neys2019; Baron and Gürçay, Reference Baron and Gürçay2017; Gürçay and Baron, Reference Gürçay and Baron2017; Koop, Reference Koop2013). However, even in a hybrid DPT account, it is unclear what mechanism might account for people vacillating back and forth between D and U options.

3.1 Experiment 1

In this experiment, we assess our key-press paradigm, which gauges vacillations, alongside evaluating its concordance with conflict defined as cohort-level disagreement and the subjective sense of conflict. The stimuli for this experiment are taken from Koenigs et al. (Reference Koenigs, Young, Adolphs, Tranel, Cushman, Hauser and Damasio2007).

3.1.1 Method

Participants

Twenty-five participants were recruited for this experiment (13 females; mean age = 25.3 years). The sample size was derived from a pilot study, where the effect size of the difference between high-conflict personal and low-conflict personal dilemmas was 0.67 (Cohen’s d), achieving a power of 0.8 with a significance level of $\alpha $ = 0.05.

Materials

We selected 16 problems from Koenigs et al.’s (Reference Koenigs, Young, Adolphs, Tranel, Cushman, Hauser and Damasio2007) paper which were divided in 4 conditions: non-moral (NM), impersonal (IM), low-conflict (LC), and high-conflict (HC). All moral problems in their stimuli set had a mean emotionality rating. We ranked these moral problems and selected 4 moral problems for each condition, taking into consideration anticipated familiarity of participants. NM problems were selected only considering familiarity.

In each of the presented problems, participants were tasked with making a 2-alternative forced choice between performing an action and refraining from it. In the NM trials, the scenarios did not invoke any moral principles. The actions in NM condition involved routine tasks like scheduling appointments, choosing between routes (2 problems featured this action), and deciding to purchase product A instead of B.

On moral trials, the stimuli contained the context of the problem in which the action and inaction were made clear along with their consequences. Specifically, actions in the LC and HC conditions involved saving a larger group at the expense of injuring or killing a smaller number of people. These actions were considered personal, as they directly caused harm to individuals or groups such as breaking someone’s arm, smothering a baby and so forth (for a more detailed discussion of ‘personal’ actions in this context, refer to Greene et al., Reference Greene, Sommerville, Nystrom, Darley and Cohen2001, Reference Greene, Nystrom, Engell, Darley and Cohen2004). Six out of 8 of these trials involved scenarios where death was a possible outcome.

Koenigs et al. (Reference Koenigs, Young, Adolphs, Tranel, Cushman, Hauser and Damasio2007) classified a personal problem as ‘LC’ post hoc after almost all participants in their study disagreed with endorsing the U action. In ‘HC’ cases, varying degrees of disagreement were observed, with no dilemma recording complete agreement among the participants to endorse the action. IM dilemmas did not include any problems in which the victim died directly from carrying out the action. These scenarios offered a choice between inaction and actions that aimed at benefiting the actor’s welfare, for example, stealing cash from a wallet on the ground, bribing to win a case, and so forth.

At the beginning of the experiment, participants practiced with 2 problems from the same set of stimuli (1 NM and 1 LC). All 18 dilemmas can be found in https://osf.io/3tbgw/?view_only=2f64b229232a4f6899ab59254cb6a90f or in the Supplementary Material.

Procedure

All trials were self-paced. A trial started with a deliberation phase, then the decision phase and ending with a rating screen (refer to the trial structure in Figure 2). During the deliberation phase, participants read the problem which was identical to the body of the problem described in Koenigs et al. (Reference Koenigs, Young, Adolphs, Tranel, Cushman, Hauser and Damasio2007) except for the question at the end. The text made clear what choices participants had on that trial. At the bottom of the deliberation phase screen, choices were presented under the prompt ‘What possibilities are you considering?’ Each choice was associated with either the left or right arrow key. The left key represented the deontological option, while the right key represented the utilitarian option on moral trials with these choice alternatives (henceforth referred to as D and U, respectively; see Supplementary Material for list of these problems). Throughout the deliberation phase, participants were instructed to pay attention to their thoughts and indicate which choice they preferred at the moment by pressing the corresponding arrow key. This allowed participants to actively and continuously express their evolving preferences as they deliberated on the given problem. They could report their preferences multiple times (but at least once) and at any point during this phase in a trial. Trials where no key was pressed during this time were excluded. The deliberation phase lasted a minimum of 1 min, although participants could take longer if needed.

After the deliberation phase, participants proceeded to the next screen to make their final decision using the arrow keys, which corresponded to the same options as before. Finally, participants rated their experience of reasoning on the following four 5-point scales: (a) How conflicted did you feel while answering? (b) How confident do you feel about your answer? (c) How difficult was the question to answer? (d) Do you think you will change your mind about your answer?

Upon filling in the consent form, participants were provided with task instructions, which were supplemented with examples for clarity. The initial set of instructions that participants read is as follows:

Welcome to the experiment!

This experiment will take 30 minutes to complete. The aim of this experiment is to understand how individuals arrive at their choices. Let’s take an example to understand this:

Imagine being in an unfamiliar restaurant, faced with a tempting menu offering options like farm-fresh pasta, pizza, and garlic bread with spread. As you contemplate your choice, various arguments may cross your mind, such as the comfort of pizza or the lighter option of bread and spread when not very hungry. Your task is to pay close attention to these arguments, categorize your thoughts based on the preferred option, and indicate your choice at the end of each scenario.

In this experiment, you will read a few stories and will be asked to think and make a choice at the end. When you are deciding you have to categorize your thoughts based on which choice they indicate. Hence, the task for you is to be attentive to your thoughts and indicate which option you are preferring while reasoning. Whenever you find your thoughts leaning towards one of the options, express your preference by pressing the corresponding key on the keyboard. Feel free to press these keys multiple times and in any order.

Now, let’s walk you through an example trial within the experiment.

After these preliminary instructions, we showed participants screenshots of a dummy trial with an NM problem to help navigate the task. Each phase of the experiment carried specific instructions. The experimenter read the following instructions out loud while displaying the screenshots to the participant.

All trials are self-paced. A trial is made of 4 screens. Press SPACE to continue.

This indicates start of a new trial. You can rest on this screen between trials. [First screen from the left in Figure 2]

This screen contains the context of the scenario. At the end of the scenario, you will be asked to report the possibilities you are considering. LEFT and RIGHT ARROW keys will indicate different kinds of considerations. When you catch yourself thinking about one of them, press the respective key. After about 1 minute, you can press SPACE to go to next page. You can take longer if you have not decided by then. [Second screen from the left in Figure 2]

This screen indicates you have to report your final decision by single key press of LEFT or RIGHT ARROW key. [Third screen from the left in Figure 2]

Finally, you must indicate how it felt to answer the question. There will be four scales: (a) How conflicted did you feel while answering? (b) How confident do you feel about your answer? (c) How difficult was the question to answer? and (d) Do you think you will change your mind about your answer?

Following this, participants completed 2 practice trials with NM and LC problems. Any difficulties encountered during these trials could be addressed by asking the experimenter for clarifications. Experimenter exited the room after commencing the experiment.

3.1.2 Results and discussion

The categorization of stimuli into LC and HC was done in a post hoc fashion by Koenigs et al. (Reference Koenigs, Young, Adolphs, Tranel, Cushman, Hauser and Damasio2007). Nonetheless, such division of dilemmas into LC and HC conditions based on cohesiveness of cohort-level judgments was replicated in our experiment. LC dilemmas demonstrated fewer commitments to the U action, with no participant agreeing to take the action in 2 out of 4 of them. There was also a greater variability in endorsing the action in HC dilemmas (Figure 3a). Our primary focus was connecting the cohort-level conceptualization of conflict to an internalized experience of it. Participants’ momentary preferences, which are frequently subject to modification during deliberation, were identified with shifts in key presses during the deliberation phase. If on a trial, a participant presses dissimilar keys one after the other then it was counted as a switch. All 4 HC dilemmas showed frequent switching in preferences compared to LC (Figure 3b). This observation aligns with the prediction that for these stimuli, cohort-level disagreements may indicate internal conflict within the individual. To account for effects of both participant and item on switching, we ran a linear mixed effects model treating participants and items as random intercepts, revealing that LC dilemmas, indeed, recorded fewer switches than HC (Table 1a).

Figure 3 Results of Experiment 1 across all moral items are depicted in the figure, where each bar represents an item and is color-coded by condition. Panel (a) represents the proportion of trials in which the given action (usually U; see Supplementary Material for more details) was endorsed in the final decision. Panel (b) represents the average number of switches. Panel (c) shows Spearman correlations of switches within a trial to the subjective ratings of Conflict and Confidence reported at the end.

Following that, we investigated the pattern in which D and U inclinations are considered. Bago and De Neys (Reference Bago and De Neys2019) employed the 2-step paradigm to discern the temporal order in inclinations by requiring a quick response at the beginning of the trial, followed by a reasoned response when participants had their final judgment ready. While this method allowed for the dissection of the process to a certain extent, constraining the investigation to specific time windows excludes a significant portion of the reasoning process that follows the initial inclination. To demonstrate this limitation, we created key-press pairs of the first and the last keys pressed on a trial during the deliberation phase, resulting in 4 possible pairs: DD, DU, UD, and UU. Here, the first letter in each couplet denotes the first key, and the second letter signifies the last key pressed during this period.

Notably, all 4 response change types were reported in moral dilemmas (see Table 2 for how frequently these pairs were observed in moral trials). We aimed to determine if there is a predictable order in which these inclinations come to the reasoners’ minds. According to the corrective model, reasoners should be inclined toward the D alternative at the beginning of the trial and switch over to U if mental resources permit which should make the DU transitions more prevalent than UD (Greene et al., Reference Greene, Sommerville, Nystrom, Darley and Cohen2001; Paxton et al., Reference Paxton, Ungar and Greene2012). However, in all 3 moral dilemmas, the frequency of DU was not significantly more than UD (see Table 2 for frequencies of key-press pairs; IM: $\chi ^2 (1) = 12.78, p < 0.001$ ; LC: $\chi ^2 (1) = 3.33, p = 0.07$ ; HC: $\chi ^2 (1) = 1.93, p = 0.17$ ). Furthermore, although DD and UU were the most frequently observed key-press pairs, between the first and the last keys pressed, participants had switched at least twice on 58% and 33% of trials, respectively, to land on the same key they started with. Such oscillations in reasoning are challenging to explain under the sequentiality assumption made by some models of DPT.

Table 1 Experiment 1 results: Linear mixed effects model of (a) switches, (b) conflict, and (c) confidence ratings by conditions with participants and items as random effects

Note: Conditions are dummy coded with LC as the reference level. Significant codes: 0, ‘***’ 0.001, ‘**’ 0.01, ‘*’ 0.05, ‘.’ 0.1, ‘ ’ 1.

Table 2 Response changes during deliberation in Experiments 1 and 2

Finally, we examined participants’ subjective ratings recorded at the end of each trial. While participants rated trials on 4 scales, our primary focus was on conflict and confidence ratings, commonly used indicators of conflict in decision-making (see Frey et al., Reference Frey, Johnson and De Neys2018; Mevel et al., 2015; Pennycook et al., Reference Pennycook, Fugelsang and Koehler2015b). Overall, more switches were associated with increased reported conflict and decreased confidence in the final answer. This trend is broadly reflected at the item-level, where conflict is positively correlated and confidence is negatively correlated with vacillations. However, given the item-wise variability in these associations, as evident in Figure 3c, we modeled the subjective ratings by switches accounting for participant- and item-level random effects. Trials with more switches were correlated with increased level of conflict, reporting the problem as difficult and subjective sense that their mind about the answer might change in the future (panels b, d, and e of Table 1, respectively). On the other hand, confidence ratings dropped with more vacillations (panel c of Table 1).

In summary, like Koenigs et al. (Reference Koenigs, Young, Adolphs, Tranel, Cushman, Hauser and Damasio2007), our results suggest that highly conflicting moral dilemmas observed diverse judgments with most individuals disagreeing with taking action in LC scenarios, while responses were mixed in HC as well as IM dilemmas. Vacillations which are internalized within the reasoner mapped reasonably well with the overall cohort-level disagreements in final decisions as well as subjective feeling of conflict. Furthermore, we employed vacillations as an analytical tool to examine the reasoning process and scrutinize models of moral reasoning by outlining how preferences evolve during deliberation. These patterns revealed that infrequent transitions and frequent oscillations between options were common in reasoning challenging to explain under certain theoretical models.

3.2 Experiment 2

The results of Experiment 1 show that our measurement of mental vacillations, when summed across a trial, correlate well with the expectation of people experiencing conflict during reasoning, as operationalized via cohort-level disagreement by Koenigs et al. (Reference Koenigs, Young, Adolphs, Tranel, Cushman, Hauser and Damasio2007). In Experiment 2, we sought to validate our findings from Experiment 1 with a more recent definition of conflict.

Bago and De Neys (Reference Bago and De Neys2019) manipulated conflict in moral decisions in terms of convergence of deontological and utilitarian principles on a choice. They propose that when these 2 principles contradict each other, people feel conflicted. We tested this definition of conflict in moral decisions in a pre-registered study below.

3.2.1 Method