How to Measure Effect Sizes for Rational Decision Making

Ina Jäntgen

doi:10.1017/psa.2023.23

How to Measure Effect Sizes for Rational Decision Making

Published online by Cambridge University Press: 17 February 2023

Ina Jäntgen

Show author details

Ina Jäntgen*: Affiliation:
Faculty of Philosophy, University of Cambridge, Sidgwick Avenue, Cambridge, United Kingdom
*: Email: ij271@cam.ac.uk

Article contents

Abstract
Introduction
Measuring effect sizes for binary variables
Modeling two scenarios for choosing treatments
The decision-theoretic dominance of absolute measures
Conclusion
Footnotes
References

Rights & Permissions

Abstract

Absolute and relative outcome measures measure a treatment’s effect size, purporting to inform treatment choices. I argue that absolute measures are at least as good as, if not better than, relative ones for informing rational decisions across choice scenarios. Specifically, this dominance of absolute measures holds for choices between a treatment and a control group treatment from a trial and for ones between treatments tested in different trials. This distinction has hitherto been neglected, just like the role of absolute and baseline risks in rational decision making that my analysis reveals. Recognizing both aspects advances the discussion on reporting outcome measures.

Type: Contributed Paper
Information: Philosophy of Science , Volume 90 , Issue 5 , December 2023 , pp. 1183 - 1193

DOI: https://doi.org/10.1017/psa.2023.23 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of the Philosophy of Science Association

1. Introduction

In biomedical research, scientists often perform trials to test the effectiveness of treatments. In such research, the collected data is analyzed using outcome measures, which describe how the tested treatment and the outcome of interest relate. Researchers usually interpret such outcome measures as measuring the effect size of the treatment. These measures then provide information for policy makers, patients, and others aiming to decide between treatments.

Not all outcome measures provide the same information though. In this article, I focus on outcome measures for binary variables. Here, two classes of measures, absolute and relative, differ in how they describe a treatment’s effect size. Consider the Heart Protection Study that tested the effectiveness of a cholesterol-lowering drug called simvastatin to prevent heart attacks and deaths among men with or at risk of heart disease (Heart Protection Study Collaborative Group 2002). The study found a so-called relative risk reduction of 18% of coronary death. The so-called risk difference was 1.2%.Footnote ¹ Only the former effect size was reported. Yet, the difference in the described effect size is striking. Aiming to decide on taking simvastatin, which effect size is informative for a rational decision maker? The relative? The absolute? Perhaps both? More generally, how should we measure effect sizes to inform rational decision making?

In this article, I argue that absolute measures are at least as good as, if not better than, relative ones for informing rational decisions across choice scenarios. More precisely, absolute but not relative measures always provide sufficient probabilistic information to choose between a treatment and a control group treatment from a trial. For choices between treatments tested in distinct trials, we need information about the difference in the probabilities of the outcome of interest given the treatments, that is the difference in the absolute risks. Absent any knowledge about the probabilities of the outcome given control group treatments, that is the baseline risks, outcome measures do not provide this information. If the deciding agents instead know the baseline risks, then they can derive the absolute risks from both classes of outcome measures. If the baseline risks are known to be equal across trials but are themselves unknown, then absolute measures but not relative ones always provide sufficient information to choose between treatments from distinct trials. Overall, for informing rational decision making, absolute measures dominate relative ones.

My analysis exposes the conditions under which both absolute and relative measures carry the probabilistic information a rational decision maker needs, and when only absolute ones do so. Moreover, it identifies the role of absolute and baseline risks in rational treatment choices. Recognizing both aspects advances the discussion on how to report effect sizes to inform treatment decisions. In particular, Jacob Stegenga and his co-authors argue that only absolute measures but not relative ones are suited to inform rational decisions (Stegenga Reference Stegenga2015; Sprenger and Stegenga Reference Sprenger and Stegenga2017; Stegenga and Kenna Reference Stegenga, Kenna and McClimans2017; Stegenga Reference Stegenga2018; see also Worrall Reference Worrall, Suarez, Dorato and Redei2010). By contrast, I show when relative measures are just as good as absolute ones for this purpose. Still, I demonstrate that relative measures do not provide decision-relevant information that cannot be provided by absolute measures, including in choice scenarios Stegenga’s work fails to consider. This finding questions the need for using relative measures to inform rational decision making, and challenges suggestions to report both absolute and relative measures (see Hoefer and Krauss Reference Hoefer and Krauss2021). Moreover, in biomedical research, most studies report only relative effect sizes like in the Heart Protection Study (Elliott et al. Reference Elliott, Skydel, Dhruva, Ross and Wallach2021). My results suggest that this practice could fail to inform treatment choices. Finally, I show that absolute and baseline risks provide sufficient information for rational treatment decisions. This verdict sets the ground for comparing outcome measures to absolute and baseline risks as tools for informing decisions.

I proceed as follows: In section 2, I introduce absolute and relative outcome measures. In section 3, I model two choice scenarios using expected utility theory, one involving outcome measures from a single trial and another involving outcome measures from distinct trials. Moreover, I identify absolute and baseline risks as sufficient for informing rational treatment decisions. In section 4, I use the decision models to identify the conditions under which absolute or relative measures inform decisions. As established by Sprenger and Stegenga (Reference Sprenger and Stegenga2017), absolute measures but not relative ones always do so for choices between a treatment and a control group treatment. I show that this argument does not hold for choices between treatments tested in different trials, a distinction Sprenger and Stegenga (Reference Sprenger and Stegenga2017) neglect. I then show that absolute measures are still at least as good as, if not better than, relative ones for informing such choices. Overall, absolute measures dominate relative ones. In section 5, I conclude with three options for using outcome measures to inform decisions supported by my analysis.

2. Measuring effect sizes for binary variables

In empirical research conducting trials, researchers use outcome measures to state how the measured values of the outcome variable in the control and treatment groups relate (Stegenga Reference Stegenga2015). In such a way, outcome measures summarize trial data to form evidence for a causal relationship between treatment and outcome.

In this article, I focus on outcome measures for binary variables. These are usually defined in terms of the observed frequencies in a trial (see Stewart Reference Stewart2016, ch. 26), which can be represented as conditional probabilities. Let A denote the tested treatment and A′ the control group treatment. E denotes that the outcome of interest is present and ¬E that it is absent. For ease of exposition, I will throughout focus on two of the most commonly cited outcome measures.Footnote ² Those are:

$$\hskip-25pt{\rm{Relative \;risk}}:{\rm{R}}{{\rm{R}}_{\rm{A}}} = {{{\rm{{P}}}\left( {{\rm{E|A}}} \right)} \over {{\rm{P}}\left( {{\rm{E|A'}}} \right)}}$$

$${\rm{Risk \;difference}}:{\rm{R}}{{\rm{D}}_{\rm{A}}} = {\rm{P}}\left( {{\rm{E|A}}} \right) - {\rm{P}}\left( {{\rm{E|A'}}} \right)$$

The relative risk belongs to a class of measures commonly called relative outcome measures. For instance, ${\rm{R}}{{\rm{R}}_{\rm{A}}}$ = 1.25 means that the probability of E given A is 1.25 times the probability of E given A′. By contrast, the risk difference is usually classified as an absolute outcome measure. To give an example, ${\rm{R}}{{\rm{D}}_{\rm{A}}}$ = 0.05 means that the probability of E is increased by 5% given A compared to A′, for instance from 10% to 15%.

Binary outcome measures in the first instance measure the strength of a statistical association, a statistical effect size. If computed numerically, they are often additionally interpreted as measuring the causal effect size of the tested treatment (Broadbent Reference Broadbent2013; Sprenger and Stegenga Reference Sprenger and Stegenga2017). Here, I will not discuss which, if any, outcome measure measures a treatment’s causal effect size or any other causal property as the following arguments do not depend on an answer to this question. Instead, I will speak loosely about outcome measures as measuring a treatment’s effect size.Footnote ³

Quite obviously, relative and absolute outcome measures do not provide the same probabilistic information. Which outcome measures then provide the information we need for rational choices between treatments? In the following sections, I argue that absolute measures are at least as good as relative ones for informing rational decision making across choice scenarios.

3. Modeling two scenarios for choosing treatments

Here are two scenarios for choosing treatments:

Single: Imagine treatment A was tested in a trial. A′ is the control group treatment used in the trial testing A, for instance, no treatment, a competitor treatment or a placebo. Based on the reported outcome measures, an agent wants to choose between A and A′, for example, for consuming either treatment or giving it to a patient.

Distinct: Imagine treatment A was tested in a trial. Moreover, an alternative treatment B was tested in another trial. Based on the reported outcome measures, an agent wants to choose between A and B, for example, for consuming either treatment or giving it to a patient.

The two scenarios involve different outcome measures. In the choice between A and A′, we consider a relative or an absolute outcome measure from the trial testing A (see section 2). By contrast, in the choice between A and B, we consider the outcome measures from the trial testing A and from the trial testing B. Let B′ denote the control group treatment in the trial testing B. The outcome measures for the trial testing B are:

$${\rm{R}}{{\rm{D}}_{\rm{B}}} = {\rm{P}}\left( {{\rm{E|B}}} \right) - {\rm{P}}\left( {{\rm{E|B'}}} \right)$$

$$\hskip-35pt{\rm{R}}{{\rm{R}}_{\rm{B}}} = {{{\rm{P}}\left( {{\rm{E|B}}} \right)} \over {{\rm{P}}\left( {{\rm{E|B'}}} \right)}}$$

To distinguish both scenarios, I assume ${\rm{A'}} \ne {\rm{B\;}}$ and ${\rm{B'}} \ne {\rm{A}}$ . It may be that ${\rm{A'}} = {\rm{B'}}$ , a case that I turn to in section 4.2. In this article, I focus on the two described scenarios because they are the most common. The other possible scenarios for a binary treatment choice are a choice between control group treatments from distinct trials and one between a treatment tested in one trial and a control group treatment from another trial. Both scenarios are analogous to a choice between treatments tested in different trials. In analogous ways, the following arguments also hold for these cases.

How should an agent decide in the two described scenarios? In line with standard expected utility theory (see Bradley Reference Bradley2017, pt. 1), I assume that an agent ought to choose a treatment that maximizes expected utility:

Expected utility maximization: Treatment X is better than treatment Y iff ${\rm{EU}}\left( {\rm{X}} \right) \gt {\rm{EU}}\left( {\rm{Y}} \right)$ .Footnote ⁴

The relevant expected utility calculations are different in the two scenarios. I start with the choice between A and A′, following Sprenger and Stegenga (Reference Sprenger and Stegenga2017, 845–48).

Let u(E) = u denote the utility of E and u(¬E) = u′ the utility of ¬E. Here, utility represents the agent’s evaluation of each possible outcome. Moreover, we assume that consuming treatments comes at a cost. Broadly construed, such a cost includes all expected harmful effects of consuming the treatment, for instance, negative side effects. Let a denote the cost of A and a′ the cost of A′. The expected utilities of A and A′ are calculated as follows:

(1)

$${\rm{EU}}\left( {\rm{A}} \right) = {\rm{P}}\left( {{\rm{E|A}}} \right){\rm{u}} + {\rm{P}}\left( {\neg {\rm{E|A}}} \right){\rm{u'\;}} - {\rm{a}} = {\rm{\;P}}\left( {{\rm{E|A}}} \right){\rm{\;}}\left( {{\rm{u}} - {\rm{u'\;}}} \right) + {\rm{u'\;}} - {\rm{a}}$$

(2)

$${\rm{EU}}\left( {{\rm{A'}}} \right) = {\rm{P}}\left( {{\rm{E|A'}}} \right){\rm{u}} + {\rm{P}}\left( {\neg {\rm{E|A'}}} \right){\rm{u'\;}} - {\rm{a'\;}} = {\rm{\;P}}\left( {{\rm{E|A'}}} \right){\rm{\;}}\left( {{\rm{u}} - {\rm{u'\;}}} \right) + {\rm{u'\;}} - {\rm{a'}}$$

(1) and (2) jointly with expected utility maximization provide a decision model:

(3)

$${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right){\rm{\;iff\;P}}\left( {{\rm{E|A}}} \right){\rm{\;}}\left( {{\rm{u}} - {\rm{u'\;}}} \right) - {\rm{a}} \gt \;{\rm{P}}\left( {{\rm{E|A'}}} \right){\rm{\;}}\left( {{\rm{u}} - {\rm{u'\;}}} \right) - {\rm{a'\;}}$$

For the choice involving several trials, we use (1) and with b denoting the cost of B:

(4)

$${\rm{EU}}\left( {\rm{B}} \right) = {\rm{P}}\left( {{\rm{E|B}}} \right){\rm{u}} + {\rm{P}}\left( {\neg {\rm{E|B}}} \right){\rm{u'}} - {\rm{b}} = {\rm{\;P}}\left( {{\rm{E|B}}} \right){\rm{\;}}\left( {{\rm{u}} - {\rm{u'}}} \right) + {\rm{u'}} - {\rm{b}}$$

Again, we derive a decision model from (1) and (4):

(5)

$${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {\rm{B}} \right){\rm{\;iff\;P}}\left( {{\rm{E|A}}} \right){\rm{\;}}\left( {{\rm{u}} - {\rm{u'}}} \right) - {\rm{a}} \gt \;{\rm{P}}\left( {{\rm{E|B}}} \right){\rm{\;}}\left( {{\rm{u}} - {\rm{u'}}} \right) - {\rm{b}}$$

In section 4, I will rely on both models to analyze when absolute and relative outcome measures provide decision-relevant information. To do so, I will assume that the observed frequencies used to calculate outcome measures are numerically equivalent to the agent’s decision-relevant credences. For instance, in (3), I take ${\rm{P}}\left( {{\rm{E|A}}} \right)$ to denote both the agent’s credence in E occurring if she takes treatment A and the observed frequency of E given A in the trial testing A. This is a substantial simplification; there are several inferences involved in forming a credence on a treatment’s effect size for a treatment choice based on one calculated from trial frequencies (Fuller and Flores Reference Fuller and Flores2015). Indeed, analyzing how outcome measures figure in such inferences is an important task I bracket. Nevertheless, this omission poses no threat to my argumentation. If reported outcome measures are to inform rational decision making at all, then they must in principle provide the needed probabilistic information. The mentioned simplification allows us to analyze when absolute or relative measures succeed in doing so.

The decision models (3) and (5) already show an alternative to using outcome measures for informing rational treatment choices: use absolute and baseline risks. As we can see in (3), in a choice involving treatments from a single trial, we can decide between treatments if we know ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ in addition to costs and utilities.Footnote ⁵ Moreover, as we can see in (5), in a choice involving treatments from distinct trials, we can decide between treatments if we know ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B}}} \right)$ in addition to costs and utilities. These conditional probabilities are usually called absolute risks when referring to the treatment group, that is ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B}}} \right)$ (Stewart Reference Stewart2016, ch. 26). When referring to the control group, that is ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B'}}} \right)$ , I will call them baseline risks. Absolute and baseline risks always provide sufficient probabilistic information for rational treatment decisions.Footnote ⁶

Still, in biomedical research, most studies assessing the effectiveness of treatments report effect sizes to inform treatment choices, rather than solely the absolute and baseline risks. Hence, it is important to analyze when this practice successfully informs decisions. Moreover, this practice could be warranted. Even if absolute and baseline risks provide sufficient information for rational choices, compared to outcome measures, they could have other disadvantages for informing treatment decisions. For instance, effect sizes might extrapolate better to target populations or individual agents than absolute and baseline risks. Or laypeople might reason better with effect sizes than with absolute and baseline risks. Assessing such considerations is beyond the scope of this article. Nonetheless, they motivate taking the practice of using effect sizes to inform decisions seriously. To do so, I will henceforth assume that the deciding agents do not know the absolute risks and, unless noted otherwise, they do not know the baseline risks either. These assumptions allow us to identify the conditions under which different outcome measures provide us with decision-relevant information.

4. The decision-theoretic dominance of absolute measures

Under what conditions do absolute or relative outcome measures provide sufficient probabilistic information to choose rationally between treatments? In choices involving outcome measures from a single trial, only absolute measures always do so (section 4.1). This is established by Sprenger and Stegenga (Reference Sprenger and Stegenga2017). In choices involving outcome measures from distinct trials, depending on our knowledge of baseline risks, both absolute and relative measures provide the decision-relevant information, neither do or only absolute ones always do (section 4.2). Overall, absolute measures dominate relative ones.

4.1. Choices involving outcome measures from a single trial

Sprenger and Stegenga (Reference Sprenger and Stegenga2017) use the decision model (3) to argue that absolute measures but not relative ones always provide sufficient information to decide between treatments given costs and utilities. These authors fail to distinguish a choice involving outcome measures from a single trial from one involving outcome measures from distinct trials. As I will show in section 4.2, this failure poses a problem for applying their argument to the latter case. However, the authors’ argument still applies to the former case. To see this, I briefly recap their argument (Sprenger and Stegenga Reference Sprenger and Stegenga2017, 845–48).

From (3) and assuming without loss of generality ${\rm{u} \gt u'}$ one can derive

(6)

$$\;{\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right)\;{\rm{iff}}\;{\rm{P}}\left( {{\rm{E|A}}} \right) - {\rm{P}}\left( {{\rm{E|A'}}} \right) \gt {{{\rm{a}} - {\rm{a'}}} \over {{\rm{u}} - {\rm{u'}}}}$$

${\rm{P}}\left( {{\rm{E|A}}} \right)-{\rm{P}}\left( {{\rm{E|A'}}} \right)$ in (6) is equivalent to RD_A. As a result, given costs, utilities, and RD_A one always knows whether ${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right)$ . The same does not hold for relative measures. To see this, we can derive

(7)

$${\rm{P}}\left( {{\rm{E|A}}} \right) - {\rm{P}}\left( {{\rm{E|A'}}} \right) = {\rm{P}}\left( {{\rm{E|A'}}} \right)\;\left( {{{{\rm{P}}\left( {{\rm{E|A}}} \right)} \over {{\rm{P}}\left( {{\rm{E|A'}}} \right)}} - 1} \right) = \;{\rm{P}}\left( {{\rm{E|A'}}} \right)\;\left( {{\rm{R}}{{\rm{R}}_{\rm{A}}} - 1} \right)$$

From (6) and (7), we get

(8)

$${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right)\;{\rm{iff}}\;{\rm{P}}\left( {{\rm{E|A'}}} \right)\;\left( {{\rm{R}}{{\rm{R}}_{\rm{A}}} - 1} \right) \gt {{{\rm{a}} - {\rm{a'}}} \over {{\rm{u}} - {\rm{u'}}}}$$

As Sprenger and Stegenga note, costs and utilities do not determine ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ . Nor does a given ${\rm{R}}{{\rm{R}}_{\rm{A}}}$ . As a result, assuming ${\rm{a}} \ne {\rm{a'}}$ as the authors do, one cannot always decide whether ${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right)$ given costs, utilities, and ${\rm{R}}{{\rm{R}}_{\rm{A}}}$ . ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ could be such that ${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right)$ or such that ${\rm{EU}}\left( {\rm{A}} \right) \lt {\rm{EU}}\left( {{\rm{A'}}} \right)$ . Suppose ${\rm{R}}{{\rm{R}}_{\rm{A}}} = 1.25$ and ${{{\rm{a}} - {\rm{a'}}} \over {{\rm{u}} - {\rm{u'}}}} = 0.1$ . Then, ${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right)$ if ${\rm{P}}\left( {{\rm{E|A'}}} \right) = 0.5$ , but ${\rm{EU}}\left( {\rm{A}} \right) \lt \;{\rm{EU}}\left( {{\rm{A'}}} \right)$ if ${\rm{P}}\left( {{\rm{E|A'}}} \right) = 0.3$ . Both baseline risks are compatible with but unknown given these costs, utilities, and ${\rm{R}}{{\rm{R}}_{\rm{A}}}$ . Hence, in contrast to absolute measures, given costs, utilities, and RR_A one cannot always decide whether ${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right)$ .Footnote ⁷

Moving beyond Sprenger and Stegenga (Reference Sprenger and Stegenga2017), it is worth noting that relative measures provide sufficient information for choosing if both treatments come at equal costs. From (8) and assuming ${\rm{a}} = {\rm{a'}}$ we can derive

(9)

$${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {{\rm{A'}}} \right){\rm{\;iff\;P}}\left( {{\rm{E|A'}}} \right)\left( {{\rm{R}}{{\rm{R}}_{\rm{A}}} - 1} \right) \gt 0$$

If ${\rm{R}}{{\rm{R}}_{\rm{A}}} \lt 1$ , then (9) demands to choose A′. This is because a well-defined ${\rm{R}}{{\rm{R}}_{\rm{A}}} \lt 1$ implies that ${\rm{P}}\left( {{\rm{E|A'}}} \right) \gt 0,$ and thus that ${\rm{P}}\left( {{\rm{E|A'}}} \right)\left( {{\rm{R}}{{\rm{R}}_{\rm{A}}} - 1} \right) \lt 0.\;$ Analogously, if ${\rm{R}}{{\rm{R}}_{\rm{A}}} \gt 1$ , (9) demands to choose A. In other words, by knowing RR_A, utilities, and equality of costs we can always settle which treatment to take. However, we are rarely if ever in a situation in which treatments come at equal costs. Thus, I will henceforth not mention this case.

To summarize, Sprenger and Stegenga (Reference Sprenger and Stegenga2017) establish that absolute but not relative measures always provide sufficient information to choose between treatments from a single trial. In the next section, I show that this argument fails to apply to choices between treatments tested in distinct trials. I then argue that absolute measures still dominate relative ones for informing such choices.

4.2. Choices involving outcome measures from distinct trials

We cannot apply Sprenger and Stegenga’s (Reference Stegenga, Kenna and McClimans2017) argument to the decision model for a choice between treatments tested in distinct trials (5). To see this, note that from (5) and assuming without loss of generality u > u′ one can derive

(10)

$${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {\rm{B}} \right)\;{\rm{iff}}\;{\rm{P}}\left( {{\rm{E|A}}} \right) - {\rm{P}}\left( {{\rm{E|B}}} \right) \gt {{{\rm{a}} - {\rm{b}}} \over {{\rm{u}} - {\rm{u'}}}}$$

In (10), we cannot interpret the decision-relevant difference in probabilities as RD_A or as RD_B, as we have done in (6). The same holds for the case of RR. This can be seen by noting that

(11)

$${\rm{P}}\left( {{\rm{E|A}}} \right) - {\rm{P}}\left( {{\rm{E|B}}} \right) = {\rm{P}}\left( {{\rm{E|B}}} \right)\;\left( {{{{\rm{P}}\left( {{\rm{E|A}}} \right)} \over {{\rm{P}}\left( {{\rm{E|B}}} \right)}} - 1} \right)$$

${{{\rm{P}}\left( {{\rm{E|A}}} \right)} \over {{\rm{P}}\left( {{\rm{E|B}}} \right)}}$ in (11) is neither equal to RR_A nor RR_B, contrary to the previous case (7). Hence, we cannot rely on Sprenger and Stegenga’s (Reference Sprenger and Stegenga2017) argument here.

As can be seen in (10), for choices between treatments from distinct trials, we need information about the difference in absolute risks, that is the difference between ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B}}} \right)$ . Correspondingly, these absolute risks suffice to inform such choices. This role of absolute risks is obscured in Sprenger and Stegenga (Reference Sprenger and Stegenga2017) because they fail to distinguish a choice involving several trials from one involving a single trial.

Still, as discussed in section 3, I acknowledge the practice of reporting outcome measures. Hence, I assume that the deciding agents choose between A and B without knowing the absolute risks, but rather only the absolute or relative outcome measures. This assumption allows us to see when absolute or relative outcome measures can still inform choices, in the sense of providing information about the decision-relevant difference in absolute risks. To answer this question, let us distinguish three epistemic situations we could be in when deciding between treatments from distinct trials, differing in how much we know about the baseline risks:

Ignorance: We know nothing about ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B'}}} \right)$ .

Full knowledge: We know ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B'}}} \right)$ .

Partial knowledge: We know that ${\rm{P}}\left( {{\rm{E|A'}}} \right) = {\rm{\;P}}\left( {{\rm{E|B'}}} \right)$ , though we know neither ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ nor ${\rm{P}}\left( {{\rm{E|B'}}} \right)$ .

Let us examine each case in turn.

4.2.1. Ignorance

In the case of ignorance about baseline risks, knowing absolute or relative measures is insufficient to have any information about the difference between ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B}}} \right)$ . This is because a given absolute or a given relative measure is compatible with a range of values for ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B}}} \right)$ , absent any information about ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B'}}} \right)$ . This result shows that unless we know something about baseline risks outcome measures from distinct trials do not provide the information needed for choosing between the tested treatments.

4.2.2. Full knowledge

When knowing the baseline risks, absolute or relative measures can both be used to calculate the absolute risks, and thus their difference. If one knows RD_A, RD_B, ${\rm{P}}\left( {{\rm{E|A'}}} \right),$ and ${\rm{P}}\left( {{\rm{E|B'}}} \right)$ then one knows ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B}}} \right)$ . If one knows RR_A, RR_B, ${\rm{P}}\left( {{\rm{E|A'}}} \right),$ and ${\rm{P}}\left( {{\rm{E|B'}}} \right)$ then one knows ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|B}}} \right)$ . This result shows that relative measures can sometimes provide equally valuable information for decisions as absolute ones.

4.2.3. Partial knowledge

In the case of partial knowledge about baseline risks, absolute measures but not relative ones always provide sufficient information to choose between treatments. We can derive

(12)

$${\rm{P}}\left( {{\rm{E|A}}} \right) - {\rm{P}}\left( {{\rm{E|B}}} \right) = \left( {{\rm{P}}\left( {{\rm{E|A}}} \right) - {\rm{P}}\left( {{\rm{E|A'}}} \right)} \right) - \left( {{\rm{P}}\left( {{\rm{E|B}}} \right) - {\rm{P}}\left( {{\rm{E|B'}}} \right)} \right) = {\rm{R}}{{\rm{D}}_{\rm{A}}} - {\rm{\;R}}{{\rm{D}}_{\rm{B}}}$$

As can be seen in (12), under partial knowledge of baseline risks, absolute measures always provide the difference in absolute risks that is sufficient to decide between A and B. Moreover, the same does not hold for relative measures. Given ${\rm{P}}\left( {{\rm{E|A'}}} \right) = {\rm{\;P}}\left( {{\rm{E|B'}}} \right)$ we get

(13)

$${\rm{P}}\left( {{\rm{E|A}}} \right) - {\rm{P}}\left( {{\rm{E|B}}} \right) = {{{\rm{P}}\left( {{\rm{E|A}}} \right){\rm{P}}\left( {{\rm{E|A'}}} \right)} \over {{\rm{P}}\left( {{\rm{E|A'}}} \right)}} - \;{{{\rm{P}}\left( {{\rm{E|B}}} \right){\rm{P}}\left( {{\rm{E|B'}}} \right)} \over {{\rm{P}}\left( {{\rm{E|B'}}} \right)}} = \;{\rm{P}}\left( {{\rm{E|A'}}} \right)\;\left( {\;{\rm{R}}{{\rm{R}}_{\rm{A}}} - \;{\rm{R}}{{\rm{R}}_{\rm{B}}}} \right)$$

From (13) and (10), we can derive

(14)

$${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {\rm{B}} \right)\;{\rm{iff}}\;{\rm{P}}\left( {{\rm{E|A'}}} \right)\;\left( {{\rm{R}}{{\rm{R}}_{\rm{A}}} - \;{\rm{R}}{{\rm{R}}_{\rm{B}}}} \right) \gt {{{\rm{a}} - {\rm{b}}} \over {{\rm{u}} - {\rm{u'}}}}$$

Just as in the case of a choice involving a single trial, neither costs, utilities, nor RR_A and RR_B determine ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ . As a result, assuming ${\rm{a}} \ne {\rm{b}},\;$ knowing costs, utilities, RR_A, and RR_B does not always provide us with sufficient information to decide between A and B. For example, suppose ${\rm{R}}{{\rm{R}}_{\rm{A}}} = 1.25$ , ${\rm{R}}{{\rm{R}}_{\rm{B}}} = 1.02,$ and ${{{\rm{a}} - {\rm{b}}} \over {{\rm{u}} - {\rm{u'}}}} = 0.15$ . Then, ${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {\rm{B}} \right)$ if ${\rm{P}}\left( {{\rm{E|A'}}} \right) = {\rm{P}}\left( {{\rm{E|B'}}} \right) = {\rm{\;}}0.7$ , but ${\rm{EU}}\left( {\rm{A}} \right) \lt {\rm{EU}}\left( {\rm{B}} \right)$ if ${\rm{P}}\left( {{\rm{E|A'}}} \right) = {\rm{P}}\left( {{\rm{E|B'}}} \right) = 0.5$ . Both baseline risks are compatible with but unknown given costs, utilities, RR_A, and RR_B. We cannot decide whether A or B maximizes expected utility here. Relative measures do not always suffice to decide between treatments under partial knowledge of baseline risks.

It is worth noting that relative measures are apt to inform choices given equal costs of treatments and partial knowledge of baseline risks. Using (14) and assuming ${\rm{a}} = {\rm{b\;}}$ we get:

(15)

$${\rm{EU}}\left( {\rm{A}} \right) \gt {\rm{EU}}\left( {\rm{B}} \right){\rm{\;iff\;P}}\left( {{\rm{E|A'}}} \right){\rm{\;}}\left( {{\rm{R}}{{\rm{R}}_{\rm{A}}} - {\rm{\;R}}{{\rm{R}}_{\rm{B}}}} \right) \gt 0$$

We know that ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ = ${\rm{P}}\left( {{\rm{E|B'}}} \right) \gt 0$ if RR_A and RR_B are well defined. Thus, if we know RR_A and RR_B we know whether ${\rm{P}}\left( {{\rm{E|A'}}} \right){\rm{\;}}\left( {{\rm{R}}{{\rm{R}}_{\rm{A}}} - {\rm{\;R}}{{\rm{R}}_{\rm{B}}}} \right) \gt 0$ and therefore whether ${\rm{EU}}\left( {\rm{A}} \right){\rm{\;}} \gt \;{\rm{EU}}\left({\rm{B}} \right)$ . Again, I will henceforth bracket this unusual case. Overall, in the case of partial knowledge, absolute measures but not relative ones always provide sufficient probabilistic information to choose between A and B.

These results can also contribute to debates on using placebos versus active comparators, that is already used treatments, as control group treatments in trials. Reviews suggest that placebo-controlled studies or studies with no treatment for the control group are more common than ones using active comparators (Hochman and McCormick Reference Hochman and McCormick2010; Cipriani et al. Reference Cipriani, Ioannidis, Rothwell, Paul Glasziou, Hernandez, Tomlinson, Simes and Naci2020). Yet, the use of placebos is often criticized for ethical reasons; if an effective treatment exists giving a placebo to a trial participant implies withholding this treatment from her (Emanuel and Miller Reference Emanuel and Miller2001; European Medicines Agency 2001). Indeed, research guidelines only allow using placebos under specific conditions (World Medical Association 2013). Moreover, researchers demand more active comparator trials on grounds of them establishing the comparative effectiveness of treatments that matters for decision-making (Cipriani et al. Reference Cipriani, Ioannidis, Rothwell, Paul Glasziou, Hernandez, Tomlinson, Simes and Naci2020; Naci et al. Reference Naci, Maximilian Salcher-Konrad, Kesselheim, Lise Rochaix, Redberg, Jackson, Sarah Garner, Stroup and Cipriani2020).

The preceding results add a decision-theoretic nuance to this debate. Unless decision makers know baseline risks, to compare treatments between trials using absolute measures we need to establish equality of baseline risks. Researchers could ensure good grounds for such equality by using the same control group treatment across trials. On the one hand, researchers asking for more active comparator trials then ought to recognize the importance of using the same active comparator across trials to inform choices between treatments tested in these trials. On the other hand, researchers could also use the same placebo across studies to establish equality of baseline risks. Note that this decision-theoretic nuance only applies to the debate on control group treatments if absolute risks are not reported in addition to or instead of outcome measures. If they are, then no matter how the trials are designed, we can compare the absolute risks for deciding between treatments. This insensitivity to trial design could pose an advantage of using absolute risks for informing decisions between treatments tested in different trials.

5. Conclusion

Let us return to our example of the Heart Protection Study: How should we measure the effect size of simvastatin to inform a rational choice on taking this drug? I have argued that absolute measures are at least as good as, if not better than, relative ones for informing such a decision across different scenarios. When choosing between a treatment and the corresponding control group treatment, absolute measures but not relative ones always provide sufficient information for choosing rationally. When choosing between treatments tested in distinct trials, we need information about the difference in absolute risks. Absent some knowledge about the relevant baseline risks, neither an absolute nor a relative outcome measure provides such information. If we know these baseline risks, then we can calculate absolute risks from either absolute or relative outcome measures. If we only have partial knowledge of the baseline risks, then only absolute measures always provide sufficient information about the decision-relevant difference in absolute risks. Finally, we can also use absolute and baseline risks to inform rational decisions between treatments. Overall, absolute measures dominate relative ones for informing rational decision making. These results support the following options for using outcome measures to inform treatment decisions:

Option 1: Use absolute measures and either absolute risks or baseline risks or ensure equality of baseline risks.

Option 2: Use relative measures and baseline risks.

Option 3: Use absolute and baseline risks instead of outcome measures.

Conclusions drawn from idealized decision-theoretic models underdetermine which outcome measures or absolute and baseline risks researchers should report to inform actual decision makers. Important further considerations to justify reporting principles based on the three described options include ethical aspects (see Schroeder Reference Schroeder2022), insights on how to best communicate risks to people (see Spiegelhalter Reference Spiegelhalter2017), and the relative advantages of extrapolating absolute outcome measures, relative ones, and absolute and baseline risks. Nevertheless, the decision-theoretic dominance of absolute measures challenges both the current practice of only reporting relative measures and suggestions to report both. Instead, a safer conjecture is to always report absolute measures or absolute and baseline risks.

Acknowledgments

For feedback and discussion, I am grateful to Jacob Stegenga, Alexander Bird, Arif Ahmed, Neil Dewar, Nicholas Makins, Cristian Larroulet Philippi, Adrià Segarra, Sophia Crüwell, Adrian Erasmus, Oliver Holdsworth, Hamed Tabatabaei Ghomi, Charlotte Zemmel, Jonathan Fuller, and Zinhle Mncube. I also thank audiences at BSPS 2022 Annual Conference, EENPS 2022 Conference, the 28th Biennial Meeting of the Philosophy of Science Association and the Third PhilInBioMed Network Meeting. Finally, I thank the Open-Oxford-Cambridge AHRC Doctoral Training Partnership and the Harding Distinguished Postgraduate Scholars Programme Leverage Scheme for their financial support.

Footnotes

¹ I explain these outcome measures in section 2.

² Other measures such as the relative risk reduction and numbers needed to treat can be derived from these two measures.

³ Readers who think causation is necessary for rational decision making may assume a justified inference to the causal effectiveness of the considered treatments.

⁴ If ${\rm{EU}}\left( {\rm{X}} \right) = {\rm{EU}}\left( {\rm{Y}} \right)$ , then it is usually considered rationally permissible to choose either treatment.

⁵ This verdict holds even though the mere difference between ${\rm{P}}\left( {{\rm{E|A}}} \right)$ and ${\rm{P}}\left( {{\rm{E|A'}}} \right)$ , i.e. RD_A, always already suffices for deciding between A and A′ (see section 4.1).

⁶ Even if researchers were to use solely absolute and baseline risks to inform decisions, they would still need outcome measures as evidence for causal inferences (see section 2).

⁷ From (8) it follows that knowing costs, utilities, RR_A and ${\rm{P}}\left( {{\rm{E|A'}}} \right)\;$ is always sufficient for a choice between A and A′.

References

Bradley, Richard. 2017. Decision Theory with a Human Face. Cambridge: Cambridge University Press.Google Scholar

Broadbent, Alex. 2013. Philosophy of Epidemiology. London: Palgrave Macmillan.Google Scholar

Cipriani, Andrea, Ioannidis, John P. A., Rothwell, Peter M., Paul Glasziou, Tianjing Li, Hernandez, Adrian F., Tomlinson, Anneka, Simes, John, and Naci, Huseyin. 2020. “Generating Comparative Evidence on New Drugs and Devices after Approval.” Lancet 395 (10228):998–1010.Google Scholar

Elliott, Marissa H., Skydel, Joshua J., Dhruva, Sanket S., Ross, Joseph S., and Wallach, Joshua D.. 2021. “Characteristics and Reporting of Number Needed to Treat, Number Needed to Harm, and Absolute Risk Reduction in Controlled Clinical Trials, 2001–2019.” JAMA Internal Medicine 181 (2):282–84.Google Scholar

Emanuel, Ezekiel J., and Miller, Franklin G.. 2001. “The Ethics of Placebo-Controlled Trials: A Middle Ground.” New England Journal of Medicine 345 (12):915–19.Google Scholar

European Medicines Agency. 2001. “ICH Topic E10: Choice of Control Group in Clinical Trials. Note for Guidance on Choice of Control Group in Clinical Trials.” CPMP/ICH/364/96. European Medicines Agency. https://www.ema.europa.eu/en/ich-e10-choice-control-group-clinical-trials.Google Scholar

Fuller, Jonathan, and Flores, Luis J.. 2015. “The Risk GP Model: The Standard Model of Prediction in Medicine.” Studies in History and Philosophy of Biological and Biomedical Sciences 54:49–61.Google Scholar

Heart Protection Study Collaborative Group. 2002. “MRC/BHF Heart Protection Study of Cholesterol Lowering with Simvastatin in 20,536 High-Risk Individuals: A Randomised Placebo-Controlled Trial.” Lancet 360 (9326):7–22.Google Scholar

Hochman, Michael, and McCormick, Danny. 2010. “Characteristics of Published Comparative Effectiveness Studies of Medications.” JAMA 303 (10):951–58.Google Scholar

Hoefer, Carl, and Krauss, Alexander. 2021. “Measures of Effectiveness in Medical Research: Reporting Both Absolute and Relative Measures.” Studies in History and Philosophy of Science Part A 88:280–83.Google Scholar

Naci, Huseyin, Maximilian Salcher-Konrad, Aaron S. Kesselheim, Beate Wieseler, Lise Rochaix, Rita F. Redberg, Georgia Salanti, Jackson, Emily, Sarah Garner, T. Stroup, Scott and Cipriani, Andrea. 2020. “Generating Comparative Evidence on New Drugs and Devices before Approval.” Lancet 395 (10228):986–97.Google Scholar

Schroeder, S. Andrew. 2022. “An Ethical Framework for Presenting Scientific Results to Policy-Makers.” Kennedy Institute of Ethics Journal 32 (1):33–67.Google Scholar

Spiegelhalter, David. 2017. “Risk and Uncertainty Communication.” Annual Review of Statistics and Its Application 4 (1):31–60.Google Scholar

Sprenger, Jan, and Stegenga, Jacob. 2017. “Three Arguments for Absolute Outcome Measures.” Philosophy of Science 84 (5):840–52.Google Scholar

Stegenga, Jacob. 2015. “Measuring Effectiveness.” Studies in History and Philosophy of Biological and Biomedical Sciences 54:62–71.10.1016/j.shpsc.2015.06.003CrossRef Google Scholar PubMed

Stegenga, Jacob. 2018. Medical Nihilism. Oxford: Oxford University Press.Google Scholar

Stegenga, Jacob, and Kenna, Aaron. 2017. “Absolute Measures of Effectiveness.” In Measurement in Medicine: Philosophical Essays on Assessment and Evaluation, edited by McClimans, Leah, 35–51. London: Rowman & Littlefield.Google Scholar

Stewart, Antony. 2016. Basic Statistics and Epidemiology: A Practical Guide. 4th ed. Boca Raton, FL: Taylor & Francis Ltd.Google Scholar

World Medical Association. 2013. “World Medical Association Declaration of Helsinki: Ethical Principles for Medical Research Involving Human Subjects.” JAMA 310 (20):2191–94.Google Scholar

Worrall, John. 2010. “Do We Need Some Large, Simple Randomized Trials in Medicine?” In EPSA Philosophical Issues in the Sciences, edited by Suarez, Mauricio, Dorato, Mauro, and Redei, Miklos, 289–301. Dordrecht: Springer.Google Scholar

Article contents

How to Measure Effect Sizes for Rational Decision Making

Abstract

1. Introduction

2. Measuring effect sizes for binary variables

3. Modeling two scenarios for choosing treatments

4. The decision-theoretic dominance of absolute measures

4.1. Choices involving outcome measures from a single trial

4.2. Choices involving outcome measures from distinct trials

4.2.1. Ignorance

4.2.2. Full knowledge

4.2.3. Partial knowledge

5. Conclusion

Acknowledgments

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests