Disentangling potential causal effects of educational duration on well-being, and mental and physical health outcomes

Margot P. van de Weijer; Perline A. Demange; Dirk H.M. Pelt; Meike Bartels; Michel G. Nivard

doi:10.1017/S003329172300329X

Disentangling potential causal effects of educational duration on well-being, and mental and physical health outcomes

Published online by Cambridge University Press: 15 November 2023

Margot P. van de Weijer

and

Margot P. van de Weijer*: Affiliation:
Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Amsterdam Public Health Research Institute, Amsterdam University Medical Centres, Amsterdam, The Netherlands Genetic Epidemiology, Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
Perline A. Demange: Affiliation:
Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Amsterdam Public Health Research Institute, Amsterdam University Medical Centres, Amsterdam, The Netherlands
Dirk H.M. Pelt: Affiliation:
Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Amsterdam Public Health Research Institute, Amsterdam University Medical Centres, Amsterdam, The Netherlands
Meike Bartels: Affiliation:
Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Amsterdam Public Health Research Institute, Amsterdam University Medical Centres, Amsterdam, The Netherlands
Michel G. Nivard: Affiliation:
Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Amsterdam Public Health Research Institute, Amsterdam University Medical Centres, Amsterdam, The Netherlands
*: Corresponding author: Margot P. van de Weijer; Email: m.p.vandeweijer@amsterdamumc.nl

Article contents

Abstract
Background
Methods
Results
Conclusions
Introduction
Methods
Results
Discussion
Author contributions
Funding statement
Competing interests
Ethical standards
Footnotes
References

Rights & Permissions

Abstract

Background

Extensive research has focused on the potential benefits of education on various mental and physical health outcomes. However, whether the associations reflect a causal effect is harder to establish.

Methods

To examine associations between educational duration and specific aspects of well-being, anxiety and mood disorders, and cardiovascular health in a sample of European Ancestry UK Biobank participants born in England and Wales, we apply four different causal inference methods (a natural policy experiment leveraging the minimum school-leaving age, a sibling-control design, Mendelian randomization [MR], and within-family MR), and assess if the methods converge on the same conclusion.

Results

A comparison of results across the four methods reveals that associations between educational duration and these outcomes appears predominantly to be the result of confounding or bias rather than a true causal effect of education on well-being and health outcomes. Although we do consistently find no associations between educational duration and happiness, family satisfaction, work satisfaction, meaning in life, anxiety, and bipolar disorder, we do not find consistent significant associations across all methods for the other phenotypes (health satisfaction, depression, financial satisfaction, friendship satisfaction, neuroticism, and cardiovascular outcomes).

Conclusions

We discuss inconsistencies in results across methods considering their respective limitations and biases, and additionally discuss the generalizability of our findings in light of the sample and phenotype limitations. Overall, this study strengthens the idea that triangulation across different methods is necessary to enhance our understanding of the causal consequences of educational duration.

Keywords

causality education health Mendelian randomization well-being within-family

Type: Original Article
Information: Psychological Medicine , Volume 54 , Issue 7 , May 2024 , pp. 1403 - 1418

DOI: https://doi.org/10.1017/S003329172300329X [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Introduction

There is an extensive body of research examining associations between educational attainment (EA) and mental and physical health outcomes. Existing studies have pointed to EA (measured as years of education, age at leaving education, or diploma obtained) as a correlate of well-being (Bücker, Nuraydin, Simonsmeier, Schneider, & Luhmann, Reference Bücker, Nuraydin, Simonsmeier, Schneider and Luhmann2018), depression (Lorant et al., Reference Lorant, Deliège, Eaton, Robert, Philippot and Ansseau2003), quality-adjusted life years (Furnée, Groot, & Van Den Brink, Reference Furnée, Groot and Van Den Brink2008), different cardiovascular outcomes (Khaing, Vallibhakara, Attia, McEvoy, & Thakkinstian, Reference Khaing, Vallibhakara, Attia, McEvoy and Thakkinstian2017), and a wide range of other diseases and disorders (Choi et al., Reference Choi, Weekley, Chen, Li, Kurella Tamura, Norris and Shlipak2011; Putrik et al., Reference Putrik, Ramiro, Keszei, Hmamouchi, Dougados, Uhlig and Boonen2016; Telfair & Shelton, Reference Telfair and Shelton2012). Often, EA is interpreted as a modifiable risk factor that might improve outcomes in these different domains, but confounding and reverse causation are difficult to rule out.

Correlational evidence provides us with a first indication of associations between education and (mental) health outcomes. For example, a meta-analysis by Bücker et al. suggests a small-to-medium positive correlation between academic achievement and subjective well-being (SWB) that was stable across different measures of academic achievement and SWB (Bücker et al., Reference Bücker, Nuraydin, Simonsmeier, Schneider and Luhmann2018). Similarly, a small but significant correlation has been found between academic achievement and subsequent depression through meta-analysis (Huang, Reference Huang2015). In addition, lower education has been associated with a higher risk of different cardiovascular outcomes (Khaing et al., Reference Khaing, Vallibhakara, Attia, McEvoy and Thakkinstian2017), and lower self-reported health (Furnée et al., Reference Furnée, Groot and Van Den Brink2008).

Such meta-analytic studies offer the opportunity to evaluate and summarize the existing literature, which allows us to identify correlations worth exploring in more detail. However, it is difficult to establish whether these associations reflect causal associations or whether they might be caused by residual confounding (e.g. genetics, socioeconomic status) (Fewell, Davey Smith, & Sterne, Reference Fewell, Davey Smith and Sterne2007; Sobel, Reference Sobel2000). While confounders can be considered in meta-analysis, it is rarely the case that a large number of studies include the same confounders. Moreover, even if confounding factors could be ruled out, correlational studies would not offer clarity on the direction of causation. For example, while higher levels of education might lead to better access to healthcare, less health problems, and higher health (van der Heide et al., Reference van der Heide, Wang, Droomers, Spreeuwenberg, Rademakers and Uiters2013), the reverse could also be true: for example, people in good health might have better possibilities to focus on education and reach higher levels of education than those in poor health (Kawachi, Adler, & Dow, Reference Kawachi, Adler and Dow2010).

A quasi-experimental design that has been applied widely in educational research is to consider compulsory schooling laws where the legal minimum school-leaving age is increased (Brunello, Fort, & Weber, Reference Brunello, Fort and Weber2009; Clark & Royer, Reference Clark and Royer2013; Glymour & Manly, Reference Glymour and Manly2018; Lleras-Muney, Reference Lleras-Muney2002) as an exposure over which individuals can be reasonably assumed to have no control. The implementation of these laws serves as a natural experiment where people are quasi-randomly separated in two groups (before and after, or subject to or not subject to the policy change). Assuming that this policy change only directly impacts the number of years someone stays in education, and assuming that is unrelated to confounding factors, this policy change can be used to estimate the direct effect of educational duration on diverse outcomes. Using this design, researchers have found positive effects of educational duration on mental health (Chevalier & Feinstein, Reference Chevalier and Feinstein2006; Graeber, Reference Graeber2017), cognitive abilities (Banks & Mazzonna, Reference Banks and Mazzonna2012), mortality (Davies, Dickson, Smith, Van Den Berg, & Windmeijer, Reference Davies, Dickson, Smith, Van Den Berg and Windmeijer2018), income (Davies et al., Reference Davies, Dickson, Smith, Van Den Berg and Windmeijer2018; Grenet, Reference Grenet2013), and cardiovascular health (Hamad, Nguyen, Bhattacharya, Glymour, & Rehkopf, Reference Hamad, Nguyen, Bhattacharya, Glymour and Rehkopf2019). Nevertheless, there is still considerable disagreement across different studies employing this design due to heterogeneity in study features such as the included instrument, the examined number of years around the reform, or the populations included (see Hamad, Elser, Tran, Rehkopf, & Goodman, Reference Hamad, Elser, Tran, Rehkopf and Goodman2018). Additionally, the policy shift only affects those that would otherwise have left school earlier, meaning that we study a Local Average Treatment Effect (LATE) in this context. This is important to keep in mind when interpreting results, since this limits the generalizability of findings to those not affected by the reform (Ichino & Winter-Ebmer, Reference Ichino and Winter-Ebmer1999). For the subgroup of individuals affected by the reform, we also assume monotonicity, i.e. there are no individuals for whom the reform decreases their educational duration.

Another quasi-experimental design controlling for several forms of confounding using observational data is the sibling-control design. Comparing outcomes of biological siblings brought up in the same family allows to control for shared environmental confounding (e.g. socioeconomic conditions during childhood), and for shared genetic predispositions. However, factors unique to one of the siblings but not the other and measurement error can still bias the results of sibling-control studies (Frisell, Reference Frisell2021). Additionally, even if we could control for all unshared confounders, the method would not help us determine the direction of causation. If we find that siblings who score higher on well-being also stay in school longer, this could be because well-being causally increases school-leaving age, but the reverse is as likely: school-leaving age might causally increase well-being.

In Mendelian randomization (MR), one or more genetic variant(s) robustly associated with a predictor variable are used as instrumental variables to examine a potentially causal association between a predictor and outcome. The approach relies on Mendel's laws of segregation and independent assortment, which assume that genetic variants are inherited randomly from one's parents and independent from other genetic variants. Assuming that (1) the genetic variants are robustly associated with the exposure, (2) there are no unmeasured confounders of the instrument–outcome association, and (3) the genetic variants are not associated with the outcome of interest other than via the exposure (no pleiotropy), the genetic variants for an exposure can be used as instruments to examine potential causality between the exposure and an outcome. For example, a genetic variant associated with educational duration that is also indirectly associated with higher well-being (through its association with educational duration) provides supportive evidence of a causal association from education on well-being. Multiple studies have used MR to examine causal links between EA and health-related traits, with suggestive evidence for causal influences on traits like alcohol consumption, physical activity, and cardiovascular outcomes (Davies, Dickson, Davey Smith, Windmeijer, & van den Berg, Reference Davies, Hill, Anderson, Sanderson, Deary and Davey Smith2019b; Gill, Efstathiadou, Cawood, Tzoulaki, & Dehghan, Reference Gill, Efstathiadou, Cawood, Tzoulaki and Dehghan2019). Importantly, these associations are only valid if the three key assumptions mentioned above are met. Unfortunately, it is often difficult to evaluate if the assumption of no pleiotropy is met, as many, or even most, genetic variants exert pleiotropic effects. In addition, unmodeled assortative mating, dynastic effects, and population stratification can spuriously induce associations between the genetic variant(s) and outcomes (Brumpton et al., Reference Brumpton, Sanderson, Heilbron, Hartwig, Harrison, Vie and Davies2020).

A further development of MR is the application of this method in the context of within-family analysis (Brumpton et al., Reference Brumpton, Sanderson, Heilbron, Hartwig, Harrison, Vie and Davies2020). By performing genetic instrumental variable within sibling pairs, we directly control for the influences of assortative mating, population stratification (siblings share the same population background), and dynastic effects. First, since genetic variants inherited by siblings are random within a family, genotype differences between siblings will be independent of assortative mating. Second, since the effects of parental wealth and status on their offspring is likely similar across siblings, genetic differences between siblings will be independent of dynastic effects. Lastly, genetic differences between siblings are independent of population stratification. Using within-sibling MR, Brumpton et al. demonstrate that conventional non-family MR estimates for the association between taller height/lower body mass index (BMI) and increased EA were almost entirely attenuated in the context of within-family MR (Brumpton et al., Reference Brumpton, Sanderson, Heilbron, Hartwig, Harrison, Vie and Davies2020). Similarly, Davies et al. used a sibling sample to check if identified associations between EA and different health measures were due to dynastic effects or assortative mating (Davies et al., Reference Davies, Dickson, Davey Smith, Windmeijer and van den Berg2019a, Reference Davies, Hill, Anderson, Sanderson, Deary and Davey Smith2019b). They found little evidence that the within-family results were different from bivariate two-sample MR, but also note a probable lack of power.

While within-family MR has important advantages over conventional MR, it is nevertheless still fallible to unmet assumptions (e.g. the presence of pleiotropy) and is also less powerful as it is applied only in siblings within a larger sample. For both the conventional and the within-family MR, we assume monotonicity (i.e. the genetic variants do not have opposite effects in subgroups of people) and interpret identified effects as LATE.

There are various methods for examining causality in observational data, but all rely on strict assumptions that often are difficult to meet or evaluate. A way in which we can reduce our reliance on these individual assumptions is by applying multiple methods and evaluate the consistency of results and potential discrepancies therein, in light of the biases that accompany each of these methods. In a study where the effect of BMI on different outcomes was assessed, the authors used both MR (subject to family-level confounding) and non-genetic and genetic within-family analyses (subject to reverse causation) (Howe et al., Reference Howe, Kanayalal, Harrison, Beaumont, Davies, Frayling and Tyrrell2020). By verifying that these methods converge upon the same conclusion, the authors increase the certainty that the results were not a by-product of their respective biases. In a similar fashion, Davies et al. examined potential causal effects of education on health, mortality, and income using both a design where they leverage the raising of school-leaving age (ROSLA) and MR, with both methods suggesting similar effects for almost all outcomes (Davies, Dickson, Davey Smith, Windmeijer, & van den Berg, Reference Davies, Dickson, Davey Smith, Windmeijer and van den Berg2021).

For the current project, we are interested in causal influences on specific aspects of well-being, anxiety and mood disorders, and cardiovascular health. As educational effects on well-being are of primary interest to us, we depart from treating ‘well-being’ as a single unified outcome and separately consider effects on satisfaction with family relations, work, friendships, health, and finances (Schimmack, Reference Schimmack2008). We rely on four widely accepted techniques for causal inference: we make use of a random natural policy shift in England and Wales in September 1972 that raised school-leaving age from 15 to 16 but is unlikely to be related to confounding factors. We perform analyses within sibships to control for shared environmental confounders, and partly control for shared genetics. We make use of an index of genetic variation related to EA as an instrumental variable in MR. Finally, we combine the genetic instrumental variable with within-family analysis in sibling pairs. We apply those techniques in a single homogenously measured sample (the UK Biobank [UKB]), minimizing variation in results due to differences in measurement. By assessing if these different methods converge on the same conclusion in terms of whether or not there is a causal effect of educational duration on the different outcomes, we can be more confident in our conclusions on the potential causal relation between education and the different outcomes.

Methods

This project was pre-registered at the Open Science Framework (https://osf.io/s6gha). Deviations from the pre-registration are indicated throughout the manuscript.

Sample

We used data from the UKB, a large UK cohort study which collected genetic and phenotypic data on ±500 000 participants between 40 and 69 years old at recruitment (Bycroft et al., Reference Bycroft, Freeman, Petkova, Band, Elliott, Sharp and Marchini2018). For the current project, we selected individuals of European ancestry (a decision taken to minimize ancestral confounding in genetic analyses) that were born in England and Wales (to ensure participants were likely affected by the school-leaving age reform). Specific further sample selection procedures for the four different analyses are described below per analysis, and a flowchart of sample selection per analysis is found in online Supplementary Fig. S1.

Education variable

We used UKB data-field 845 ‘age completed full-time education’ as our education exposure variable. Participants were asked to answer the question ‘at what age did you complete your continuous full-time education?’. If someone provided an answer below 5, or an answer higher than their age, the answer was rejected. If someone answered with an age higher than 40, the participant was asked to confirm their answer. Since the question was not collected in participants who indicated having a college or university degree, we, in line with the literature (Davies et al., Reference Davies, Dickson, Smith, Van Den Berg and Windmeijer2018; Plotnikov et al., Reference Plotnikov, Williams, Atan, Davies, Mojarrad and Guggenheim2020), imputed their age at completed full-time education as 21. In case someone provided an answer on more than one instance, we used the last available answer as the age at which one completed their full-time education. If the answer at the later time-point indicated a lower age than a previous answer (N = 72), we coded the answer as missing.

Outcome variables

General information on item construction and cleaning procedures for these variables can be found in the Supplementary Methods. The following self-report items were included as well-being outcome variables: general happiness based on happiness (UKB ID 4526) and general happiness (UKB ID 20459), family relationship satisfaction (UKB ID 4559), financial situation satisfaction (UKB ID 4581), friendship satisfaction (UKB ID 4570), work/job satisfaction (UKB ID 4537), health satisfaction based on health satisfaction (UKB ID 4548) and general happiness with own health (UKB ID 20459), and belief that own life is meaningful (UKB ID 20460). All items were coded so that a higher score indicated a higher level of well-being. For neuroticism, we included a summary score (UKB ID 20127) that was based on 12 neurotic domain self-report items. We used a combination of medical record data (UKB ID 41270) and self-report data (UKB ID 20002) to create binary variables reflecting if someone was ever diagnosed with depression, anxiety, or manic or bipolar disorder. Lastly, a binary variable indicating cardiovascular problems was constructed based on vascular/heart problems diagnosed by a doctor (UKB ID 6150) or self-reported (UKB ID 20002).

Control outcomes

We selected four negative control outcomes: height (UKB ID 50), birthweight (UKB ID 20022), comparative body size at age 10 (UKB ID 1687), and comparative height size at age 10 (UKB ID 1697). It is unlikely these variables are causally influenced by additional years of schooling, but the presence of confounding parental variables (e.g. parental SES) might lead to observable but false-positive associations. As a positive control outcome, we included average total household income before tax (UKB ID 738), which was split into the four yes/no dichotomous variables: income over 18k, income over 31k, income over 52k, and income over 100k. General information on item construction and cleaning procedures for these variables can also be found in the Supplementary Methods.

Covariates

As phenotypic covariates, we included sex (UKB ID 31), assessment center (UKB ID 54), family size (based on number of [adopted] siblings, UKB IDs 1873, 3972, 1883, and 3982), season of birth (based on month of birth, UKB ID 52), and year of birth (UKB ID 34). Genetic covariates included the first 10 genomic principal components (PCs) and batch (UKB ID 22000).

Genotype data

Single-nucleotide polymorphisms (SNPs) from HapMap3 (CEU: Utah residents with Northern and Western European Ancestry) (1 345 801 SNPs) were filtered out of the imputed dataset. A pre-principal component analysis (PCA) quality control (QC) was done on unrelated individuals, filtering out SNPs with minor allele frequency (MAF) <0.01 and missingness >0.05, leaving 1 252 123 SNPs. After filtering out individuals with non-European ancestry, the SNP QC was repeated on unrelated Europeans (N = 312 927). SNPs with MAF <0.01, missingness >0.05, and Hardy-Weinberg equilibrium (HWE) p < 10⁻¹⁰ were filtered, leaving 1 246 531 SNPs. The HWE p-value threshold of 10⁻¹⁰ was based on: http://www.nealelab.is/blog/2019/9/17/genotyped-snps-in-uk-biobank-failing-hardy-weinberg-equilibrium-test. A final dataset of 1 246 531 QC-ed SNPs was created for 456 028 UKB subjects of European ancestry.

UKB correction

While the UKB is a valuable dataset where a large number of participants have been genotyped and extensively phenotyped, it is not necessarily representative of the UK population due to confounding from volunteer bias (Batty, Gale, Kivimäki, Deary, & Bell, Reference Batty, Gale, Kivimäki, Deary and Bell2020). To partially correct for volunteer bias, we calculate and include inverse probability weights using procedures by van Alten, Domingue, Galama, and Marees (Reference van Alten, Domingue, Galama and Marees2022). The respondents are weighted using weights based on sex, year of birth (5-year cohort), education level, ethnicity, region of residence (Census Greater London Area), tenure of dwelling, employment status, number of cars in the household, a dummy indicating whether the person lives in a single-person household, and self-reported health. For a more detailed description, see van Alten et al. (Reference van Alten, Domingue, Galama and Marees2022).

Analyses

We use four different methods to examine potential causal effects between educational duration and our outcomes. Table 1 provides an overview of these four methods, including their respective advantages and limitations. Sample descriptives per method can be found in Table 2. Below, we describe each of the four methods in more detail. All analysis code is available at https://github.com/margotvandeweijer/EA_causality. All continuous outcomes were standardized so that the resulting effect sizes reflect the s.d. increase in the outcomes for each additional year of education (see Table 2 for an overview of the s.d.s of the included variables).

Table 1. Overview of different methods used in the present study

* All methods are susceptible for bias from selection/collider bias.

Table 2. Sample descriptives full sample, and per analysis type (for those with education data)

*For some participants, sex information is missing.

Instrumental variable analysis leveraging the ROSLA

We used the ROSLA policy reform where the minimum school-leaving age was increased from 15 to 16 in England and Wales to examine the effects of longer schooling on our different outcomes. We selected a sample of UKB participants born in a 5-year window (1 February 1955 to 1 February 1960) around the reform (1 September 1972), and excluded related individuals (KING kinship coefficient >0.0884) using the ukbtools package in R (Hanscombe, Coleman, Traylor, & Lewis, Reference Hanscombe, Coleman, Traylor and Lewis2019). A binary ROSLA indicator was created for this subset of participants that indicates if a participant was born before (affected = 0) or after (affected = 1) 1 September 1957 and was thus affected by the reform or not. Additionally, we transformed the age at which one left full-time education variable into a binary variable that indicates if an individual stayed in school after age 15 or not (Davies et al., Reference Davies, Dickson, Davey Smith, Windmeijer and van den Berg2019a). Next, we used two-stage least squares (2SLS) instrumental variable analyses using the fixest R package (Bergé, Reference Bergé2018), where in the first stage the binary education variable was included as the dependent variable and the binary ROSLA indicator was included as the instrument. In the second stage, we regressed all our standardized outcome variables on the fitted education values from the first-stage regression. Both stages included the phenotypic covariates. For comparative purposes, we also run regular (non-pre-registered) ordinary least squares (OLS) regression in the same sample the binary education predictor was used to predict the different outcomes (including the same covariates as the ROSLA analyses). To examine the robustness of the ROSLA results, we repeated the analyses using samples born in a 2 and 10 years window around the reform.

Sibling control design

We perform analyses within sibships to control for shared familial background characteristics, and partly control for genetic effects. Biological sibships in the UKB dataset are defined as participants with a kinship coefficient between ${1 \over {2^{5/2}}}$ and ${1 \over {2^{3/2}}}$ and a probability of zero identical-by-state sharing >0.0012 (Bycroft et al., Reference Bycroft, Freeman, Petkova, Band, Elliott, Sharp and Marchini2018; Manichaikul et al., Reference Manichaikul, Mychaleckyj, Rich, Daly, Sale and Chen2010). Individuals indicating they were adopted were removed from this sample. For each sibship j with i siblings, we start by calculating the average age at which sibships left full-time education $\overline {edu_{oj}} = \sum\nolimits_1^m {edu_{ij}/m}$. Next, we calculate each sibling's deviation from the sibship average: $edu_{\Delta ij} = edu_{ij}-\overline {edu_{0j}}$. We use these estimates in a linear model where each outcome Y _ij for sibling i in sibship j is predicted as follows:

$$Y_{ij} = \beta _{00} + \beta _B\overline {edu_{oj}} + \beta _Wedu_{\Delta ij} + covariates + e$$

where β _B is the between-sibship effect estimating if the average school-leaving age within sibships is associated with our outcomes, and β _W is the within-sibship effect estimating if a sibling deviating from the sibship school-leaving age average is associated with our outcome measures. Since we examine the effect of these within- and between-sibship estimates on the outcomes of individual siblings, we excluded sibships where only one sibling reported on educational duration, but we did not exclude sibships where not all siblings reported on one or more outcome measures. We report robust standard errors taking into account familial clustering, calculated using the coeftest function from the lmtest r-package (Hothorn et al., Reference Hothorn, Zeileis, Farebrother (pan.f), Cummins (pan.f), Millo and Mitchell2022). All phenotypic covariates were included in the analyses.

Mendelian randomization

We used polygenic scores (PGS) for EA in 2SLS instrumental variable analysis as genetic instruments for testing a directed causal association between educational duration and the outcomes. PGS are aggregate measures of genetic susceptibility for a trait of interest weighted by effect size estimates from genome-wide association studies (Choi, Mak, & O'Reilly, Reference Choi, Mak and O'Reilly2020). To calculate the PGS for EA, we used the summary statistics from the Genome Wide Association Study (GWAS) of years of education by Lee et al. (Reference Lee, Wedow, Okbay, Kong, Maghzian, Zacher and Cesarini2018), excluding 23andme and British cohorts (N = ~245k). PGS were constructed from the set of genome-wide significant HapMap3 SNPs (p < 5 × 10⁻⁸), pruned to be independent (using the package TwoSampleMR [Hemani et al., Reference Hemani, Zheng, Elsworth, Wade, Haberland, Baird and Haycock2018]) using a clumping window of 1000 kb and a linkage disequilibrium (LD) cut-off of R ² = 0.1. The PGS prediction accuracy for EA was assessed based on the incremental R ² when including the PGS in a regression with all covariates.

Next, the PGS was used as a genetic instrument in 2SLS instrumental variable analysis in a sample of unrelated UKB participants (KING kinship coefficient >0.0884). In the first stage, we predicted age at which one left full-time education (standardized) from the PGSs. In the second stage, the outcome and control outcomes were predicted from the fitted education values. All phenotypic and genetic covariates were included as covariates in both stages. The MR analyses were conducted using the fixest package in R (Bergé, Reference Bergé2018). For comparison, we also perform regular (non-pre-registered) OLS regression in the same sample, where standardized age at which one left full-time education is used to predict the outcomes, whilst correcting for the phenotypic covariates.

Mendelian randomization in sibships

Since one of the limitations of MR is its susceptibility to residual confounding stemming from dynastic effects, population stratification, and assortative mating, we additionally perform MR within sibships. We identify siblings in UKB and calculate each sibling's deviation from the sibship average using the same methodology as used for the sibling control design (see ‘Sibling control design’). Additionally, we use the PGSs calculated for the MR analyses (see ‘Mendelian randomization’) to calculate a PGS average within sibships: $\overline {PGS_{oj}} = \sum\nolimits_1^m {PGS_{ij}/m}$, and each sibling's deviation from the sibship average: $PGI_{\Delta ij} = PGI_{ij}-\overline {PGI_{0j}}$. We use these deviation estimates in instrumental variable regression (using the fixest package), where in the first stage we predict the sibling education deviation from the sibling PGS deviation. Next, the outcome and control outcomes were predicted from the first-stage fitted education values. Similar to the within-sibling analyses, we excluded sibships where only one sibling reported on EA, but we did not exclude sibships where not all siblings reported on one or more outcome measures. We report robust standard errors taking into account familial clustering, calculated using the coeftest function from the lmtest r-package (Hothorn et al., Reference Hothorn, Zeileis, Farebrother (pan.f), Cummins (pan.f), Millo and Mitchell2022). Both the phenotypic and genetic covariates were included.

Pre-registered interpretation of results

We define an unambiguous causal association as one where the policy shift, the sibling control design, and the Mendelian randomization analyses all imply a significant result in the same direction. The absence of significance across these methods would imply the absence of such a result. Due to the lower power associated with our within-sibship MR analyses, we are satisfied if the magnitude and direction of the Mendelian randomization within siblings is consistent with the other methods. With respect to statistical significance and multiple testing, we use two significance thresholds: (1) a suggestive threshold where we correct for the number of outcomes (15), so that α = 0.05/15 = 0.003, and (2) a conservative threshold where we correct for the number of outcomes (15) and analysis types (4), so that α = 0.05/60 = 0.0008. Inconsistencies across results will be interpreted along the potential biases and assumptions that accompany the different methods.

Results

Instrumental variable analysis leveraging the ROSLA

Table 3 depicts the results of the ROSLA instrumental variable analyses. Based on the 2SLS models, none of the outcomes are significantly predicted by age at which one left full-time education. This contrasts our comparative OLS analyses, which do not control for unmeasured confounders, where most associations were significant. The F-statistic of the 2SLS analyses ranged from 216.9 to 1205.9 depending on the outcome of interest, indicating that our instrument is unlikely to suffer from weak instrument bias. Since the standard errors are relatively large and the Wu–Hausman statistics, which test for the absence of endogeneity, were almost always non-significant at α = 0.05, it is suggested that the 2SLS and OLS models do not statistically differ. However, the methods do lead to different estimates, suggesting the OLS results are nonetheless subject to considerable bias. Examining these associations in a 2- or 10-year window around the reform did not change our conclusions (see online Supplementary Table S1).

Table 3. Results ROSLA instrumental variable analyses

Note. All continuous outcomes were standardized. Assessment center, sex, season of birth, and year of birth were included as covariates.

p-values indicated in bold are lower than the conservative p-value threshold of 0.0008.

^a H0 is the absence of endogeneity of the instrumented variables.

These findings contrast earlier findings by Davies et al. (Reference Davies, Dickson, Smith, Van Den Berg and Windmeijer2018). Using instrumental variable regression in UKB, they did observe an effect of remaining in school after age 15 on different cardiovascular outcomes and income. The main difference between the current study and the Davies et al. study is the method of correcting for year of birth, where they used a difference-in-difference approach instead of including this variable as a covariate. Therefore, we performed supplementary (non-preregistered) analyses where we, in a step-wise fashion, added season of birth and year of birth. The results are shown in online Supplementary Table S2 and Fig. 1. While adding year of birth as covariates might increase the chance that we are overcorrecting, it is evident from these results that the use of a policy experiment as an instrumental variable is very sensitive to the model specification: inclusion year of birth renders previously significant associations with happiness, familial, financial, and work satisfaction, cardiovascular problems, income, birthweight, and height non-significant.

Figure 1. Comparison of ROSLA results including and excluding year of birth (yob) as a covariate for (a) continuous outcome measures, (b) binary outcome measures, and (c) control measures.