INTRODUCTION
Since 2000, more than 1,100 studies have been published that examine the effects of democracy using cross-national data (see Gerring, Knutsen, and Berge Reference Gerring, Knutsen and Berge2022 for an overview). However, there have been no attempt to establish whether such analyses have sufficient statistical power to detect an effect of democracy. A lack of power can be problematic, as it implies a high probability of committing a false negative (Type II error). Even when estimates are statistically significant there is a risk of vastly overstating effect size (Type M error) and estimates may even have the wrong sign (Type S error) when studies are underpowered (Arel-Bundock et al. Reference Arel-Bundock, Briggs, Doucouliagos, Avina and Stanley2022; Gelman and Carlin Reference Gelman and Carlin2014). This article seeks to shed light on this issue by using simulation to examine variation in the estimates for the effect of democracy. It finds that, with currently available data, analyses are likely only powered to detect strong and non-dynamic effects of democracy.
A staggering amount of factors have been theorized to be affected by democracy. However, this article is primarily focused on economic development for several reasons: first, it is the outcome that has been examined most frequently by the literature (Colagrossi, Rossignoli, and Maggioni Reference Colagrossi, Rossignoli and Maggioni2020; Gerring, Knutsen, and Berge Reference Gerring, Knutsen and Berge2022, 367); second, there are good theoretical arguments for finding a substantial and positive impact of democracy (e.g., Baum and Lake Reference Baum and Lake2003; Gerring et al. Reference Gerring, Bond, Barndt and Moreno2005; Knutsen Reference Knutsen2012); third, data on GDP per capita are availability for more countries and for longer time spans than is the case with most other outcomes; and fourth, economic development vary more than most other outcomes studied by the literature, such as infant mortality or civil war.Footnote 1 Thus, if power is an issue for detecting a large effect of democracy on economic development, then it is likely to also present an issue for other outcomes.
Using the most extensive data available on democracy and GDP per capita and the standard two-way fixed effects (TWFE) estimator, I find that democracy must make countries around 16% richer or more for analysis to have sufficient power (80% power at $ \alpha =0.05 $ ). This represents a large effect when compared to both prior estimates in the literature (e.g., Acemoglu et al. Reference Acemoglu, Naidu, Restrepo and Robinson2019; Colagrossi, Rossignoli, and Maggioni Reference Colagrossi, Rossignoli and Maggioni2020; Knutsen and Wig Reference Knutsen and Wig2015) and to the distribution of estimates from a multiverse analysis of the relationship between democracy and economic development.Footnote 2 I document a similar pattern for an alternative outcome, civil war, where democracy must decrease the risk of onset by 80% or more when compared to the average probability of civil war onset for analysis to be sufficiently powered. Moreover, if data are missing for a few countries, the true effect size must be very large to attain sufficient statistical power. For example, for datasets containing 75 countries, democracy must cause countries to be around 24% richer for the analysis to be well-powered. The consequences for power are not as arduous if data are missing for earlier time periods, such as prior to WWII, as long as the outcome changes slowly over time. For outcomes that vary significantly from year-to-year, missing early time periods also reduces statistical power substantially even when effects are large.
If an effect of democracy exist, it is likely to be dynamic and growing over time (see, e.g., Acemoglu et al. Reference Acemoglu, Naidu, Restrepo and Robinson2019). A common approach to modeling this are event-studies that include dummies for the relative time prior to and after democratization in addition to country and year fixed effects. I show that this further exacerbate power issues. Even when the average effect of democracy on economic development is large (corresponding to an average effect of around 16%), studies are only powered to detect long-run effects.Footnote 3 Using most of the new staggered difference-in-difference estimators that take issues with the TWFE estimator into account further increases power requirements (Chiu et al. Reference Chiu, Lan, Liu and Xu2023; Egerod and Hollenbach Reference Egerod and Hollenbach2024). Thus, one should be cautious when interpreting analysis of dynamic effects. The lack of power also implies that it is not possible to detect actual deviations from parallel trends prior to treatment. In addition, my results indicate that analyses are, except in extreme cases, unlikely to be powered to detect interaction effects, as sufficient power is not reached even when the average effect is large (16%) and the difference between groups is huge (200%). Finally, I show that, conditional on researchers finding a significant result, estimates may be the wrong sign or several magnitudes too large if the true effect is small. This is problematic as the effects of democracy literature shows evidence of selection on significance (Gerring, Knutsen, and Berge Reference Gerring, Knutsen and Berge2022).
This article makes several contributions. First, its findings have implications for studies of the effects of democracy using cross-national data. Researchers within this field rarely, if ever, consider the power of the statistical tests they conduct. My results suggest that scholars relying on the standard TWFE estimator and cross-national regime data should think carefully about statistical power as it is likely to pose issues for their analysis unless the effect sizes they study are very large. This is true even if data are available for the whole population of countries across many years. This likely also applies to other institutional causes such as state capacity and party institutionalization (e.g., Andersen and Doucette Reference Andersen and Doucette2022; Bizarro et al. Reference Bizarro, Gerring, Knutsen, Hicken, Bernhard, Skaaning and Coppedge2018; Hegre, Bernhard, and Teorell Reference Hegre, Bernhard and Teorell2020). This echoes recent studies that find generally low statistical power in economics and political science more broadly (Arel-Bundock et al. Reference Arel-Bundock, Briggs, Doucouliagos, Avina and Stanley2022; Askarov et al. Reference Askarov, Doucouliagos, Stanley and Doucouliagos2024; Ioannidis, Stanley, and Doucouliagos Reference Ioannidis, Stanley and Doucouliagos2017).
Second, I show how different design decisions and features of the data impact statistical power when using cross-national data. For example, studying interactions or using a sample that only includes countries from one continent are unlikely to yield credible estimates of the effects of democracy. Given that democracy is a staggered and dynamic treatment, scholarship that seek to address related issues with the TWFE estimator (e.g., de Chaisemartin and d’Haultfoeuille Reference de Chaisemartin and d’Haultfoeuille2020; Goodman-Bacon Reference Goodman-Bacon2021; Sun and Abraham Reference Sun and Abraham2021), should note that power becomes even more of an issue when using appropriate staggered difference-in-difference estimators. The lack of power is especially pertinent when testing for parallel trends prior to democratization, as tests are unlikely to pick up anything but very large divergences (see also Egerod and Hollenbach Reference Egerod and Hollenbach2024; Roth Reference Roth2022).
Third, I illustrate how scholars can use simulation to examine what minimum effect size is required for a study of the effects of democracy to be informative (see also Black et al. Reference Black, Hollingsworth, Nunes and Simon2022; Egerod and Hollenbach Reference Egerod and Hollenbach2024 for inspiration in this regard). The inherent features of regime data, such as strong autocorrelation and clustered transitions, makes it hard to artificially create similar data. Simulating with real-world data might help researcher ascertain whether their proposed research design is actually powered to detect probable effects.
SIMULATION APPROACH
Statistical power depends on the level of statistical significance (usually set at $ \alpha =0.05 $ ), sample size, effect size, number of units treated, and variability. Researchers studying the effects of democracy can only manipulate the number of countries and years in the sample in so far as there is additional data for some time periods or regions that can be collected. However, in the main analysis, I assume that the most extensive sample available is used. This covers the period 1800–2015 for around 180 countries (based on democracy data from Boix, Miller, and Rosato Reference Boix, Miller and Rosato2013 Footnote 4 and logged GDP per capita data from Fariss et al. Reference Fariss2022a; Reference Fariss, Anders, Markowitz and Barnum2022b). The variability of the dependent and independent variables and the number of units treated are also mostly outside the control of the researcher.Footnote 5 Given that one uses the most extensive sample available, this begs the questions: what kind of effect sizes can reliably be detected and how does this compare to the minimum effect size of interest?
To answer these questions, I conduct a simulation-based power analysis that vary the treatment effect in small increments using real-world panel data on democracy and economic development. The country-time-series for the outcome and treatment are separated and randomly combined into new artificial countries. The aim of the simulation approach is to find the minimum effect size of democracy on economic development that is reliably detectable using standard datasets and empirical approaches in the literature. Using actual panel data ensures that the features of the data match the features one would normally encounter when estimating cross-national regressions. Trying to simulate this kind of data is likely to significantly overstate the level of statistical power, as simulated data are unlikely to match the degree of autocorrelation, clustering, and non-randomness present in actual cross-national datasets (see Black et al. Reference Black, Hollingsworth, Nunes and Simon2022; Egerod and Hollenbach Reference Egerod and Hollenbach2024, 20–1). This is especially the case when studying regimes as individual countries rarely experience more than one or two transitions to or away from democracy. Thus, this approach assesses the uncertainty inherent in the designs used by the effects of democracy literature.Footnote 6
Unfortunately, it is uncommon in the literature to report or discuss a minimum effect size of interest. Thus, I cannot readily compare the minimum effect size that can be detected with the kinds of effects scholars in the field would find theoretically and practically relevant. As an alternative, I do two things. First, I compare to reported effects in recent studies. Acemoglu et al. (Reference Acemoglu, Naidu, Restrepo and Robinson2019) find that democracies are about 15% richer on average;Footnote 7 Colagrossi, Rossignoli, and Maggioni (Reference Colagrossi, Rossignoli and Maggioni2020) report a slightly lower difference of approximately 12%; Knutsen and Wig (Reference Knutsen and Wig2015) find GDP per capita grow about 0.42% faster per year compared to autocracies. However, relying on reported estimates risks overstating the actual relationship between democracy and economic development if studies are underpowered and there is selection on significance in the literature (Gelman Reference Gelman2019). Second, I compare with the effects found in a multiverse analysis that vary the factors which commonly differs between prior studies of democracy and economic development.Footnote 8 Democracies are about 9% richer on average, while the interquartile range of estimates goes from 5% to 13%. Democracies grow 0.4% faster annually, while the interquartile range of estimate goes from 0.27% to 0.53%.Footnote 9
I assess the performance of statistical significance tests based on a panel of countries (i) observed in different years (t). I adopt standard power thresholds of 80% and 90% with a significance level of 0.05 ( $ \alpha =0.05) $ . An often used approach in the literature is a linear regression of $ Ln{(GDP/cap)}_{it} $ on $ Democrac{y}_{it} $ and country and year fixed effects ( $ {\gamma}_i $ , $ {\delta}_t $ ). This is also termed the TWFE estimator. The tests are based on standard errors that cluster on countries. I summarize the specification as
I also run specifications that include a lagged outcome ( $ Ln{(GDP/cap)}_{it-1} $ ), in essence having growth as the dependent variable instead. I vary the baseline effect of democracy $ \beta $ in increments of 0.01 to find the minimum effect size that corresponds to a power level of 80% and 90%. The number of countries in the sample is 180 (C), which corresponds to the observed number of countries with data on both the democracy and the GDP per capita variable in at least 1 year. $ {\epsilon}_{it} $ captures other time-variant factors that affect a country’s economic development. I evaluate the variability of $ \widehat{\beta} $ as follows.
I simulate the steps outlined below ten thousand times and save the $ \widehat{\beta} $ from each repetition:
-
1. Construct a panel dataset of countries observed from 1800 to 2015.
-
2. Assign logged GDP per capita-year series to each country based on data from Fariss et al. (Reference Fariss, Anders, Markowitz and Barnum2022b).Footnote 10
-
3. Randomly assign democracy-year series to each country based on data from Boix, Miller, and Rosato (Reference Boix, Miller and Rosato2013).Footnote 11
-
4. Multiply $ Ln{(GDP/cap)}_{it} $ by an increasing $ \beta $ in each simulation run in years where $ Democrac{y}_{it} $ is equal to 1.
-
5. Estimate $ Ln{(GDP/cap)}_{it}={\gamma}_i+{\delta}_t+\beta Democrac{y}_{it}+{\epsilon}_{it.} $
-
6. Save $ \widehat{\beta}. $
Steps 2 and 3 ensure that $ Ln{(GDP/cap)}_{it} $ and $ Democrac{y}_{it} $ are uncorrelated in expectation. Thus, without step 4, estimates of $ \beta $ should center around 0 if the TWFE estimator is unbiased in this case. The average $ \widehat{\beta} $ for the simulations where $ \beta $ is set to 0 is −0.001, indicating that the procedure does remove any correlation between democracy and logged GDP per capita.
FINDINGS
I now evaluate how the power requirements of this approach vary as a function of (i) effect size, (ii) the size of the treatment group, (iii) the number of years in the dataset, (iv) the presence of dynamic effects, and (v) the presence of an interaction effect. These represent common differences between studies of the effect of democracy, as (i) some outcomes are more loosely connected to democracy (e.g., Leipziger Reference Leipziger2024; Paglayan Reference Paglayan2021), (ii) and (iii) occasionally outcome data are only available for some countries or periods (e.g., Stasavage Reference Stasavage2005), (iv) the effect of democracy often materialize slowly over time (e.g., Acemoglu et al. Reference Acemoglu, Naidu, Restrepo and Robinson2019), and (v) many scholars are interested in how different factors interacts with the effect of democracy (e.g., Cox and Weingast Reference Cox and Weingast2018).
Varying Effect Size
Figure 1 displays the results for the baseline specifications. Using the most extensive sample available, there is insufficient power to detect an effect of democracy that is as strong as the ones reported in Acemoglu et al. (Reference Acemoglu, Naidu, Restrepo and Robinson2019), Colagrossi, Rossignoli, and Maggioni (Reference Colagrossi, Rossignoli and Maggioni2020), and Knutsen and Wig (Reference Knutsen and Wig2015). In fact, for studies to reach an 80% power level, the true effect of democracy must be above 0.15 or 0.005 when including a lagged dependent variable.
Consequently, studies are only powered to detect large differences between democracies and non-democracies. Given the natural limits on the number of units (countries) that can be included in cross-national analysis, scholarship is unlikely to have sufficient power unless one studies relationships where democracy has a strong effect and data have good coverage. In the Supplementary Material, I further show that this result is consistent across different choices available to researchers analyzing the effects of democracy. First, I find a similar pattern (see Figure A1 in the Supplementary Material) when using an interval-scaled measure of democracy using the v2x_polyarchy variable from V-Dem (Coppedge et al. Reference Coppedge, Gerring, Knutsen, Lindberg, Teorell, Alizada and Altman2022). In addition, I analyze an alternative outcome—civil war onset—that has received substantial attention in the democratization literature (see the “Democracy and Civil War” section). Event variables usually have much less variation, and a score of 1 on these variables is often rare. As a result, this scenario reflects less ideal conditions for finding an effect of democracy. Yet, these are conditions that are common in the effects of democracy literature (Gerring, Knutsen, and Berge Reference Gerring, Knutsen and Berge2022). I find that the lack of power is severe in this case. Next, I examine how these results change in cases where researchers do not have data for all countries.
Varying the Number of Countries in the Sample
Figure 2 shows the relationship between the number of countries in the dataset, effect size, and statistical power. The lack of power quickly becomes more pronounced when data are missing for some countries. When data are only available for 125 countries, democracy must cause countries to be around 18% richer on average to reach an 80% power level. If data are only available for 75 countries, democracy must cause countries to be approximately 24% richer. As a result, lacking data for a number of countries further reduce the number of relationships one can study as effect sizes must be substantially larger for analyses to be powered.
Varying the Number of Years Included
Missing data for a number of years often do not have equally dire consequences for statistical power when compared to missing countries, as standard errors are usually clustered on country and because missing countries can directly affect the size of the treatment group. However, as the upper graph in Figure 3 shows, a shorter time period can cause the size of the treatment group to shrink and reduce the number of countries to cluster on. Using the full sample, the treatment group includes 92 countries that transition to or away from democracy (i.e., changes treatment status). If data only include years after 1970, the treatment group almost shrinks to half as only 48 countries witness a regime change. This is because regimes tend to be sticky and change little year-to-year. In addition, as the lower graph shows, the effect one does find might reflect a very different geographic treatment group than the one found when using data for all countries.
Figure 4 displays the relationship between years included in the sample, effect size, and statistical power. As expected, the left graph, where growth is the outcome, shows that reducing the number of years included in the sample increases the effect size required for studies to be powered to detect an effect. However, somewhat surprisingly the relationship is actually reversed when the level of economic development is the outcome. Countervailing forces are at play here. On the one hand, reducing the number of years in the sample does decrease the number of treated units and the number of total units in the sample, which lowers power. On the other hand, when the level of development is the outcome and the time series for each country is short, the country fixed effects become very good at predicting the outcome (i.e., much less true when growth is the outcome), which lowers the standard error. Moreover, the reduction in the number of clusters (i.e., countries) is quite small. This increase in predictive power might offset the loss of (treated) units if the outcome only changes slowly over time. Nevertheless, in many cases, a shorter time period means less power, and it always implies that we are primarily studying the effects of democracy in specific parts of the world.
Dynamic Effects and Statistical Power
If an effect of democracy on economic development exist, it is likely to be dynamic and growing over time (see Acemoglu et al. Reference Acemoglu, Naidu, Restrepo and Robinson2019).Footnote 12 A common approach to modeling this are event-studies that include dummies for the relative years prior to and after democratization in addition to country and year fixed effects (often excluding a dummy for the year just before democracy is introduced). How does this alter the power requirements? To evaluate this, I use the estimates from Acemoglu et al. (Reference Acemoglu, Naidu, Restrepo and Robinson2019), which indicate that GDP per capita grows after democratization in comparison with autocracies until about 20 years after democratization. At this point, democracies remain about 13%/16%/21% richer than autocracies (based on the multiverse analysis/Colagrossi, Rossignoli, and Maggioni Reference Colagrossi, Rossignoli and Maggioni2020/and the minimum detectable effect [MDE]). To simulate this, I assume that $ \beta $ in the population grows by 0.0065/0.008/0.0105 each year after democratization and plateaus at 0.13/0.16/0.21 after 20 years. Figure 5 displays the distribution of estimates for the over time effect of democratization. It indicates that studies are generally underpowered to detect the dynamic effect of democracy. However, the power level does reach the standard threshold around 10 years after democratization if the true average effect of democracy is as large as the minimum detectable effect (16%). Thus, it may be possible to recover large long-run effects. There are, however, two reasons to be cautious when studying dynamic effects. First, using most of the new staggered difference-in-difference estimators further increases power requirements (Chiu et al. Reference Chiu, Lan, Liu and Xu2023; Egerod and Hollenbach Reference Egerod and Hollenbach2024). Second, if one lacks the power to detect short-term effects following democratization, it is also likely that one lacks the power to detect deviations from parallel trends prior to treatment (see also Roth Reference Roth2022). Low power thus increases the risk that one misses the presence of nonparallel trends prior to democratization.
Interactions and Statistical Power
The effect of democracy may differ across groups, and as such, we might be interested in estimating this. Indeed, according to Colagrossi, Rossignoli, and Maggioni (Reference Colagrossi, Rossignoli and Maggioni2020), 38% of studies on democracy and economic development published since 2010 examine an interaction between democracy and another factor. However, as noted by, Gelman (Reference Gelman2018), interactions increase the sample size required to detect an effect substantially.
To ascertain what implications interactions have for power when using cross-national data, I randomly assign countries into two groups and vary the size of the effect of democracy within each group according to three scenarios based on the size of the difference in effect size between groups.Footnote 13 To capture this, I include an interaction term between the democracy indicator and the group indicator in the baseline TWFE model. Figure 6 plots the share of interaction terms that are significant as a function of interaction effect size and baseline effect size. Even when the true baseline effect is very strong (MDE $ =\beta =0.16 $ ) and the true interaction effect is very large (a 200% difference between groups), studies are not powered to detect an interaction effect. Thus, it is very unlikely that studies of democracy and economic development have sufficient power to study interactions.
Implications of Low Power
Besides increasing the risk of committing a false negative (Type II error), low power other and graver consequences for research. It increases the share of estimates with the wrong sign (Type S-Error) and it exaggerates the magnitude of the effect size (Type M-Error) (Gelman and Carlin Reference Gelman and Carlin2014).
This is particularly problematic when power is low and there is selection on significance (Gelman Reference Gelman2019). Figure 7 illustrates this for studies of the effects of democracy. The left graph shows the share of significant results with the wrong sign (Type S-Error) at different true effects conditional on finding a significant result. When the true effect of democracy is relatively small (i.e., if democracies cause countries to be around 2.5% or less richer) and a significant effect is recovered, it is fairly likely that the result is in the opposite direction of the true effect. The right graph shows how much the effect is exaggerated, on average, compared to the true effect conditional on finding a statistically significant effect. When the true effect is small (0.05), significant results are exaggerated by 250% on average. This does drop when the true effect is larger, but even when the true effect is fairly strong (0.12), significant results are exaggerated by around 22% on average.
DISCUSSION
Taken together, these results suggest that analyses are only powered to detect strong effects of democracy. Thus, in a best case scenario, the absence of an effect of democracy for an outcome cannot be considered definitive proof that democracy has no effect on that outcome. Given the variability of the estimates and their sensitivity to the number of countries included and effect size, it is prudent to be cautious when interpreting significant effects as this might reflect noise given the low-powered nature of cross-national studies. In the worse case scenarios, statistically significant effects may be highly exaggerated or even in the wrong direction. Caution is further warranted as the effect of democracy is likely to be dynamic in many cases, which exacerbates power issues as appropriate estimators require additional statistical power.
What can be done about this? First, a similar simulation exercise using real data and planned research designs should be undertaken before starting a study, which would reveal whether a planned analysis is likely to be informative or not. Here one could, for instance, consider whether additional statistical power can be gained by altering the design to include between country variation. However, researchers should be cautious and recognize that this likely trades bias from time-invariant confounders for power (this being a specific instance of the bias-variance trade-off). Second, one should be careful when interpreting the magnitude of effects found in cross-national analyses of the effects of democracy and recognize the uncertainty inherent in such estimates. In addition, one might supplement the analysis of the effects of democracy on an outcome by identifying additional implications of the theory that can also be tested. If the pattern is similar across outcomes, it raises confidence in the results. Moreover, in a small subset of cases there is data available on subnational variation in democratization (or at least on the theoretically relevant component of democracy) which can supplement the cross-national analysis (see, for instance, Grumbach Reference Grumbach2023; Lankina and Getachew Reference Lankina and Getachew2012 for data examples).
SUPPLEMENTARY MATERIALS
To view supplementary material for this article, please visit https://doi.org/10.1017/S0003055424001278.
DATA AVAILABILITY STATEMENT
Research documentation and data that support the findings of this study are openly available in the American Political Science Review Dataverse at https://doi.org/10.7910/DVN/CLGI63.
ACKNOWLEDGMENTS
I thank Jørgen Møller, Kaja Bakke, Martin Bisgaard, and Adea Garfui, as well as other participants at the Quality of Government internal conference, the Danish Political Science Association’s annual meeting, and CCWS at the Department of Politics and Society (Aalborg University), for their helpful comments. I would also like to thank the editors and reviewers for their thoughtful and insightful comments.
CONFLICT OF INTEREST
The author declares no ethical issues or conflicts of interest in this research.
ETHICAL STANDARDS
The author affirms this research did not involve human participants.
Comments
No Comments have been published for this article.