Donald Trump’s victory in 2016 highlighted once again that presidential elections in the United States are won not by universal suffrage but by the Electoral College system. This system can lead to important distortions (see Erikson, Sigman, and Yao Reference Erikson, Sigman and Yao2020) as was the case in 2000 when George W. Bush won the election even though his Democratic opponent Al Gore had won the popular vote by roughly a half-million ballots. But it was of course Hillary Clinton’s defeat in 2016 that brought out the nature of these distortions even more vividly. On that occasion, the Democratic candidate secured three million more votes than her Republican opponent, but nonetheless had to relinquish the presidency due to her defeats by narrow margins in three states—namely, Michigan, Pennsylvania, and Wisconsin. These results strongly suggest that an adequate forecasting model for US presidential elections should ideally be able to predict election results for each state rather than the nation as a whole.
In this short article, we present a forecasting model that draws on data from the 50 states and the District of Columbia since the 1980 presidential election.Footnote 1 This state-by-state political economy model produces vote-share forecasts for the two major party candidates in every state that can then be used to make a projection of the Electoral College result. This model also allows us to assess the influence of the incumbent president being absent of the presidential ticket, a feature that has unexpectedly gained considerable significance for the 2024 election.
THE STATE-BY-STATE POLITICAL ECONOMY MODEL
The state-by-state political economy model follows in the tradition of disaggregated data models initiated in the United States by Rosenstone (Reference Rosenstone1983) and in France by Jérôme, Lafay, and Lewis-Beck (Reference Jérôme, Jean-Dominique and Lewis-Beck1993). This approach, which seeks to predict election results at the level of significant regions within a given country (states, provinces, departments, etc.) rather than nationally, has been used in the United States by a significant number of researchers including Berry and Bickers (Reference Berry and Bickers2012), Campbell (Reference Campbell1992), Campbell, Ali, and Jalalzai (Reference Campbell, Ali, Jalalzai, Steger, Kelly and Wrighton2006), and Klarner (Reference Klarner2012). Our model follows a similar strategy. As Foucault and Nadeau point out (Reference Foucault and Nadeau2012), the use of regionalized data has two major advantages. First, it considerably increases the number of available observations (a great number of models are limited to postwar elections, which considerably reduces statistical power). For instance, our model relies on 561 state-level outcomes; in comparison, even if one could assemble all the presidential elections that have taken place since the American War of Independence, only 59 cases (excluding 2024) would be available for estimating a national-level model. Second, in a political system such as that of the United States, where the final outcome depends on the number of electors obtained in each state rather than the overall popular vote, localized forecasts are a precious asset. Furthermore, if regional conditions are thought to be more relevant to the voter, then disaggregated models should provide better results. For instance, Park and Reeves (Reference Park and Reeves2020, 460) found that “[i]ncreases in local unemployment [in the United States] decreased the probability of voting for the incumbent party by way of influencing perceptions of the national economy.”
if regional conditions are thought to be more relevant to the voter, then disaggregated models should provide better results.
We focus on the geographical dimension of voting in US presidential elections in the 50 states (plus the District of Columbia), and our main explanatory factors are measured either at the state level (i.e., the unemployment rate and the presidential approval rate) or “regionalized” (i.e., Democratic or Republican regional strongholds). This model performed well in the 2020 US presidential election: it predicted that Joe Biden would win narrowly against Donald Trump with 308 electoral votes. In the end, Biden won 306 electoral votes. Our model correctly anticipated the outcome in 47 states as well as the District of Columbia. Although Arizona and Georgia were wrongly attributed to Trump, we predicted a razor-thin victory for Biden in Florida. Our state-by-state forecasts for the 2020 election can be found in figure 1.

Figure 1 US Presidential Election Forecasts and Results, 2020
Candidates’ portraits are public domain files.
The primary variable on which this prediction was based was the unfavorable economic situation caused by the COVID-19 pandemic in 2020 for Donald Trump: although unemployment decreased in the months leading up to the election, it peaked at almost 15% in April 2020. On average, the unemployment rate increased by 7.6 percentage points between the last quarter of 2016 and the second quarter of 2020 across all states. Therefore, the state of the economy seems to have played a major role in Biden’s victory (Jérôme et al. Reference Jérôme, Jérôme-Speziari, Mongrain and Nadeau2021). Economic performance is an essential part of many forecasting models. As Guntermann, Lenz, and Myers (Reference Guntermann, Lenz and Myers2021, 838) point out, evidence for retrospective voting can be found “all the way back to George Washington.” Biden’s journey to the White House was also facilitated by Trump’s relatively low approval rating in many states (i.e., 42% on average six months before the vote).
The state-by-state political economy model acknowledges that American presidential elections are played out in states, some of which are virtually locked, whereas others are both unstable and sometimes decisive. Since 2000, our model has correctly predicted the result of presidential elections on four occasions (see Jérôme and Jérôme-Speziari Reference Jérôme and Jérôme-Speziari2012). In 2000 and 2016, the model failed to predict Al Gore’s and Hillary Clinton’s defeat in the Electoral College (see Jérôme and Jérôme-Speziari Reference Jérôme and Jérôme-Speziari2016). This event led to a few adjustments, allowing us to predict Trump’s Electoral College victory in 2016 after the fact (see Jérôme et al. Reference Jérôme, Jérôme-Speziari, Mongrain and Nadeau2021). Among other innovations, we included the popularity of the incumbent by state, taking into account whether the sitting president was running for a second term, thus making the model fully “regionalized” in its structure. Our 2020 model also featured the construction of an index measuring the partisan composition of the legislature in each state (i.e., Democratic, Republican or split), making it possible to assess the electoral bonus of strong local support for the major candidates.
The state-by-state political economy model acknowledges that American presidential elections are played out in states, some of which are virtually locked, whereas others are both unstable and sometimes decisive.
Although our disaggregated model produced a very accurate forecast for the 2020 election, the average out-of-sample error of that model was still relatively high at 4.62 points. Furthermore, in the absence of vote intention data for independent and third-party candidates across all states since 1980, we were compelled to use actual vote shares for these candidates in all elections prior to the most recent ones. This was a major limitation of the model as before-the-fact forecasts should be based only on information available before each election. Therefore, we revised the specification of our model notably by withdrawing the independent and third-party candidates variable and replacing it with three binary indicators to account for the influence of John Anderson’s electoral performance in 1980 and of Ross Perot in 1992 and 1996 (thus, before-the-fact forecasts from 2000 onward are based solely on information available before the election).Footnote 2 In an effort to boost accuracy, we also added the (two-party) vote share received by the incumbent party in the previous presidential election as well as the (two-party) vote share received by the incumbent party during midterm elections in each state (i.e., US House, US Senate, gubernatorial, and state legislative—lower and upper houses—races). Furthermore, whereas we measured the effect of presidential popularity using two variables in previous iterations of the model (i.e., one when the incumbent was seeking reelection and one when an in-party nonincumbent was running), these variables were replaced by an interaction term between the president’s job approval rating and a binary variable indicating whether the incumbent is running for a second term. The other explanatory variables are the change in unemployment over the incumbent’s term in office (unemployment figures are widely reported in the news media, and job security is likely a salient issue for voters; see Anderson Reference Anderson, Anderson and Stephenson2010), an index of Democratic or Republican dominance including old and more recent partisan strongholds, the challenger’s primary score, two binary variables for states in which Democrats and Republicans have enjoyed above-average success over multiple elections, and two binary variables for the District of Columbia when either Democrats or Republicans control the White House to denote the overwhelming advantage of the Democratic Party in DC.Footnote 3
The changes mentioned above are reflected in panel (a) of table 1 (Extended Model). Because some of the variables that are included in the original model lose statistical significance under the new specification (in great part due to the inclusion of the incumbent’s electoral score in the previous presidential election), we also present a simplified version of the revised model based on only four elements: (1) the vote received by the incumbent in each state during the previous presidential election, (2) the president’s job approval rating (and its interaction with the status of the incumbent party candidate), (3) the partisan pattern indexes, and (4) the challenger’s vote in the primaries. This model can be found in panel (b) of table 1 (Simplified Model).Footnote 4 Note that the new versions of the model now include state fixed effects to control for all potential time invariant omitted variables.Footnote 5
Table 1 State-by-State Political Economy Models: Pooled Time Series, 50 States and DC (1980–2020)

Note: Dependent variable: Incumbent party candidate’s two-party vote share. Robust state-level clustered standard errors in parentheses.
+ p < 0.1; * p < 0.05; ** p < 0.01; *** p < 0.001; (two-tailed).
Looking first at the extended model, the effects of the usual determinants of the presidential vote are clearly visible. For instance, a one-point increase in the unemployment rate of a given state in the second quarter of the election year relative to the last quarter of the previous election year leads to a loss of 0.23 percentage points in that state’s incumbent two-party vote share. Although Republicans might suffer more than Democrats from higher levels of joblessness because of issue ownership considerations, Park and Reeves (Reference Park and Reeves2020, 460) concluded that “[v]oters translate rising local unemployment into sanctions against the incumbent regardless of party.” The effect of presidential popularity is also significant, although the interaction term between the president’s job approval and whether the president is seeking reelection only reaches statistical significance at the 0.10 level. This interaction is graphically represented in figure 2, which shows linear predictions of two-party vote shares at different levels of job approval when an incumbent is seeking reelection and when an in-party nonincumbent is running (other variables were held at their mean value or reference category). An approval rating of 50% in a state would lead to an average gain of 6.56 points in that same state when the president stands for reelection and of 3.50 points in an open-seat election. This difference can be decisive in key states. As mentioned by Campbell (Reference Campbell2004, 302), “the presidential incumbency advantage goes largely, if not exclusively, to first party-term incumbents […]. Barring abject failure, first party-term incumbents are virtually assured of a second term. […] The first party-term advantage appears to be so strong that all of the models should determine whether they adequately take it into account in some way.” In-party nonincumbents (including vice presidents) seem to receive only partial blame or credit for the past performance of their party. This is attested by our model.

Figure 2 Job Approval Rating × Incumbency Status Interaction
This variable has seized considerable, and somewhat unexpected, importance for this year’s election: according to our model, the Democratic Party is penalized by Joe Biden’s decision to withdraw from the race. However, this does not account for concerns about Biden’s age and fitness to serve as president and beat Donald Trump, which ultimately led him to end his reelection bid and endorse his vice president, Kamala Harris, for the Democratic nomination. Although Biden’s decision to exit the race less than four months before the election is unprecedented, it has already drawn comparison with the experiences of Democratic presidents Harry S. Truman and Lyndon B. Johnson, who abandoned the idea of securing a second full term because of waning public approval in 1952 and 1968, respectively. In both cases, Republicans took back the White House. Admittedly, we cannot pretend that our model is able to precisely measure the effect of Biden’s withdrawal so late in the campaign following intense media scrutiny about his age and internal pressures to drop out. However, past elections tell us that incumbent presidents generally benefit from a greater electoral advantage than do candidates seeking the office for the first time. It is unclear whether this will apply to Kamala Harris, who, at the time of writing, is slightly ahead of Trump in many state-level and national polls, although often still well inside the margin of error. Nonetheless, according to our model, Biden’s relatively low popularity should be doubly disadvantageous for Harris. Additionally, one could argue that the relatively good performance of the US economy in the past four years is unlikely to benefit Kamala Harris as much as it would have Joe Biden; Nadeau and Lewis-Beck (Reference Nadeau and Lewis-Beck2001) found that macroeconomic conditions were not as strongly related to the level of support received by nonincumbent presidential candidates.
Partisanship also matters as “old” and “new” strongholds of the incumbent’s political color offer bonuses of 2.58 and 2.01 points, respectively. Furthermore, the revised model supports once again the idea that a strong performance of the challenger in their party’s primaries can hurt the incumbent in the general election: for example, all else being equal, a score of 50% for the opposition nominee in a given state’s primaries produces a loss of 1.70 percentage points for the incumbent. As mentioned by Norpoth (Reference Norpoth2004, 740), “primary support is not just a proxy or a trial heat, but a real-life test of the candidates’ electoral performance.” Furthermore, some studies have shown that divisive primaries for various elective offices in the United States can lead to a reduced likelihood of winning the general election (see, e.g., Gurian et al. Reference Gurian, Burroughs, Atkeson, Cann and Haynes2016; Harbridge-Yong and Hutchinson Reference Harbridge-Yong and Hutchinson2024).
Unsurprisingly, the lagged presidential vote constitutes a strong predictor of the incumbent performance. For each percentage point won in the last election, the candidate of the incumbent party can expect to “get back” 0.72 points. Note that both midterm election results and state legislative control are insignificant. The independent candidacies of John Anderson in 1980 and Ross Perot in 1992 seemed to have considerably harmed Jimmy Carter’s and George H. W. Bush’s reelection bids, whereas Perot’s second electoral appearance in 1996 does not appear to have hampered (nor facilitated) Bill Clinton’s victory. The two variables for long-term above-average electoral performance are insignificant, whereas the Democratic advantage in DC only appears to matter when the incumbent is a Democrat.
The simplified model shows that past presidential vote, incumbency status, partisan patterns, and the primary performance of the challenger can bring us a long way in predicting state-level outcomes. The improved accuracy of the revised models is visible from jackknife out-of-sample forecasts.Footnote 6 The previous specification of the model (Jérôme et al. Reference Jérôme, Jérôme-Speziari, Mongrain and Nadeau2021) correctly identified the state winner in 83.4% of cases between 1980 and 2020 (i.e., 468/561), but the extended model does slightly better at 85.4% (479/561). The simplified model proves even more accurate, with 491 states attributed to the right candidate across all 11 elections (87.5%).Footnote 7 The original model yielded an out-of-sample mean absolute error of 4.62 points for state-level vote shares. The extended and simplified models have much lower out-of-sample mean absolute errors of 3.58 and 3.47 points, respectively. Out-of-sample forecasts of incumbent vote shares for previous elections (1980–2020) in each state obtained using the extended model are shown in figure 3.

Figure 3 Mean Absolute Error by State from Out-of-Sample Forecasts (Extended Model), 1980–2020
To see mean absolute error by state from out-of-sample forecasts over time, see section E of the appendix.
In terms of before-the-fact forecasts, the most important metric to assess the quality of a predictive model, both the extended and simplified models outperform our initial equation. The before-the-fact mean absolute errors for state-level vote shares between 2000 and 2020 are 3.40 and 3.10 for the extended and simplified models, respectively, versus 4.76 for the original model. Consequently, Electoral College forecasts (which are presented in figure 4; see also sections F and G of the appendix) are also considerably more accurate under the revised specifications: the mean absolute errors are 22.00 and 18.67 electoral votes for the extended and simplified models, respectively, versus 61.83 for the original model. On the 306 state-level presidential races held since 2000, 32 are incorrectly predicted by the extended model, 35 by the simplified model, and 44 by the original model. Note that our 2020 forecast remains unchanged under the revised specifications—that is 230 electoral votes for Trump.

Figure 4 Before-the-Fact Forecasts and Results: Electoral College, 2000–2020
Solid bars show before-the-fact forecasts. Semitransparent bars show actual results. Candidates’ portraits are public domain files.
The results of our model offer interesting explanations for predicting the outcome of presidential elections at the state level, the key, we believe, to a well-founded prediction of the national outcome. The explanation of local results using variables that are measured at the same level of disaggregation, notably economic performance and presidential popularity, and partisan trends that anchor a state permanently in the Democratic or Republican camp has on the whole produced quality electoral forecasts over the last few decades. Obviously, one must note the unusual circumstances in which the US presidential campaign is playing out. It is difficult to measure how Joe Biden’s historic decision to drop out of the 2024 race amidst questions regarding his health and ability to serve as president will affect the election. Additionally, the assassination attempt on Donald Trump on July 13, two days before the Republican National Convention, could not only embolden his supporters but also strengthen his appeal among moderate Republicans and undecided voters in a rallying effect. As any forecasting effort, our model cannot really take into account the influence of such unpredictable factors—a president’s hasty departure or a candidate being targeted by a would-be assassin. Considering the exceptional nature of the 2024 campaign, we will present our forecasts for an open-seat election (with Kamala Harris as the Democratic nominee) and compare it with the now-hypothetical Biden–Trump match-up.
THE FORECAST(S)
Using the two models presented in table 1, we plugged in current numbers to estimate the incumbent party candidate’s share of the two-party vote in each state. Based on these simulations, we can make a prediction about who will win the electoral votes in each state. According to the extended model, Kamala Harris would lose the election with 197 electoral votes against 341 for Trump. Had Joe Biden remained in the race, the extended model would have predicted a completely different outcome, with 322 electoral votes for Biden and 216 electoral votes for Trump (although, once again, this forecast does not fully register the potential influence of Biden’s mental acuity concerns and of the failed assassination attempt on Trump). The simplified model is even less encouraging for Harris, with only 184 electoral votes.Footnote 8 According to this model, the 13 electoral votes from Virginia would go to Trump. However, both the extended and simplified models predict an extremely close contest in Virginia: according to the extended model, Harris would win this state with 50.09% of the two-party vote, whereas Trump would win Virginia with 50.01% of the vote according to the simplified model. Interestingly, there is a clear divergence between the extended model and the simplified model for the 2024 election: whereas the former would predict a Democratic victory in the presence of the incumbent president, the latter would only give a small boost to the incumbent with 206 electoral votes. Overall, according to our models, Trump would be the first president to seek and win a nonconsecutive term in office since Democrat Grover Cleveland in the late 1800s. Figure 5 shows the predicted vote share for the Democratic (Harris) and Republican (Trump) candidates using the extended model—panels (a) and (b)—as well as the predicted global outcomes (i.e., who will carry the state)—panel (c)—in each state for the upcoming 2024 presidential election. The “open seat” Electoral College forecast can be compared with the forecast for a Biden–Trump confrontation—panel (d).

Figure 5 Two-Party Vote Share and Electoral College Vote Forecasts by State (Extended Model), 2024
Figure 6 displays the likelihood of states being won by the Democratic or the Republican candidate (see also section C of the appendix). Colored gradients are used to show higher probabilities for one of the candidates. Following the approach of the Polymarket (see 270toWin 2024), states were classified as “tilt” (a less than 60% chance of winning), “leaning” (between 60% and 80% exclusively), “likely” (between 80% and 90% exclusively), or “safe” (90% and over) for either one of the two parties. As can be seen, Trump starts with a strong basis of 252 electoral votes in safe Republican states, whereas safe Democratic states would provide Harris with only 141 electoral votes. Thus, it seems like a Harris’s presidency would be the result of a rather improbable come-from-behind victory. Even by winning toss-ups and Republican-leaning states, she would still fall short of the 270 required electoral votes.

Figure 6 Likelihood of Winning for the Democratic and Republican Candidates (Extended Model), 2024
Created in part using MapChart (https://www.mapchart.net/usa.html).
CONCLUSION
Our results show that it is clearly possible for Donald Trump to achieve what seemed impossible just a few months ago: to become President of the United States once again. Although politicians, pundits and ordinary citizens are still trying to figure out if Harris stands a much greater chance of beating Donald Trump than Joe Biden, our model casts doubt on the merits of the strategy adopted by the Democratic Party—to get rid of Biden. At the same time, our forecast for what was supposed to be a Biden–Trump rematch could not factor in the concerns about Biden’s mental fitness following a poor debate performance and highly mediatized fumbles, the Trump assassination attempt, or more recently, Robert F. Kennedy Jr.’s decision to suspend his independent campaign and endorse Trump.
Whatever happens, the theory of government accountability suggests that Kamala Harris should be held responsible for the record of the Biden administration, which is reflected in low support levels through national and local job approval ratings in June 2024. Consequently, the Democratic candidate appears to be starting from a long way off. Since her elevation to the top of the ticket, Harris has nonetheless gained ground over Trump in multiple polls. Our forecast is somewhat at odds with a number of poll aggregators and models that predict a much tighter race. The only way to judge the accuracy of our model for the upcoming election is obviously to wait for November 5th.
In an important article, Gelman and King (Reference Gelman and King1993) questioned the contrast between the volatility of voting intentions during campaigns and the relative predictability of election results (see also Nadeau et al. Reference Nadeau, Dassonneville, Lewis-Beck and Mongrain2020). The events of the current presidential campaign certainly have the potential to have a marked influence, in the short term at least, on the voting intentions of American voters. In Gelman and King’s view, these fluctuations should be transitory, and the results next November should reflect the informed choices of Americans based on structural variables. Therefore, we believe that the 2024 US presidential elections will provide a stringent test of this hypothesis. If one thing is certain, it is that the 2024 presidential campaign should be seen as an humbling experience for election forecasters.
Going forward, we can already suggest a few avenues for improvement. One potential way to further refine the model would be to fully integrate the favorability ratings of in-party nonincumbent candidates (see Campbell and Dettrey Reference Campbell and Dettrey2009, 307)—although, the debate on the respective influence of retrospective and prospective determinants of voting is admittedly far from settled. The implementation of a seemingly unrelated regression model might also help in getting more precise estimates by providing vote-share forecasts for both major parties but also for independent and third-party candidates (see, e.g., Mongrain Reference Mongrain2019). As Timm (Reference Timm2002, 316) puts it, “[t]he advantage of the SUR [seemingly unrelated regressions] model is that it permits one to relate different independent variables to each dependent variable using the correlations among the errors in different equations to improve upon the estimators.” That being said, our revised models perform well for the six elections held between 2000 and 2020. We will soon know if 2024 will be any different.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit http://doi.org/10.1017/S104909652400088X.
ACKNOWLEDGMENTS
We wish to thank the editors and the three anonymous reviewers for their excellent comments and suggestions.
DATA AVAILABILITY STATEMENT
Data are available at the PS: Political Science and Politics Harvard Dataverse at https://doi.org/10.7910/DVN/A9UC0H.
CONFLICTS OF INTEREST
The authors declare no ethical issues or conflicts of interest in this research.