Subject to change: quantifying transformation in armed conflict actors at scale using text

Margaret J. Foster

doi:10.1017/psrm.2024.26

Subject to change: quantifying transformation in armed conflict actors at scale using text

Published online by Cambridge University Press: 26 September 2024

Margaret J. Foster

Show author details

Margaret J. Foster*: Affiliation:
Department of Political Science, Duke University, Durham, NC, USA
*: Email: margaret.foster@duke.edu

Article contents

Abstract
Quantifying change points at scale
Application and replication
Conclusion
Competing interests
Footnotes
References

Rights & Permissions

Abstract

An extensive theoretical and practitioner literature addresses the drivers and consequences of transformation of violent rebel actors during conflicts. However, measurement challenges constrain large-N empirical study of the effects and consequences of such transformations. This Research Note introduces a strategy to identify periods of transformation and change in the operation of non-state armed militant groups via computational text analysis of trends in reporting on activities. It presents the measurement approach and demonstrates scalability to a corpus of more than 200 militant groups operating from 1989 to 2020. The study concludes by extending a recent analysis of the impacts of uncertainty on conflict termination. An online Appendix demonstrates the advantages and drawbacks of the measurement through a series of case studies.

Keywords

civil wars text analysis topic modeling

Type: Research Note
Information: Political Science Research and Methods , First View , pp. 1 - 7

DOI: https://doi.org/10.1017/psrm.2024.26 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press on behalf of EPS Academic Ltd

Transformation and uncertainty are central to a wide range of scholarship on the operation of armed groups and the outcomes of civil wars, yet capturing dynamism and evolution in violent non-state actors remains difficult. Modeling change is an important link in quantitatively testing theoretical expectations about conflict. Many theorized effects of opaque-but-important internal shifts during conflict remain understudied because the literature lacks a way to measure and identify periods of uncertainty across time and space (Nilsson and Svensson, Reference Nilsson and Svensson2021).

Measuring changes within substate armed groups is challenging for several reasons. First, their internal processes are typically opaque, if not actively hidden. Second, data availability is inconsistent: some conflicts produce dense information while others remain hard to access. Likewise, the data availability of previous information eras limits the scope of historical work. Third, event data provide what is recorded to have happened and to whom it was attributed, but rarely the processes, negotiations, compromises, and opportunities that lead to tactical and strategic decisions. Fourth, the amorphous nature of organizational transformations makes it challenging to identify change points according to a consistent metric.

This Research Note contributes to filling the gap between theory and measurement tools by proposing a strategy that can identify change points rapidly and at scale by computationally modeling news reports. A text-as-data strategy has several advantages. First, it allows the results to vary by location and context. Second, researchers can customize transition thresholds and temporal granularity while also retaining transparency. Third, because the approach models texts sourced from a widely used database of conflict events, potential transition periods can be cross-checked with vetted documentation. Finally, it can be implemented at scale.

This Note proceeds in three parts. First, I describe the measurement strategy. Second, I produce yearly estimates and a summary change indicator for 258 unique rebel groups in armed conflicts. Third, I use the output of the measurement to extend analysis by Nilsson and Svensson (Reference Nilsson and Svensson2021) on the effects of uncertainty on the length of civil conflicts. An accompanying Appendix positions the method within existing approaches, discusses limitations, details text preprocessing, presents additional robustness checks, explores face validity through case studies, and further extends the replication.

1. Quantifying change points at scale

I produce a high-level encapsulation of the behavior associated with reported activities of rebel groups in a given year by using a computational text model to assign news articles to one of two group-specific topics. I then aggregate these into a yearly summary of reported activity patterns. In the sections that follow, use the term “frame” to describe the high-level summary produced by the content of, and proportion of articles ascribed to, these topics. Concrete examples of the output can be seen in the Proof of Concept section of the Appendix. The approach conceptualizes “change” via revealed preferences, which capture actions and thus incorporates strategic decisions as well opportunism, disobedience, shirking, or misjudgment (Beshears et al., Reference Beshears, Choi, Laibson and Madrian2008).Footnote ¹

To identify themes, I use structural topic model (STM), an unsupervised topic model. STM is well suited to modeling dynamic changes because it can incorporate document-level covariates (such as time) and has demonstrated validity for short texts on a narrow range of issues (Roberts et al., Reference Roberts, Stewart, Tingley, Lucas, Leder-Luis, Gadarian, Albertson and Rand2014, Reference Roberts, Stewart and Airoldi2016). Although rarely applied in the conflict literature, STM has been used by scholars of comparative politics to identify and quantify trends in opaque processes such as corruption (Pan and Chen, Reference Pan and Chen2018); individual preferences in consensus-based institutions (Baerg and Lowe, Reference Baerg and Lowe2020), and institutional support for human rights abuses (Bagozzi and Berliner, Reference Bagozzi and Berliner2018; Magaloni and Rodriguez, Reference Magaloni and Rodriguez2020).

A two-topic model, such as the one used here, is essentially a unidimensional text scaling model with the discovered topics serving as scale anchor points. Although scholars have, with reason, critiqued unidimensional scaling models as an oversimplification of complex dynamics (e.g., Klar, Reference Klar2014), such scales are widely used to produce high-level summaries of political actors and estimates of how their goals change over time (e.g., Slapin and Proksch, Reference Slapin and Proksch2008; Grimmer, Reference Grimmer2010; Lauderdale and Clark, Reference Lauderdale and Clark2014; Becker and Malesky, Reference Becker and Malesky2017; Egerod and Klemmensen, Reference Egerod and Klemmensen2020). The parsimony of a two-topic model allows the algorithm to efficiently highlight change points without the need for researchers to tune more than 200 separate models, as they would if they customized the number of topics to each group.

1.1. Data and output

I use the source articles collected in the Uppsala Conflict Data Program (UCDP) disaggregated Georeferenced Event Dataset (GED) version 21.1 (Pettersson et al., Reference Pettersson, Davies, Deniz, Engström, Hawach, Högbladh and Öberg2021). This allows the project to benefit from their systematic selection, vetting, and deduplication processes (see Sundberg and Melander, Reference Sundberg and Melander2013). I modeled articles rather than features of events based on the rationale that the texts embed both activities and context. For example, a group may be described as rebels (context) who engage in clashes with government forces (activity) or as local militants (context) who kill civilians and regional policemen (activity).

I subset the GED to conflicts classified as intrastate, extrasystemic, or internationalized intrastate conflict. I then derived a list of the non-state organized armed groups using the UCDP Actor Dataset (Pettersson et al., Reference Pettersson, Davies, Deniz, Engström, Hawach, Högbladh and Öberg2021). This produced articles associated with 191,252 violent events from 1989 to 2020, representing the activity of 352 non-state organized armed groups (actors) across 393 unique conflict dyads. As text model requires enough data to identify patterns in the text distributions, I removed actors with fewer than ten events and those for which fewer than ten words remained after preprocessing.

The 10-event threshold produced 2180 group years for 258 unique armed [non-state] conflict actors, representing 190,688 recorded events from 151 uniquely named conflicts. I created an actor-specific corpus of news articles from the GED for each of these actors and estimated a two-topic STM with splined year covariates. This generated 258 separate models, one for each non-state armed actor with enough content to model. Within each actor corpus, I assigned an expected thematic proportion from that actor's Topic One and Topic Two to each source article. To identify and summarize trends, I assigned each actor-specific Topic One the value of −1 and Topic Two the value of +1. I then used the topic ratio assigned to each article to position each on the [−1, 1] two-dimensional scale.

This analysis produced a set of scaled UCDP source article texts for each conflict actor. To quantify evolution, I took the average yearly position of scaled articles. A yearly topic ratio of −1 indicates that all the articles for that year are associated with the actor's Topic One, and the inverse. At the other extreme, a topic ratio around 0 indicates that news coverage of the group was equally divided between the two topics.

The magnitude of the difference between the topic ratio at year y _t and y _t−n quantifies the presence and magnitude of “change” between a given year, t, and n year(s) in the past. This produced a record of the descriptive evolution of 242 groups and 1818 group-years.Footnote ² This approach to measuring and summarizing changes in armed conflict actors is flexible and customizable.Footnote ³ Given yearly and lagged topic proportion data, researchers can decide how to operationalize a “change” threshold tailored to their research design. As well, the design allows the model to identify topics that are specific to each actor, and for the proportion of the topics to change non-linearly over time.

In the analysis that follows, I operationalize “change” as occurring in a year during which an actor experienced a change in their topic ratio of ≥|1| relative to the previous year. Substantively, this threshold suggests that in one year, reporting on the group's behavior shifted from being predominantly associated with that group's Topic One (or Topic Two) to the group-specific Topic Two (or Topic One). This is a high threshold, capturing rapid change in the descriptions of an organization's behavior.

A group-level distribution of topic proportion changes based on a one-year lag can be seen in Figure 1. The figure shows the per-group distribution of yearly topic changes for each of the GED's 174 armed conflict actors that have more than three active years and ten recorded activities. The X-axis shows the distribution of one-year lagged topic proportion changes. Militant groups with fewer changes have distributions that are concentrated around 0 and thus have a more peaked distribution. Conversely, groups with more framing instability have a flatter distribution, with more proportion changes near 2 and −2. Conflict actors are grouped by quartiles of activity levels. The first quartile contains actors with the least activity, while the fourth quartile represents the most active armed groups. Actors in the first quartile are associated with [10, 25] violent event records, actors in the second quartile are associated with (25, 75] violent events, those in the third are associated with (75, 219] events, and groups in the fourth quartile are associated with 219 or more violent events. As one would expect, for all groups, the distribution of year-on-year topic ratio changes is concentrated around zero. This indicates that most group years have very little change in the framing scale relative to the previous year.

Figure 1. Group-level distributions of one-year topic changes.

The Supplementary Appendix provides a close analysis of the results for several groups, to illustrate the outcome and suggest face validity for using a text-based measurement strategy. The cases are chosen to cover a range of operational and situational environments. Additionally, the Appendix addresses four points of concern—selection effects, media access, model specification, and face validity—and describes in general how a researcher could analyze the results for specific groups of interest. The Appendix also discusses the opportunities (and limitations) for researchers to qualitatively interpret the meaning of the summary words in the group-specific scale. In the next section, I highlight how the measure can be used quantitatively, as part of a large-N statistical model.

2. Application and replication

To demonstrate how the modeling strategy presented above can enhance existing research into the dynamics and consequences of armed conflict, I extend analysis from Nilsson and Svensson (Reference Nilsson and Svensson2021). Their research design operationalized “uncertainty” via an indicator coding whether the non-state actor in a conflict dyad put forward Islamist claims or motivations in their first year of operation. In addition to demonstrating the utility of my approach, I fill a measurement gap in the literature as dynamic measures of group and conflict level uncertainty were previously unavailable to researchers of civil conflict.

I re-estimate their analysis using changes in the yearly topic ratio as an alternative operationalization of uncertainty. Rapid changes in the depiction of activities of the rebel group should reflect underlying instability in the conflict environment and be suggestive of observer uncertainty about the motivation and operational profile of the non-state actor. Following Nilsson and Svensson's logic that uncertainty prolongs conflict, such representational instability should be associated with conflicts that are more resistant to termination.Footnote ⁴ The primary independent variable is an indicator that captures whether there have been any years with a rapid (as defined below) change in topic proportions. This is a time-invariant measure, intended to closely match the replicated model. The unit of analysis for the model is group-year.

I replicate the central finding of Nilsson and Svensson: conflict dyads in which the armed non-state actor begins the conflict with an Islamist claim are less likely to terminate. I then introduce three measures of change: the first, and lowest, threshold is a binary indicator for whether the group had any one year changes in topic proportions of at least |1| (i.e., half of the range of the group-specific scale); the second is an indicator for whether the group had a shift of at least |1.5| (i.e., two-thirds of the scale's range); and the third is an indicator whether the group experienced a single-year representational change of magnitude |2| (i.e., the entire possible range).Footnote ⁵ Figure 2 summarizes the binary variables that capture whether a specific group was subject to a low (at least |1| on the [−1, 1] scale), medium (at least |1.5| on the scale), or high (at least |2| on the scale) change period across all active years. The Appendix presents a time-varying specification of the model that uses the number of changes, whether there was a recent change, and the length of time between “change” years as independent variables. Additionally, the Appendix presents an analysis of whether the results are robust to more restrictive modeling thresholds.

Figure 2. Distribution of new binary variables.

The original study and the extension have different temporal scope, which warrants discussion. Nilsson and Svensson's dataset spans 1979–2013, whereas my measure of uncertainty is based on the UCDP GED, which spans 1989–2020. The results below cover 1989–2013, the intersection of the data.Footnote ⁶ The data adaptation needed to incorporate my measure is described in detail in the Appendix.Footnote ⁷ I keep all other parameters the same as in Nilsson and Svensson's analysis.

Figure 3 presents results for the replication and extension of the model of conflict termination. Adding the change measure strengthens the underlying finding that uncertainty prolongs conflict. Armed non-state actors that underwent periods of rapid framing changes are associated with conflicts that are significantly more intractable than conflicts associated with groups that did not experience a representational change. The “Low Change” case of a one-year shift of at least magnitude |1| on the scale [−1, 1] scale is associated with a conflict being 43 percent less likely to change. Likewise, armed groups with a one-year change of at least magnitude |1.5| (the “Medium Change” groups) are associated with a 50 percent decreased likelihood of termination. Conflicts in the “High Change” condition whereby the actor changed across the entire scale (and thus had a one-year change of magnitude of |2|) were 30 percent less likely to terminate, although estimated with lower precision than the original study. These findings are not driven by correlation between framing shocks and Islamist motivations: there is a very low correlation between Islamist motivation and having experienced rapid descriptive change.

Figure 3. Replication of termination model with change. Termination propensity is modeled via a Cox proportional hazard model to capture the effects of time. In these models, hazard ratios are interpreted relative to 1. A positive value indicates an increased likelihood of termination, whereas a negative value shows a reduced likelihood of termination. To account for the possibility of repeat events the specifications are stratified on termination episodes.

Thus far, the analysis has used a binary measure of whether a militant group experienced a period of framing change. This is the most conservative approach because it is agnostic to the precision with which the text-based measure identifies the date of the change. However, using more precise measurements may produce additional insights into the connection between group transformation and conflict termination. The Appendix extends the analysis by introducing variables that incorporate both time and frequency of group transformation.

3. Conclusion

This Note has made a methodological and substantive contribution to the scholarship of conflict dynamics and militant group operation. Methodologically, it demonstrates how scholars of conflict dynamics can apply new techniques to existing data to measure a process that has proven difficult to operationalize. Substantively, it contributes a tool that can be paired with other organizational data to produce a dynamic window into the consequences and outcomes of organizational design and operations. It takes scholars of conflict a step closer to a dynamic measure of uncertainty and opens a new avenue of inquiry into understanding which opaque organizations change their operating profile and under what conditions these changes tend to occur.

Although focused on the application to a particular form of organizational data the Note's underlying contribution of a method to measure difficult-to-quantify behavior via recorded activity patterns can be productively applied to other domains. Future extensions can apply the method to aggregate and model information about other political entities or to proxy challenging concepts such as stance (Bestvater and Monroe, Reference Bestvater and Monroe2023). The strategy described in this Note may also be useful for scholars interested in dynamic processes more broadly, such as uncertainty in negotiations or bargaining (Fearon, Reference Fearon1995).

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/psrm.2024.26.

To obtain replication material for this article, https://doi.org/10.7910/DVN/1HNSZR

Competing interests

None.

Footnotes

¹ Ceteris paribus, capturing normative, or true, preferences would provide a closer fit with theoretical expectations. However, this route is complicated by access and by strategic incentives to misrepresent true objectives at an organizational, factional, or personal level. For these reasons, quantifying normative preferences is daunting for any specific group and intractable at scale.

² Sixteen groups fall out of the analysis at this stage due to idiosyncrasies in their data.

³ Cross-group comparisons can be done on magnitude and rates of change and by thematically clustering topics, but the substantive implications of each −1 and 1 point are unique to each specific actor.

⁴ The index captures shifts in how external actors write about an organization, so one should expect some lag relative to the true process generating organizational or situational instability.

⁵ The |1| and |1.5| magnitude changes overlap closely with one and two standard deviations in the year-on-year change measure (0.74 and 1.48, respectively).

⁶ The difference in the data coverage produces a termination dataset with 1020 conflict-dyad years, versus Nilsson and Svensson's 1657 conflict-dyad years.

⁷ The extension to conflict recurrence is likewise featured in the Appendix because the differing temporal scope means that comparison to the original results is less informative.

References

Baerg, N and Lowe, W (2020) A textual Taylor rule: estimating central bank preferences combining topic and scaling methods. Political Science Research and Methods 8, 106–122.CrossRef Google Scholar

Bagozzi, BE and Berliner, D (2018) The politics of scrutiny in human rights monitoring: evidence from structural topic models of us state department human rights reports. Political Science Research and Methods 6, 661–677.CrossRef Google Scholar

Becker, J and Malesky, E (2017) The continent or the “grand large”? Strategic culture and operational burden-sharing in NATO. International Studies Quarterly 61, 163–180.CrossRef Google Scholar

Beshears, J, Choi, JJ, Laibson, D and Madrian, BC (2008) How are preferences revealed?. Journal of Public Economics 92, 1787–1794.CrossRef Google Scholar PubMed

Bestvater, SE and Monroe, BL (2023) Sentiment is not stance: target-aware opinion classification for political text analysis. Political Analysis 31, 235–256.CrossRef Google Scholar

Egerod, B and Klemmensen, R (2020) Scaling political positions from texts: assumptions, methods and pitfalls. The Sage handbook of research methods in political science and international relations. Thousand Oaks: Sage, pp. 498–521.CrossRef Google Scholar

Fearon, JD (1995) Rationalist explanations for war. International Organization 49, 379–414.CrossRef Google Scholar

Grimmer, J (2010) A Bayesian hierarchical topic model for political texts: measuring expressed agendas in Senate Press releases. Political Analysis 18, 1–35.CrossRef Google Scholar

Klar, S (2014) A multidimensional study of ideological preferences and priorities among the American public. Public Opinion Quarterly 78, 344–359.CrossRef Google Scholar

Lauderdale, BE and Clark, TS (2014) Scaling politically meaningful dimensions using texts and votes. American Journal of Political Science 58, 754–771.CrossRef Google Scholar

Magaloni, B and Rodriguez, L (2020) Institutionalized police brutality: torture, the militarization of security, and the reform of inquisitorial criminal justice in Mexico. American Political Science Review 114, 1013–1034.CrossRef Google Scholar

Nilsson, D and Svensson, I (2021) The intractability of Islamist insurgencies: Islamist rebels and the recurrence of civil war. International Studies Quarterly 65, 620–632.CrossRef Google Scholar

Pan, J and Chen, K (2018) Concealing corruption: how Chinese officials distort upward reporting of online grievances. American Political Science Review 112, 602–620.CrossRef Google Scholar

Pettersson, T, Davies, S, Deniz, A, Engström, G, Hawach, N, Högbladh, S and Öberg, MSM (2021) Organized violence 1989–2020, with a special emphasis on Syria. Journal of Peace Research 58, 809–825.CrossRef Google Scholar

Roberts, ME, Stewart, BM and Airoldi, EM (2016) A model of text for experimentation in the social sciences. Journal of the American Statistical Association 111, 988–1003.CrossRef Google Scholar

Roberts, ME, Stewart, BM, Tingley, D, Lucas, C, Leder-Luis, J, Gadarian, SK, Albertson, B and Rand, DG (2014) Structural topic models for open-ended survey responses. American Journal of Political Science 58, 1064–1082.CrossRef Google Scholar

Slapin, JB and Proksch, S -O (2008) A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52, 705–722.CrossRef Google Scholar

Sundberg, R and Melander, E (2013) Introducing the UCDP georeferenced event dataset. Journal of Peace Research 50, 523–532.CrossRef Google Scholar

Figure 1. Group-level distributions of one-year topic changes.

Figure 2. Distribution of new binary variables.

Foster supplementary material

File 1.8 MB

Foster Dataset

Dataset

https://doi.org/10.7910/DVN/1HNSZR

Link

Article contents

Subject to change: quantifying transformation in armed conflict actors at scale using text

Abstract

Keywords

1. Quantifying change points at scale

1.1. Data and output

2. Application and replication

3. Conclusion

Supplementary material

Competing interests

Footnotes

References

Foster supplementary material

Foster Dataset

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests