Abstract

We present a machine learning-based methodology to forecast conflict. This forecast aims to address both the long-term risk of violent conflict (defined as organized violence resulting in at least 10 fatalities over a 12 month period) and a short-term estimate of the number of violent events (defined as events due to organized violence resulting in at least one fatality over the next two months).

The long-term risk of violent conflict uses a classification approach called random forest (RF). It predicts whether a region will experience violent conflict (yes or no) over the coming 12 months. The short-term estimate of the number of violent events uses a deep learning technique called long-short term memory (LSTM) neural network for time series prediction. It predicts the intensity and direction of conflict in the coming two months.

When both models are applied to historic data of past conflict events not used in training or testing the machine learing methods, the RF focusing on predicting long-term risk on violent conflict captures 86% of future conflicts. The LSTM model estimates the number of conflict events for the coming two-month period and these predicted event counts are -on average- within ± 2 events from the observed event counts at the first-order administrative level, e.g. states or provinces within a country.

A web-based tool that houses the models allows users to explore forecasts and individual predictor indicators spatially and through time, providing additional information on underlying vulnerabilities as a first step toward enabling timely, effective water-related interventions to mitigate conflict and build peace.

Download RF Technical Note

Overview

The overall objective of the Global Tool of the Water, Peace and Security partnership is to offer a platform where actors from the global defense, development, diplomacy, and disaster relief sectors (among others) and national governments of developing countries can identify conflict hotspots before violence erupts, begin to understand the local context, and prioritize opportunities for water interventions. This requires information that is timely, accurate, and actionable. To meet user demand for early warning information, our ambition is to release an updated forecast every month. The tool allows users to explore the forecast, as well as underlying model inputs and contextual indicators across a variety of domains.

The foundation of our forecasting is an expansive library of quantitative indicators potentially related to conflict. The indicators used in our model—predictor variables—are available for exploration as both interactive maps and time series. Some indicators are not fit for use in a quantitative model but are nonetheless useful for decision-making. The tool includes these contextual indicators alongside the model inputs as spatial data. All tool functionality—access to datasets, metadata, and geospatial visualization specifications—is powered by the Resource Watch API, an open-source service designed to easily integrate into user workstreams.

The WPS Global Early Warning Tool is not intended to elucidate causal relationships between the predictor variables and conflict. It does, however, highlight instances of water shocks (i.e. heavy rains or drought) reflecting our audience’s interest in water-related interventions. While we do not know or claim that water shocks drive conflict, we do believe they are important for screening when on-the-ground adaptation measures are needed and work towards climate-sensitive peacebuilding.

 

Model performance

Long-term RF model

When applied to a historic dataset, overall, the model captures 86% of future conflicts, successfully forecasting over 9 out of every 10 ongoing conflicts and 6 out of 10 emerging conflicts. The trade-off for this high recall is low precision for emerging conflicts. Around 80% of all emerging conflict forecasts represent false positives, that is, instances where conflict was forecasted but did not actually occur. Ongoing conflicts have both high recall and high precision (<1% were false positives).

Short-term LSTM model

The nature of violent conflict, the complex direct and indirect pathways, and in this case conflict event counts, directs the performance assessment of the models towards a comparison against basic predictor models (baseline models) like a naïve forecast or even as basic as the average of actual observed conflict events. With measures like the Mean Absolute Scaled Error (MASE) and the Root Relative Squared Error (RRSE) we compare our deep learning prediction model to those types of basic predictor models.

The MASE is a measure to determine the effectiveness of the conflict event prediction using the LSTM deep learning approach by comparing the predicted values with a naïve forecast. This naïve forecast equals the current prediction to the observed value in the previous time step. The MASE measure scales the Mean Absolute Error (MAE) of the prediction to the MAE of the baseline model. As such, a MASE of 1 demonstrates equal performance of the prediction model to the baseline. With a MASE below 1, the prediction model outperforms the baseline in terms of absolute differences between observed and predicted event counts. The MASE of our conflict event prediction model at the first-order administrative level is 0.56. Compared to the naïve forecast baseline, by utilizing the LSTM prediction model the MAE decreases by 44 percent.

The Root Relative Squared Error Loss (RRSE) is another measure to determine the effectiveness of the conflict event prediction to a baseline model. In this case, by comparing the predicted event counts to the average of the observed conflict events. Like the MASE measure, a RRSE of 1 demonstrates equal performance of the prediction model to the baseline. With a RRSE below 1, the prediction model outperforms the baseline in terms of quadratic differences between observed and predicted event counts. The RRSE of our conflict event prediction model at the first-order administrative level is 0.73. Compared to the average conflict event value, the LSTM prediction model decreases the RSE by 27 percent.

We computed the Root Mean Square Error (RMSE) as well. This measure provides an estimate for the standard deviation of the distribution of the error between the observed and predicted conflict event counts. The RMSE of the LSTM prediction model is 2. This means that we expect the value of the next conflict event count prediction -on average- will be within ± 2 events from the observed number of events.

 

Application

The two models within our forecast are designed to help organizations address conflict in two distinct ways. The long-term risk indicator can be used for strategic planning, while the short-term estimate can provide detailed information on the nature of upcoming conflict that can be useful for on-the-ground operations.

Like an initial medical screening, the long-term RF model is optimised to flag all concerning cases for further analysis. In other words, we would rather wrongly forecast the presence of conflict than incorrectly forecast its absence (i.e. ‘peace’, in the strictly negative sense). The downside to this decision is that the long-term forecast overestimates conflict. Users interested in the ongoing conflict forecasts can have high confidence in the forecast, and may feel comfortable acting on this information immediately. For emerging conflicts, users can view these results as a ‘first screening’, feeling confident that our ‘net’ has caught most emerging conflicts, but acknowledging they are interspersed with many instances of peace.

The LSTM offers great detail on the nature of the conflict, including the intensity and direction. It can be used by local and regional teams operating on the ground, and it meant for more agile decision making. On average, users can expect the LSTM forecast to be correct within ±2 events. Results are strongest in Africa and weaker in the Middle East and South Asia, where there is a shorter record of conflict events to train the model.

 

Technical Details

Model Type

Long-term RF Model

An RF algorithm generates an ensemble of decision trees. A tree uses a random subset of the predictor variables to make decisions about the value of the dependent variable. Each decision, or branch, leads the tree closer to its final prediction, the leaf. The forest then tabulates the individual tree’s predictions in a vote, with the most popular becoming the overall model’s prediction. Once the RF model is trained using known inputs and outputs, we can run a new sample of predictor variable values through the model to generate a forecast.

Short-term LSTM model 

The LSTM is a deep learning approach capable of learning long-term dependencies. LSTMs are explicitly designed to remember information for long periods. LSTMs both learn a mapping from inputs to outputs and what context from the input sequence is useful for the mapping. They even can dynamically change this context as needed. As such, LSTMs can be used to create large recurrent networks, that can be used to address difficult sequence problems. We choose LSTMs for modelling time series of conflict event counts because of their ability to learn the order dependence between events in a sequence. These deep learning approaches don’t need a pre-specified and fixed context for making predictions but have the promise of being able to learn this context from the training data.

We designed a stacked LSTM architecture followed by a standard feed-forward (dense) layer. After each LSTM layer a ‘dropout’ layer is added to avoid overfitting. Since target and feature data are available on different time scales, only the monthly data (conflict event and SPI-value) are fed to the model as sequences. The length of the sequences fed into the model is 48 timesteps, equaling 4 years of monthly data. Both yearly (population data) and non-temporal features (rural population, rainfed agriculture value and seasonal (intra-annual) variation) are introduced as auxiliary data into the model outside of the LSTM.

 

Unit of Analysis

The first-order administrative unit is the spatial unit of analysis for both machine learning methods. These boundaries are based on the Database of Global Administrative Areas (GADM 2018). Note: the original RF was trained using the second-order administrative unit.

The current geographic scope is limited to Africa, Western Asia, South and Southeast Asia.

Data Tranches Used to Train, Calibrate, and Assess the Models

 

Fit Model

Estimate Error/Fine-tune

Assess Performance

RF

January 2004 -May 2016

June 2017 - December 2017

January 2018 - June 2018

LSTM

January 2008 -December 2017

January 2019 - June 2021

July 2021 - February 2022

 

Dependent variable

We use the Armed Conflict Locations Event Database (ACLED) to develop our dependent variable (Raleigh et al. 2010). This database contains events of organized violence with high temporal and spatial resolution. Each event in the database is assigned a precise date and location, and is classified following a codebook (ACLED, 2019).

ACLED collects conflict events at the ‘atomic unit’ regardless of whether the conflict resulted in fatalities and contains for each event conflict: date, location, agent type and event type. This also means that ACLED doesn’t designate larger conflicts but allows users to make their own selections and aggregations based on e.g., type of event, type of actor, type of interaction, named actor, location, or time period.

We chose conflict event counts over a two-months period as model target. The use of event counts is in accordance with other studies based on ACLED data (see for example Raleigh & Kniveton 2012; O’Loughlin et al. 2014; Witmer et al. 2017), but differs from conflict prediction models which use a binary target (‘yes/no’ conflict state) and predict a probability of conflict. Landis (2014) states that ACLED violent events are far more frequent and often correlated across time as one event makes another more likely to occur. This tendency makes it difficult to discern independence from one event to another for the purposes of coding unique and independent onsets. Furthermore, for the deep learning approach we refrained from utilizing fatality numbers as a dependent variable because these numbers are often the most biased and poorly reported component of conflict data (Raleigh & Kishi, 2020). Although ACLED codes the most conservative reports of fatality figures to minimize overcounting, Raleigh & Kishi (2020) state that this information cannot reliably be used to distinguish larger conflicts or conflict actors from each other. Conflict events within those two classes are included when they resulted in at least one fatality and have a reasonably high geoprecision (coded as geoprecision level 1 or 2). The events types used in both the RF and the LSTM can be found in the table below.

List of ACLED Events Included in the Models

Predictor variables

The Water, Peace and Security Partnership has made available a large set of indicators on the online platform to be included in a prediction model (see Kuzma et al. 2020). The selection of predictor variables for the RF model is based on a feature-importance analysis. Out of the 80+ indicators tested, the following indicators were most relevant to produce the forecasts, and therefore are used in our current model:

List of Predictor Variables Used by the Final RF Model