Lustick Consulting s ABM Verification & Validation

Similar documents
Package EBMAforecast

Mixed Methods Stability Forecasting and Mitigation for the DARPA ICEWS Program

Forecasting Conflict Lecture 4 Models and Metrics

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

Statistical Models for Causal Analysis

Removal and Destruction of Syrian Chemical Weapons

Issues around verification, validation, calibration, and confirmation of agent-based models of complex spatial systems

Verification and Validation. CS1538: Introduction to Simulations

Do we need Experts for Time Series Forecasting?

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

ECE521 week 3: 23/26 January 2017

Math 6330: Statistical Consulting Class 5

Basic Verification Concepts

A Wisdom of the Crowd Approach to Forecasting

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION

Data Mining. CS57300 Purdue University. March 22, 2018

3. If a forecast is too high when compared to an actual outcome, will that forecast error be positive or negative?

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Learning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors

NOWCASTING THE OBAMA VOTE: PROXY MODELS FOR 2012

Logic and Artificial Intelligence Lecture 13

Forecasting: The First Step in Demand Planning

GDP growth and inflation forecasting performance of Asian Development Outlook

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Components for Accurate Forecasting & Continuous Forecast Improvement

A Random-effects construction of EMV models - a solution to the Identification problem? Peter E Clarke, Deva Statistical Consulting

Targeted Maximum Likelihood Estimation in Safety Analysis

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

Introduction to Statistical Inference

Freeman (2005) - Graphic Techniques for Exploring Social Network Data

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

GOVERNMENT MAPPING WORKSHOP RECOVER Edmonton s Urban Wellness Plan Mapping Workshop December 4, 2017

Forecasting Time Frames Using Gann Angles By Jason Sidney

EC 331: Research in Applied Economics

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Jagdish Maheshri ABSTRACT. Introduction

Model Selection. Frank Wood. December 10, 2009

Brief Glimpse of Agent-Based Modeling

Introduction to Artificial Intelligence. Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST

Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee

Crop / Weather Update

Northwestern Consolidated Schools of Shelby County. Curriculum. World Studies (Eastern Hemisphere) Prepared by. Rich Ballard

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Economic and Social Council

Analyzing Residuals in a PROC SURVEYLOGISTIC Model

Logistic Regression: Regression with a Binary Dependent Variable

Decision Trees: Overfitting

CSISS Resources for Research and Teaching


Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

CONCEPT 4 Scientific Law. CONCEPT 3 Scientific Theory

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

LECTURE 15: SIMPLE LINEAR REGRESSION I

Mapping Communities of Opportunity in New Orleans

Economics 390 Economic Forecasting

LEARNING OUTCOMES SST (G1-G12)

Ever since elections for office have been held, people. Accuracy of Combined Forecasts for the 2012 Presidential Election: The PollyVote FEATURES

AUTO SALES FORECASTING FOR PRODUCTION PLANNING AT FORD

Global Perspectives Goals & Objectives: (include numbers) Learning Essentials Materials/ Resources

CS 188: Artificial Intelligence Spring Announcements

Calibrated Uncertainty in Deep Learning

VBM683 Machine Learning

Chapter 10 Regression Analysis

Accuracy of Combined Forecasts for the 2012 Presidential Election: The PollyVote

TRADE AND DEVELOPMENT: THE ASIAN EXPERIENCE. All F. Darrat

Forecasting Using Time Series Models

Mapping granite outcrops in the Western Australian Wheatbelt using Landsat TM data

Recommendations on trajectory selection in flight planning based on weather uncertainty

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Introduction to Machine Learning Midterm Exam

Social Studies: The World (End of the Year TEST)

Linear classifiers: Overfitting and regularization

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Regression Analysis. A statistical procedure used to find relations among a set of variables.

POLICY GAMES. u t = θ 0 θ 1 π t + ε t. (1) β t (u 2 t + ωπ 2 t ). (3)

POLICY GAMES. u t = θ 0 θ 1 π t + ε t. (1) β t (u 2 t + ωπ 2 t ). (3) π t = g t 1 + ν t. (4) g t = θ 1θ 0 ω + θ 2 1. (5)

SCIENTIFIC INQUIRY AND CONNECTIONS. Recognize questions and hypotheses that can be investigated according to the criteria and methods of science

NOAA 2015 Updated Atlantic Hurricane Season Outlook

A Plot of the Tracking Signals Calculated in Exhibit 3.9

Applied Microeconometrics I

NOTES AND CORRESPONDENCE. Improving Week-2 Forecasts with Multimodel Reforecast Ensembles

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Implementation Status & Results Burkina Faso TRANSPORT SECTOR PROJECT (P074030)

Fall 2013 Monday 6:30pm-9:00pm

Demand Forecasting Reporting Period: 19 st Jun th Sep 2017

Basic Verification Concepts

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Fire frequency in the Western Cape

Assessing probabilistic forecasts about particular situations

Day 6: Classification and Machine Learning

Measuring Social Influence Without Bias

Lecture I. What is Quantitative Macroeconomics?

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU

Variable selection and machine learning methods in causal inference

SRI Briefing Note Series No.8 Communicating uncertainty in seasonal and interannual climate forecasts in Europe: organisational needs and preferences

David G. DeWitt Director, Climate Prediction Center (CPC) NOAA/NWS

C1: From Weather to Climate Looking at Air Temperature Data

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Deep Learning Architecture for Univariate Time Series Forecasting

Transcription:

Lustick Consulting s ABM Verification & Validation September 2015 The key to Lustick Consulting s Verification & Validation (V&V) process is iterative analysis of model inputs, theory, outputs, and empirical indicators. V&V should not be a static, one- time event, but rather an ongoing process of hypothesizing, testing, and confirmation or disconfirmation. 1 In order to implement this goal, Option I of our project called for six quarterly Verification & Validation reports from August 2012 to December 2013 that included our system status, data inputs, and model forecasts. In addition, we performed a MESA Epistemological Decomposition of our Venezuela model in May 2012 to unpack the data and theory behind the model. 2 Lastly, we held multiple sessions with two Subject Matter Experts Dr. Allen Hicken and Dr. David Faris who helped us better understand how our models relate to the real- world. Both experts found our approach compelling with relation to their countries (Egypt, Indonesia, the Philippines, and Thailand). There are many other ways to measure verification and validation, including a model s consistency, prominence, and accuracy with respect to other research in the discipline. We can also measure the model's utility to users, since an effective model should be useful for some specified purpose. Of course, the classic way to validate a model in modern science is to create a ground truth dataset and use that real- world data to test the model in- and out- of- sample. In this section, we test the validity of our country models by fitting a simple logistic regression using key indicators from the ABM output to predict the likelihood of five Events of Interest (EOIs) from the ICEWS project: Domestic Political Crisis, Insurgency, Rebellion, Ethnic/Religious Violence, and International Crisis. However, forecasting of discrete events, per se, is not a fully appropriate measure of performance for our modeling technique. This shortcoming stems from the greater flexibility of our model, which seeks to provide information not only about what could happen or did happen, but also how and why those things could or did happen. Our model also can examine the likelihood of what could have happened and how, and why that event might have occurred. From a causal, theoretical, and policy planning point of view, it is crucial to understand that much of what actually happens is random, i.e. theoretically uninteresting and causally inaccessible. In other words, a model can be excellent, and be wrong about a particular forecast. The better and more difficult validation procedure must include comparison of patterns of forecasts to patterns of outcomes and not merely forecast discrete events. 1 For more information on Lustick Consulting s deep Verification & Validation work, see (Lustick and Tubin, 2012). 2 For more information on the Model Evaluation, Selection, and Assessment (MESA) project, see (Alicia, et.al., 2012).

Our computational steering process used for updating the models allows us to start running our models back to the beginning of 2001, but we treat the first three years of the run as a burn- in period, a common practice in agent- based modeling applications. We also start four of our models later than 2001 due to lack of data and severe regime shifts in some countries. In all models we gather thirteen key output variables that we believe are the most likely drivers of the ICEWS EOIs. These include: 1. Mobilization indicators: attacks, protests, lobbies, and victims 2. Dynamic Political Hierarchy (DPH) indicators: dominant, incumbent, regime, system, and non- system subscription and activation (for more information on the Dynamic Political Hierarchy, see Lustick et.al. 2012) We then fit a simple logistic regression using these variables to our five binary EOIs. All variables were included for all models, and no attempt was made to tweak the model for performance. We intend these results to be a proof of concept, not a set of stable LC forecasts. In order to gauge the accuracy of our results, we compare our in- sample and out- of- sample forecasts to the Ensemble Bayesian Model Averaging (EBMA) model from the ICEWS project (Montgomery, et al., 2012). In general, we have found that our models forecast events well and rival the EBMA in some cases. Figure 1: Brier scores by EOI and Country, broken down by the ABM and EBMA results Figure 1 above compares the Brier Scores 3 for the ABM and EBMA forecasts for each EOI and country. We can visualize these metrics by creating separation plots by EOI to show how many cases we have classified correctly in Figure 2. In the separation plots, predictions are ordered from least likely to most likely, and a red bar indicates a true EOI. An accurate separation plot tends to have high predictions during true events on the right side of the graph and low predictions during false events on the left side. We can also check our model s accuracy by looking at the Sensitivity (percentage of 1 s correct), Specificity (percentage of 0 s correct), and 3 https://en.wikipedia.org/wiki/brier_score

statistics 4 (see Table 1.) To calculate the metrics that require a cut- point, the value.5 was used. (This cut- point was chosen ahead of time, not calibrated to improve performance.) Figure 2: Separation Plot of in- sample and out- of- sample results Table 1: Agent- based Model Forecast Metrics, In- and Out- of- sample Domestic Political Crisis Rebellion International Crisis 82.3 46.65 95.54 4.46 53.35 51.29 83.85 82.24 84.54 15.46 17.76 63.96 91.68 74.23 96.24 3.76 25.77 73.75 Insurgency Ethnic/Religious Violence 79.26 48.54 96.97 3.03 51.46 55.15 88.54 63.76 95.8 4.2 36.24 65.33 4 is a method for measuring the correlation of binary variables. https://en.wikipedia.org/wiki/phi_coefficient

Figure 3: EOI forecasts for Yemen To give an example, Figure 3 shows four EOI forecasts for our Yemen model, comparing the predictions for both the EBMA and ABM models. Both models seem to do poorly in predicting the Yemen Rebellion (REB) or Domestic Political Crisis (DPC), but do modestly well in forecasting the Insurgency (INS) and Ethnic/Religious Violence (ERV) before it occurred. Note that forecasts made after July 2013 are completely out- of- sample. To show the volume of data being used for this analysis, we can look at the ABM output data in Figure 4 for each country- month combination within our timeframe. Changes in the ABM output data are caused by a combination of exogenous punctuations, continuous computational steering, and internal model dynamics. The x- axis for each country shows the month and the y- axis shows the value for each particular ABM output variable. The y- axis scales may vary for each country. Lobby, protest, attack, and victim are measures of the average number of agents during each country- month that are mobilizing in different ways, representing levels of discontent and isolation that rise to the level of mobilization. The DPH subscription variables are the average number of agents for each DPH level for which that DPH level is the highest to which it is subscribed. For example, high levels of system and non- system subscription tell us that there are many agents in the landscape that do not have a dominant, incumbent, or regime identity in their repertoire and are therefore alienated from the center of politics. The last set of variables, DPH activation, is the average number of agents that are activated on an identity in

each level of the DPH. High regime activation simply means that there are many agents that are activated on some regime identity, but could be subscribed to dominant, incumbent, or system identities. All of these variables have consistent meanings across countries and are therefore ideal for using in a large- N cross- country analysis. Figure 4: Main ABM output variables over time We also ran a similar experiment as part of the ME- CEWS project to forecast the number of violent events per province per week in seven Middle Eastern countries. We used the same method as described above, but used a simple ordinary least squares linear model with a floor of zero. Also, we used only a subset of the variables from the EOI country- level forecasts, including attack, protest, lobby, and our DPH subscription measures (eight variables). We again compare our model to the EBMA results, but different metrics need to be used in order to measure the accuracy of a count variable as opposed to a binary variable. In Table 2 we show the comparison between the ABM and EBMA validation metrics for weekly forecasts of violent events for 125 provinces between January 2004 and September 2015. We can see that the correlation is weak for both models, meaning that neither model did particularly well in forecasting violent events. Although the ABM Root Mean Squared Error is higher, our Mean Absolute Error is lower, indicating more outlier forecasts for the ABM. Perhaps

most interesting is that the correlation between the two models is only.38, which means that our model would likely improve the EBMA forecast if included in the averaging algorithm. Table 2: Empirical validation for violent event forecasts Agent- based Model Ensemble Bayesian (ABM) Model Averaging (EBMA) Root Mean Squared Error 10.99 10.26 Mean Absolute Error 2.74 2.79 Pearson s Correlation.33.46 To show the richness of the output, Figure 5 shows the weekly ABM output for each Syrian province between 2012 and 2017 (our forecast end date). We can see that there is significant variation from province to province due to differences in support for factions (Assad, Kurds, ISIS, Rebels), identity complexion (ethnic, religious, political), and steering data coming from media reports picked up by the ICEWS event data. Figure 5: ABM Output for Syrian Provinces The main advantage to a large N approach for problems like these is that the logistic regression can find patterns that a human could not. The downside is that all of the results are correlational, meaning that they don t necessarily describe real causal processes. Building a

large set of agent- based models, running and updating them for a fifteen- year period, and then validating the results using real- world data is an unprecedented step in the field of computational social science. Even more, showing that those results can compete with and complement the state- of- the- art in statistical forecasting is strong evidence that ABM holds great potential in the future of the discipline. Bibliography Lustick, Ian S. and Matthew Tubin. 2012. Verification as a Form of Validation: Deepening Theory to Broaden Application of DoD Protocols to the Social Sciences. Advances in Design for Cross- Cultural Activities. Lustick, Ian S., et al. 2012. From theory to simulation: the dynamic political hierarchy in country virtualisation models. Journal of Experimental & Theoretical Artificial Intelligence 24(3) Montgomery, Jacob M., Florian M. Hollenbach, and Michael D. Ward. 2012. Improving predictions using ensemble Bayesian model averaging. Political Analysis 20.3. Ruvinsky, Alicia I., Janet E. Wedgwood, and John J. Welsh. 2012. Establishing bounds of responsible operational use of social science models via innovations in verification and validation. Advances in design for cross- cultural activities Part II.