Estimating the Shape of Covert Networks

Size: px

Start display at page:

Download "Estimating the Shape of Covert Networks"

Emmeline Hancock
5 years ago
Views:

Estimating the Shape of Covert Networks -Matthew Dombroski

Analysis of Social and Organizational Systems Matthew J.

1-412-268-1876 Fax: 1-412-268-6938 Email: mdombros @andrew.

1 Estimating the Shape of Covert Networks -Matthew Dombroski -Paul Fischbeck -Kathleen Carley Center for Computational Analysis of Social and Organizational Systems Matthew J. Dombroski Porter Hall Pittsburgh, PA Tel: Fax: This work was supported by the NSF IGERT in CASOS 23 1

2 Agenda Problem Statement Model Building Relationship of Interest (ROI) and Aggregation Network Properties Empirical Data Transformation Dyadic Dependency Models Comparison of updating models Command and Control Policy Analysis Covert network destabilization Comparison of Decision Performance Metrics versus Aggregate Network Metrics Conclusion and Future Research June 23 CMU - Matthew Dombroski - EPP, CASOS 2

3 Problem Statement and Motivation Command and Control may be able to make better decisions by determining Who is in organization Who is important in organization How information flows through the organization Important applications in improving Destabilization of covert and illicit organizations Communication in relief and military operations with uncertain communication structure Most organizations have an unknown communication structure Data available are often disparate and incomplete High degree of uncertainty in the data May not be able to fully characterize its structure June 23 CMU - Matthew Dombroski - EPP, CASOS 3

Proposed Approach Goal: Using a predefined relationship (dyad), construct a model that infers relationships and builds a prediction of the network quickly using limited data Limited Generate Input

4 Proposed Approach Goal: Using a predefined relationship (dyad), construct a model that infers relationships and builds a prediction of the network quickly using limited data Limited Generate Input Baseline Data Networks aggregation Incorporate Inferences from Network Properties gap filling Generate Overall Network EVPI Policy Analysis Policy question: improved decisions? June 23 CMU - Matthew Dombroski - EPP, CASOS 4

5 Context Determines The Relationship of Interest (ROI) Clearly define the relationship that is of interest Context dependent Covert organizations (conspiring to commit an illicit act) Military operations (request for resources or assistance) Limits set of empirical data available for model Input data arrives according to Poisson process Quantify and distinguish input data informing ROI Reliability Strength of data towards relationship June 23 CMU - Matthew Dombroski - EPP, CASOS 5

6 Aggregation by Bayes Rule If L ij is the event that persons i and j share the ROI And I ij is the event that certain data is observed about the relationship between persons i and j P ( I ) Then Define: ij Lij and P( I ij Lij ) P( L ij I ij ) = P( I ij L ij P( I ij ) P( L ij L ij ) + ) P( L P( I ij ij ) L ij ) P( L ij ) Network priors can be assigned a priori June 23 CMU - Matthew Dombroski - EPP, CASOS 6

7 Incorporating Inference Using Empirical Data (cont.) Before constructing inference model Transform empirical frequency networks into probabilities of ROI Assume frequency of interaction indicates strength of relationship Transforming Empirical Network Data Define ROI Threshold number of interactions required to attain relationship of interest (x max ) Define marginal increase of each additional interaction to probability of ROI Transforming Empirical Network Data (λ) Transform interaction frequency into probabilities P( ROI) = (1 e (1 e λx June 23 CMU - Matthew Dombroski - EPP, CASOS 7 λ x max ) )

8 Network Transformation June 23 CMU - Matthew Dombroski - EPP, CASOS 8

9 Network Transformation June 23 CMU - Matthew Dombroski - EPP, CASOS 9

10 Network Properties Triad Closure Property (Skvoretz, 199) j If then occurs with a probability greater than chance i k Adjacency Property Corollary of triad closure that is modeled i k i If occurs with probability greater than chance then j j k and occur with a probability greater than chance i k June 23 CMU - Matthew Dombroski - EPP, CASOS 1

11 .8 Incorporating Inference Using Empirical Data (cont.) Probability of ROI in Adjacent Relationship.7.6 Average Count of Conversations on Adjacent Relationship Count of Conversations on Reference Relationship Probability of ROI in Reference Relationship 8 th Perc. 6 th Perc. 4 th Perc. 2 th Perc. Adjacency Property Density of Network Density of Network Plot shows percentile contours for adjacent relationships in fraternity network data Model fit using neural network of 9 nodes and 3 layers (percent current error on 4 training data points is.9%) June 23 CMU - Matthew Dombroski - EPP, CASOS 11

12 Several Possible Implementations of Model Three inference models constructed: Model 1-Direct update dyads using incoming information and Bayes Rule only (control model) Model 2-Model 1 plus indirect inferencing only on immediately adjacent relationships using inference model Model 3-Model 2 plus indirect inferencing on all other relationships using inference model June 23 CMU - Matthew Dombroski - EPP, CASOS 12

13 Performance of Models Compared Using Simulation Sample data (2 of 58 nodes), perform an update, calculate performance, another update Constant uniform prior (.2) Reconstruct by varying: Input data accuracy Proportion of information supporting existence of ROI (.5) Performance Metric Absolute Error (sum of prediction probability minus actual probability) Absolute Error = ( N i> j j= 1 abs( p ij A ij )) June 23 CMU - Matthew Dombroski - EPP, CASOS 13

14 .35.3 Absolute Error Model1 Model2 Model3 Absolute Error Observations.15 Basecase.1.5 Absolute Error Observations.24 Random Conditionals Observations Sensor Reliable Conditionals (.8,.1) June 23 CMU - Matthew Dombroski - EPP, CASOS 14

15 .35.3 Absolute Error Model1 Model2 Model3 Absolute Error Observations.25 Basecase Absolute Error Observations P(observing dyad supporting information).2 P(observing non dyad supporting information) Observations.2 P(observing dyad supporting information).8 P(observing non dyad supporting information) June 23 CMU - Matthew Dombroski - EPP, CASOS 15

16 Conclusions from Simulations Inference model results are mixed Under conditions of balance between positive and negative tie input data and random or moderately accurate conditionals Performance on Absolute Error is relatively good for inference models After about 4 updates, base model outperforms inference models Under conditions of imbalance Base model outperforms inference models for nearly all updates Inference over/under predicts the network Under conditions of accurate conditionals Models are indistinguishable June 23 CMU - Matthew Dombroski - EPP, CASOS 16

17 Policy Analysis What does this mean for command and control decisions? C2 Terrorist Example C2 must make a decision of who to remove from network of conspirators using models Get Network prediction after x updates Estimate the n most critical individuals to remove from network Remove them from network Assess how many fragments the network is broken into Compare the number of fragments across different model implementations June 23 CMU - Matthew Dombroski - EPP, CASOS 17

18 Policy Analysis Scenario-Covert Networks (cont.) Remove 7 out of 2 people Network Prediction-Terrorist Example 12 Calculate expected number of fragments the network is broken into 1 Number of Fragments Fragments total perfect information 12 EVPI-avg(Model1) 4.9 EVPI-avg(Model2) 5.4 EVPI-avg(Model3) 4.9 lower 95% min average max upper 95% 2 On average, model 2 approximates perfect information closest to within 6.6 total fragments of perfect information 7.1 for models 1 and 3 model1 model2 model3 model1 model2 model3 model1 model2 model3 total isolates subgroups Network Fragments June 23 CMU - Matthew Dombroski - EPP, CASOS 18

19 A Comparison of Simulation Performance Metrics to Decision Metrics.3.25 Absolute Error Terrorism Example-Med Density Graph Observations Mid Density Network Number of Fragments lower 95% min average max upper 95% 2 model 1 model 2 model 3 model 1 model 2 model 3 model 1 model 2 model 3 total isolates subgroups Network Fragments June 23 CMU - Matthew Dombroski - EPP, CASOS 19

20 Network Match Performance Versus Decision Performance Conduct a paired t-test for each network prediction for absolute error and fragments Null Hypotheses: E(absolute error 1 - absolute error 2 ) =, E(fragments 1 fragments 2 ) = E(absolute error 1 - absolute error 3 ) =, E(fragments 1 - fragments 3 ) = E(absolute error 2 - absolute error 3 ) =, E(fragments 2 fragments 3 ) = Hypothesis Mean Value Standard Deviation T-value Dof Significance Level -(Abs 1 Abs 2 ) >.1 Frag 1 Frag <.1 -(Abs 1 Abs 3 ) >.1 Frag 1 Frag < S <.5 -(Abs 2 Abs 3 ) <.1 Frag 2 Frag <.1 June 23 CMU - Matthew Dombroski - EPP, CASOS 2

21 Conclusions Good/bad performance on simulation metric does not imply good/bad performance in C2 decision space Absolute Error recommends 3 over 2, but 2 = 1 and 3 = 1 Fragments recommends 2, 3, then 1 Decision metrics may predict performance better than aggregate network metrics Fragments is a better measure of the actual decision we will make Absolute error is one step abstracted from the decision C2 decisions can benefit from a model Care must be taken in evaluating performance of such models Slight improvement in the number of fragments generated Errors when too many updates occur June 23 CMU - Matthew Dombroski - EPP, CASOS 21

22 Future Work Explore the relationship between other C2 decision metrics and aggregate network metrics Improving Communication for Military Operations Network prediction indicates where communication is weak Add dyads to force individuals to communicate better Expected number of individuals to learn a new fact Vaccination Strategies Against a Contagious Disease Develop more rigorous model using triad closure Characterize several different types of models for different types of networks Work relationships Friend relationships June 23 CMU - Matthew Dombroski - EPP, CASOS 22

23 Questions and Comments Acknowledgements: This research has been supported, in part, by National Science Foundation under the IGERT program for training and research in CASOS and the ONR. Additional support was also provided by the Center for Computational Analysis of Social and Organizational Systems and the Department of Engineering and Public Policy at Carnegie Mellon. The views and conclusions contained in this document are those of the author and should not be interpreted as representing official policies, either expressed or implied, of the Office of Naval Research grant number , the National Science Foundation, or the U.S. government. June 23 CMU - Matthew Dombroski - EPP, CASOS 23

24 Incorporating Inference Using Empirical Data 7 Average Count of Conversations on Adjacent Relationship Adjacency Property Density of Network Density of Network Count of Conversations on Reference Relationship Bernard and Killworth (1976) fraternity brother network Observed number of interactions between 58 individuals over a week Average interactions between individuals-1.89, min-, max-51 June 23 CMU - Matthew Dombroski - EPP, CASOS 24

25 .35.3 Absolute Error Model1 Model2 Model3.5.2 Absolute Error Observations.1 Basecase Observations.2 Absolute Error Informed Prior Observations Misinformed Prior June 23 CMU - Matthew Dombroski - EPP, CASOS 25

26 .35.3 Absolute Error Model1 Model2 Model Absolute Error Observations.15 Basecase Absolute Error Observations.1 Mid Density Network.5 Model1 Model2 Model Observations Low Density Network June 23 CMU - Matthew Dombroski - EPP, CASOS 26

27 Expected Number Infected Policy Analysis Scenario-Contagious Diseases Innoculate max 7 out of 2 people Calculate expected number of remaining people to contract disease Assume SIR model of disease through network (Newman, 22) Assume probability of transmission (p) p 1. perfect information 2.35 E (model 1) 6.3 E (model 2) 5.19 E (model 3) 5.14 model 1 model 2 min average model 3 model 1 model 2 model 3 model 1 model 2 Model 3 approaches perfect information closest model model 1 to within 1.9 persons when p is.1 to within 2.79 persons when p is 1 model 2 model 3 Probability of Transmission model 1 model 2 model 3 model 1 model 2 model 3 June 23 CMU - Matthew Dombroski - EPP, CASOS 27

28 Expected Number Communicated to model 1 Policy Analysis Scenario 3-Relief Organizations Add maximum of 3 relationships Calculate expected number of individuals to learn new information assuming transmission probabiilty p number of of added dyads relationships transmission probability perfect information E(model 1) E(model 2) E(model 3) Model 2 approximates perfect information closest model 2 model 3 model 1 model 2 model 3 model 1 model 2 model 3 To within.9 people with 1 new relationship and p = To 1. within.14 people with 2. 2 new relationships and 3. p =.7 N umber of N ew R elationships and Transmission Probability model 1 June 23 CMU - Matthew Dombroski - EPP, CASOS 28 model 2 model 3 model 1 model 2 model 3 model 1 model 2 model 3 min avg max

Bargaining, Information Networks and Interstate

Bargaining, Information Networks and Interstate Conflict Erik Gartzke Oliver Westerwinter UC, San Diego Department of Political Sciene egartzke@ucsd.edu European University Institute Department of Political