Cyber Security Games with Asymmetric Information

Similar documents
Asymmetric Information Security Games 1/43

Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning

Cyber-Awareness and Games of Incomplete Information

A Stochastic Framework for Quantitative Analysis of Attack-Defense Trees

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Simple Counter-terrorism Decision

CMU Noncooperative games 4: Stackelberg games. Teacher: Ariel Procaccia

Fictitious Self-Play in Extensive-Form Games

1 Markov decision processes

V&V MURI Overview Caltech, October 2008

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Lecture notes for Analysis of Algorithms : Markov decision processes

Dynamic and Adversarial Reachavoid Symbolic Planning

An Introduction to Markov Decision Processes. MDP Tutorial - 1

6 Reinforcement Learning

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

MS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari

Long-Run versus Short-Run Player

Efficient Sensor Network Planning Method. Using Approximate Potential Game

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Coordinating over Signals

Multi-Agent Learning with Policy Prediction

Reinforcement Learning

Andrew/CS ID: Midterm Solutions, Fall 2006

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

General-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so

Learning Equilibrium as a Generalization of Learning to Optimize

Dual Interpretations and Duality Applications (continued)

Estimating Covariance Using Factorial Hidden Markov Models

Lecture 4: Misc. Topics and Reinforcement Learning

Q-Learning in Continuous State Action Spaces

Computing Minmax; Dominance

The Multi-Arm Bandit Framework

Institute of Electrical and Electronics Engineers (IEEE) 53rd IEEE Conference on Decision and Control

A Gentle Introduction to Reinforcement Learning

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

Learning Tetris. 1 Tetris. February 3, 2009

Interacting Vehicles: Rules of the Game

, and rewards and transition matrices as shown below:

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

Learning to Coordinate Efficiently: A Model-based Approach

Econometric Analysis of Games 1

Final Exam December 12, 2017

Definitions and Proofs

November 28 th, Carlos Guestrin 1. Lower dimensional projections

Refinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible

Prophet Inequalities and Stochastic Optimization

Belief-based Learning

STA 414/2104: Machine Learning

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Introduction to Artificial Intelligence (AI)

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests

Intelligent Systems (AI-2)

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Equivalence notions and model minimization in Markov decision processes

Bayesian Networks Inference with Probabilistic Graphical Models

Australian National University WORKSHOP ON SYSTEMS AND CONTROL

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Informed Principal in Private-Value Environments

RECURSION EQUATION FOR

Final Exam December 12, 2017

CS 7180: Behavioral Modeling and Decision- making in AI

arxiv: v2 [cs.gt] 4 Aug 2016

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

Learning in Zero-Sum Team Markov Games using Factored Value Functions

1 Bewley Economies with Aggregate Uncertainty

Confronting Theory with Experimental Data and vice versa. European University Institute. May-Jun Lectures 7-8: Equilibrium

Securing Infrastructure Facilities: When Does Proactive Defense Help?

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Figure 1: Bayes Net. (a) (2 points) List all independence and conditional independence relationships implied by this Bayes net.

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Solving Dynamic Bandit Problems and Decentralized Games using the Kalman Bayesian Learning Automaton

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Mathematical Formulation of Our Example

4: Dynamic games. Concordia February 6, 2017

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

The priority promotion approach to parity games

T 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x:

Nonlinear Dynamics between Micromotives and Macrobehavior

CS 4100 // artificial intelligence. Recap/midterm review!

Value Function Approximation in Zero-Sum Markov Games

Influencing Social Evolutionary Dynamics

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

Convergence and No-Regret in Multiagent Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning

1 Primals and Duals: Zero Sum Games

Math Models of OR: The Revised Simplex Method

OPTIMAL CONTROL OF A FLEXIBLE SERVER

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Transcription:

Cyber Security Games with Asymmetric Information Jeff S. Shamma Georgia Institute of Technology Joint work with Georgios Kotsalis & Malachi Jones ARO MURI Annual Review November 15, 2012

Research Thrust: Obtaining Actionable Cyber-Attack Forecasts Develop adversary behavior models to help predict the effects of future attacks that might be launched to prevent successful mission completion. Asymmetric information repeated games Reduced order models for hidden Markov models ictf 2010 implementation 1

Project Architecture Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Data Mission Model Cyber-Assets Model COAs Data Predict Future Actions Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status 8 2

Project Architecture Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Mission Model Cyber-Assets Model Predict Future Actions COAs Data Data Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status 8 3

ictf 2010 Setting: Sequential execution of service-based missions Game: Service vulnerability vs Counter measures Modeling: Stochastic automata (Petri Nets) States associated with services Labeled transitions 4

ictf Attack Attacker: Observes state transition signals Decides which services to attack Rewards/penalties for attacking correct/incorrect services Complications: Parallel mission signal ambiguity Product state space 5

Asymmetric information games Motivation: One player has superior information Issue: Maximize effectiveness while minimizing observability and hence vulnerability Revelation vs Exploitation Context: Worst-case guarantee vs low probability surprise Repeated vs one-shot Bayesian Stackelberg Games 6

Repeated zero-sum games with asymmetric information L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12... B m 21 m 22 Players repeatedly play same game over sequential stages Row player = maximizer; Col player = minimizer Player s observe opponent actions (perfect monitoring) Strategy = mapping from past observations to future action probabilities Utility = Sum of stage payoffs 7

Repeated zero-sum games with asymmetric information L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12... B m 21 m 22 Asymmetric information: At start, matrix randomly selected M {M 1, M 2,..., M K } Probabilities {p 1, p 2,..., p K } Row player knows selected game ictf setting: Row = attacker with skill profile Col = defender Row action = strategy matching skill Col action = security resource allocation 8

Prior work Non-computational characterization of optimal value: [ v n+1 (p) = 1 max n + 1 min p k x T km k y + n (x 1,...,x K ) K y s k x(s)v n ( p + (p, x, s) )] Computational LP construction of optimal policies with exponential growth: (S = size of stage game matrix) S ( K Π N n=0 Sn ) Computation of optimal non-revealing value & strategies: u(p) = max x min y xt Non-computational achievability of Cav[u(p)]: ( ) p k M k y k 9

New results One-time policy improvement: Compute optimal current stage strategy subject to nonrevelation in future stages: [ ˆv n+1 (p) = 1 max n + 1 min p k x T km k y + n x(s)u ( p + (p, x, s) )] (x 1,...,x K ) K y s Perpetual policy improvement: Update beliefs and repeat. Features: Represents a middle ground between full non-revelation and optimal. Can be computed online via LP of size S K S 3. Theorem: Both one-time and perpetual policy improvement achieves Cav[u(p)]: Cav[u(p)] ˆv n (p) v n (p) Cav[u(p)] + C pk (1 p k ) n and lower bound is tight. k k 10

ictf implementation Iterations: Each stage of the repeated game is an extended time horizon Attacker implements Quasi-Belief Greedy policy (QBG) Defender allocates resources to defend services Attacker profile: Expected reward for successful attack per service: (skill level) (1 defender resource) (service value) Attacker profile is a vector of skill levels, e.g., S 0 = 0.7 S 1 = 0.2 S 3 = 0.3. S 9 = etc 11

Quasi-Belief Greedy Policy (QBG) Hidden Markov Model representation: Outputs: Unlabeled transitions, e.g., {T 1, T 2, T 3, T 5 } State-space: Products of individual states, e.g., 9 10 15 12 = 16200 Attack decisions also provide additional measurement (i.e., attack success or failure) Beliefs are probabilities of state combinations, e.g., (16200) QBG iteration: Start with set of individual mission beliefs Given unlabeled transitions, compute most likely assignment Update individual mission beliefs assuming most likely assignment Attack services with highest expected reward Renormalize beliefs given success/failure Variants: Periodic attack Thresholds on belief probabilities Optimal probing 12

ictf example BR 1 BR 2 QBG 1 23 375 QBG 2-92 69 Type I BR 1 BR 2 QBG 1-6 -28 QBG 2 128-20 Type I Two attacker types QBG i = behave as type i BR i = defense tuned to type i Attacker one-shot dominant strategy: Act according to type (i.e., no deception) Defender one-shot dominant strategy: Defend for correct type 13

ictf example: 2 stages BR 1 BR 2 QBG 1 23 375 QBG 2-92 69 Type I BR 1 BR 2 QBG 1-6 -28 QBG 2 128-20 Type I p = (0.5, 0.5); #repetitions = 2 Non-revealing strategy Stage 1: non-revealing Stage 2: non-revealing Payoff =18 Dominant strategy Stage 1: fully exploit Stage 2: fully exploit Payoff = 39 One-time policy improvement Stage 1: partially-revealing Stage 2: non-revealing Payoff = 27 Perpetual policy improvement Stage 1: partially-revealing Stage 2: fully exploit Payoff = 53 14

ictf example: Many stages BR 1 BR 2 QBG 1 23 375 QBG 2-92 69 Type I BR 1 BR 2 QBG 1-6 -28 QBG 2 128-20 Type I Dominant strategy: Payoff = 2 Fully non-revealing strategy: Payoff = 18 Policy improvement: Payoff = 23 15

ictf, HMM s, and Model reduction HMM = Markov chain + partial observation function p 22 S 2 O 1 p 21 p 42 S 1 p 41 S 4 g p 13 p 34 S 3 O 2 p 33 HMM ictf: Outputs: Unlabeled transitions, e.g., {T 1, T 2, T 3, T 5 } State-space: Products of individual states, e.g., 9 10 15 12 = 16200 Attack decisions also provide additional measurement (i.e., attack success or failure) Transitions do not depend on attack actions Beliefs are probabilities of state combinations, e.g., (16200) 16

Summary and Preview Prior results: Counterexamples to show that state aggregation need not capture exact reducibility Analysis to show reduction approach of KMD2008 captures exact reducibility Current results: Theoretical characterization of when exact reduction via aggregation works (isolated points) Application of reduction to ictf in lieu of quasi-beliefs 17

HMM statistical signature Observation process: Y = {Y t } t Z+ for particular initial condition π Probability function: p π : Y [0, 1] p π [v k... v 1 ] = Pr[Y k = v k,..., Y 1 = v 1 ], v 1,..., v k Y Equivalence notion: Y (1) p Y (2) p π (1)[v] = p π (2)[v], v Y Approximation: p π (1)[v] p π (2)[v], v Y 18

Parametric description Jump linear system analog: H Y = (1 n, M : Y R n n +, π) 1 n = (1,..., 1) T M[o k ] substochastic transition matrix corresponding to o k Y π initial distribution of X 0 M[o k ] =.......... Pr[X t+1 = s i, Y t+1 = o k X t = s j ].........., π = Pr[X 0 = s 1 ]. Pr[X 0 = s n ] Path probability formula: p π [v k... v 1 ] = 1 T n M[v k ]... M[v 1 ] π, v 1,..., v k Y 19

Reduction Problem Approximate representation: H Y = (1 n, M, π) vs HŶ = (1ˆn, ˆM, ˆπ) where ˆn < n: 1 T n M[v k ]... M[v 1 ] π 1 Ṱ n ˆM[v k ]... ˆM[v1 ] ˆπ v 1,..., v k Y Complexity reduction: M[v i ] R n n + vs. ˆM[vi ] Rˆn ˆn + Relaxation: H Y = (1 n, M, π) vs QŶ = (ĉ, Â, ˆb): 1 T n M[v k ]... M[v 1 ] π ĉ T Â[v k ]... Â[v 1 ] ˆb v 1,..., v k Y Quasi-realization = probability generators without HMM restriction 20

Reduction Process with A Priori Bound Solve Lyapunov equations to obtain gramian like quantities W c, W o Perform eigenvalue decomposition on W 1 2 c W o W 1 2 c to obtain Singular numbers controlling the error bound Projection and dilation operators that produce low order system matrices Uniform bound for any initial condition π Theorem [KMD, 2008]: Worst case bound applies to reduction of HMM s (procedure may produce quasi-realizations) v Y (p π [v] pˆπ [v]) 2 d H (H, Ĥ) = 2(σˆn+1 +... + σ n ) 21

Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 22

Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 23

Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 24

Model reduction on CTF E286 2 = (p π [v] ˆpˆπ [v]) 2 (0.0028) 2 7.84 10 6 v Y Bound valid for any initial condition π 1331, sum is over the whole language Y = Y 1 = 680, Y 2 = 680 2 = 462400, Y 3 = 680 3 31.4 10 6,... Assuming a uniform distribution of a constant relative error 7.3% error per string. 25

Evaluation of reduced order model Predictive capability: Approximate family of conditional distributions p π [v k v k 1... v 1 ] ˆpˆπ [v k v k 1... v 1 ] Observed history: v k 1... v 1 Y k 1 Predict or generate next symbol: v k Y Bound relevancy: p π [v k v k 1... v 1 v o ] p π [v k 1... v 1 v o ] = ˆpˆπ[v k v k 1... v 1 v o ] ˆpˆπ [v k 1... v 1 v o ] Recursive implementation: where H k = M[v k ] H k 1, H 0 = π. p π [v k v k 1... v 1 ] = 1T n M[v k ] H k 1 1 T n H k 1 26

Evaluation of reduced order model Convergence of conditional distribution for 2 particular histories Notice threshold phenomenon 27

Decision Problem Now reduction algorithm is utilized in a decision setting involving the ictf setting Quantity of interest is state beliefs: H k = M[v k ] H k 1, H 0 = π Greedy strategy: Attacker has belief on the network state say π t n Observes aggregate output, say y t+1 = {T 1, T 5, T 10 } and updates the belief to π t+1 = M[y t+1]π t 1 T nm[y t+1 ]π t = H t+1 1 T nh t+1 Based on π t+1 attacker chooses which services to interrupt (greedy optimization), observes the payoff, and renormalizes his belief... repeat 28

Relevance of reduction Reduced order model parameters: c T = 1 T n V A[y] = U M[y] V, b = U π y Y where U Rˆn n, V R n ˆn. Approximate case: Belief vector lives in a lower dimensional space. where Ĥt+1 c T Ĥ t+1 ˆπ t+1 = V Rˆn Reduction guarantees that: A[y t+1 ] ˆπ t c T A[y t+1 ] ˆπ t = V Ĥ t+1 c T Ĥ t+1 π t+1 = H t+1 1 T nh t+1 R n 1 T n H t+1 c T Ĥ t+1 = 1 T n V Ĥt+1 29

Belief vector approximation Convergence of sufficient statistic for a particular history Percent decision agreement 96% 30

Concluding remarks Recap: Asymmetric information repeated games Reduced order models for hidden Markov models ictf 2010 implementation Future work: Strategies for non-informed player (Stackelberg, Nash, etc) Generalization of partial non-revelation Optimal probing formulation Model reduction for decision problems 31