Cyber Security Games with Asymmetric Information

Cyber Security Games with Asymmetric Information Jeff S. Shamma Georgia Institute of Technology Joint work with Georgios Kotsalis & Malachi Jones ARO MURI Annual Review November 15, 2012

Research Thrust: Obtaining Actionable Cyber-Attack Forecasts Develop adversary behavior models to help predict the effects of future attacks that might be launched to prevent successful mission completion. Asymmetric information repeated games Reduced order models for hidden Markov models ictf 2010 implementation 1

Project Architecture Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Data Mission Model Cyber-Assets Model COAs Data Predict Future Actions Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status 8 2

Project Architecture Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Mission Model Cyber-Assets Model Predict Future Actions COAs Data Data Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status 8 3

ictf 2010 Setting: Sequential execution of service-based missions Game: Service vulnerability vs Counter measures Modeling: Stochastic automata (Petri Nets) States associated with services Labeled transitions 4

ictf Attack Attacker: Observes state transition signals Decides which services to attack Rewards/penalties for attacking correct/incorrect services Complications: Parallel mission signal ambiguity Product state space 5

Asymmetric information games Motivation: One player has superior information Issue: Maximize effectiveness while minimizing observability and hence vulnerability Revelation vs Exploitation Context: Worst-case guarantee vs low probability surprise Repeated vs one-shot Bayesian Stackelberg Games 6

Repeated zero-sum games with asymmetric information L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12... B m 21 m 22 Players repeatedly play same game over sequential stages Row player = maximizer; Col player = minimizer Player s observe opponent actions (perfect monitoring) Strategy = mapping from past observations to future action probabilities Utility = Sum of stage payoffs 7

Repeated zero-sum games with asymmetric information L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12... B m 21 m 22 Asymmetric information: At start, matrix randomly selected M {M 1, M 2,..., M K } Probabilities {p 1, p 2,..., p K } Row player knows selected game ictf setting: Row = attacker with skill profile Col = defender Row action = strategy matching skill Col action = security resource allocation 8

Prior work Non-computational characterization of optimal value: [ v n+1 (p) = 1 max n + 1 min p k x T km k y + n (x 1,...,x K ) K y s k x(s)v n ( p + (p, x, s) )] Computational LP construction of optimal policies with exponential growth: (S = size of stage game matrix) S ( K Π N n=0 Sn ) Computation of optimal non-revealing value & strategies: u(p) = max x min y xt Non-computational achievability of Cav[u(p)]: ( ) p k M k y k 9

New results One-time policy improvement: Compute optimal current stage strategy subject to nonrevelation in future stages: [ ˆv n+1 (p) = 1 max n + 1 min p k x T km k y + n x(s)u ( p + (p, x, s) )] (x 1,...,x K ) K y s Perpetual policy improvement: Update beliefs and repeat. Features: Represents a middle ground between full non-revelation and optimal. Can be computed online via LP of size S K S 3. Theorem: Both one-time and perpetual policy improvement achieves Cav[u(p)]: Cav[u(p)] ˆv n (p) v n (p) Cav[u(p)] + C pk (1 p k ) n and lower bound is tight. k k 10

ictf implementation Iterations: Each stage of the repeated game is an extended time horizon Attacker implements Quasi-Belief Greedy policy (QBG) Defender allocates resources to defend services Attacker profile: Expected reward for successful attack per service: (skill level) (1 defender resource) (service value) Attacker profile is a vector of skill levels, e.g., S 0 = 0.7 S 1 = 0.2 S 3 = 0.3. S 9 = etc 11

Quasi-Belief Greedy Policy (QBG) Hidden Markov Model representation: Outputs: Unlabeled transitions, e.g., {T 1, T 2, T 3, T 5 } State-space: Products of individual states, e.g., 9 10 15 12 = 16200 Attack decisions also provide additional measurement (i.e., attack success or failure) Beliefs are probabilities of state combinations, e.g., (16200) QBG iteration: Start with set of individual mission beliefs Given unlabeled transitions, compute most likely assignment Update individual mission beliefs assuming most likely assignment Attack services with highest expected reward Renormalize beliefs given success/failure Variants: Periodic attack Thresholds on belief probabilities Optimal probing 12

ictf example BR 1 BR 2 QBG 1 23 375 QBG 2-92 69 Type I BR 1 BR 2 QBG 1-6 -28 QBG 2 128-20 Type I Two attacker types QBG i = behave as type i BR i = defense tuned to type i Attacker one-shot dominant strategy: Act according to type (i.e., no deception) Defender one-shot dominant strategy: Defend for correct type 13

ictf example: 2 stages BR 1 BR 2 QBG 1 23 375 QBG 2-92 69 Type I BR 1 BR 2 QBG 1-6 -28 QBG 2 128-20 Type I p = (0.5, 0.5); #repetitions = 2 Non-revealing strategy Stage 1: non-revealing Stage 2: non-revealing Payoff =18 Dominant strategy Stage 1: fully exploit Stage 2: fully exploit Payoff = 39 One-time policy improvement Stage 1: partially-revealing Stage 2: non-revealing Payoff = 27 Perpetual policy improvement Stage 1: partially-revealing Stage 2: fully exploit Payoff = 53 14

ictf example: Many stages BR 1 BR 2 QBG 1 23 375 QBG 2-92 69 Type I BR 1 BR 2 QBG 1-6 -28 QBG 2 128-20 Type I Dominant strategy: Payoff = 2 Fully non-revealing strategy: Payoff = 18 Policy improvement: Payoff = 23 15

ictf, HMM s, and Model reduction HMM = Markov chain + partial observation function p 22 S 2 O 1 p 21 p 42 S 1 p 41 S 4 g p 13 p 34 S 3 O 2 p 33 HMM ictf: Outputs: Unlabeled transitions, e.g., {T 1, T 2, T 3, T 5 } State-space: Products of individual states, e.g., 9 10 15 12 = 16200 Attack decisions also provide additional measurement (i.e., attack success or failure) Transitions do not depend on attack actions Beliefs are probabilities of state combinations, e.g., (16200) 16

Summary and Preview Prior results: Counterexamples to show that state aggregation need not capture exact reducibility Analysis to show reduction approach of KMD2008 captures exact reducibility Current results: Theoretical characterization of when exact reduction via aggregation works (isolated points) Application of reduction to ictf in lieu of quasi-beliefs 17

HMM statistical signature Observation process: Y = {Y t } t Z+ for particular initial condition π Probability function: p π : Y [0, 1] p π [v k... v 1 ] = Pr[Y k = v k,..., Y 1 = v 1 ], v 1,..., v k Y Equivalence notion: Y (1) p Y (2) p π (1)[v] = p π (2)[v], v Y Approximation: p π (1)[v] p π (2)[v], v Y 18

Parametric description Jump linear system analog: H Y = (1 n, M : Y R n n +, π) 1 n = (1,..., 1) T M[o k ] substochastic transition matrix corresponding to o k Y π initial distribution of X 0 M[o k ] =.......... Pr[X t+1 = s i, Y t+1 = o k X t = s j ].........., π = Pr[X 0 = s 1 ]. Pr[X 0 = s n ] Path probability formula: p π [v k... v 1 ] = 1 T n M[v k ]... M[v 1 ] π, v 1,..., v k Y 19

Reduction Problem Approximate representation: H Y = (1 n, M, π) vs HŶ = (1ˆn, ˆM, ˆπ) where ˆn < n: 1 T n M[v k ]... M[v 1 ] π 1 Ṱ n ˆM[v k ]... ˆM[v1 ] ˆπ v 1,..., v k Y Complexity reduction: M[v i ] R n n + vs. ˆM[vi ] Rˆn ˆn + Relaxation: H Y = (1 n, M, π) vs QŶ = (ĉ, Â, ˆb): 1 T n M[v k ]... M[v 1 ] π ĉ T Â[v k ]... Â[v 1 ] ˆb v 1,..., v k Y Quasi-realization = probability generators without HMM restriction 20

Reduction Process with A Priori Bound Solve Lyapunov equations to obtain gramian like quantities W c, W o Perform eigenvalue decomposition on W 1 2 c W o W 1 2 c to obtain Singular numbers controlling the error bound Projection and dilation operators that produce low order system matrices Uniform bound for any initial condition π Theorem [KMD, 2008]: Worst case bound applies to reduction of HMM s (procedure may produce quasi-realizations) v Y (p π [v] pˆπ [v]) 2 d H (H, Ĥ) = 2(σˆn+1 +... + σ n ) 21

Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 22

Model reduction on CTF E286 2 = (p π [v] ˆpˆπ [v]) 2 (0.0028) 2 7.84 10 6 v Y Bound valid for any initial condition π 1331, sum is over the whole language Y = Y 1 = 680, Y 2 = 680 2 = 462400, Y 3 = 680 3 31.4 10 6,... Assuming a uniform distribution of a constant relative error 7.3% error per string. 25

Evaluation of reduced order model Predictive capability: Approximate family of conditional distributions p π [v k v k 1... v 1 ] ˆpˆπ [v k v k 1... v 1 ] Observed history: v k 1... v 1 Y k 1 Predict or generate next symbol: v k Y Bound relevancy: p π [v k v k 1... v 1 v o ] p π [v k 1... v 1 v o ] = ˆpˆπ[v k v k 1... v 1 v o ] ˆpˆπ [v k 1... v 1 v o ] Recursive implementation: where H k = M[v k ] H k 1, H 0 = π. p π [v k v k 1... v 1 ] = 1T n M[v k ] H k 1 1 T n H k 1 26

Evaluation of reduced order model Convergence of conditional distribution for 2 particular histories Notice threshold phenomenon 27

Decision Problem Now reduction algorithm is utilized in a decision setting involving the ictf setting Quantity of interest is state beliefs: H k = M[v k ] H k 1, H 0 = π Greedy strategy: Attacker has belief on the network state say π t n Observes aggregate output, say y t+1 = {T 1, T 5, T 10 } and updates the belief to π t+1 = M[y t+1]π t 1 T nm[y t+1 ]π t = H t+1 1 T nh t+1 Based on π t+1 attacker chooses which services to interrupt (greedy optimization), observes the payoff, and renormalizes his belief... repeat 28

Relevance of reduction Reduced order model parameters: c T = 1 T n V A[y] = U M[y] V, b = U π y Y where U Rˆn n, V R n ˆn. Approximate case: Belief vector lives in a lower dimensional space. where Ĥt+1 c T Ĥ t+1 ˆπ t+1 = V Rˆn Reduction guarantees that: A[y t+1 ] ˆπ t c T A[y t+1 ] ˆπ t = V Ĥ t+1 c T Ĥ t+1 π t+1 = H t+1 1 T nh t+1 R n 1 T n H t+1 c T Ĥ t+1 = 1 T n V Ĥt+1 29

Belief vector approximation Convergence of sufficient statistic for a particular history Percent decision agreement 96% 30

Concluding remarks Recap: Asymmetric information repeated games Reduced order models for hidden Markov models ictf 2010 implementation Future work: Strategies for non-informed player (Stackelberg, Nash, etc) Generalization of partial non-revelation Optimal probing formulation Model reduction for decision problems 31