Cyber-Awareness and Games of Incomplete Information Jeff S Shamma Georgia Institute of Technology ARO/MURI Annual Review August 23 24, 2010
Preview Game theoretic modeling formalisms Main issue: Information exploitation vs revelation 1
Example: Network monitoring Players & strategies: Administrator: {Monitor, Not Monitor} Attacker: {Attack, Not Attack} Preferences/utility function: where w = value of asset c f = cost of failed attack c a = cost to execute attack c m = cost to monitor Note: Not zero-sum M NM A c f c a, w c m w c a, 0 NA 0, w c m 0, w 2
Elements Players (actors, agents): P = {1, 2,..., p} Strategies (choices): Individual: Collective: Preferences, expressed as utility function: s i S i (s 1,..., s p ) S = S 1... S p u i : S R s i s u i (s) u i (s ) Essential feature: Preferences over collective strategies: max s i S i u i (s i ) vs max s i S i u i (s i, s i ) 3
Example: Network monitoring 1 Setup: External world (E), Web server (W), File server (F), Workstation (N) States: Software: ftpd, httpd, nfsd, process, sniffer, virus Flags: User account compromised & data compromised 4 Traffic levels per edge Number of states billion Actions-per-state Attacker: {Attack-httpd, Attack-ftpd, Install-sniffer,...} Administrator: {Remove-account, Restart-ftpd, Install-sniffer-detector,...} 1 Source: Lye & Wing, Game strategies in network security, Int J Inf Secur, 2005. 4
Dynamic network monitoring, cont Dynamics: State/action dependent transition probabilities Transition dependent rewards/costs Stochastic Markov game: Stategy = state dependent action rules Preferences = Expected future discounted rewards/costs Compare: M NM A c f c a, w c m w c a, 0 NA 0, w c m 0, w 5
Solution concepts: Descriptive & prescriptive Case I: The strategy profile s is a Nash equilibrium if for every player i, u i (s ) = u i (s i, s i) u i (s i, s i) Idea: No player has a unilateral incentive to change action Case II: The strategy profile s i is a (weakly) dominant strategy if for all s i: u i (s i, s i ) u i (s i, s i ) Idea: s i is always optimal, e.g., A 0, 0 1, 1 B 0, 0 1, 1 Case III: The strategy profile s i is a security strategy if Idea: Select s sec i s sec i = arg max s i to maximize guaranteed utility min s i u i (s i, s i ) 6
NE informational requirements Introduce mixed strategies Pr [A] = p & Pr [NA] = 1 p Pr [M] = q & Pr [NM] = 1 q M NM A c f c a, w c m w c a, 0 NA 0, w c m 0, w Restate preferences as expected utility NE: Solve (p, q) Implication: w c m = (1 p) w q ( c f c a ) + (1 q) (w c a ) = 0 Specific probabilities depend on knowledge of environment (opponent s utilities)...unlike security strategies 7
Uncertain environments: Static case Example 2 : M NM A c f c a, w c m w c a, 0 NA 0, w c m 0, w Malicious System user knows own type M NA 0, w c m NA 0, w c m Administrator receives signals (e.g,. {G, Y, R}) and forms beliefs G Pr [Malicious = 0.05] Y Pr [Malicious = 0.25] R Pr [Malicious = 0.8] Normal Can introduce uncertainty to either or both players (e.g., honey pot or not ) NM 0, w 0, w 2 Source: Liu et al., ayesian game approach for intrusion detection in wireless ad hoc networks, GameNets, 2006. 8
Uncertain environments: Dynamic case Setup: Multiple states Action dependent state transition probabilities Each player has correlated observations about state Strategy: Mapping from private history to actions Uncertainty sources & implications: State values Exploitation vs revelation Opponent actions Beliefs (of beliefs...) 9
Special case: Repeated zero-sum game Setup: A m 11 m 12 B m 21 m 22 A m 11 m 12 B m 21 m 22 A m 11 m 12... B m 21 m 22 Players repeatedly play same game over sequential stages Row player = maximizer; Col player = minimizer Player s observe opponent actions (perfect monitoring) Strategy = mapping from past observations to future action probabilities Utility = [averaged,discounted][finite,infinite] sum of stage payoffs Issues: Asymmetric information Computation 10
Example Setup: A 0, 0 1, 1 B 0, 0 1, 1 Administrator (row) knows state (allowed behavior) Attacker (col) has probabilistic beliefs over state α A 1, 1 0, 0 B 1, 1 0, 0 Note: Row has state-dependent dominant strategies Nash Equilibrium Attacker s belief (common knowledge): ( 0.6 0.4 ) Administrator does not use dominant strategy: (oblivious to col) day α β ( ) ( ) 0 0.62 0.38 0.32 0.68 1 β 11
Example, cont A 0, 0 1, 1 B 0, 0 1, 1 Day 1: Plays according to prior beliefs: col(1) = A α A 1, 1 0, 0 B 1, 1 0, 0 Day 2: Based on Admin strategy, attacker builds posterior beliefs row(1) posterior col(2) ( ) A ( 0.75 0.25 ) A B 0.45 0.55 B β 12
Non-revealing strategies Issue: Growth of beliefs for fixed prior: (#actions) (#stages) Non-revealing strategy: Informed player uses state-independent probabilities Consequence: Constant beliefs by uniformed player Ease of computation Suboptimal Question: When are non-revealing strategies optimal? Setup: State-dependent payoff matrices M 1, M 2,..., M K Prior belief p 1, p 2,..., p K 13
Non-revealing strategies, cont One-shot game values: v 1 = max min ( x 1 x 2... x K) T x 1,x 2,...,x K y p 1 M 1 p 2 M 2. y p K M K v NR = max min x T NR (p 1 M 1 + p 2 M 2 +... + p K M K ) y x NR y Claim: A non-revealing strategy is if optimal if and only if v 1 = v NR If so, the optimal strategy is stationary (i.e., stage independent). 14
Non-revelation example Setup: A 0 1 B 0 1/2 Administrator configures spare server for A or B Attacker targets A or B Utility is quality of service State is legitimate user activity with prior ( 0.2 0.8 ) α A 1/2 0 B 1 0 β Utility of dominant strategy: 0.2 #stages 15
Non-revelation example, cont A 0 1 B 0 1/2 Non-revelation one-shot game: ( ) 0 1 0.2 + 0.8 0 1/2 Strategies: Administrator: Always A Attacker: Always B Payoff: α A 1/2 0 B 1 0 ( ) 1/2 0 1 0 = β ( ) 0.4 0.2 0.8 0.1 0.2 vs 0.2 #stages 16
Future directions Broader classes of computable strategies ( almost non-revealing) Robustness to unknown landscapes (prior beliefs, non-zero-sum payoffs) Role of adaptation/learning Imperfect monitoring 17