Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning

Size: px

Start display at page:

Download "Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning"

Rudolph Maxwell
6 years ago
Views:

1 Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning João P. Hespanha Kyriakos G. Vamvoudakis

Cyber Situation Awareness Framework Mission

Observations: Netflow, Probing, Time analysis

Analysis to determine dependencies between

Characterize Attackers Data Mission Model Data

Sensor Alerts Correlation Engine Data Impact

2 Cyber Situation Awareness Framework Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Data Mission Model Data Cyber-Assets Model Predict Future Actions COAs Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status

3 Outline... Large matrix games (summary of results) Multi-agent learning under cyber-attack using Q-learning (summary of results) Integration of online optimization for real-time attack prediction and visualization (summary of results) Observability of dynamical systems under attacks to sensors (summary and new results)

Q-learning (summary of results) Integration of online optimization for real-time attack

4 Outline... Large matrix games (summary of results) Multi-agent learning under cyber-attack using Q-learning (summary of results) Integration of online optimization for real-time attack prediction and visualization (summary of results) Observability of dynamical systems under attacks to sensors (summary and new results) Bopardikar- UTRC Prandini-Milano Poly. Milano

5 Network Security Games intrusion detection system chat software web proxy attack graph from [J. Wing 2007] intruder s target! sequence of intruder actions that compromise database server, not detected by IDS Even trivially small network security games can lead to games with very large decision trees Problem statistics of ictf 2010 over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about distinct policies attacker can choose among distinct policies, depending on attacker's level of expertise

6 attack graph from [J. Wing 2007] Network Security Games intrusion detection system Developed sample-based approach to solving zero-sum games Approach provides probabilistic guarantees on the performance of the policies (in terms of security levels) Results applicable to very general classes of games that can include stochastic actions, partial information, etc. chat software web proxy intruder s target! sequence of intruder actions that compromise database server, not detected by IDS Even trivially small network security games can lead to games with very large decision trees Problem statistics of ictf 2010 over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about distinct policies attacker can choose among distinct policies, depending on attacker's level of expertise

7 Application to ictf 2010 services S6, S2 We were able to Provide Cyber-security Litya office receives estimates avg. 314 of units mission for completion success of all 4 missions Provide dynamic rules to control firewall Take into account the effect of attacks & counter measures services S4, S5 Response can be a function of attacker sophistication Play what-if scenarios (vulnerabilities, information, etc.) services S3, S7 service S1 Increasing level of attacker sophistication services S2, S3 Level of attacker sophistication # units received by Litya for 1 round of missions [Option I, no bribes] services S0, S2 service S8 service S6 # units received by Litya for 1 round of missions [Option I, with bribes] service S9 # units received by Litya for 1 round of missions [Option II, with bribes] no service vulnerable (baseline) services S3, S9 S2 services S0, S1 service S2 (vulnerable to 38 teams) S2, S6, S9 (vulnerable to at least 6 team) S0, S2, S4, S6, S7, S8, S9 services S3, S8 service S1 service S0 (vulnerable to at least 1 team) all services vulnerable

8 Outline... Large matrix games (summary of results) Multi-agent learning under cyber-attack using Q-learning (summary of results) Integration of online optimization for real-time attack prediction and visualization (summary of results) Observability of dynamical systems under attacks to sensors (summary and new results)

Resilient Cyber-Mission Architectures In complex cyber

elements automate processes of distributed resource

What is the impact of attacks on this type of

9 Resilient Cyber-Mission Architectures In complex cyber missions, human operators define policies and rules computing elements automate processes of distributed resource allocation, scheduling, inventory management, etc. What is the impact of attacks on this type of automated/optimization process? Can we devise algorithms with built-in attack prediction/awareness capabilities?

10 Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication 2 nd order adjustment rule x i pk ` 1q x i pkq` x i pk ` 1q value at processor i, iteration k x i pkq x i pkq`u i ` v i adjustment on x i by processor i, at iteration k Goal: minimize errors between values of agents and their neighbors e i ÿ xi x j x i x j peers of agent i jpn i (self-included) Attacker: maximize errors using stealth attacks (small v i ) correct update on adjustment update on adjustment by attacker

11 Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication 2 nd order adjustment rule x i pk ` 1q x i pkq` x i pk ` 1q value at processor i, iteration k x i pkq x i pkq`u i ` v i correct update on adjustment update on adjustment by attacker adjustment on x i by processor i, at iteration k Nash equilibrium formulation: e i ÿ xi x j x i x j jpn i J i ÿ ˆ }e i } 2 ` ÿ }u 2 j } 2 2ij}v j } k jpn error min. by us max. by attacker our updates (small means smooth) min. by us max. by attacker attacker updates (small means stealth) max. by us min. by attacker

12 Optimal Solution Bellman Equation Optimal Control and Attacker Policies number of peers u i d i 0 I BV i Be i Under appropriate regularity assumptions (smoothness) J i pu i,u i,v q J i pu i,u i,v q J i pu, v i,v iq J i pu, v i,v i u i * is optimal (minimal) for us v i * is optimal (maximal) for attacker Moreover, Consensus will be reached asymptotically All variables will remain bounded through the transient (in fact, Lyapunov stability)

Optimal Solution Bellman Equation Optimal Control and Attacker Policies number of peers But Bellman equation difficult BV i to solve (curse of dimensionality) u i d i 0 I Be i Approach: Under Machine

13 Optimal Solution Bellman Equation Optimal Control and Attacker Policies number of peers But Bellman equation difficult BV i to solve (curse of dimensionality) u i d i 0 I Be i Approach: Under Machine appropriate learning regularity based algorithm assumptions to solve (smoothness) this distributed consensus problem J i u i * is optimal (minimal) for us pu i,u i,v q J i pu i,u i,v Key contributions i JApplies i pu, v i,v iq to second-order J i pu, vupdates i,v iq (and@v even i more v i * is complex optimal (maximal) dynamics) for attacker Algorithms do not require global knowledge of the communication graph Algorithms do not require knowledge of the update rules used by other agents Moreover, Formal guarantees of correctness (convergence) Consensus will be reached asymptotically All variables will remain bounded through the transient (in fact, Lyapunov stability)

Actor/Critic Learning Approach Critic = Model free (distributed) algorithm to evaluate the current algorithm & estimate attacker actions Actor = Model free (distributed) algorithm to

14 Actor/Critic Learning Approach Critic = Model free (distributed) algorithm to evaluate the current algorithm & estimate attacker actions Actor = Model free (distributed) algorithm to enact optimal decisions (based on critic s findings) Actor & Critic based on Approximate Dynamic Programming: Critic learns Q-function (action dependent) & Actor learns optimal control laws

optimization for real-time attack prediction and

15 Outline... Large matrix games (summary of results) Multi-agent learning under cyber-attack using Q-learning (summary of results) Integration of online optimization for real-time attack prediction and visualization (summary of results) Observability of dynamical systems under attacks to sensors (summary and new results)

Cyber Missions Complexity Challenges to real-time cyber-mission protection: cyber assets shared among missions cyber asset requirements change over time missions can use different configurations of

16 Cyber Missions Complexity Challenges to real-time cyber-mission protection: cyber assets shared among missions cyber asset requirements change over time missions can use different configurations of resources complex network of cyber-asset dependencies Mission 1 services S0, S1 services S5, S6 service S9 services S0, S1 services S5, S6, S7 service S6 Attack on service S0 can result in multiple mission failure But, damage only realized if missions follows particular paths Mission 2 services S0, S2 service S2 service S ICTF Cyber Awareness Questions: When & where is an attacker most likely to strike? When & where is an attacker most damaging to mission completion? How will the answer depend on attacker resources? attacker skills? attacker knowledge? (real-time what-if analysis)

17 Mapping Service Attacks to Mission Damage Developed an optimization formalism to predict (most likely/damaging) attacks Damage equation: (for service s at time t) Uncertainty equation: (for service s at time t) PD s t «a s t ` b s t AR s t Potential damage probability of realizing damage attack resources p s t «r0,1s`cs t d s t AR s t attack resources equation parameters vary with time as mission progresses (learned from data in ictf exercises) Optimal attacks: maximize total damage to mission constrained by: ÿ ARt s TR s total attack resources at time t Formalism and predictions validated in ICTF 2011 exercise (89 teams, participants)

Multi-Resolution Visualization Enabling a multi-resolution attack

High-level attack predictions based on online optimization 2.

18 Multi-Resolution Visualization Enabling a multi-resolution attack analysis 1. High-level attack predictions based on online optimization 2. Potential damage & uncertainty associated with attacks to different services 3. Parameters that determine damage and uncertainty AlloSphere Integration High-level predictions permit fast action Low-level parameters permits investigating rationale for predictions

19 Outline... Large matrix games (summary of results) Multi-agent learning under cyber-attack using Q-learning (summary of results) Integration of online optimization for real-time attack prediction and visualization (summary of results) Observability of dynamical systems under attacks to sensors (summary and new results) Sinopoli-CMU Y. Mo-Caltech

Detection in Adversarial Environments How to interpret & access the reliability of sensors that have been manipulated? Sensors relevant to cyber missions? Measurement sensors (e.g.

20 Detection in Adversarial Environments How to interpret & access the reliability of sensors that have been manipulated? Sensors relevant to cyber missions? Measurement sensors (e.g., SCADA systems) Computational sensors (e.g., weather forecasting simulation engines) Data retrieval sensors (e.g., database queries) Cyber-security sensors (e.g., IDSs) Sensor-error Domains Deterministic sensors: with n sensors, one can get correct answer as long as m < n/2 sensors have been manipulated Stochastic sensors without manipulation: solution given by hypothesis testing/estimation Ppsensor errorq p err ñ Ppn sensor errorq «` n n{2 Stochastic sensors with potential manipulation: open problem? pn{2 err

21 Problem formulation X binary random variable to be estimated PpX 0q PpX 1q 1 2 for simplicity (papers treat general case) Y 1, Y 2,, Y n noisy measurements of X produced by n sensors PpY i Xq p err PpY i Xq 1 p per-sensor error probability (not necessarily very small) Z 1, Z 2,, Z n measurements actually reported by the n sensors # Y i sensor i not attacked Z i? sensor i attacked at most m sensors attacked p attack probability that we are under attack (very hard to know!) interpretation of sensor data should be mostly independent of p attack

22 Theorem: Result for small # of sensors (n<2/p err ) X binary random variable to be estimated Y 1, Y 2,, Y n noisy measurements of X produced by n sensors Z 1, Z 2,, Z n measurements # actually reported by the n sensors Y i sensor i not attacked Z i? sensor i attacked at most m sensors attacked p attack probability that we are under attack (very hard to know!) The optimal estimator is go with the majority of the (potentially manipulated) sensor readings # µ majority w.p. 1 y 2 µ no consensus w.p. y 2, y 2 # r0,1s go with the majority, EXCEPT if there is consensus p1 p err q ÿ ˆ n 1 2ÿ k m n 1 2ÿ k 0 p ˆn m p n k err p1 p err q k m k m ˆn m k 1 p attack p attack n m n m`perr q p n m k err p1 p err q k `p1 perr q n p n err p n m err 0 p n m err The optimal estimator is largely independent of p attack (hard to know)

23 Result for small # of sensors (n<2/p err ) X binary random variable to be estimated Y 1, Y 2,, Y n noisy measurements of X produced by n sensors Z 1, Z 2,, Z n measurements # actually reported by the n sensors Y i sensor i not attacked Z i? sensor i attacked What about the at estimation most m sensors of time-varying attacked variables: e.g., the state of a mission! p attack probability that we are under attack (very hard to know!) Previously Theorem: X constant binary random variable to be estimated The optimal estimator is Now go with the majority of the (potentially manipulated) sensor readings # X(t) µ majority time-varying w.p. state 1 y 2 variable to be estimated, # r0,1s n m y 2 p1 p based p err qn m`p on n m err err µ no consensus 1. sensor w.p. measurements y 2, that may 0 have been manipulated p n m 2. system dynamics E.g., the state of a cyber mission go with the majority, EXCEPT if there is consensus The optimal estimator is largely independent of p attack (hard to know) ÿ ˆ n 1 2ÿ k m n 1 2ÿ k 0 p ˆn m p n k err p1 p err q k m k m ˆn m k 1 p attack p attack q p n m k err p1 p err q k `p1 perr q n p n err err

24 Continuous Linear Systems dynamical evolution of systems s state ẋ = Ax + Bu control signals y i = C i x + D i u, i 2{1,...,N} ( y i sensor i not attacked z i =? sensor i attacked N measurements produced by sensor N measurements reported by sensor at most M sensors can be manipulated by the attackers Under what conditions can one reconstruct the state from (potentially corrupted) sensor measurements?

25 Continuous Linear Systems dynamical evolution of systems s state ẋ = Ax + Bu control signals y i = C i x + D i u, i 2{1,...,N} ( y i sensor i not attacked z i =? sensor i attacked N measurements produced by sensor N measurements reported by sensor at most M sensors can be manipulated by the attackers Theorem: Exact state reconstruction is possible if and only if system is observable through every subset of N - 2M measurements ê state could be reconstructed through only N - 2M measurements in the absence of attacks ê potential attack at M sensors, effectively disables 2M sensors

26 Estimation algorithms Gramian-based estimator: batch, finite-time estimation inversion of the observability matrix at each time step Observer-based estimator: asymptotic estimation recursive low-computation algorithm provably robust with respect to noise on all sensors (including non attacked ones) Algorithm outline: 1. Build an estimate removing by ignoring a set S of M sensors 2. Build additional estimates by removing, in addition, all combinations of M additional sensors 3. If all attacked sensors were in set S, then the estimates in steps 1. and 2. will be consistent (modulo noise) (all estimates can be constructed without combinatorial complexity, by using finite dimensionality)

27 Discrete Event Systems - Background alphabet: c Y u language: L Ä supervisor: f : L Ñ sets with (at least) all uncontrolled symbols language controller by supervisor: w P L f ô w P L f,w P L, P fpwq Theorem. There exists a supervisor f such that L f K i K is controllable K u X L Ä K observation map: P : Ñp o Yt uq observation alphabet P -supervisor: g : P plq Ñ language controller by P-supervisor: w P L g ô w P L g,w P L, P g`p pwq Theorem. There exists a P -supervisor g such that L g K i K is controllable and P -observable P pwq P pw 1 q ñ E P : w P K, w 1 P LzK or w P LzK, w 1 P K

28 Supervised DES under attacks language observation map P-supervisor language observation map P-supervisor attack A 1 attack A 2 attack A n Attack model: m out of n attacks active (which?) each attack symbol distortion, erasure, insertion nondeterminism A i : o Ñ 2 o

29 Supervised DES under attacks language observation map P-supervisor attack A 1 attack A 2 attack A n Attack model: m out of n attacks active (which?) each attack symbol distortion, erasure, insertion nondeterminism A i : o Ñ 2 o maximal language L max g,a controlled by P -supervisor g under the attack A: w P L max g,a ô w P L max g,a,w P L, Dy P AP pwq, P gpyq, set of observed symbols minimal language L max g,a w controlled by P -supervisor g under the attack A: P L max g,a ô w P L max g,a,w P P AP pwq, P gpyq, languages with largest/smallest number of word that attacker could enforce

30 Supervised DES under attacks language observation map P-supervisor attack A 1 attack A 2 attack A n Attack model: m out of n attacks active (which?) each attack symbol distortion, erasure, insertion nondeterminism A i : o Ñ 2 o Theorem. There exists a P -supervisor g such that L min g,a Lmax g,a K i K is controllable and P -observable for set of attacks A DA, A 1 P A AP pwqxa 1 P pw 1 q H ñ E P : w P K, w 1 P LzK or w P LzK, w 1 P K.

31 Output-symbol attacks language observation map P-supervisor attack A 1 attack A 2 attack A n Attack model: Each output symbol produced by one sensor One sensor could be manipulated (which?) Sensor manipulation: symbol erasure, insertion nondeterminism A i = arbitrary insertions/deletions of i-th output symbol Theorem. There exists a P -supervisor g such that L min g,a Lmax g,a K is controllable and P j P o K i remove from output string the symbols i & j potential attack at 1 sensor, effectively disables 2 sensor

32 Summary... Large matrix games (summary of results) Multi-agent learning under cyber-attack using Q-learning (summary of results) Integration of online optimization for real-time attack prediction and visualization (summary of results) Observability of dynamical systems under attacks to sensors (summary and new results)

Cyber Security Games with Asymmetric Information

Cyber Security Games with Asymmetric Information Jeff S. Shamma Georgia Institute of Technology Joint work with Georgios Kotsalis & Malachi Jones ARO MURI Annual Review November 15, 2012 Research Thrust: