Cyber Security Games with Asymmetric Information

Size: px
Start display at page:

Download "Cyber Security Games with Asymmetric Information"

Transcription

1 Cyber Security Games with Asymmetric Information Jeff S. Shamma Georgia Institute of Technology Joint work with Georgios Kotsalis & Malachi Jones ARO MURI Annual Review November 15, 2012

2 Research Thrust: Obtaining Actionable Cyber-Attack Forecasts Develop adversary behavior models to help predict the effects of future attacks that might be launched to prevent successful mission completion. Asymmetric information repeated games Reduced order models for hidden Markov models ictf 2010 implementation 1

3 Project Architecture Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Data Mission Model Cyber-Assets Model COAs Data Predict Future Actions Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status 8 2

4 Project Architecture Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Mission Model Cyber-Assets Model Predict Future Actions COAs Data Data Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status 8 3

5 ictf 2010 Setting: Sequential execution of service-based missions Game: Service vulnerability vs Counter measures Modeling: Stochastic automata (Petri Nets) States associated with services Labeled transitions 4

6 ictf Attack Attacker: Observes state transition signals Decides which services to attack Rewards/penalties for attacking correct/incorrect services Complications: Parallel mission signal ambiguity Product state space 5

7 Asymmetric information games Motivation: One player has superior information Issue: Maximize effectiveness while minimizing observability and hence vulnerability Revelation vs Exploitation Context: Worst-case guarantee vs low probability surprise Repeated vs one-shot Bayesian Stackelberg Games 6

8 Repeated zero-sum games with asymmetric information L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12 B m 21 m 22 L R T m 11 m B m 21 m 22 Players repeatedly play same game over sequential stages Row player = maximizer; Col player = minimizer Player s observe opponent actions (perfect monitoring) Strategy = mapping from past observations to future action probabilities Utility = Sum of stage payoffs 7

9 Repeated zero-sum games with asymmetric information L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12 B m 21 m 22 L R T m 11 m B m 21 m 22 Asymmetric information: At start, matrix randomly selected M {M 1, M 2,..., M K } Probabilities {p 1, p 2,..., p K } Row player knows selected game ictf setting: Row = attacker with skill profile Col = defender Row action = strategy matching skill Col action = security resource allocation 8

10 Prior work Non-computational characterization of optimal value: [ v n+1 (p) = 1 max n + 1 min p k x T km k y + n (x 1,...,x K ) K y s k x(s)v n ( p + (p, x, s) )] Computational LP construction of optimal policies with exponential growth: (S = size of stage game matrix) S ( K Π N n=0 Sn ) Computation of optimal non-revealing value & strategies: u(p) = max x min y xt Non-computational achievability of Cav[u(p)]: ( ) p k M k y k 9

11 New results One-time policy improvement: Compute optimal current stage strategy subject to nonrevelation in future stages: [ ˆv n+1 (p) = 1 max n + 1 min p k x T km k y + n x(s)u ( p + (p, x, s) )] (x 1,...,x K ) K y s Perpetual policy improvement: Update beliefs and repeat. Features: Represents a middle ground between full non-revelation and optimal. Can be computed online via LP of size S K S 3. Theorem: Both one-time and perpetual policy improvement achieves Cav[u(p)]: Cav[u(p)] ˆv n (p) v n (p) Cav[u(p)] + C pk (1 p k ) n and lower bound is tight. k k 10

12 ictf implementation Iterations: Each stage of the repeated game is an extended time horizon Attacker implements Quasi-Belief Greedy policy (QBG) Defender allocates resources to defend services Attacker profile: Expected reward for successful attack per service: (skill level) (1 defender resource) (service value) Attacker profile is a vector of skill levels, e.g., S 0 = 0.7 S 1 = 0.2 S 3 = 0.3. S 9 = etc 11

13 Quasi-Belief Greedy Policy (QBG) Hidden Markov Model representation: Outputs: Unlabeled transitions, e.g., {T 1, T 2, T 3, T 5 } State-space: Products of individual states, e.g., = Attack decisions also provide additional measurement (i.e., attack success or failure) Beliefs are probabilities of state combinations, e.g., (16200) QBG iteration: Start with set of individual mission beliefs Given unlabeled transitions, compute most likely assignment Update individual mission beliefs assuming most likely assignment Attack services with highest expected reward Renormalize beliefs given success/failure Variants: Periodic attack Thresholds on belief probabilities Optimal probing 12

14 ictf example BR 1 BR 2 QBG QBG Type I BR 1 BR 2 QBG QBG Type I Two attacker types QBG i = behave as type i BR i = defense tuned to type i Attacker one-shot dominant strategy: Act according to type (i.e., no deception) Defender one-shot dominant strategy: Defend for correct type 13

15 ictf example: 2 stages BR 1 BR 2 QBG QBG Type I BR 1 BR 2 QBG QBG Type I p = (0.5, 0.5); #repetitions = 2 Non-revealing strategy Stage 1: non-revealing Stage 2: non-revealing Payoff =18 Dominant strategy Stage 1: fully exploit Stage 2: fully exploit Payoff = 39 One-time policy improvement Stage 1: partially-revealing Stage 2: non-revealing Payoff = 27 Perpetual policy improvement Stage 1: partially-revealing Stage 2: fully exploit Payoff = 53 14

16 ictf example: Many stages BR 1 BR 2 QBG QBG Type I BR 1 BR 2 QBG QBG Type I Dominant strategy: Payoff = 2 Fully non-revealing strategy: Payoff = 18 Policy improvement: Payoff = 23 15

17 ictf, HMM s, and Model reduction HMM = Markov chain + partial observation function p 22 S 2 O 1 p 21 p 42 S 1 p 41 S 4 g p 13 p 34 S 3 O 2 p 33 HMM ictf: Outputs: Unlabeled transitions, e.g., {T 1, T 2, T 3, T 5 } State-space: Products of individual states, e.g., = Attack decisions also provide additional measurement (i.e., attack success or failure) Transitions do not depend on attack actions Beliefs are probabilities of state combinations, e.g., (16200) 16

18 Summary and Preview Prior results: Counterexamples to show that state aggregation need not capture exact reducibility Analysis to show reduction approach of KMD2008 captures exact reducibility Current results: Theoretical characterization of when exact reduction via aggregation works (isolated points) Application of reduction to ictf in lieu of quasi-beliefs 17

19 HMM statistical signature Observation process: Y = {Y t } t Z+ for particular initial condition π Probability function: p π : Y [0, 1] p π [v k... v 1 ] = Pr[Y k = v k,..., Y 1 = v 1 ], v 1,..., v k Y Equivalence notion: Y (1) p Y (2) p π (1)[v] = p π (2)[v], v Y Approximation: p π (1)[v] p π (2)[v], v Y 18

20 Parametric description Jump linear system analog: H Y = (1 n, M : Y R n n +, π) 1 n = (1,..., 1) T M[o k ] substochastic transition matrix corresponding to o k Y π initial distribution of X 0 M[o k ] = Pr[X t+1 = s i, Y t+1 = o k X t = s j ] , π = Pr[X 0 = s 1 ]. Pr[X 0 = s n ] Path probability formula: p π [v k... v 1 ] = 1 T n M[v k ]... M[v 1 ] π, v 1,..., v k Y 19

21 Reduction Problem Approximate representation: H Y = (1 n, M, π) vs HŶ = (1ˆn, ˆM, ˆπ) where ˆn < n: 1 T n M[v k ]... M[v 1 ] π 1 Ṱ n ˆM[v k ]... ˆM[v1 ] ˆπ v 1,..., v k Y Complexity reduction: M[v i ] R n n + vs. ˆM[vi ] Rˆn ˆn + Relaxation: H Y = (1 n, M, π) vs QŶ = (ĉ, Â, ˆb): 1 T n M[v k ]... M[v 1 ] π ĉ T Â[v k ]... Â[v 1 ] ˆb v 1,..., v k Y Quasi-realization = probability generators without HMM restriction 20

22 Reduction Process with A Priori Bound Solve Lyapunov equations to obtain gramian like quantities W c, W o Perform eigenvalue decomposition on W 1 2 c W o W 1 2 c to obtain Singular numbers controlling the error bound Projection and dilation operators that produce low order system matrices Uniform bound for any initial condition π Theorem [KMD, 2008]: Worst case bound applies to reduction of HMM s (procedure may produce quasi-realizations) v Y (p π [v] pˆπ [v]) 2 d H (H, Ĥ) = 2(σˆn σ n ) 21

23 Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 22

24 Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 23

25 Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 24

26 Model reduction on CTF E286 2 = (p π [v] ˆpˆπ [v]) 2 (0.0028) v Y Bound valid for any initial condition π 1331, sum is over the whole language Y = Y 1 = 680, Y 2 = = , Y 3 = ,... Assuming a uniform distribution of a constant relative error 7.3% error per string. 25

27 Evaluation of reduced order model Predictive capability: Approximate family of conditional distributions p π [v k v k 1... v 1 ] ˆpˆπ [v k v k 1... v 1 ] Observed history: v k 1... v 1 Y k 1 Predict or generate next symbol: v k Y Bound relevancy: p π [v k v k 1... v 1 v o ] p π [v k 1... v 1 v o ] = ˆpˆπ[v k v k 1... v 1 v o ] ˆpˆπ [v k 1... v 1 v o ] Recursive implementation: where H k = M[v k ] H k 1, H 0 = π. p π [v k v k 1... v 1 ] = 1T n M[v k ] H k 1 1 T n H k 1 26

28 Evaluation of reduced order model Convergence of conditional distribution for 2 particular histories Notice threshold phenomenon 27

29 Decision Problem Now reduction algorithm is utilized in a decision setting involving the ictf setting Quantity of interest is state beliefs: H k = M[v k ] H k 1, H 0 = π Greedy strategy: Attacker has belief on the network state say π t n Observes aggregate output, say y t+1 = {T 1, T 5, T 10 } and updates the belief to π t+1 = M[y t+1]π t 1 T nm[y t+1 ]π t = H t+1 1 T nh t+1 Based on π t+1 attacker chooses which services to interrupt (greedy optimization), observes the payoff, and renormalizes his belief... repeat 28

30 Relevance of reduction Reduced order model parameters: c T = 1 T n V A[y] = U M[y] V, b = U π y Y where U Rˆn n, V R n ˆn. Approximate case: Belief vector lives in a lower dimensional space. where Ĥt+1 c T Ĥ t+1 ˆπ t+1 = V Rˆn Reduction guarantees that: A[y t+1 ] ˆπ t c T A[y t+1 ] ˆπ t = V Ĥ t+1 c T Ĥ t+1 π t+1 = H t+1 1 T nh t+1 R n 1 T n H t+1 c T Ĥ t+1 = 1 T n V Ĥt+1 29

31 Belief vector approximation Convergence of sufficient statistic for a particular history Percent decision agreement 96% 30

32 Concluding remarks Recap: Asymmetric information repeated games Reduced order models for hidden Markov models ictf 2010 implementation Future work: Strategies for non-informed player (Stackelberg, Nash, etc) Generalization of partial non-revelation Optimal probing formulation Model reduction for decision problems 31

Asymmetric Information Security Games 1/43

Asymmetric Information Security Games 1/43 Asymmetric Information Security Games Jeff S. Shamma with Lichun Li & Malachi Jones & IPAM Graduate Summer School Games and Contracts for Cyber-Physical Security 7 23 July 2015 Jeff S. Shamma Asymmetric

More information

Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning

Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning João P. Hespanha Kyriakos G. Vamvoudakis Cyber Situation Awareness Framework Mission Cyber-Assets Simulation/Live Security Exercises

More information

Cyber-Awareness and Games of Incomplete Information

Cyber-Awareness and Games of Incomplete Information Cyber-Awareness and Games of Incomplete Information Jeff S Shamma Georgia Institute of Technology ARO/MURI Annual Review August 23 24, 2010 Preview Game theoretic modeling formalisms Main issue: Information

More information

A Stochastic Framework for Quantitative Analysis of Attack-Defense Trees

A Stochastic Framework for Quantitative Analysis of Attack-Defense Trees 1 / 35 A Stochastic Framework for Quantitative Analysis of R. Jhawar K. Lounis S. Mauw CSC/SnT University of Luxembourg Luxembourg Security and Trust of Software Systems, 2016 ADT2P & TREsPASS Project

More information

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and

More information

Simple Counter-terrorism Decision

Simple Counter-terrorism Decision A Comparative Analysis of PRA and Intelligent Adversary Methods for Counterterrorism Risk Management Greg Parnell US Military Academy Jason R. W. Merrick Virginia Commonwealth University Simple Counter-terrorism

More information

CMU Noncooperative games 4: Stackelberg games. Teacher: Ariel Procaccia

CMU Noncooperative games 4: Stackelberg games. Teacher: Ariel Procaccia CMU 15-896 Noncooperative games 4: Stackelberg games Teacher: Ariel Procaccia A curious game Playing up is a dominant strategy for row player So column player would play left Therefore, (1,1) is the only

More information

Fictitious Self-Play in Extensive-Form Games

Fictitious Self-Play in Extensive-Form Games Johannes Heinrich, Marc Lanctot, David Silver University College London, Google DeepMind July 9, 05 Problem Learn from self-play in games with imperfect information. Games: Multi-agent decision making

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

V&V MURI Overview Caltech, October 2008

V&V MURI Overview Caltech, October 2008 V&V MURI Overview Caltech, October 2008 Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology Goals!! Specification, design, and certification!! Coherent

More information

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result

More information

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6 Solving Zero-Sum Extensive-Form Games ranislav ošanský E4M36MS, Fall 2013, Lecture 6 Imperfect Information EFGs States Players 1 2 Information Set ctions Utility Solving II Zero-Sum EFG with perfect recall

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Dynamic and Adversarial Reachavoid Symbolic Planning

Dynamic and Adversarial Reachavoid Symbolic Planning Dynamic and Adversarial Reachavoid Symbolic Planning Laya Shamgah Advisor: Dr. Karimoddini July 21 st 2017 Thrust 1: Modeling, Analysis and Control of Large-scale Autonomous Vehicles (MACLAV) Sub-trust

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

MS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari

MS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari MS&E 246: Lecture 12 Static games of incomplete information Ramesh Johari Incomplete information Complete information means the entire structure of the game is common knowledge Incomplete information means

More information

Long-Run versus Short-Run Player

Long-Run versus Short-Run Player Repeated Games 1 Long-Run versus Short-Run Player a fixed simultaneous move stage game Player 1 is long-run with discount factor δ actions a A a finite set 1 1 1 1 2 utility u ( a, a ) Player 2 is short-run

More information

Efficient Sensor Network Planning Method. Using Approximate Potential Game

Efficient Sensor Network Planning Method. Using Approximate Potential Game Efficient Sensor Network Planning Method 1 Using Approximate Potential Game Su-Jin Lee, Young-Jin Park, and Han-Lim Choi, Member, IEEE arxiv:1707.00796v1 [cs.gt] 4 Jul 2017 Abstract This paper addresses

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Coordinating over Signals

Coordinating over Signals Coordinating over Signals Jeff S. Shamma Behrouz Touri & Kwang-Ki Kim School of Electrical and Computer Engineering Georgia Institute of Technology ARO MURI Program Review March 18, 2014 Jeff S. Shamma

More information

Multi-Agent Learning with Policy Prediction

Multi-Agent Learning with Policy Prediction Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Andrew/CS ID: Midterm Solutions, Fall 2006

Andrew/CS ID: Midterm Solutions, Fall 2006 Name: Andrew/CS ID: 15-780 Midterm Solutions, Fall 2006 November 15, 2006 Place your name and your andrew/cs email address on the front page. The exam is open-book, open-notes, no electronics other than

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

General-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so

General-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so General-sum games You could still play a minimax strategy in general- sum games I.e., pretend that the opponent is only trying to hurt you But this is not rational: 0, 0 3, 1 1, 0 2, 1 If Column was trying

More information

Learning Equilibrium as a Generalization of Learning to Optimize

Learning Equilibrium as a Generalization of Learning to Optimize Learning Equilibrium as a Generalization of Learning to Optimize Dov Monderer and Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000,

More information

Dual Interpretations and Duality Applications (continued)

Dual Interpretations and Duality Applications (continued) Dual Interpretations and Duality Applications (continued) Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Lecture 4: Misc. Topics and Reinforcement Learning

Lecture 4: Misc. Topics and Reinforcement Learning Approximate Dynamic Programming Lecture 4: Misc. Topics and Reinforcement Learning Mengdi Wang Operations Research and Financial Engineering Princeton University August 1-4, 2015 1/56 Feature Extraction

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Institute of Electrical and Electronics Engineers (IEEE) 53rd IEEE Conference on Decision and Control

Institute of Electrical and Electronics Engineers (IEEE) 53rd IEEE Conference on Decision and Control KAUST Repository LP formulation of asymmetric zero-sum stochastic games Item type Conference Paper Authors Li, Lichun; Shamma, Jeff S. Eprint version DOI Publisher Journal Rights Post-print 0.09/CDC.204.7039680

More information

A Gentle Introduction to Reinforcement Learning

A Gentle Introduction to Reinforcement Learning A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,

More information

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse Yongjia Song, James Luedtke Virginia Commonwealth University, Richmond, VA, ysong3@vcu.edu University

More information

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Interacting Vehicles: Rules of the Game

Interacting Vehicles: Rules of the Game Chapter 7 Interacting Vehicles: Rules of the Game In previous chapters, we introduced an intelligent control method for autonomous navigation and path planning. The decision system mainly uses local information,

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Econometric Analysis of Games 1

Econometric Analysis of Games 1 Econometric Analysis of Games 1 HT 2017 Recap Aim: provide an introduction to incomplete models and partial identification in the context of discrete games 1. Coherence & Completeness 2. Basic Framework

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Definitions and Proofs

Definitions and Proofs Giving Advice vs. Making Decisions: Transparency, Information, and Delegation Online Appendix A Definitions and Proofs A. The Informational Environment The set of states of nature is denoted by = [, ],

More information

November 28 th, Carlos Guestrin 1. Lower dimensional projections

November 28 th, Carlos Guestrin 1. Lower dimensional projections PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are

More information

Refinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible

Refinements - change set of equilibria to find better set of equilibria by eliminating some that are less plausible efinements efinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible Strategic Form Eliminate Weakly Dominated Strategies - Purpose - throwing

More information

Prophet Inequalities and Stochastic Optimization

Prophet Inequalities and Stochastic Optimization Prophet Inequalities and Stochastic Optimization Kamesh Munagala Duke University Joint work with Sudipto Guha, UPenn Bayesian Decision System Decision Policy Stochastic Model State Update Model Update

More information

Belief-based Learning

Belief-based Learning Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Bayesian Congestion Control over a Markovian Network Bandwidth Process Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work

More information

Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection

Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection Theory cannot provide clear guesses about with equilibrium will occur in games with multiple

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture

More information

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Equivalence notions and model minimization in Markov decision processes

Equivalence notions and model minimization in Markov decision processes Artificial Intelligence 147 (2003) 163 223 www.elsevier.com/locate/artint Equivalence notions and model minimization in Markov decision processes Robert Givan a,, Thomas Dean b, Matthew Greig a a School

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Australian National University WORKSHOP ON SYSTEMS AND CONTROL

Australian National University WORKSHOP ON SYSTEMS AND CONTROL Australian National University WORKSHOP ON SYSTEMS AND CONTROL Canberra, AU December 7, 2017 Australian National University WORKSHOP ON SYSTEMS AND CONTROL A Distributed Algorithm for Finding a Common

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic

More information

Informed Principal in Private-Value Environments

Informed Principal in Private-Value Environments Informed Principal in Private-Value Environments Tymofiy Mylovanov Thomas Tröger University of Bonn June 21, 2008 1/28 Motivation 2/28 Motivation In most applications of mechanism design, the proposer

More information

RECURSION EQUATION FOR

RECURSION EQUATION FOR Math 46 Lecture 8 Infinite Horizon discounted reward problem From the last lecture: The value function of policy u for the infinite horizon problem with discount factor a and initial state i is W i, u

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

CS 7180: Behavioral Modeling and Decision- making in AI

CS 7180: Behavioral Modeling and Decision- making in AI CS 7180: Behavioral Modeling and Decision- making in AI Hidden Markov Models Prof. Amy Sliva October 26, 2012 Par?ally observable temporal domains POMDPs represented uncertainty about the state Belief

More information

arxiv: v2 [cs.gt] 4 Aug 2016

arxiv: v2 [cs.gt] 4 Aug 2016 Dynamic Programming for One-Sided Partially Observable Pursuit-Evasion Games Karel Horák, Branislav Bošanský {horak,bosansky}@agents.fel.cvut.cz arxiv:1606.06271v2 [cs.gt] 4 Aug 2016 Department of Computer

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

1 Bewley Economies with Aggregate Uncertainty

1 Bewley Economies with Aggregate Uncertainty 1 Bewley Economies with Aggregate Uncertainty Sofarwehaveassumedawayaggregatefluctuations (i.e., business cycles) in our description of the incomplete-markets economies with uninsurable idiosyncratic risk

More information

Confronting Theory with Experimental Data and vice versa. European University Institute. May-Jun Lectures 7-8: Equilibrium

Confronting Theory with Experimental Data and vice versa. European University Institute. May-Jun Lectures 7-8: Equilibrium Confronting Theory with Experimental Data and vice versa European University Institute May-Jun 2008 Lectures 7-8: Equilibrium Theory cannot provide clear guesses about with equilibrium will occur in games

More information

Securing Infrastructure Facilities: When Does Proactive Defense Help?

Securing Infrastructure Facilities: When Does Proactive Defense Help? Securing Infrastructure Facilities: When Does Proactive Defense Help? The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo! Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training

More information

Figure 1: Bayes Net. (a) (2 points) List all independence and conditional independence relationships implied by this Bayes net.

Figure 1: Bayes Net. (a) (2 points) List all independence and conditional independence relationships implied by this Bayes net. 1 Bayes Nets Unfortunately during spring due to illness and allergies, Billy is unable to distinguish the cause (X) of his symptoms which could be: coughing (C), sneezing (S), and temperature (T). If he

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Solving Dynamic Bandit Problems and Decentralized Games using the Kalman Bayesian Learning Automaton

Solving Dynamic Bandit Problems and Decentralized Games using the Kalman Bayesian Learning Automaton Solving Dynamic Bandit Problems and Decentralized Games using the Kalman Bayesian Learning Automaton By Stian Berg Thesis submitted in Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

4: Dynamic games. Concordia February 6, 2017

4: Dynamic games. Concordia February 6, 2017 INSE6441 Jia Yuan Yu 4: Dynamic games Concordia February 6, 2017 We introduce dynamic game with non-simultaneous moves. Example 0.1 (Ultimatum game). Divide class into two groups at random: Proposers,

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

The priority promotion approach to parity games

The priority promotion approach to parity games The priority promotion approach to parity games Massimo Benerecetti 1, Daniele Dell Erba 1, and Fabio Mogavero 2 1 Università degli Studi di Napoli Federico II 2 Università degli Studi di Verona Abstract.

More information

T 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x:

T 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x: 108 OPTIMAL STOPPING TIME 4.4. Cost functions. The cost function g(x) gives the price you must pay to continue from state x. If T is your stopping time then X T is your stopping state and f(x T ) is your

More information

Nonlinear Dynamics between Micromotives and Macrobehavior

Nonlinear Dynamics between Micromotives and Macrobehavior Nonlinear Dynamics between Micromotives and Macrobehavior Saori Iwanaga & kira Namatame Dept. of Computer Science, National Defense cademy, Yokosuka, 239-8686, JPN, E-mail: {g38042, nama}@nda.ac.jp Tel:

More information

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

Value Function Approximation in Zero-Sum Markov Games

Value Function Approximation in Zero-Sum Markov Games Value Function Approximation in Zero-Sum Markov Games Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 {mgl,parr}@cs.duke.edu Abstract This paper investigates

More information

Influencing Social Evolutionary Dynamics

Influencing Social Evolutionary Dynamics Influencing Social Evolutionary Dynamics Jeff S Shamma Georgia Institute of Technology MURI Kickoff February 13, 2013 Influence in social networks Motivating scenarios: Competing for customers Influencing

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last

More information

Convergence and No-Regret in Multiagent Learning

Convergence and No-Regret in Multiagent Learning Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

1 Primals and Duals: Zero Sum Games

1 Primals and Duals: Zero Sum Games CS 124 Section #11 Zero Sum Games; NP Completeness 4/15/17 1 Primals and Duals: Zero Sum Games We can represent various situations of conflict in life in terms of matrix games. For example, the game shown

More information

Math Models of OR: The Revised Simplex Method

Math Models of OR: The Revised Simplex Method Math Models of OR: The Revised Simplex Method John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA September 2018 Mitchell The Revised Simplex Method 1 / 25 Motivation Outline 1

More information

OPTIMAL CONTROL OF A FLEXIBLE SERVER

OPTIMAL CONTROL OF A FLEXIBLE SERVER Adv. Appl. Prob. 36, 139 170 (2004) Printed in Northern Ireland Applied Probability Trust 2004 OPTIMAL CONTROL OF A FLEXIBLE SERVER HYUN-SOO AHN, University of California, Berkeley IZAK DUENYAS, University

More information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information