Cyber Security Games with Asymmetric Information
|
|
- Logan Cain
- 5 years ago
- Views:
Transcription
1 Cyber Security Games with Asymmetric Information Jeff S. Shamma Georgia Institute of Technology Joint work with Georgios Kotsalis & Malachi Jones ARO MURI Annual Review November 15, 2012
2 Research Thrust: Obtaining Actionable Cyber-Attack Forecasts Develop adversary behavior models to help predict the effects of future attacks that might be launched to prevent successful mission completion. Asymmetric information repeated games Reduced order models for hidden Markov models ictf 2010 implementation 1
3 Project Architecture Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Data Mission Model Cyber-Assets Model COAs Data Predict Future Actions Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status 8 2
4 Project Architecture Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Mission Model Cyber-Assets Model Predict Future Actions COAs Data Data Sensor Alerts Correlation Engine Data Impact Analysis Data Create semantically-rich view of cyber-mission status 8 3
5 ictf 2010 Setting: Sequential execution of service-based missions Game: Service vulnerability vs Counter measures Modeling: Stochastic automata (Petri Nets) States associated with services Labeled transitions 4
6 ictf Attack Attacker: Observes state transition signals Decides which services to attack Rewards/penalties for attacking correct/incorrect services Complications: Parallel mission signal ambiguity Product state space 5
7 Asymmetric information games Motivation: One player has superior information Issue: Maximize effectiveness while minimizing observability and hence vulnerability Revelation vs Exploitation Context: Worst-case guarantee vs low probability surprise Repeated vs one-shot Bayesian Stackelberg Games 6
8 Repeated zero-sum games with asymmetric information L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12 B m 21 m 22 L R T m 11 m B m 21 m 22 Players repeatedly play same game over sequential stages Row player = maximizer; Col player = minimizer Player s observe opponent actions (perfect monitoring) Strategy = mapping from past observations to future action probabilities Utility = Sum of stage payoffs 7
9 Repeated zero-sum games with asymmetric information L R T m 11 m 12 B m 21 m 22 L R T m 11 m 12 B m 21 m 22 L R T m 11 m B m 21 m 22 Asymmetric information: At start, matrix randomly selected M {M 1, M 2,..., M K } Probabilities {p 1, p 2,..., p K } Row player knows selected game ictf setting: Row = attacker with skill profile Col = defender Row action = strategy matching skill Col action = security resource allocation 8
10 Prior work Non-computational characterization of optimal value: [ v n+1 (p) = 1 max n + 1 min p k x T km k y + n (x 1,...,x K ) K y s k x(s)v n ( p + (p, x, s) )] Computational LP construction of optimal policies with exponential growth: (S = size of stage game matrix) S ( K Π N n=0 Sn ) Computation of optimal non-revealing value & strategies: u(p) = max x min y xt Non-computational achievability of Cav[u(p)]: ( ) p k M k y k 9
11 New results One-time policy improvement: Compute optimal current stage strategy subject to nonrevelation in future stages: [ ˆv n+1 (p) = 1 max n + 1 min p k x T km k y + n x(s)u ( p + (p, x, s) )] (x 1,...,x K ) K y s Perpetual policy improvement: Update beliefs and repeat. Features: Represents a middle ground between full non-revelation and optimal. Can be computed online via LP of size S K S 3. Theorem: Both one-time and perpetual policy improvement achieves Cav[u(p)]: Cav[u(p)] ˆv n (p) v n (p) Cav[u(p)] + C pk (1 p k ) n and lower bound is tight. k k 10
12 ictf implementation Iterations: Each stage of the repeated game is an extended time horizon Attacker implements Quasi-Belief Greedy policy (QBG) Defender allocates resources to defend services Attacker profile: Expected reward for successful attack per service: (skill level) (1 defender resource) (service value) Attacker profile is a vector of skill levels, e.g., S 0 = 0.7 S 1 = 0.2 S 3 = 0.3. S 9 = etc 11
13 Quasi-Belief Greedy Policy (QBG) Hidden Markov Model representation: Outputs: Unlabeled transitions, e.g., {T 1, T 2, T 3, T 5 } State-space: Products of individual states, e.g., = Attack decisions also provide additional measurement (i.e., attack success or failure) Beliefs are probabilities of state combinations, e.g., (16200) QBG iteration: Start with set of individual mission beliefs Given unlabeled transitions, compute most likely assignment Update individual mission beliefs assuming most likely assignment Attack services with highest expected reward Renormalize beliefs given success/failure Variants: Periodic attack Thresholds on belief probabilities Optimal probing 12
14 ictf example BR 1 BR 2 QBG QBG Type I BR 1 BR 2 QBG QBG Type I Two attacker types QBG i = behave as type i BR i = defense tuned to type i Attacker one-shot dominant strategy: Act according to type (i.e., no deception) Defender one-shot dominant strategy: Defend for correct type 13
15 ictf example: 2 stages BR 1 BR 2 QBG QBG Type I BR 1 BR 2 QBG QBG Type I p = (0.5, 0.5); #repetitions = 2 Non-revealing strategy Stage 1: non-revealing Stage 2: non-revealing Payoff =18 Dominant strategy Stage 1: fully exploit Stage 2: fully exploit Payoff = 39 One-time policy improvement Stage 1: partially-revealing Stage 2: non-revealing Payoff = 27 Perpetual policy improvement Stage 1: partially-revealing Stage 2: fully exploit Payoff = 53 14
16 ictf example: Many stages BR 1 BR 2 QBG QBG Type I BR 1 BR 2 QBG QBG Type I Dominant strategy: Payoff = 2 Fully non-revealing strategy: Payoff = 18 Policy improvement: Payoff = 23 15
17 ictf, HMM s, and Model reduction HMM = Markov chain + partial observation function p 22 S 2 O 1 p 21 p 42 S 1 p 41 S 4 g p 13 p 34 S 3 O 2 p 33 HMM ictf: Outputs: Unlabeled transitions, e.g., {T 1, T 2, T 3, T 5 } State-space: Products of individual states, e.g., = Attack decisions also provide additional measurement (i.e., attack success or failure) Transitions do not depend on attack actions Beliefs are probabilities of state combinations, e.g., (16200) 16
18 Summary and Preview Prior results: Counterexamples to show that state aggregation need not capture exact reducibility Analysis to show reduction approach of KMD2008 captures exact reducibility Current results: Theoretical characterization of when exact reduction via aggregation works (isolated points) Application of reduction to ictf in lieu of quasi-beliefs 17
19 HMM statistical signature Observation process: Y = {Y t } t Z+ for particular initial condition π Probability function: p π : Y [0, 1] p π [v k... v 1 ] = Pr[Y k = v k,..., Y 1 = v 1 ], v 1,..., v k Y Equivalence notion: Y (1) p Y (2) p π (1)[v] = p π (2)[v], v Y Approximation: p π (1)[v] p π (2)[v], v Y 18
20 Parametric description Jump linear system analog: H Y = (1 n, M : Y R n n +, π) 1 n = (1,..., 1) T M[o k ] substochastic transition matrix corresponding to o k Y π initial distribution of X 0 M[o k ] = Pr[X t+1 = s i, Y t+1 = o k X t = s j ] , π = Pr[X 0 = s 1 ]. Pr[X 0 = s n ] Path probability formula: p π [v k... v 1 ] = 1 T n M[v k ]... M[v 1 ] π, v 1,..., v k Y 19
21 Reduction Problem Approximate representation: H Y = (1 n, M, π) vs HŶ = (1ˆn, ˆM, ˆπ) where ˆn < n: 1 T n M[v k ]... M[v 1 ] π 1 Ṱ n ˆM[v k ]... ˆM[v1 ] ˆπ v 1,..., v k Y Complexity reduction: M[v i ] R n n + vs. ˆM[vi ] Rˆn ˆn + Relaxation: H Y = (1 n, M, π) vs QŶ = (ĉ, Â, ˆb): 1 T n M[v k ]... M[v 1 ] π ĉ T Â[v k ]... Â[v 1 ] ˆb v 1,..., v k Y Quasi-realization = probability generators without HMM restriction 20
22 Reduction Process with A Priori Bound Solve Lyapunov equations to obtain gramian like quantities W c, W o Perform eigenvalue decomposition on W 1 2 c W o W 1 2 c to obtain Singular numbers controlling the error bound Projection and dilation operators that produce low order system matrices Uniform bound for any initial condition π Theorem [KMD, 2008]: Worst case bound applies to reduction of HMM s (procedure may produce quasi-realizations) v Y (p π [v] pˆπ [v]) 2 d H (H, Ĥ) = 2(σˆn σ n ) 21
23 Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 22
24 Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 23
25 Model reduction on CTF System: Pick any 3 petri-nets from CTF Dimension: 729 states, 364 outputs to 1331 states, 680 outputs Computation time: 122 sec to 1092 sec Cut-off behavior, clear choice of reduced order model 24
26 Model reduction on CTF E286 2 = (p π [v] ˆpˆπ [v]) 2 (0.0028) v Y Bound valid for any initial condition π 1331, sum is over the whole language Y = Y 1 = 680, Y 2 = = , Y 3 = ,... Assuming a uniform distribution of a constant relative error 7.3% error per string. 25
27 Evaluation of reduced order model Predictive capability: Approximate family of conditional distributions p π [v k v k 1... v 1 ] ˆpˆπ [v k v k 1... v 1 ] Observed history: v k 1... v 1 Y k 1 Predict or generate next symbol: v k Y Bound relevancy: p π [v k v k 1... v 1 v o ] p π [v k 1... v 1 v o ] = ˆpˆπ[v k v k 1... v 1 v o ] ˆpˆπ [v k 1... v 1 v o ] Recursive implementation: where H k = M[v k ] H k 1, H 0 = π. p π [v k v k 1... v 1 ] = 1T n M[v k ] H k 1 1 T n H k 1 26
28 Evaluation of reduced order model Convergence of conditional distribution for 2 particular histories Notice threshold phenomenon 27
29 Decision Problem Now reduction algorithm is utilized in a decision setting involving the ictf setting Quantity of interest is state beliefs: H k = M[v k ] H k 1, H 0 = π Greedy strategy: Attacker has belief on the network state say π t n Observes aggregate output, say y t+1 = {T 1, T 5, T 10 } and updates the belief to π t+1 = M[y t+1]π t 1 T nm[y t+1 ]π t = H t+1 1 T nh t+1 Based on π t+1 attacker chooses which services to interrupt (greedy optimization), observes the payoff, and renormalizes his belief... repeat 28
30 Relevance of reduction Reduced order model parameters: c T = 1 T n V A[y] = U M[y] V, b = U π y Y where U Rˆn n, V R n ˆn. Approximate case: Belief vector lives in a lower dimensional space. where Ĥt+1 c T Ĥ t+1 ˆπ t+1 = V Rˆn Reduction guarantees that: A[y t+1 ] ˆπ t c T A[y t+1 ] ˆπ t = V Ĥ t+1 c T Ĥ t+1 π t+1 = H t+1 1 T nh t+1 R n 1 T n H t+1 c T Ĥ t+1 = 1 T n V Ĥt+1 29
31 Belief vector approximation Convergence of sufficient statistic for a particular history Percent decision agreement 96% 30
32 Concluding remarks Recap: Asymmetric information repeated games Reduced order models for hidden Markov models ictf 2010 implementation Future work: Strategies for non-informed player (Stackelberg, Nash, etc) Generalization of partial non-revelation Optimal probing formulation Model reduction for decision problems 31
Asymmetric Information Security Games 1/43
Asymmetric Information Security Games Jeff S. Shamma with Lichun Li & Malachi Jones & IPAM Graduate Summer School Games and Contracts for Cyber-Physical Security 7 23 July 2015 Jeff S. Shamma Asymmetric
More informationDetection and Mitigation of Cyber-Attacks Using Game Theory and Learning
Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning João P. Hespanha Kyriakos G. Vamvoudakis Cyber Situation Awareness Framework Mission Cyber-Assets Simulation/Live Security Exercises
More informationCyber-Awareness and Games of Incomplete Information
Cyber-Awareness and Games of Incomplete Information Jeff S Shamma Georgia Institute of Technology ARO/MURI Annual Review August 23 24, 2010 Preview Game theoretic modeling formalisms Main issue: Information
More informationA Stochastic Framework for Quantitative Analysis of Attack-Defense Trees
1 / 35 A Stochastic Framework for Quantitative Analysis of R. Jhawar K. Lounis S. Mauw CSC/SnT University of Luxembourg Luxembourg Security and Trust of Software Systems, 2016 ADT2P & TREsPASS Project
More informationDynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition
Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and
More informationSimple Counter-terrorism Decision
A Comparative Analysis of PRA and Intelligent Adversary Methods for Counterterrorism Risk Management Greg Parnell US Military Academy Jason R. W. Merrick Virginia Commonwealth University Simple Counter-terrorism
More informationCMU Noncooperative games 4: Stackelberg games. Teacher: Ariel Procaccia
CMU 15-896 Noncooperative games 4: Stackelberg games Teacher: Ariel Procaccia A curious game Playing up is a dominant strategy for row player So column player would play left Therefore, (1,1) is the only
More informationFictitious Self-Play in Extensive-Form Games
Johannes Heinrich, Marc Lanctot, David Silver University College London, Google DeepMind July 9, 05 Problem Learn from self-play in games with imperfect information. Games: Multi-agent decision making
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationV&V MURI Overview Caltech, October 2008
V&V MURI Overview Caltech, October 2008 Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology Goals!! Specification, design, and certification!! Coherent
More informationA Polynomial-time Nash Equilibrium Algorithm for Repeated Games
A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result
More informationSolving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6
Solving Zero-Sum Extensive-Form Games ranislav ošanský E4M36MS, Fall 2013, Lecture 6 Imperfect Information EFGs States Players 1 2 Information Set ctions Utility Solving II Zero-Sum EFG with perfect recall
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationDynamic and Adversarial Reachavoid Symbolic Planning
Dynamic and Adversarial Reachavoid Symbolic Planning Laya Shamgah Advisor: Dr. Karimoddini July 21 st 2017 Thrust 1: Modeling, Analysis and Control of Large-scale Autonomous Vehicles (MACLAV) Sub-trust
More informationAn Introduction to Markov Decision Processes. MDP Tutorial - 1
An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationMS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari
MS&E 246: Lecture 12 Static games of incomplete information Ramesh Johari Incomplete information Complete information means the entire structure of the game is common knowledge Incomplete information means
More informationLong-Run versus Short-Run Player
Repeated Games 1 Long-Run versus Short-Run Player a fixed simultaneous move stage game Player 1 is long-run with discount factor δ actions a A a finite set 1 1 1 1 2 utility u ( a, a ) Player 2 is short-run
More informationEfficient Sensor Network Planning Method. Using Approximate Potential Game
Efficient Sensor Network Planning Method 1 Using Approximate Potential Game Su-Jin Lee, Young-Jin Park, and Han-Lim Choi, Member, IEEE arxiv:1707.00796v1 [cs.gt] 4 Jul 2017 Abstract This paper addresses
More informationSatisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games
Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),
More informationCoordinating over Signals
Coordinating over Signals Jeff S. Shamma Behrouz Touri & Kwang-Ki Kim School of Electrical and Computer Engineering Georgia Institute of Technology ARO MURI Program Review March 18, 2014 Jeff S. Shamma
More informationMulti-Agent Learning with Policy Prediction
Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationAndrew/CS ID: Midterm Solutions, Fall 2006
Name: Andrew/CS ID: 15-780 Midterm Solutions, Fall 2006 November 15, 2006 Place your name and your andrew/cs email address on the front page. The exam is open-book, open-notes, no electronics other than
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More informationGeneral-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so
General-sum games You could still play a minimax strategy in general- sum games I.e., pretend that the opponent is only trying to hurt you But this is not rational: 0, 0 3, 1 1, 0 2, 1 If Column was trying
More informationLearning Equilibrium as a Generalization of Learning to Optimize
Learning Equilibrium as a Generalization of Learning to Optimize Dov Monderer and Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000,
More informationDual Interpretations and Duality Applications (continued)
Dual Interpretations and Duality Applications (continued) Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters
More informationEstimating Covariance Using Factorial Hidden Markov Models
Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu
More informationLecture 4: Misc. Topics and Reinforcement Learning
Approximate Dynamic Programming Lecture 4: Misc. Topics and Reinforcement Learning Mengdi Wang Operations Research and Financial Engineering Princeton University August 1-4, 2015 1/56 Feature Extraction
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationInstitute of Electrical and Electronics Engineers (IEEE) 53rd IEEE Conference on Decision and Control
KAUST Repository LP formulation of asymmetric zero-sum stochastic games Item type Conference Paper Authors Li, Lichun; Shamma, Jeff S. Eprint version DOI Publisher Journal Rights Post-print 0.09/CDC.204.7039680
More informationA Gentle Introduction to Reinforcement Learning
A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,
More informationAn Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse
An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse Yongjia Song, James Luedtke Virginia Commonwealth University, Richmond, VA, ysong3@vcu.edu University
More informationEstimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS
MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationInteracting Vehicles: Rules of the Game
Chapter 7 Interacting Vehicles: Rules of the Game In previous chapters, we introduced an intelligent control method for autonomous navigation and path planning. The decision system mainly uses local information,
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationLearning to Coordinate Efficiently: A Model-based Approach
Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion
More informationEconometric Analysis of Games 1
Econometric Analysis of Games 1 HT 2017 Recap Aim: provide an introduction to incomplete models and partial identification in the context of discrete games 1. Coherence & Completeness 2. Basic Framework
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationDefinitions and Proofs
Giving Advice vs. Making Decisions: Transparency, Information, and Delegation Online Appendix A Definitions and Proofs A. The Informational Environment The set of states of nature is denoted by = [, ],
More informationNovember 28 th, Carlos Guestrin 1. Lower dimensional projections
PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are
More informationRefinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible
efinements efinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible Strategic Form Eliminate Weakly Dominated Strategies - Purpose - throwing
More informationProphet Inequalities and Stochastic Optimization
Prophet Inequalities and Stochastic Optimization Kamesh Munagala Duke University Joint work with Sudipto Guha, UPenn Bayesian Decision System Decision Policy Stochastic Model State Update Model Update
More informationBelief-based Learning
Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process
Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work
More informationEconomics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection
Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection Theory cannot provide clear guesses about with equilibrium will occur in games with multiple
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture
More informationIncremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests
Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationEquivalence notions and model minimization in Markov decision processes
Artificial Intelligence 147 (2003) 163 223 www.elsevier.com/locate/artint Equivalence notions and model minimization in Markov decision processes Robert Givan a,, Thomas Dean b, Matthew Greig a a School
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationAustralian National University WORKSHOP ON SYSTEMS AND CONTROL
Australian National University WORKSHOP ON SYSTEMS AND CONTROL Canberra, AU December 7, 2017 Australian National University WORKSHOP ON SYSTEMS AND CONTROL A Distributed Algorithm for Finding a Common
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationProbabilistic Model Checking and Strategy Synthesis for Robot Navigation
Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic
More informationInformed Principal in Private-Value Environments
Informed Principal in Private-Value Environments Tymofiy Mylovanov Thomas Tröger University of Bonn June 21, 2008 1/28 Motivation 2/28 Motivation In most applications of mechanism design, the proposer
More informationRECURSION EQUATION FOR
Math 46 Lecture 8 Infinite Horizon discounted reward problem From the last lecture: The value function of policy u for the infinite horizon problem with discount factor a and initial state i is W i, u
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationCS 7180: Behavioral Modeling and Decision- making in AI
CS 7180: Behavioral Modeling and Decision- making in AI Hidden Markov Models Prof. Amy Sliva October 26, 2012 Par?ally observable temporal domains POMDPs represented uncertainty about the state Belief
More informationarxiv: v2 [cs.gt] 4 Aug 2016
Dynamic Programming for One-Sided Partially Observable Pursuit-Evasion Games Karel Horák, Branislav Bošanský {horak,bosansky}@agents.fel.cvut.cz arxiv:1606.06271v2 [cs.gt] 4 Aug 2016 Department of Computer
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationLearning in Zero-Sum Team Markov Games using Factored Value Functions
Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer
More information1 Bewley Economies with Aggregate Uncertainty
1 Bewley Economies with Aggregate Uncertainty Sofarwehaveassumedawayaggregatefluctuations (i.e., business cycles) in our description of the incomplete-markets economies with uninsurable idiosyncratic risk
More informationConfronting Theory with Experimental Data and vice versa. European University Institute. May-Jun Lectures 7-8: Equilibrium
Confronting Theory with Experimental Data and vice versa European University Institute May-Jun 2008 Lectures 7-8: Equilibrium Theory cannot provide clear guesses about with equilibrium will occur in games
More informationSecuring Infrastructure Facilities: When Does Proactive Defense Help?
Securing Infrastructure Facilities: When Does Proactive Defense Help? The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationFigure 1: Bayes Net. (a) (2 points) List all independence and conditional independence relationships implied by this Bayes net.
1 Bayes Nets Unfortunately during spring due to illness and allergies, Billy is unable to distinguish the cause (X) of his symptoms which could be: coughing (C), sneezing (S), and temperature (T). If he
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationSolving Dynamic Bandit Problems and Decentralized Games using the Kalman Bayesian Learning Automaton
Solving Dynamic Bandit Problems and Decentralized Games using the Kalman Bayesian Learning Automaton By Stian Berg Thesis submitted in Partial Fulfillment of the Requirements for the Degree Master of Science
More informationReinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies
Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More information4: Dynamic games. Concordia February 6, 2017
INSE6441 Jia Yuan Yu 4: Dynamic games Concordia February 6, 2017 We introduce dynamic game with non-simultaneous moves. Example 0.1 (Ultimatum game). Divide class into two groups at random: Proposers,
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationThe priority promotion approach to parity games
The priority promotion approach to parity games Massimo Benerecetti 1, Daniele Dell Erba 1, and Fabio Mogavero 2 1 Università degli Studi di Napoli Federico II 2 Università degli Studi di Verona Abstract.
More informationT 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x:
108 OPTIMAL STOPPING TIME 4.4. Cost functions. The cost function g(x) gives the price you must pay to continue from state x. If T is your stopping time then X T is your stopping state and f(x T ) is your
More informationNonlinear Dynamics between Micromotives and Macrobehavior
Nonlinear Dynamics between Micromotives and Macrobehavior Saori Iwanaga & kira Namatame Dept. of Computer Science, National Defense cademy, Yokosuka, 239-8686, JPN, E-mail: {g38042, nama}@nda.ac.jp Tel:
More informationCS 4100 // artificial intelligence. Recap/midterm review!
CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks
More informationValue Function Approximation in Zero-Sum Markov Games
Value Function Approximation in Zero-Sum Markov Games Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 {mgl,parr}@cs.duke.edu Abstract This paper investigates
More informationInfluencing Social Evolutionary Dynamics
Influencing Social Evolutionary Dynamics Jeff S Shamma Georgia Institute of Technology MURI Kickoff February 13, 2013 Influence in social networks Motivating scenarios: Competing for customers Influencing
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs
CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last
More informationConvergence and No-Regret in Multiagent Learning
Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More information1 Primals and Duals: Zero Sum Games
CS 124 Section #11 Zero Sum Games; NP Completeness 4/15/17 1 Primals and Duals: Zero Sum Games We can represent various situations of conflict in life in terms of matrix games. For example, the game shown
More informationMath Models of OR: The Revised Simplex Method
Math Models of OR: The Revised Simplex Method John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA September 2018 Mitchell The Revised Simplex Method 1 / 25 Motivation Outline 1
More informationOPTIMAL CONTROL OF A FLEXIBLE SERVER
Adv. Appl. Prob. 36, 139 170 (2004) Printed in Northern Ireland Applied Probability Trust 2004 OPTIMAL CONTROL OF A FLEXIBLE SERVER HYUN-SOO AHN, University of California, Berkeley IZAK DUENYAS, University
More informationHidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More information