Introduction to Sequential Teams
|
|
- Bathsheba Newton
- 5 years ago
- Views:
Transcription
1 Introduction to Sequential Teams Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan MITACS Workshop on Fusion and Inference in Networks, 2011
2 Decentralized systems are ubiquitous
3 Real-time quantization X Source Y ˆX Encoder Receiver Objective Choose transmission and estimation policy to minimize expected total distortion (over a finite or infinite horizon)
4 Multiaccess broadcast Transmitter 1 Transmitter 2 Multiaccess channel Objective Choose transmission policy to maximize throughput (over a finite or infinite horizon)
5 Estimating with active sensing Environment Sensor Estimator Objective Choose transmission and estimation policy to minimize a weighted average of expected transmission cost and expected total distortion (over a finite or infinite horizon)
6 Systematic design of decentralized systems Salient Features Multi-stage decision problems Multiple decision makers (or agents) with decentralized information Structure of optimal policy Can an agent, or a group of agents Shed available data Compress available data without loss of optimality? Search for optimal policies Brute force search of an optimal policy has doubly exponential complexity with time-horizon. How can we search for an optimal policy efficiently?
7 Outline A taxonomy of decentralized systems Overview of centralized stochastic control Markov decision processes (MDP) Partially observable Markov decision processes (POMDP) Delayed state observation Design principle for sequential teams. Delayed state observation
8 Classification of decentralized systems Games Static Objective Stab'lty Coupling Decentralized Systems Teams Dynamic Optim'lty
9 Classification of decentralized systems Games Static Objective Stab'lty Coupling Decentralized Systems Teams Dynamic Optim'lty
10 Classification of decentralized systems Games Static Objective Dynamic Teams Dynamic Coupling Decentralized Systems Teams Optim'lty Stab'lty Sequential Classical info strc Nonsequential Nonclassical
11 Classification of decentralized systems Games Static Objective Dynamic Teams Dynamic Coupling Decentralized Systems Teams Optim'lty Stab'lty Sequential Classical info strc Nonsequential Nonclassical J J J J J J J J
12 Sequential dynamic teams We are interested in A A A A A with non-classical information structures J J J J
13 A bit of history...
14 A bit of history...
15 Overview of centralized stochastic control
16 Centralized stochastic control Single decision maker A A A with classical information structures J J J J
17 MDP: Structural properties Sys X Agent U
18 MDP: Structural properties Sys X Agent U Structure of optimal policy Choose current action based on current state X t
19 MDP: Structural properties R R R Sys X Agent c c c f f f U X U X U X U g g g Structure of optimal policy Choose current action based on current state X t
20 MDP: Structural properties R R R Sys X Agent c c c f f f U X U X U X U g g g Structure of optimal policy R R R Choose current action c c c f f f based on current state X t X U X U X U g g g
21 POMDP: Structural properties X Sys Y Obs Agent U
22 POMDP: Structural properties X Sys Y Obs Agent U Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent)
23 POMDP: Structural properties r r r X Sys Y Obs Agent c c c f f f U x u x u x u g g g y y y Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent)
24 POMDP: Structural properties r r r X Sys Y Obs Agent c c c f f f U x u x u x u g g g y y y Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent) r r r c c c f f f x u x u x u g g g y y y π π π
25 An example: Delayed observation Sys Delay d Agent
26 An example: Delayed observation Sys Delay d Agent Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : )
27 An example: Delayed observation Sys Delay d Agent Original form of control laws U = g (X :, U : ) Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : )
28 An example: Delayed observation Sys Delay d Agent Original form of control laws U = g (X :, U : ) Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : ) Simplified form of control laws U = g (X, U : )
29 Structural policies in stochastic control Structure of optimal policies Shed irrelevant information Compress relevant information to a compact statistic Hopefully, the data at the agent is not increasing with time
30 Structural policies in stochastic control Structure of optimal policies Shed irrelevant information Compress relevant information to a compact statistic Hopefully, the data at the agent is not increasing with time Implication of the results Simplify the functional form of the decision rules Simplify search for optimal decision rules A prerequisite for deriving dynamic programming decomposition.
31 Extending ideas to decentralized control J A A A J J J A A A A J J J J A
32 Delayed observation of state One-sided delay d (X, X ) Agent 1 U Sys (X, X ) One-sided delay d (X, X ) Agent 2 U Original form of control laws U = g ([ X : U : ], [ X : U ]) :
33 Delayed observation of state One-sided delay d (X, X ) Agent 1 U Sys (X, X ) One-sided delay d (X, X ) Agent 2 U Original form of control laws U = g ([ X : U : ], [ X : U ]) : Is this structure correct? U = g ([ X : U : ], [ X U ])
34 Lets consider delay d = 2 At Agent 1 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )
35 Lets consider delay d = 2 At Agent 1 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : ) At Agent 2 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )
36 Lets consider delay d = 2 At Agent 1 At time 4, agent 1 can't remove X because X gives some information about U At Agent 2. U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : ) U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )
37 How does agent 1 figure out how agent 2 will interpret his (agent 1's) actions?
38 How does agent 1 figure out how agent 2 will interpret his (agent 1's) actions?
39 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge
40 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ).
41 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ). A three step approach: 1. Consider a coordinated system 2. Show that the coordinated system is equivalent to the original system 3. Simplify the coordinated system
42 Original System C, L X, X C, L
43 Coordinated System L X, X C L
44 Coordinated System L X, X C L Observations: C = (X :, X :, U :, U : ) Control actions : Function sections γ, γ γ ( ) = g (, C ) Agents are dumb and simply follow the prescription U = γ (L ) = γ (X, X, U )
45 Coordinated System L X, X C L Observations: C = (X :, X :, U :, U : ) Control actions : Function sections γ, γ γ ( ) = g (, C ) Agents are dumb and simply follow the prescription U = γ (L ) = γ (X, X, U ) The two systems are equivalent
46 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U )
47 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data)
48 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : )
49 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : )
50 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : ) Structural result: (γ, γ ) = ψ (π )
51 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : ) Structural result: (γ, γ ) = ψ (π ) Or equivalently, U = g (π, L )
52 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, )
53 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, ) We can show that π = Pr(X, X, Z Z :, γ :, γ : ) (Z, ˆγ, ˆγ )
54 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, ) We can show that π = Pr(X, X, Z Z :, γ :, γ : ) (Z, ˆγ, ˆγ ) Equivalent structural result U = g ([ X : U : ], [ X U ], ˆγ, ˆγ )
55 Recap: Solution approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ). A three step approach: 1. Consider a coordinated system 2. Show that the coordinated system is equivalent to the original system 3. Simplify the coordinated system
56 Applications Delayed sharing info structure (Open problem for 40 years) [Nayyar Mahajan Teneketzis 2011] real-time communication, feedback communication, multi-user communication, decentralized sequential hypothesis testing, multiaccess broadcast, active sensing,...
57 Randomized decision rules Unknown model Future directions
58 Thank you
Structure of optimal decentralized control policies An axiomatic approach
Structure of optimal decentralized control policies An axiomatic approach Aditya Mahajan Yale Joint work with: Demos Teneketzis (UofM), Sekhar Tatikonda (Yale), Ashutosh Nayyar (UofM), Serdar Yüksel (Queens)
More informationOptimal Decentralized Control of Coupled Subsystems With Control Sharing
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 58, NO. 9, SEPTEMBER 2013 2377 Optimal Decentralized Control of Coupled Subsystems With Control Sharing Aditya Mahajan, Member, IEEE Abstract Subsystems that
More informationChapter 4 The Common-Information Approach to Decentralized Stochastic Control
Chapter 4 The Common-Information Approach to Decentralized Stochastic Control Ashutosh Nayyar, Aditya Mahajan, and Demosthenis Teneketzis 4.1 Introduction Many modern technological systems, such as cyber-physical
More informationSufficient Statistics in Decentralized Decision-Making Problems
Sufficient Statistics in Decentralized Decision-Making Problems Ashutosh Nayyar University of Southern California Feb 8, 05 / 46 Decentralized Systems Transportation Networks Communication Networks Networked
More informationEquilibria for games with asymmetric information: from guesswork to systematic evaluation
Equilibria for games with asymmetric information: from guesswork to systematic evaluation Achilleas Anastasopoulos anastas@umich.edu EECS Department University of Michigan February 11, 2016 Joint work
More informationProblems in the intersection of Information theory and Control
Problems in the intersection of Information theory and Control Achilleas Anastasopoulos anastas@umich.edu EECS Department University of Michigan Dec 5, 2013 Former PhD student: Junghuyn Bae Current PhD
More information6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011
6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student
More informationSequential Decision Making in Decentralized Systems
Sequential Decision Making in Decentralized Systems by Ashutosh Nayyar A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering:
More informationDecentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach
Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach 1 Ashutosh Nayyar, Aditya Mahajan and Demosthenis Teneketzis Abstract A general model of decentralized
More informationCommon Knowledge and Sequential Team Problems
Common Knowledge and Sequential Team Problems Authors: Ashutosh Nayyar and Demosthenis Teneketzis Computer Engineering Technical Report Number CENG-2018-02 Ming Hsieh Department of Electrical Engineering
More informationRemote estimation over a packet-drop channel with Markovian state
1 Remote estimation over a packet-drop channel with Markovian state Jhelum Chakravorty and Aditya Mahajan Abstract We investigate a remote estimation problem in which a transmitter observes a Markov source
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationAvailable online at ScienceDirect. IFAC-PapersOnLine (2016) Remote-state estimation with packet drop
Available online at www.sciencedirect.com ScienceDirect IFAC-PapersOnLine 49-22 (2016) 007 012 Remote-state estimation with packet drop Jhelum Chakravorty Aditya Mahajan McGill University, Electrical Computer
More informationOptimal Control Strategies in Delayed Sharing Information Structures
Optimal Control Strategies in Delayed Sharing Information Structures Ashutosh Nayyar, Aditya Mahajan and Demosthenis Teneketzis arxiv:1002.4172v1 [cs.oh] 22 Feb 2010 February 12, 2010 Abstract The n-step
More informationDynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition
Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and
More informationOptimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor
1 Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor A. Nayyar, T. Başar, D. Teneketzis and V. V. Veeravalli Abstract We consider a remote estimation problem with
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions
More informationStochastic Nestedness and Information Analysis of Tractability in Decentralized Control
Stochastic Nestedness and Information Analysis of Tractability in Decentralized Control Serdar Yüksel 1 Abstract Communication requirements for nestedness conditions require exchange of very large data
More informationThe Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount
The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount Yinyu Ye Department of Management Science and Engineering and Institute of Computational
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationStructure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback
Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback Jhelum Chakravorty Electrical and Computer Engineering McGill University, Montreal, Canada Email: jhelum.chakravorty@mail.mcgill.ca
More informationOptimally Solving Dec-POMDPs as Continuous-State MDPs
Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI
More informationAn Introduction to Markov Decision Processes. MDP Tutorial - 1
An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationPower Allocation over Two Identical Gilbert-Elliott Channels
Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa
More informationDecentralized stochastic control
DOI 0.007/s0479-04-652-0 Decentralized stochastic control Aditya Mahajan Mehnaz Mannan Springer Science+Business Media New York 204 Abstract Decentralized stochastic control refers to the multi-stage optimization
More informationAN EXTERNALITY BASED DECENTRALIZED OPTIMAL POWER ALLOCATION SCHEME FOR WIRELESS MESH NETWORKS
AN EXTERNALITY BASED DECENTRALIZED OPTIMAL POWER ALLOCATION SCHEME FOR WIRELESS MESH NETWORKS Shruti Sharma and Demos Teneketzis Department of Electrical Engineering and Computer Science University of
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process
Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work
More informationIAS. Dec-POMDPs as Non-Observable MDPs. Frans A. Oliehoek 1 and Christopher Amato 2
Universiteit van Amsterdam IAS technical report IAS-UVA-14-01 Dec-POMDPs as Non-Observable MDPs Frans A. Oliehoek 1 and Christopher Amato 2 1 Intelligent Systems Laboratory Amsterdam, University of Amsterdam.
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationDecentralized LQG Control of Systems with a Broadcast Architecture
Decentralized LQG Control of Systems with a Broadcast Architecture Laurent Lessard 1,2 IEEE Conference on Decision and Control, pp. 6241 6246, 212 Abstract In this paper, we consider dynamical subsystems
More informationSequential team form and its simplification using graphical models
Forty-Seventh Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 30 - October 2, 2009 Sequential team form and its simplification using graphical models Aditya Mahajan Dept. of Electrical
More informationDecentralized Control of Cooperative Systems: Categorization and Complexity Analysis
Journal of Artificial Intelligence Research 22 (2004) 143-174 Submitted 01/04; published 11/04 Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Claudia V. Goldman Shlomo
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationMathematical Optimization Models and Applications
Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,
More informationSpatial Transformation
Spatial Transformation Presented by Liqun Chen June 30, 2017 1 Overview 2 Spatial Transformer Networks 3 STN experiments 4 Recurrent Models of Visual Attention (RAM) 5 Recurrent Models of Visual Attention
More informationInformation-Theoretic Privacy for Smart Metering Systems with a Rechargeable Battery
1 Information-Theoretic Privacy for Smart Metering Systems with a Rechargeable Battery Simon Li, Ashish Khisti, and Aditya Mahajan Abstract Smart-metering systems report electricity usage of a user to
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationProbabilistic Model Checking and Strategy Synthesis for Robot Navigation
Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic
More informationLORD: LOw-complexity, Rate-controlled, Distributed video coding system
LORD: LOw-complexity, Rate-controlled, Distributed video coding system Rami Cohen and David Malah Signal and Image Processing Lab Department of Electrical Engineering Technion - Israel Institute of Technology
More informationCommunication constraints and latency in Networked Control Systems
Communication constraints and latency in Networked Control Systems João P. Hespanha Center for Control Engineering and Computation University of California Santa Barbara In collaboration with Antonio Ortega
More informationTHE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES
MATHEMATICS OF OPERATIONS RESEARCH Vol. 27, No. 4, November 2002, pp. 819 840 Printed in U.S.A. THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES DANIEL S. BERNSTEIN, ROBERT GIVAN, NEIL
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture
More informationMarks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:
Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,
More information(Advances in) Quantum Reinforcement Learning. Vedran Dunjko
(Advances in) Quantum Reinforcement Learning Vedran Dunjko vedran.dunjko@mpq.mpg.de 2017 Acknowledgements: Briegel Melnikov Tiersch theoretical physics Makmal Friis Piater Hangl intelligent & interactive
More informationMultimedia Communications. Scalar Quantization
Multimedia Communications Scalar Quantization Scalar Quantization In many lossy compression applications we want to represent source outputs using a small number of code words. Process of representing
More informationCSE250A Fall 12: Discussion Week 9
CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.
More informationarxiv: v1 [cs.sy] 30 Sep 2015
Optimal Sensor Scheduling and Remote Estimation over an Additive Noise Channel Xiaobin Gao, Emrah Akyol, and Tamer Başar arxiv:1510.00064v1 cs.sy 30 Sep 015 Abstract We consider a sensor scheduling and
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationAn Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes
An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More informationHypothesis Testing with Communication Constraints
Hypothesis Testing with Communication Constraints Dinesh Krithivasan EECS 750 April 17, 2006 Dinesh Krithivasan (EECS 750) Hyp. testing with comm. constraints April 17, 2006 1 / 21 Presentation Outline
More informationThe Markov Decision Process Extraction Network
The Markov Decision Process Extraction Network Siegmund Duell 1,2, Alexander Hans 1,3, and Steffen Udluft 1 1- Siemens AG, Corporate Research and Technologies, Learning Systems, Otto-Hahn-Ring 6, D-81739
More informationThe Complexity of Decentralized Control of Markov Decision Processes
University of Massachusetts Amherst From the SelectedWorks of Neil Immerman June, 2000 The Complexity of Decentralized Control of Markov Decision Processes Daniel S. Bernstein Shlolo Zilberstein Neil Immerman,
More informationMcGill University Department of Electrical and Computer Engineering
McGill University Department of Electrical and Computer Engineering ECSE 56 - Stochastic Control Project Report Professor Aditya Mahajan Team Decision Theory and Information Structures in Optimal Control
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationLECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem
LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser
More informationMulti-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays
Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays Sahand Haji Ali Ahmad, Mingyan Liu Abstract This paper considers the following stochastic control problem that arises
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationOptimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood
Optimality of Walrand-Varaiya Type Policies and Approximation Results for Zero-Delay Coding of Markov Sources by Richard G. Wood A thesis submitted to the Department of Mathematics & Statistics in conformity
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem
Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:
More informationSurrogate loss functions, divergences and decentralized detection
Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1
More informationCapacity Bounds for Diamond Networks
Technische Universität München Capacity Bounds for Diamond Networks Gerhard Kramer (TUM) joint work with Shirin Saeedi Bidokhti (TUM & Stanford) DIMACS Workshop on Network Coding Rutgers University, NJ
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationECE Information theory Final
ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationRemote Estimation Games over Shared Networks
October st, 04 Remote Estimation Games over Shared Networks Marcos Vasconcelos & Nuno Martins marcos@umd.edu Dept. of Electrical and Computer Engineering Institute of Systems Research University of Maryland,
More informationDecentralized Control of Stochastic Systems
Decentralized Control of Stochastic Systems Sanjay Lall Stanford University CDC-ECC Workshop, December 11, 2005 2 S. Lall, Stanford 2005.12.11.02 Decentralized Control G 1 G 2 G 3 G 4 G 5 y 1 u 1 y 2 u
More informationA POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation
A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationChapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS
Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationSymbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning
Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus
More informationProbability and Time: Hidden Markov Models (HMMs)
Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013 CPSC 322, Lecture 32 Slide 1 Lecture Overview Recap Markov Models Markov Chain
More informationApproximate Universal Artificial Intelligence
Approximate Universal Artificial Intelligence A Monte-Carlo AIXI Approximation Joel Veness Kee Siong Ng Marcus Hutter Dave Silver UNSW / NICTA / ANU / UoA September 8, 2010 General Reinforcement Learning
More informationDecentralized Decision Making!
Decentralized Decision Making! in Partially Observable, Uncertain Worlds Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst Joint work with Martin Allen, Christopher
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationInformation Sources. Professor A. Manikas. Imperial College London. EE303 - Communication Systems An Overview of Fundamentals
Information Sources Professor A. Manikas Imperial College London EE303 - Communication Systems An Overview of Fundamentals Prof. A. Manikas (Imperial College) EE303: Information Sources 24 Oct. 2011 1
More informationCS 598 Statistical Reinforcement Learning. Nan Jiang
CS 598 Statistical Reinforcement Learning Nan Jiang Overview What s this course about? A grad-level seminar course on theory of RL 3 What s this course about? A grad-level seminar course on theory of RL
More informationTransmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations
Transmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations Xiaolu Zhang, Meixia Tao and Chun Sum Ng Department of Electrical and Computer Engineering National
More informationNote that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )).
l I ~-16 / (a) (5 points) What is the capacity Cr of the channel X -> Y? What is C of the channel Y - Z? (b) (5 points) What is the capacity C 3 of the cascaded channel X -3 Z? (c) (5 points) A ow let.
More informationRestless Bandit Index Policies for Solving Constrained Sequential Estimation Problems
Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems Sofía S. Villar Postdoctoral Fellow Basque Center for Applied Mathemathics (BCAM) Lancaster University November 5th,
More informationMarkovian Decision Process (MDP): theory and applications to wireless networks
Markovian Decision Process (MDP): theory and applications to wireless networks Philippe Ciblat Joint work with I. Fawaz, N. Ksairi, C. Le Martret, M. Sarkiss Outline Examples MDP Applications A few examples
More informationTemporal Difference Learning & Policy Iteration
Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.
More informationDecentralized decision making with spatially distributed data
Decentralized decision making with spatially distributed data XuanLong Nguyen Department of Statistics University of Michigan Acknowledgement: Michael Jordan, Martin Wainwright, Ram Rajagopal, Pravin Varaiya
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationDecentralized Sequential Change Detection Using Physical Layer Fusion
Decentralized Sequential Change Detection Using hysical Layer Fusion Leena Zacharias ECE Department Indian Institute of Science Bangalore, India Email: zleena@ece.iisc.ernet.in ajesh Sundaresan ECE Department
More informationDecentralized Sequential Hypothesis Testing. Change Detection
Decentralized Sequential Hypothesis Testing & Change Detection Giorgos Fellouris, Columbia University, NY, USA George V. Moustakides, University of Patras, Greece Outline Sequential hypothesis testing
More informationThe Optimality of Beamforming: A Unified View
The Optimality of Beamforming: A Unified View Sudhir Srinivasa and Syed Ali Jafar Electrical Engineering and Computer Science University of California Irvine, Irvine, CA 92697-2625 Email: sudhirs@uciedu,
More informationPower Controlled FCFS Splitting Algorithm for Wireless Networks
Power Controlled FCFS Splitting Algorithm for Wireless Networks Ashutosh Deepak Gore Abhay Karandikar Department of Electrical Engineering Indian Institute of Technology - Bombay COMNET Workshop, July
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationOptimizing Memory-Bounded Controllers for Decentralized POMDPs
Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003
More informationSTRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS
STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS Qing Zhao University of California Davis, CA 95616 qzhao@ece.ucdavis.edu Bhaskar Krishnamachari University of Southern California
More informationEvent-triggered stabilization of linear systems under channel blackouts
Event-triggered stabilization of linear systems under channel blackouts Pavankumar Tallapragada, Massimo Franceschetti & Jorge Cortés Allerton Conference, 30 Sept. 2015 Acknowledgements: National Science
More informationMarkov Decision Processes and Solving Finite Problems. February 8, 2017
Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:
More informationLogic, Knowledge Representation and Bayesian Decision Theory
Logic, Knowledge Representation and Bayesian Decision Theory David Poole University of British Columbia Overview Knowledge representation, logic, decision theory. Belief networks Independent Choice Logic
More information