Introduction to Sequential Teams

Size: px
Start display at page:

Download "Introduction to Sequential Teams"

Transcription

1 Introduction to Sequential Teams Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan MITACS Workshop on Fusion and Inference in Networks, 2011

2 Decentralized systems are ubiquitous

3 Real-time quantization X Source Y ˆX Encoder Receiver Objective Choose transmission and estimation policy to minimize expected total distortion (over a finite or infinite horizon)

4 Multiaccess broadcast Transmitter 1 Transmitter 2 Multiaccess channel Objective Choose transmission policy to maximize throughput (over a finite or infinite horizon)

5 Estimating with active sensing Environment Sensor Estimator Objective Choose transmission and estimation policy to minimize a weighted average of expected transmission cost and expected total distortion (over a finite or infinite horizon)

6 Systematic design of decentralized systems Salient Features Multi-stage decision problems Multiple decision makers (or agents) with decentralized information Structure of optimal policy Can an agent, or a group of agents Shed available data Compress available data without loss of optimality? Search for optimal policies Brute force search of an optimal policy has doubly exponential complexity with time-horizon. How can we search for an optimal policy efficiently?

7 Outline A taxonomy of decentralized systems Overview of centralized stochastic control Markov decision processes (MDP) Partially observable Markov decision processes (POMDP) Delayed state observation Design principle for sequential teams. Delayed state observation

8 Classification of decentralized systems Games Static Objective Stab'lty Coupling Decentralized Systems Teams Dynamic Optim'lty

9 Classification of decentralized systems Games Static Objective Stab'lty Coupling Decentralized Systems Teams Dynamic Optim'lty

10 Classification of decentralized systems Games Static Objective Dynamic Teams Dynamic Coupling Decentralized Systems Teams Optim'lty Stab'lty Sequential Classical info strc Nonsequential Nonclassical

11 Classification of decentralized systems Games Static Objective Dynamic Teams Dynamic Coupling Decentralized Systems Teams Optim'lty Stab'lty Sequential Classical info strc Nonsequential Nonclassical J J J J J J J J

12 Sequential dynamic teams We are interested in A A A A A with non-classical information structures J J J J

13 A bit of history...

14 A bit of history...

15 Overview of centralized stochastic control

16 Centralized stochastic control Single decision maker A A A with classical information structures J J J J

17 MDP: Structural properties Sys X Agent U

18 MDP: Structural properties Sys X Agent U Structure of optimal policy Choose current action based on current state X t

19 MDP: Structural properties R R R Sys X Agent c c c f f f U X U X U X U g g g Structure of optimal policy Choose current action based on current state X t

20 MDP: Structural properties R R R Sys X Agent c c c f f f U X U X U X U g g g Structure of optimal policy R R R Choose current action c c c f f f based on current state X t X U X U X U g g g

21 POMDP: Structural properties X Sys Y Obs Agent U

22 POMDP: Structural properties X Sys Y Obs Agent U Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent)

23 POMDP: Structural properties r r r X Sys Y Obs Agent c c c f f f U x u x u x u g g g y y y Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent)

24 POMDP: Structural properties r r r X Sys Y Obs Agent c c c f f f U x u x u x u g g g y y y Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent) r r r c c c f f f x u x u x u g g g y y y π π π

25 An example: Delayed observation Sys Delay d Agent

26 An example: Delayed observation Sys Delay d Agent Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : )

27 An example: Delayed observation Sys Delay d Agent Original form of control laws U = g (X :, U : ) Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : )

28 An example: Delayed observation Sys Delay d Agent Original form of control laws U = g (X :, U : ) Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : ) Simplified form of control laws U = g (X, U : )

29 Structural policies in stochastic control Structure of optimal policies Shed irrelevant information Compress relevant information to a compact statistic Hopefully, the data at the agent is not increasing with time

30 Structural policies in stochastic control Structure of optimal policies Shed irrelevant information Compress relevant information to a compact statistic Hopefully, the data at the agent is not increasing with time Implication of the results Simplify the functional form of the decision rules Simplify search for optimal decision rules A prerequisite for deriving dynamic programming decomposition.

31 Extending ideas to decentralized control J A A A J J J A A A A J J J J A

32 Delayed observation of state One-sided delay d (X, X ) Agent 1 U Sys (X, X ) One-sided delay d (X, X ) Agent 2 U Original form of control laws U = g ([ X : U : ], [ X : U ]) :

33 Delayed observation of state One-sided delay d (X, X ) Agent 1 U Sys (X, X ) One-sided delay d (X, X ) Agent 2 U Original form of control laws U = g ([ X : U : ], [ X : U ]) : Is this structure correct? U = g ([ X : U : ], [ X U ])

34 Lets consider delay d = 2 At Agent 1 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )

35 Lets consider delay d = 2 At Agent 1 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : ) At Agent 2 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )

36 Lets consider delay d = 2 At Agent 1 At time 4, agent 1 can't remove X because X gives some information about U At Agent 2. U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : ) U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )

37 How does agent 1 figure out how agent 2 will interpret his (agent 1's) actions?

38 How does agent 1 figure out how agent 2 will interpret his (agent 1's) actions?

39 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge

40 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ).

41 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ). A three step approach: 1. Consider a coordinated system 2. Show that the coordinated system is equivalent to the original system 3. Simplify the coordinated system

42 Original System C, L X, X C, L

43 Coordinated System L X, X C L

44 Coordinated System L X, X C L Observations: C = (X :, X :, U :, U : ) Control actions : Function sections γ, γ γ ( ) = g (, C ) Agents are dumb and simply follow the prescription U = γ (L ) = γ (X, X, U )

45 Coordinated System L X, X C L Observations: C = (X :, X :, U :, U : ) Control actions : Function sections γ, γ γ ( ) = g (, C ) Agents are dumb and simply follow the prescription U = γ (L ) = γ (X, X, U ) The two systems are equivalent

46 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U )

47 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data)

48 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : )

49 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : )

50 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : ) Structural result: (γ, γ ) = ψ (π )

51 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : ) Structural result: (γ, γ ) = ψ (π ) Or equivalently, U = g (π, L )

52 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, )

53 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, ) We can show that π = Pr(X, X, Z Z :, γ :, γ : ) (Z, ˆγ, ˆγ )

54 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, ) We can show that π = Pr(X, X, Z Z :, γ :, γ : ) (Z, ˆγ, ˆγ ) Equivalent structural result U = g ([ X : U : ], [ X U ], ˆγ, ˆγ )

55 Recap: Solution approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ). A three step approach: 1. Consider a coordinated system 2. Show that the coordinated system is equivalent to the original system 3. Simplify the coordinated system

56 Applications Delayed sharing info structure (Open problem for 40 years) [Nayyar Mahajan Teneketzis 2011] real-time communication, feedback communication, multi-user communication, decentralized sequential hypothesis testing, multiaccess broadcast, active sensing,...

57 Randomized decision rules Unknown model Future directions

58 Thank you

Structure of optimal decentralized control policies An axiomatic approach

Structure of optimal decentralized control policies An axiomatic approach Structure of optimal decentralized control policies An axiomatic approach Aditya Mahajan Yale Joint work with: Demos Teneketzis (UofM), Sekhar Tatikonda (Yale), Ashutosh Nayyar (UofM), Serdar Yüksel (Queens)

More information

Optimal Decentralized Control of Coupled Subsystems With Control Sharing

Optimal Decentralized Control of Coupled Subsystems With Control Sharing IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 58, NO. 9, SEPTEMBER 2013 2377 Optimal Decentralized Control of Coupled Subsystems With Control Sharing Aditya Mahajan, Member, IEEE Abstract Subsystems that

More information

Chapter 4 The Common-Information Approach to Decentralized Stochastic Control

Chapter 4 The Common-Information Approach to Decentralized Stochastic Control Chapter 4 The Common-Information Approach to Decentralized Stochastic Control Ashutosh Nayyar, Aditya Mahajan, and Demosthenis Teneketzis 4.1 Introduction Many modern technological systems, such as cyber-physical

More information

Sufficient Statistics in Decentralized Decision-Making Problems

Sufficient Statistics in Decentralized Decision-Making Problems Sufficient Statistics in Decentralized Decision-Making Problems Ashutosh Nayyar University of Southern California Feb 8, 05 / 46 Decentralized Systems Transportation Networks Communication Networks Networked

More information

Equilibria for games with asymmetric information: from guesswork to systematic evaluation

Equilibria for games with asymmetric information: from guesswork to systematic evaluation Equilibria for games with asymmetric information: from guesswork to systematic evaluation Achilleas Anastasopoulos anastas@umich.edu EECS Department University of Michigan February 11, 2016 Joint work

More information

Problems in the intersection of Information theory and Control

Problems in the intersection of Information theory and Control Problems in the intersection of Information theory and Control Achilleas Anastasopoulos anastas@umich.edu EECS Department University of Michigan Dec 5, 2013 Former PhD student: Junghuyn Bae Current PhD

More information

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student

More information

Sequential Decision Making in Decentralized Systems

Sequential Decision Making in Decentralized Systems Sequential Decision Making in Decentralized Systems by Ashutosh Nayyar A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering:

More information

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach 1 Ashutosh Nayyar, Aditya Mahajan and Demosthenis Teneketzis Abstract A general model of decentralized

More information

Common Knowledge and Sequential Team Problems

Common Knowledge and Sequential Team Problems Common Knowledge and Sequential Team Problems Authors: Ashutosh Nayyar and Demosthenis Teneketzis Computer Engineering Technical Report Number CENG-2018-02 Ming Hsieh Department of Electrical Engineering

More information

Remote estimation over a packet-drop channel with Markovian state

Remote estimation over a packet-drop channel with Markovian state 1 Remote estimation over a packet-drop channel with Markovian state Jhelum Chakravorty and Aditya Mahajan Abstract We investigate a remote estimation problem in which a transmitter observes a Markov source

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Available online at ScienceDirect. IFAC-PapersOnLine (2016) Remote-state estimation with packet drop

Available online at  ScienceDirect. IFAC-PapersOnLine (2016) Remote-state estimation with packet drop Available online at www.sciencedirect.com ScienceDirect IFAC-PapersOnLine 49-22 (2016) 007 012 Remote-state estimation with packet drop Jhelum Chakravorty Aditya Mahajan McGill University, Electrical Computer

More information

Optimal Control Strategies in Delayed Sharing Information Structures

Optimal Control Strategies in Delayed Sharing Information Structures Optimal Control Strategies in Delayed Sharing Information Structures Ashutosh Nayyar, Aditya Mahajan and Demosthenis Teneketzis arxiv:1002.4172v1 [cs.oh] 22 Feb 2010 February 12, 2010 Abstract The n-step

More information

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and

More information

Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor

Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor 1 Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor A. Nayyar, T. Başar, D. Teneketzis and V. V. Veeravalli Abstract We consider a remote estimation problem with

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Stochastic Nestedness and Information Analysis of Tractability in Decentralized Control

Stochastic Nestedness and Information Analysis of Tractability in Decentralized Control Stochastic Nestedness and Information Analysis of Tractability in Decentralized Control Serdar Yüksel 1 Abstract Communication requirements for nestedness conditions require exchange of very large data

More information

The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount

The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount Yinyu Ye Department of Management Science and Engineering and Institute of Computational

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback

Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback Jhelum Chakravorty Electrical and Computer Engineering McGill University, Montreal, Canada Email: jhelum.chakravorty@mail.mcgill.ca

More information

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Optimally Solving Dec-POMDPs as Continuous-State MDPs Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Power Allocation over Two Identical Gilbert-Elliott Channels

Power Allocation over Two Identical Gilbert-Elliott Channels Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa

More information

Decentralized stochastic control

Decentralized stochastic control DOI 0.007/s0479-04-652-0 Decentralized stochastic control Aditya Mahajan Mehnaz Mannan Springer Science+Business Media New York 204 Abstract Decentralized stochastic control refers to the multi-stage optimization

More information

AN EXTERNALITY BASED DECENTRALIZED OPTIMAL POWER ALLOCATION SCHEME FOR WIRELESS MESH NETWORKS

AN EXTERNALITY BASED DECENTRALIZED OPTIMAL POWER ALLOCATION SCHEME FOR WIRELESS MESH NETWORKS AN EXTERNALITY BASED DECENTRALIZED OPTIMAL POWER ALLOCATION SCHEME FOR WIRELESS MESH NETWORKS Shruti Sharma and Demos Teneketzis Department of Electrical Engineering and Computer Science University of

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Bayesian Congestion Control over a Markovian Network Bandwidth Process Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work

More information

IAS. Dec-POMDPs as Non-Observable MDPs. Frans A. Oliehoek 1 and Christopher Amato 2

IAS. Dec-POMDPs as Non-Observable MDPs. Frans A. Oliehoek 1 and Christopher Amato 2 Universiteit van Amsterdam IAS technical report IAS-UVA-14-01 Dec-POMDPs as Non-Observable MDPs Frans A. Oliehoek 1 and Christopher Amato 2 1 Intelligent Systems Laboratory Amsterdam, University of Amsterdam.

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Decentralized LQG Control of Systems with a Broadcast Architecture

Decentralized LQG Control of Systems with a Broadcast Architecture Decentralized LQG Control of Systems with a Broadcast Architecture Laurent Lessard 1,2 IEEE Conference on Decision and Control, pp. 6241 6246, 212 Abstract In this paper, we consider dynamical subsystems

More information

Sequential team form and its simplification using graphical models

Sequential team form and its simplification using graphical models Forty-Seventh Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 30 - October 2, 2009 Sequential team form and its simplification using graphical models Aditya Mahajan Dept. of Electrical

More information

Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis

Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Journal of Artificial Intelligence Research 22 (2004) 143-174 Submitted 01/04; published 11/04 Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Claudia V. Goldman Shlomo

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Mathematical Optimization Models and Applications

Mathematical Optimization Models and Applications Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,

More information

Spatial Transformation

Spatial Transformation Spatial Transformation Presented by Liqun Chen June 30, 2017 1 Overview 2 Spatial Transformer Networks 3 STN experiments 4 Recurrent Models of Visual Attention (RAM) 5 Recurrent Models of Visual Attention

More information

Information-Theoretic Privacy for Smart Metering Systems with a Rechargeable Battery

Information-Theoretic Privacy for Smart Metering Systems with a Rechargeable Battery 1 Information-Theoretic Privacy for Smart Metering Systems with a Rechargeable Battery Simon Li, Ashish Khisti, and Aditya Mahajan Abstract Smart-metering systems report electricity usage of a user to

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic

More information

LORD: LOw-complexity, Rate-controlled, Distributed video coding system

LORD: LOw-complexity, Rate-controlled, Distributed video coding system LORD: LOw-complexity, Rate-controlled, Distributed video coding system Rami Cohen and David Malah Signal and Image Processing Lab Department of Electrical Engineering Technion - Israel Institute of Technology

More information

Communication constraints and latency in Networked Control Systems

Communication constraints and latency in Networked Control Systems Communication constraints and latency in Networked Control Systems João P. Hespanha Center for Control Engineering and Computation University of California Santa Barbara In collaboration with Antonio Ortega

More information

THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES

THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES MATHEMATICS OF OPERATIONS RESEARCH Vol. 27, No. 4, November 2002, pp. 819 840 Printed in U.S.A. THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES DANIEL S. BERNSTEIN, ROBERT GIVAN, NEIL

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

(Advances in) Quantum Reinforcement Learning. Vedran Dunjko

(Advances in) Quantum Reinforcement Learning. Vedran Dunjko (Advances in) Quantum Reinforcement Learning Vedran Dunjko vedran.dunjko@mpq.mpg.de 2017 Acknowledgements: Briegel Melnikov Tiersch theoretical physics Makmal Friis Piater Hangl intelligent & interactive

More information

Multimedia Communications. Scalar Quantization

Multimedia Communications. Scalar Quantization Multimedia Communications Scalar Quantization Scalar Quantization In many lossy compression applications we want to represent source outputs using a small number of code words. Process of representing

More information

CSE250A Fall 12: Discussion Week 9

CSE250A Fall 12: Discussion Week 9 CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.

More information

arxiv: v1 [cs.sy] 30 Sep 2015

arxiv: v1 [cs.sy] 30 Sep 2015 Optimal Sensor Scheduling and Remote Estimation over an Additive Noise Channel Xiaobin Gao, Emrah Akyol, and Tamer Başar arxiv:1510.00064v1 cs.sy 30 Sep 015 Abstract We consider a sensor scheduling and

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

Hypothesis Testing with Communication Constraints

Hypothesis Testing with Communication Constraints Hypothesis Testing with Communication Constraints Dinesh Krithivasan EECS 750 April 17, 2006 Dinesh Krithivasan (EECS 750) Hyp. testing with comm. constraints April 17, 2006 1 / 21 Presentation Outline

More information

The Markov Decision Process Extraction Network

The Markov Decision Process Extraction Network The Markov Decision Process Extraction Network Siegmund Duell 1,2, Alexander Hans 1,3, and Steffen Udluft 1 1- Siemens AG, Corporate Research and Technologies, Learning Systems, Otto-Hahn-Ring 6, D-81739

More information

The Complexity of Decentralized Control of Markov Decision Processes

The Complexity of Decentralized Control of Markov Decision Processes University of Massachusetts Amherst From the SelectedWorks of Neil Immerman June, 2000 The Complexity of Decentralized Control of Markov Decision Processes Daniel S. Bernstein Shlolo Zilberstein Neil Immerman,

More information

McGill University Department of Electrical and Computer Engineering

McGill University Department of Electrical and Computer Engineering McGill University Department of Electrical and Computer Engineering ECSE 56 - Stochastic Control Project Report Professor Aditya Mahajan Team Decision Theory and Information Structures in Optimal Control

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser

More information

Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays

Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays Sahand Haji Ali Ahmad, Mingyan Liu Abstract This paper considers the following stochastic control problem that arises

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood Optimality of Walrand-Varaiya Type Policies and Approximation Results for Zero-Delay Coding of Markov Sources by Richard G. Wood A thesis submitted to the Department of Mathematics & Statistics in conformity

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:

More information

Surrogate loss functions, divergences and decentralized detection

Surrogate loss functions, divergences and decentralized detection Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1

More information

Capacity Bounds for Diamond Networks

Capacity Bounds for Diamond Networks Technische Universität München Capacity Bounds for Diamond Networks Gerhard Kramer (TUM) joint work with Shirin Saeedi Bidokhti (TUM & Stanford) DIMACS Workshop on Network Coding Rutgers University, NJ

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

ECE Information theory Final

ECE Information theory Final ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Remote Estimation Games over Shared Networks

Remote Estimation Games over Shared Networks October st, 04 Remote Estimation Games over Shared Networks Marcos Vasconcelos & Nuno Martins marcos@umd.edu Dept. of Electrical and Computer Engineering Institute of Systems Research University of Maryland,

More information

Decentralized Control of Stochastic Systems

Decentralized Control of Stochastic Systems Decentralized Control of Stochastic Systems Sanjay Lall Stanford University CDC-ECC Workshop, December 11, 2005 2 S. Lall, Stanford 2005.12.11.02 Decentralized Control G 1 G 2 G 3 G 4 G 5 y 1 u 1 y 2 u

More information

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus

More information

Probability and Time: Hidden Markov Models (HMMs)

Probability and Time: Hidden Markov Models (HMMs) Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013 CPSC 322, Lecture 32 Slide 1 Lecture Overview Recap Markov Models Markov Chain

More information

Approximate Universal Artificial Intelligence

Approximate Universal Artificial Intelligence Approximate Universal Artificial Intelligence A Monte-Carlo AIXI Approximation Joel Veness Kee Siong Ng Marcus Hutter Dave Silver UNSW / NICTA / ANU / UoA September 8, 2010 General Reinforcement Learning

More information

Decentralized Decision Making!

Decentralized Decision Making! Decentralized Decision Making! in Partially Observable, Uncertain Worlds Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst Joint work with Martin Allen, Christopher

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Information Sources. Professor A. Manikas. Imperial College London. EE303 - Communication Systems An Overview of Fundamentals

Information Sources. Professor A. Manikas. Imperial College London. EE303 - Communication Systems An Overview of Fundamentals Information Sources Professor A. Manikas Imperial College London EE303 - Communication Systems An Overview of Fundamentals Prof. A. Manikas (Imperial College) EE303: Information Sources 24 Oct. 2011 1

More information

CS 598 Statistical Reinforcement Learning. Nan Jiang

CS 598 Statistical Reinforcement Learning. Nan Jiang CS 598 Statistical Reinforcement Learning Nan Jiang Overview What s this course about? A grad-level seminar course on theory of RL 3 What s this course about? A grad-level seminar course on theory of RL

More information

Transmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations

Transmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations Transmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations Xiaolu Zhang, Meixia Tao and Chun Sum Ng Department of Electrical and Computer Engineering National

More information

Note that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )).

Note that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )). l I ~-16 / (a) (5 points) What is the capacity Cr of the channel X -> Y? What is C of the channel Y - Z? (b) (5 points) What is the capacity C 3 of the cascaded channel X -3 Z? (c) (5 points) A ow let.

More information

Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems

Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems Sofía S. Villar Postdoctoral Fellow Basque Center for Applied Mathemathics (BCAM) Lancaster University November 5th,

More information

Markovian Decision Process (MDP): theory and applications to wireless networks

Markovian Decision Process (MDP): theory and applications to wireless networks Markovian Decision Process (MDP): theory and applications to wireless networks Philippe Ciblat Joint work with I. Fawaz, N. Ksairi, C. Le Martret, M. Sarkiss Outline Examples MDP Applications A few examples

More information

Temporal Difference Learning & Policy Iteration

Temporal Difference Learning & Policy Iteration Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.

More information

Decentralized decision making with spatially distributed data

Decentralized decision making with spatially distributed data Decentralized decision making with spatially distributed data XuanLong Nguyen Department of Statistics University of Michigan Acknowledgement: Michael Jordan, Martin Wainwright, Ram Rajagopal, Pravin Varaiya

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Decentralized Sequential Change Detection Using Physical Layer Fusion

Decentralized Sequential Change Detection Using Physical Layer Fusion Decentralized Sequential Change Detection Using hysical Layer Fusion Leena Zacharias ECE Department Indian Institute of Science Bangalore, India Email: zleena@ece.iisc.ernet.in ajesh Sundaresan ECE Department

More information

Decentralized Sequential Hypothesis Testing. Change Detection

Decentralized Sequential Hypothesis Testing. Change Detection Decentralized Sequential Hypothesis Testing & Change Detection Giorgos Fellouris, Columbia University, NY, USA George V. Moustakides, University of Patras, Greece Outline Sequential hypothesis testing

More information

The Optimality of Beamforming: A Unified View

The Optimality of Beamforming: A Unified View The Optimality of Beamforming: A Unified View Sudhir Srinivasa and Syed Ali Jafar Electrical Engineering and Computer Science University of California Irvine, Irvine, CA 92697-2625 Email: sudhirs@uciedu,

More information

Power Controlled FCFS Splitting Algorithm for Wireless Networks

Power Controlled FCFS Splitting Algorithm for Wireless Networks Power Controlled FCFS Splitting Algorithm for Wireless Networks Ashutosh Deepak Gore Abhay Karandikar Department of Electrical Engineering Indian Institute of Technology - Bombay COMNET Workshop, July

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

RL 14: POMDPs continued

RL 14: POMDPs continued RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Optimizing Memory-Bounded Controllers for Decentralized POMDPs

Optimizing Memory-Bounded Controllers for Decentralized POMDPs Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003

More information

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS Qing Zhao University of California Davis, CA 95616 qzhao@ece.ucdavis.edu Bhaskar Krishnamachari University of Southern California

More information

Event-triggered stabilization of linear systems under channel blackouts

Event-triggered stabilization of linear systems under channel blackouts Event-triggered stabilization of linear systems under channel blackouts Pavankumar Tallapragada, Massimo Franceschetti & Jorge Cortés Allerton Conference, 30 Sept. 2015 Acknowledgements: National Science

More information

Markov Decision Processes and Solving Finite Problems. February 8, 2017

Markov Decision Processes and Solving Finite Problems. February 8, 2017 Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:

More information

Logic, Knowledge Representation and Bayesian Decision Theory

Logic, Knowledge Representation and Bayesian Decision Theory Logic, Knowledge Representation and Bayesian Decision Theory David Poole University of British Columbia Overview Knowledge representation, logic, decision theory. Belief networks Independent Choice Logic

More information