Markovian Decision Process (MDP): theory and applications to wireless networks

Size: px
Start display at page:

Download "Markovian Decision Process (MDP): theory and applications to wireless networks"

Transcription

1 Markovian Decision Process (MDP): theory and applications to wireless networks Philippe Ciblat Joint work with I. Fawaz, N. Ksairi, C. Le Martret, M. Sarkiss

2 Outline Examples MDP Applications A few examples Markov Decision Process Markov chain Discounted approach Bellman s equation and fixed-point theorem Practical algorithms Applications to wireles networks Channel exploration Hybrid ARQ optimization Scheduling with Energy Harvesting Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 2 / 24

3 Examples MDP Applications Part 1 : A few examples Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 3 / 24

4 Examples MDP Applications Example 1 : Inventory control. Slot k x k x k+1 Packet s request and input a k Packet s output w k. x k stock at the beginning of slot k a k stock ordered and immediatly delivered at the beginning of slot k w k request of stock during slot k (iid process) Goal : optimizing {a k }? but a k too large (cost) ; a k too small (stock outage with x k+1 = 0) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 3 / 24

5 Examples MDP Applications Example 1 : Inventory control (cont d) Mathematical model 1 : Reward : r(x k, a k ) α-discounted (long-term) reward : r = lim N N α n r(x n, a n ) Sequential dynamic : Additive cost over time x k+1 = f (x k, a k, w k ) = (x k + a k w k ) + {x k } Markov chain with transition probability Q(x k+1 x k, a k ) Policy : a k+1 = µ (k) (x k ) [ {µ (k) }? min E x0,µ lim N ] N α n r(x n, µ (n) (x n )) Markov Decision Process (MDP) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 4 / 24

6 Examples MDP Applications Example 1 : Inventory control (cont d) Mathematical model 2 Reward/Cost : c(x k ) Outage : x k = 0 if x k + a k w k 0, 1 otherwise 1 o( x k ) = x k lim N N N o( x n ) < O t Let s k = [x k, x k ] : system state. Optimal policy : {µ (k) }? min E s0,µ [ lim N ] N α n r(s n, µ (n) (s n )) s.t. E s0,µ [ lim N 1 N ] N o(s n ) < O t Constrained Markov Decision Process (CMDP) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 5 / 24

7 Examples MDP Applications Example 2 : Water reservoir control Reservoir with C finite capacity : maximize simultaneously the stock and water release x k water volume at time k a k water released volume at time k (for electricity or irrigation) w k random inflow during slot k (rain and tributary rivers) : iid process We have x k+1 = min(x k a k + w k, C) Cost : r(x k, a k ) = x k + a k x k not perfectly known, but estimated. So we know y k = g(x k, ζ k ) Partially Observable Markov Decision Process (POMDP) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 6 / 24

8 Examples MDP Applications Markov Chain Part 2 : Markov Decision Process Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 7 / 24

9 Markov chain Examples MDP Applications Markov Chain Let {x k } a random sequence/process. Definition A sequence is said a Markov chain iff P(x k+1 x k,, x 0 ) = P(x k+1 x k ), k If x k takes a finite number of values X = {x (1),, x (V ) },. transition probability matrix : T = (t m,n ) 1 m,n V with t m,n = Prob(x k+1 = x (m) x k = x (n) ) 0 and V m=1 t m,n = 1. Graph theory (Stochastic) non-negative matrix theory x (1) x (2) x (3) x (4). Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 7 / 24

10 Examples MDP Applications Markov Chain State, Action, and Policy State : x k (system information at time k) Action : a k Disturbance : w k iid process Transition kernel : x k+1 = f (x k, a k, w k ) Q(x k+1 x k, a k ) = Prob(x k+1 = x (m) x k = x (n), a k = a) = tm,n (a, w)p w (w)dw Reward : r(x k, a k ) [ ] E x0 α n r(x n, a n ) (discount) or E x0 [ lim N 1 N ] N r(x n, a n ) Policy : µ k (x k ) A with A set of actions deterministic (µ k is a function) or random (µ k is a pdf) stationary (µ k independent of k) or nonstationary Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 8 / 24

11 Examples MDP Applications Markov Chain Bellman s equation Let µ be a deterministic stationary policy Let R µ (x 0 ) be the discounted reward assuming used policy µ and initialized at x 0 [ ] R µ (x 0 ) = E x0 α n r(x n, µ(x n )) Bellman s equation [ ] = r(x 0, µ(x 0 )) + αe x0 α n r(x n+1, µ(x n+1 )) = r(x 0, µ(x 0 )) + αe x0 [R µ (x 1 )] = r(x 0, µ(x 0 )) + α R µ (x)q(x x 0, µ(x 0 ))dx, x 0 R µ = T µ (R µ ) (fixed-point) x 0 T µ (f )(x 0 ) = r(x 0, µ(x 0 )) + α f (x)q(x x 0, µ(x 0 ))dx Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 9 / 24

12 Examples MDP Applications Markov Chain Fixed-point theorem We also define T (f )(x 0 ) = max a A { r(x 0, a) + α } f (x)q(x x 0, a)dx Lemma It exits an unique function R (resp. R µ ) such that R = T (R) and R µ = T µ (R µ ) Sketch of proof : Let f = sup x f (x) (within set of bounded function) Assuming r(, ) bounded function for any state and action T (f ) T (g) α max a A { α f g f (x)q(x x 0, a)dx T is a α-contraction, so Banach s fixed-point theorem applies } g(x)q(x x 0, a)dx Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 10 / 24

13 Examples MDP Applications Markov Chain Optimal deterministic policy Theorem A (deterministic stationary) policy µ is said optimal R µ (x 0 ) = R (x 0 ), x 0 with R (x 0 ) = sup µ R µ (x 0 ) Equivalently, R µ is the fixed point of T Sketch of proof : Let µ a policy s.t. R µ is the fixed point of T R µ (x 0 ) r(x 0, a) + α R µ (x)q(x x 0, a)dx, (x 0, a) X A R µ (x n ) r(x n, a n ) + α R µ (x)q(x x n, a n )dx E x0,µ[α n R µ (x n )] α n+1 E x0,µ[ R µ (x)q(x x n, a n )dx] E x0,µ[α n r(x n, a n )] ( α n +expectation) E x0,µ[α n R µ (x)q(x h n 1 )dx] E x0,µ[α n+1 R µ (x)q(x h n )dx] E x0,µ[α n r(x n, a n )] (Markov) R µ (x 0 ) α N+1 E x0,µ[ R µ (x)q(x h N )dx] E x0,µ[ N αn r(x n, a n )] (sum over n) R µ (x 0 ) E x0,µ[ αn r(x n, a n )] = R µ (x 0 ), µ (Q.E.D) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 11 / 24

14 Examples MDP Applications Markov Chain Algorithm 1 : Value iteration (VI) According to fixed-point thorem, we have R µ Let {R n } wityh any function R 0 s.t. = lim T... T N }{{} (R), R N times R n+1 = T (R n ) { R n+1 (x) = max r(x, a) + α a µ n (x) = arg max a { r(x, a) + α } R n (y)q(y x, a)dy } R n (y)q(y x, a)dy Theorem lim µ N(x) = µ (x), x N Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 12 / 24

15 Examples MDP Applications Markov Chain Algorithm 2 : Linear Programming (LP) Lemma Finite set of states X = {x (0),, x (V ) } Finite set of actions A = {a (0),, a (V ) } Let T be the Bellman s operator on vector s.t. W = T (W ) { } V W (m) = max r(x (m), a) + α W (n)q(x (n) x (m), a) a Let be the elementwise greater than. If U V, then T (U) T (V ) Let W be fixed point of T and W s.t. W T (W ), then W W As W T (W ), we get W = arg min W V W (m) s.t. W T (W ) m=0 Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 13 / 24

16 Examples MDP Applications Markov Chain Algorithm 2 : Linear Programming (LP) (cont d) Function R µ Vector R = [R µ (x (0) ),, R µ (x (V ) )] Linear programming algorithm s.t. R T (R) R(m) max a i.e. R fixed point of T R = arg min R { R(m) r(x (m), a (t) ) + α V R(n) m=0 r(x (m), a) + α } V R(n)Q(x (n) x (m), a) V R(n)Q(x (n) x (m), a (t) ), m, t Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 14 / 24

17 Examples MDP Applications Markov Chain Extension : CMDP Deterministic stationary policy not optimal anymore It exists a random stationary policy Computation of optimal policy still through linear programming Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 15 / 24

18 Examples MDP Applications Exploration HARQ optimization Scheduling Part 3 : Applications to wireless networks Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 16 / 24

19 Examples MDP Applications Exploration HARQ optimization Scheduling Multiband Exploration [Lun15] One secondary user may use N c non-contiguous channels (in 4G with carrier aggregation) Can explore only N e channels simultaneously Issue : which ones selecting at any time? Partially Observable Markov Decision Process (POMDP) States (at time n) -not all known empty/occupied channels Action N e channels to explore Transition kernel Q(s n+1 s n, a n ) Markovian process Reward r(s n, a n ) #empty tested channel Policy a n = π(s n ) [ ] Policy maximizing E α n r(s n, a n ) Extension : Competition between secondary users stochastic game Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 16 / 24 n

20 Examples MDP Applications Exploration HARQ optimization Scheduling Hybrid ARQ optimization Type-I HARQ : Let S be a packet composed by N coded symbols. TX RX S1 T S1 S2 NACK ACK NO YES Type-II HARQ : Memory at RX and non-identical packets at TX. TX RX S1(1). S1(2) NACK NO S2(1) ACK YES. Y 1 = S 1 (1) + N 1 Y 2 = S 1 (2) + N 2 detection on Y = [Y 1, Y 2 ] (Coding gain) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 17 / 24

21 Examples MDP Applications Exploration HARQ optimization Scheduling Case 1 : Power optimization [Taj13] We would like to adapt the power packet per packet After the n-th transmission, if NACK is received, we have New action A n+1 to do : here choosing P n+1 Available information : accumulated mutual information equal to I n = n log 2 (1 + G l P l ) l=1 K n {ACK and new round, 1 attempt and NACK received,, L attempts and NACK received} P(K n+1, I n+1 K n, I n, ) = P(K n+1, I n+1 K n, I n ) Markov Chain : S n = (K n, I n ) A policy µ = how selecting P n+1 given S n Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 18 / 24

22 Examples MDP Applications Exploration HARQ optimization Scheduling Optimization problem and numerical results max µ s.t. lim N N 1 1 reward µ (S n, A n ) N }{{} #information bits after ACK N 1 1 lim power(s n, A n ) P max N N Throughput (in bcpu) constant power optimal random policy based power Optimal random policy exists Optimal pdf obtained through linear programming SNR (db) [source: Tajan s PhD] Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 19 / 24

23 Examples MDP Applications Exploration HARQ optimization Scheduling Case 2 : Best Modulation and Coding scheme [Ksa15] Action : Best modulation and coding for the first packet transmission Action : Best modulation for the retransmission if BPSK : a few redondancy bits sent but well protected if QAM : a lot of redondancy bits sent but not well protected State : current channel impulse response, number of HARQ attempts, effective SNR Numerical result 1.5 LTE set-up Correlated channel (25km/h) Average throughput (Mbps) Proposed algorithm HARQ-aware scheme (not accounting for the future) HARQ-unaware scheme (not accounting for the future) Trivial scheme with random MCS selection subframe index (iteration number) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 20 / 24

24 Examples MDP Applications Exploration HARQ optimization Scheduling Scheduling with energy harvesting [Faw17]. Packet arrival a n (λ a ) Packets in trash w n DEVICE Battery: B Buffer: P Latency: K 0 Energy arrival e n (λ e ) Packets sent u n (cost in energy). Action : u n sent packets (of length L) using the following energy C n = σ2 g n ( 2 u nl 1 ) States : s n k n : age of the packet within the buffer b n : battery level b n+1 = min(b n C n + e n+1, B) g n : channel gain Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 21 / 24

25 Examples MDP Applications Exploration HARQ optimization Scheduling Scheduling with energy harvesting (cont d) Reward : packet loss (buffer overflow, delay non-fulfillment) [ ] min lim E 1 N (ε o (s n, u n ) + ε d (s n, u n )) N N with ε o number of dropped packets due to buffer overflow ε d number of dropped packets due to delay non-fulfillment Numerical results Dropped packet probability λ =1: Naive e λ e =1: Optimal λ =3: Naive e λ =3: Optimal e λ a Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 22 / 24

26 Examples MDP Applications Exploration HARQ optimization Scheduling Not treated issues Curse of dimensionality Competition between users (game theory [Lun15]) Unknown Q : reinforcment learning, Q-learning Closely related to Dynamic Programming (Viterbi algorithm for channel decoding or ISI management) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 23 / 24

27 References Examples MDP Applications Exploration HARQ optimization Scheduling [Her89] O. Hernández-Lerma, Adaptive Markov Control Processes, [Ber95] D. Bertsekas, Dynamic Programming and Optimal Control, [Taj13] R. Tajan, HARQ retransmission scheme in cognitive radio, PhD thesis, Univ. Cergy-Pontoise, France, [Ksa15] N. Ksairi and P. Ciblat, Modulation and Coding Schemes Selection for Type-II HARQ in Time-Correlated Fading Channels, IEEE Workshop on Signal Processing Advances for Wireless Communications (SPAWC), [Lun15] J. Lunden, V. Koivunen, and H.V. Poor, Spectrum Exploration and Exploitation for Cognitive Radio : Recent Advances, IEEE Signal Processing Magazine, March [Faw17] I. Fawaz, P. Ciblat, and M. Sarkiss, Energy minimization based Resource Scheduling for Strict Delay Constrained Wireless Communications, IEEE Global Signal Processing Conference (GLOBALSIP), Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 24 / 24

Energy minimization based Resource Scheduling for Strict Delay Constrained Wireless Communications

Energy minimization based Resource Scheduling for Strict Delay Constrained Wireless Communications Energy minimization based Resource Scheduling for Strict Delay Constrained Wireless Communications Ibrahim Fawaz 1,2, Philippe Ciblat 2, and Mireille Sarkiss 1 1 LIST, CEA, Communicating Systems Laboratory,

More information

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New

More information

Cooperative HARQ with Poisson Interference and Opportunistic Routing

Cooperative HARQ with Poisson Interference and Opportunistic Routing Cooperative HARQ with Poisson Interference and Opportunistic Routing Amogh Rajanna & Mostafa Kaveh Department of Electrical and Computer Engineering University of Minnesota, Minneapolis, MN USA. Outline

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Dynamic spectrum access with learning for cognitive radio

Dynamic spectrum access with learning for cognitive radio 1 Dynamic spectrum access with learning for cognitive radio Jayakrishnan Unnikrishnan and Venugopal V. Veeravalli Department of Electrical and Computer Engineering, and Coordinated Science Laboratory University

More information

Markov decision processes with threshold-based piecewise-linear optimal policies

Markov decision processes with threshold-based piecewise-linear optimal policies 1/31 Markov decision processes with threshold-based piecewise-linear optimal policies T. Erseghe, A. Zanella, C. Codemo Dept. of Information Engineering, University of Padova, Italy Padova, June 2, 213

More information

Online Learning Schemes for Power Allocation in Energy Harvesting Communications

Online Learning Schemes for Power Allocation in Energy Harvesting Communications Online Learning Schemes for Power Allocation in Energy Harvesting Communications Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering

More information

The Art of Sequential Optimization via Simulations

The Art of Sequential Optimization via Simulations The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Modeling and Simulation NETW 707

Modeling and Simulation NETW 707 Modeling and Simulation NETW 707 Lecture 6 ARQ Modeling: Modeling Error/Flow Control Course Instructor: Dr.-Ing. Maggie Mashaly maggie.ezzat@guc.edu.eg C3.220 1 Data Link Layer Data Link Layer provides

More information

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS Qing Zhao University of California Davis, CA 95616 qzhao@ece.ucdavis.edu Bhaskar Krishnamachari University of Southern California

More information

OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback. Prof. Phil Schniter

OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback. Prof. Phil Schniter OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback Prof. Phil Schniter T. H. E OHIO STATE UNIVERSITY Joint work with Mr. Rohit Aggarwal, Dr. Mohamad Assaad, Dr. C. Emre Koksal September

More information

Performance of Round Robin Policies for Dynamic Multichannel Access

Performance of Round Robin Policies for Dynamic Multichannel Access Performance of Round Robin Policies for Dynamic Multichannel Access Changmian Wang, Bhaskar Krishnamachari, Qing Zhao and Geir E. Øien Norwegian University of Science and Technology, Norway, {changmia,

More information

Power Allocation over Two Identical Gilbert-Elliott Channels

Power Allocation over Two Identical Gilbert-Elliott Channels Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Jayakrishnan Unnikrishnan, Student Member, IEEE, and Venugopal V. Veeravalli, Fellow, IEEE 1 arxiv:0807.2677v2 [cs.ni] 21 Nov 2008

More information

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment

More information

WIRELESS systems often operate in dynamic environments

WIRELESS systems often operate in dynamic environments IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 61, NO. 9, NOVEMBER 2012 3931 Structure-Aware Stochastic Control for Transmission Scheduling Fangwen Fu and Mihaela van der Schaar, Fellow, IEEE Abstract

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:

More information

NOMA: Principles and Recent Results

NOMA: Principles and Recent Results NOMA: Principles and Recent Results Jinho Choi School of EECS GIST September 2017 (VTC-Fall 2017) 1 / 46 Abstract: Non-orthogonal multiple access (NOMA) becomes a key technology in 5G as it can improve

More information

Average Age of Information with Hybrid ARQ under a Resource Constraint

Average Age of Information with Hybrid ARQ under a Resource Constraint 1 Average Age of Information with Hybrid ARQ under a Resource Constraint arxiv:1710.04971v3 [cs.it] 5 Dec 2018 Elif Tuğçe Ceran, Deniz Gündüz, and András György Department of Electrical and Electronic

More information

Coding versus ARQ in Fading Channels: How reliable should the PHY be?

Coding versus ARQ in Fading Channels: How reliable should the PHY be? Coding versus ARQ in Fading Channels: How reliable should the PHY be? Peng Wu and Nihar Jindal University of Minnesota, Minneapolis, MN 55455 Email: {pengwu,nihar}@umn.edu Abstract This paper studies the

More information

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

WIRELESS COMMUNICATIONS AND COGNITIVE RADIO TRANSMISSIONS UNDER QUALITY OF SERVICE CONSTRAINTS AND CHANNEL UNCERTAINTY

WIRELESS COMMUNICATIONS AND COGNITIVE RADIO TRANSMISSIONS UNDER QUALITY OF SERVICE CONSTRAINTS AND CHANNEL UNCERTAINTY University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Theses, Dissertations, and Student Research from Electrical & Computer Engineering Electrical & Computer Engineering, Department

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Matthew Johnston, Eytan Modiano Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge,

More information

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Jayakrishnan Unnikrishnan, Student Member, IEEE, and Venugopal V. Veeravalli, Fellow, IEEE 1 arxiv:0807.2677v4 [cs.ni] 6 Feb 2010

More information

Delay QoS Provisioning and Optimal Resource Allocation for Wireless Networks

Delay QoS Provisioning and Optimal Resource Allocation for Wireless Networks Syracuse University SURFACE Dissertations - ALL SURFACE June 2017 Delay QoS Provisioning and Optimal Resource Allocation for Wireless Networks Yi Li Syracuse University Follow this and additional works

More information

Throughput Maximization for Delay-Sensitive Random Access Communication

Throughput Maximization for Delay-Sensitive Random Access Communication 1 Throughput Maximization for Delay-Sensitive Random Access Communication Derya Malak, Howard Huang, and Jeffrey G. Andrews arxiv:1711.02056v2 [cs.it] 5 Jun 2018 Abstract Future 5G cellular networks supporting

More information

Chapter 4: Continuous channel and its capacity

Chapter 4: Continuous channel and its capacity meghdadi@ensil.unilim.fr Reference : Elements of Information Theory by Cover and Thomas Continuous random variable Gaussian multivariate random variable AWGN Band limited channel Parallel channels Flat

More information

INTRODUCTION TO MARKOV DECISION PROCESSES

INTRODUCTION TO MARKOV DECISION PROCESSES INTRODUCTION TO MARKOV DECISION PROCESSES Balázs Csanád Csáji Research Fellow, The University of Melbourne Signals & Systems Colloquium, 29 April 2010 Department of Electrical and Electronic Engineering,

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

ABSTRACT WIRELESS COMMUNICATIONS. criterion. Therefore, it is imperative to design advanced transmission schemes to

ABSTRACT WIRELESS COMMUNICATIONS. criterion. Therefore, it is imperative to design advanced transmission schemes to ABSTRACT Title of dissertation: DELAY MINIMIZATION IN ENERGY CONSTRAINED WIRELESS COMMUNICATIONS Jing Yang, Doctor of Philosophy, 2010 Dissertation directed by: Professor Şennur Ulukuş Department of Electrical

More information

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood Optimality of Walrand-Varaiya Type Policies and Approximation Results for Zero-Delay Coding of Markov Sources by Richard G. Wood A thesis submitted to the Department of Mathematics & Statistics in conformity

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

A Stochastic Control Approach for Scheduling Multimedia Transmissions over a Polled Multiaccess Fading Channel

A Stochastic Control Approach for Scheduling Multimedia Transmissions over a Polled Multiaccess Fading Channel A Stochastic Control Approach for Scheduling Multimedia Transmissions over a Polled Multiaccess Fading Channel 1 Munish Goyal, Anurag Kumar and Vinod Sharma Dept. of Electrical Communication Engineering,

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania Stochastic Dynamic Programming Jesus Fernande-Villaverde University of Pennsylvania 1 Introducing Uncertainty in Dynamic Programming Stochastic dynamic programming presents a very exible framework to handle

More information

Multicarrier transmission DMT/OFDM

Multicarrier transmission DMT/OFDM W. Henkel, International University Bremen 1 Multicarrier transmission DMT/OFDM DMT: Discrete Multitone (wireline, baseband) OFDM: Orthogonal Frequency Division Multiplex (wireless, with carrier, passband)

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

16.410/413 Principles of Autonomy and Decision Making

16.410/413 Principles of Autonomy and Decision Making 16.410/413 Principles of Autonomy and Decision Making Lecture 23: Markov Decision Processes Policy Iteration Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology December

More information

Stochastic Contextual Bandits with Known. Reward Functions

Stochastic Contextual Bandits with Known. Reward Functions Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern

More information

Energy-Efficient Transmission Scheduling with Strict Underflow Constraints

Energy-Efficient Transmission Scheduling with Strict Underflow Constraints Energy-Efficient Transmission Scheduling with Strict Underflow Constraints David I Shuman, Mingyan Liu, and Owen Q. Wu arxiv:0908.1774v2 [math.oc] 16 Feb 2010 Abstract We consider a single source transmitting

More information

Decentralized Control of Stochastic Systems

Decentralized Control of Stochastic Systems Decentralized Control of Stochastic Systems Sanjay Lall Stanford University CDC-ECC Workshop, December 11, 2005 2 S. Lall, Stanford 2005.12.11.02 Decentralized Control G 1 G 2 G 3 G 4 G 5 y 1 u 1 y 2 u

More information

Outage Probability for Two-Way Solar-Powered Relay Networks with Stochastic Scheduling

Outage Probability for Two-Way Solar-Powered Relay Networks with Stochastic Scheduling Outage Probability for Two-Way Solar-Powered Relay Networks with Stochastic Scheduling Wei Li, Meng-Lin Ku, Yan Chen, and K. J. Ray Liu Department of Electrical and Computer Engineering, University of

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

ABSTRACT CROSS-LAYER ASPECTS OF COGNITIVE WIRELESS NETWORKS. Title of dissertation: Anthony A. Fanous, Doctor of Philosophy, 2013

ABSTRACT CROSS-LAYER ASPECTS OF COGNITIVE WIRELESS NETWORKS. Title of dissertation: Anthony A. Fanous, Doctor of Philosophy, 2013 ABSTRACT Title of dissertation: CROSS-LAYER ASPECTS OF COGNITIVE WIRELESS NETWORKS Anthony A. Fanous, Doctor of Philosophy, 2013 Dissertation directed by: Professor Anthony Ephremides Department of Electrical

More information

Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios

Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios 1206 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 3, MARCH 2009 Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios Anh Tuan Hoang, Ying-Chang Liang, David Tung Chong Wong,

More information

Minimum Age of Information in the Internet of Things with Non-uniform Status Packet Sizes

Minimum Age of Information in the Internet of Things with Non-uniform Status Packet Sizes 1 Minimum Age of Information in the Internet of Things with Non-uniform Status Packet Sizes Bo Zhou and Walid Saad, Fellow, IEEE arxiv:1901.07069v2 [cs.it] 24 Jan 2019 Abstract In this paper, a real-time

More information

Discrete planning (an introduction)

Discrete planning (an introduction) Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Fast-Decodable MIMO HARQ Systems

Fast-Decodable MIMO HARQ Systems 1 Fast-Decodable MIMO HARQ Systems Seyyed Saleh Hosseini, Student Member, IEEE, Jamshid Abouei, Senior Member, IEEE, and Murat Uysal, Senior Member, IEEE Abstract This paper presents a comprehensive study

More information

CS 598 Statistical Reinforcement Learning. Nan Jiang

CS 598 Statistical Reinforcement Learning. Nan Jiang CS 598 Statistical Reinforcement Learning Nan Jiang Overview What s this course about? A grad-level seminar course on theory of RL 3 What s this course about? A grad-level seminar course on theory of RL

More information

Introduction to Reinforcement Learning Part 1: Markov Decision Processes

Introduction to Reinforcement Learning Part 1: Markov Decision Processes Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for

More information

Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems

Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems 1 Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems Man Chu, Hang Li, Member, IEEE, Xuewen Liao, Member, IEEE, and Shuguang Cui, Fellow, IEEE

More information

1 Introduction

1 Introduction Published in IET Communications Received on 27th February 2009 Revised on 13th November 2009 ISSN 1751-8628 Cross-layer optimisation of network performance over multiple-input multipleoutput wireless mobile

More information

Application-Level Scheduling with Deadline Constraints

Application-Level Scheduling with Deadline Constraints Application-Level Scheduling with Deadline Constraints 1 Huasen Wu, Xiaojun Lin, Xin Liu, and Youguang Zhang School of Electronic and Information Engineering, Beihang University, Beijing 100191, China

More information

A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling

A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling 2015 IEEE 54th Annual Conference on Decision and Control (CDC) December 15-18, 2015 Osaka, Japan A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling

More information

TCP over Cognitive Radio Channels

TCP over Cognitive Radio Channels 1/43 TCP over Cognitive Radio Channels Sudheer Poojary Department of ECE, Indian Institute of Science, Bangalore IEEE-IISc I-YES seminar 19 May 2016 2/43 Acknowledgments The work presented here was done

More information

Optimal sequential decision making for complex problems agents. Damien Ernst University of Liège

Optimal sequential decision making for complex problems agents. Damien Ernst University of Liège Optimal sequential decision making for complex problems agents Damien Ernst University of Liège Email: dernst@uliege.be 1 About the class Regular lectures notes about various topics on the subject with

More information

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague /doku.php/courses/a4b33zui/start pagenda Previous lecture: individual rational

More information

Reinforcement Learning as Classification Leveraging Modern Classifiers

Reinforcement Learning as Classification Leveraging Modern Classifiers Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Energy Efficient Multiuser Scheduling: Statistical Guarantees on Bursty Packet Loss

Energy Efficient Multiuser Scheduling: Statistical Guarantees on Bursty Packet Loss Energy Efficient Multiuser Scheduling: Statistical Guarantees on Bursty Packet Loss M. Majid Butt, Eduard A. Jorswieck and Amr Mohamed Department of Computer Science and Engineering, Qatar University,

More information

TODAY S mobile Internet is facing a grand challenge to. Application-Level Scheduling with Probabilistic Deadline Constraints

TODAY S mobile Internet is facing a grand challenge to. Application-Level Scheduling with Probabilistic Deadline Constraints 1 Application-Level Scheduling with Probabilistic Deadline Constraints Huasen Wu, Member, IEEE, Xiaojun Lin, Senior Member, IEEE, Xin Liu, Member, IEEE, and Youguang Zhang Abstract Opportunistic scheduling

More information

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts. An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu

More information

Some AI Planning Problems

Some AI Planning Problems Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533

More information

Cognitive Spectrum Access Control Based on Intrinsic Primary ARQ Information

Cognitive Spectrum Access Control Based on Intrinsic Primary ARQ Information Cognitive Spectrum Access Control Based on Intrinsic Primary ARQ Information Fabio E. Lapiccirella, Zhi Ding and Xin Liu Electrical and Computer Engineering University of California, Davis, California

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Interleave Division Multiple Access. Li Ping, Department of Electronic Engineering City University of Hong Kong

Interleave Division Multiple Access. Li Ping, Department of Electronic Engineering City University of Hong Kong Interleave Division Multiple Access Li Ping, Department of Electronic Engineering City University of Hong Kong 1 Outline! Introduction! IDMA! Chip-by-chip multiuser detection! Analysis and optimization!

More information

AN ABSTRACT OF THE THESIS OF

AN ABSTRACT OF THE THESIS OF AN ABSTRACT OF THE THESIS OF Thai Duong for the degree of Master of Science in Computer Science presented on May 20, 2015. Title: Data Collection in Sensor Networks via the Novel Fast Markov Decision Process

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

Q-Learning for Markov Decision Processes*

Q-Learning for Markov Decision Processes* McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of

More information

Stability Analysis in a Cognitive Radio System with Cooperative Beamforming

Stability Analysis in a Cognitive Radio System with Cooperative Beamforming Stability Analysis in a Cognitive Radio System with Cooperative Beamforming Mohammed Karmoose 1 Ahmed Sultan 1 Moustafa Youseff 2 1 Electrical Engineering Dept, Alexandria University 2 E-JUST Agenda 1

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels On the Optimality of Myopic Sensing 1 in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels Kehao Wang, Lin Chen arxiv:1103.1784v1 [cs.it] 9 Mar 2011 Abstract Recent works ([1],

More information

CAP Plan, Activity, and Intent Recognition

CAP Plan, Activity, and Intent Recognition CAP6938-02 Plan, Activity, and Intent Recognition Lecture 10: Sequential Decision-Making Under Uncertainty (part 1) MDPs and POMDPs Instructor: Dr. Gita Sukthankar Email: gitars@eecs.ucf.edu SP2-1 Reminder

More information

Wireless Transmission with Energy Harvesting and Storage. Fatemeh Amirnavaei

Wireless Transmission with Energy Harvesting and Storage. Fatemeh Amirnavaei Wireless Transmission with Energy Harvesting and Storage by Fatemeh Amirnavaei A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in The Faculty of Engineering

More information

A General Formula for Compound Channel Capacity

A General Formula for Compound Channel Capacity A General Formula for Compound Channel Capacity Sergey Loyka, Charalambos D. Charalambous University of Ottawa, University of Cyprus ETH Zurich (May 2015), ISIT-15 1/32 Outline 1 Introduction 2 Channel

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

Amr Rizk TU Darmstadt

Amr Rizk TU Darmstadt Saving Resources on Wireless Uplinks: Models of Queue-aware Scheduling 1 Amr Rizk TU Darmstadt - joint work with Markus Fidler 6. April 2016 KOM TUD Amr Rizk 1 Cellular Uplink Scheduling freq. time 6.

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

On Stochastic Feedback Control for Multi-antenna Beamforming: Formulation and Low-Complexity Algorithms

On Stochastic Feedback Control for Multi-antenna Beamforming: Formulation and Low-Complexity Algorithms 1 On Stochastic Feedback Control for Multi-antenna Beamforming: Formulation and Low-Complexity Algorithms Sun Sun, Student Member, IEEE, Min Dong, Senior Member, IEEE, and Ben Liang, Senior Member, IEEE

More information