Markovian Decision Process (MDP): theory and applications to wireless networks
|
|
- Kristopher Beasley
- 6 years ago
- Views:
Transcription
1 Markovian Decision Process (MDP): theory and applications to wireless networks Philippe Ciblat Joint work with I. Fawaz, N. Ksairi, C. Le Martret, M. Sarkiss
2 Outline Examples MDP Applications A few examples Markov Decision Process Markov chain Discounted approach Bellman s equation and fixed-point theorem Practical algorithms Applications to wireles networks Channel exploration Hybrid ARQ optimization Scheduling with Energy Harvesting Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 2 / 24
3 Examples MDP Applications Part 1 : A few examples Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 3 / 24
4 Examples MDP Applications Example 1 : Inventory control. Slot k x k x k+1 Packet s request and input a k Packet s output w k. x k stock at the beginning of slot k a k stock ordered and immediatly delivered at the beginning of slot k w k request of stock during slot k (iid process) Goal : optimizing {a k }? but a k too large (cost) ; a k too small (stock outage with x k+1 = 0) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 3 / 24
5 Examples MDP Applications Example 1 : Inventory control (cont d) Mathematical model 1 : Reward : r(x k, a k ) α-discounted (long-term) reward : r = lim N N α n r(x n, a n ) Sequential dynamic : Additive cost over time x k+1 = f (x k, a k, w k ) = (x k + a k w k ) + {x k } Markov chain with transition probability Q(x k+1 x k, a k ) Policy : a k+1 = µ (k) (x k ) [ {µ (k) }? min E x0,µ lim N ] N α n r(x n, µ (n) (x n )) Markov Decision Process (MDP) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 4 / 24
6 Examples MDP Applications Example 1 : Inventory control (cont d) Mathematical model 2 Reward/Cost : c(x k ) Outage : x k = 0 if x k + a k w k 0, 1 otherwise 1 o( x k ) = x k lim N N N o( x n ) < O t Let s k = [x k, x k ] : system state. Optimal policy : {µ (k) }? min E s0,µ [ lim N ] N α n r(s n, µ (n) (s n )) s.t. E s0,µ [ lim N 1 N ] N o(s n ) < O t Constrained Markov Decision Process (CMDP) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 5 / 24
7 Examples MDP Applications Example 2 : Water reservoir control Reservoir with C finite capacity : maximize simultaneously the stock and water release x k water volume at time k a k water released volume at time k (for electricity or irrigation) w k random inflow during slot k (rain and tributary rivers) : iid process We have x k+1 = min(x k a k + w k, C) Cost : r(x k, a k ) = x k + a k x k not perfectly known, but estimated. So we know y k = g(x k, ζ k ) Partially Observable Markov Decision Process (POMDP) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 6 / 24
8 Examples MDP Applications Markov Chain Part 2 : Markov Decision Process Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 7 / 24
9 Markov chain Examples MDP Applications Markov Chain Let {x k } a random sequence/process. Definition A sequence is said a Markov chain iff P(x k+1 x k,, x 0 ) = P(x k+1 x k ), k If x k takes a finite number of values X = {x (1),, x (V ) },. transition probability matrix : T = (t m,n ) 1 m,n V with t m,n = Prob(x k+1 = x (m) x k = x (n) ) 0 and V m=1 t m,n = 1. Graph theory (Stochastic) non-negative matrix theory x (1) x (2) x (3) x (4). Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 7 / 24
10 Examples MDP Applications Markov Chain State, Action, and Policy State : x k (system information at time k) Action : a k Disturbance : w k iid process Transition kernel : x k+1 = f (x k, a k, w k ) Q(x k+1 x k, a k ) = Prob(x k+1 = x (m) x k = x (n), a k = a) = tm,n (a, w)p w (w)dw Reward : r(x k, a k ) [ ] E x0 α n r(x n, a n ) (discount) or E x0 [ lim N 1 N ] N r(x n, a n ) Policy : µ k (x k ) A with A set of actions deterministic (µ k is a function) or random (µ k is a pdf) stationary (µ k independent of k) or nonstationary Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 8 / 24
11 Examples MDP Applications Markov Chain Bellman s equation Let µ be a deterministic stationary policy Let R µ (x 0 ) be the discounted reward assuming used policy µ and initialized at x 0 [ ] R µ (x 0 ) = E x0 α n r(x n, µ(x n )) Bellman s equation [ ] = r(x 0, µ(x 0 )) + αe x0 α n r(x n+1, µ(x n+1 )) = r(x 0, µ(x 0 )) + αe x0 [R µ (x 1 )] = r(x 0, µ(x 0 )) + α R µ (x)q(x x 0, µ(x 0 ))dx, x 0 R µ = T µ (R µ ) (fixed-point) x 0 T µ (f )(x 0 ) = r(x 0, µ(x 0 )) + α f (x)q(x x 0, µ(x 0 ))dx Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 9 / 24
12 Examples MDP Applications Markov Chain Fixed-point theorem We also define T (f )(x 0 ) = max a A { r(x 0, a) + α } f (x)q(x x 0, a)dx Lemma It exits an unique function R (resp. R µ ) such that R = T (R) and R µ = T µ (R µ ) Sketch of proof : Let f = sup x f (x) (within set of bounded function) Assuming r(, ) bounded function for any state and action T (f ) T (g) α max a A { α f g f (x)q(x x 0, a)dx T is a α-contraction, so Banach s fixed-point theorem applies } g(x)q(x x 0, a)dx Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 10 / 24
13 Examples MDP Applications Markov Chain Optimal deterministic policy Theorem A (deterministic stationary) policy µ is said optimal R µ (x 0 ) = R (x 0 ), x 0 with R (x 0 ) = sup µ R µ (x 0 ) Equivalently, R µ is the fixed point of T Sketch of proof : Let µ a policy s.t. R µ is the fixed point of T R µ (x 0 ) r(x 0, a) + α R µ (x)q(x x 0, a)dx, (x 0, a) X A R µ (x n ) r(x n, a n ) + α R µ (x)q(x x n, a n )dx E x0,µ[α n R µ (x n )] α n+1 E x0,µ[ R µ (x)q(x x n, a n )dx] E x0,µ[α n r(x n, a n )] ( α n +expectation) E x0,µ[α n R µ (x)q(x h n 1 )dx] E x0,µ[α n+1 R µ (x)q(x h n )dx] E x0,µ[α n r(x n, a n )] (Markov) R µ (x 0 ) α N+1 E x0,µ[ R µ (x)q(x h N )dx] E x0,µ[ N αn r(x n, a n )] (sum over n) R µ (x 0 ) E x0,µ[ αn r(x n, a n )] = R µ (x 0 ), µ (Q.E.D) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 11 / 24
14 Examples MDP Applications Markov Chain Algorithm 1 : Value iteration (VI) According to fixed-point thorem, we have R µ Let {R n } wityh any function R 0 s.t. = lim T... T N }{{} (R), R N times R n+1 = T (R n ) { R n+1 (x) = max r(x, a) + α a µ n (x) = arg max a { r(x, a) + α } R n (y)q(y x, a)dy } R n (y)q(y x, a)dy Theorem lim µ N(x) = µ (x), x N Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 12 / 24
15 Examples MDP Applications Markov Chain Algorithm 2 : Linear Programming (LP) Lemma Finite set of states X = {x (0),, x (V ) } Finite set of actions A = {a (0),, a (V ) } Let T be the Bellman s operator on vector s.t. W = T (W ) { } V W (m) = max r(x (m), a) + α W (n)q(x (n) x (m), a) a Let be the elementwise greater than. If U V, then T (U) T (V ) Let W be fixed point of T and W s.t. W T (W ), then W W As W T (W ), we get W = arg min W V W (m) s.t. W T (W ) m=0 Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 13 / 24
16 Examples MDP Applications Markov Chain Algorithm 2 : Linear Programming (LP) (cont d) Function R µ Vector R = [R µ (x (0) ),, R µ (x (V ) )] Linear programming algorithm s.t. R T (R) R(m) max a i.e. R fixed point of T R = arg min R { R(m) r(x (m), a (t) ) + α V R(n) m=0 r(x (m), a) + α } V R(n)Q(x (n) x (m), a) V R(n)Q(x (n) x (m), a (t) ), m, t Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 14 / 24
17 Examples MDP Applications Markov Chain Extension : CMDP Deterministic stationary policy not optimal anymore It exists a random stationary policy Computation of optimal policy still through linear programming Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 15 / 24
18 Examples MDP Applications Exploration HARQ optimization Scheduling Part 3 : Applications to wireless networks Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 16 / 24
19 Examples MDP Applications Exploration HARQ optimization Scheduling Multiband Exploration [Lun15] One secondary user may use N c non-contiguous channels (in 4G with carrier aggregation) Can explore only N e channels simultaneously Issue : which ones selecting at any time? Partially Observable Markov Decision Process (POMDP) States (at time n) -not all known empty/occupied channels Action N e channels to explore Transition kernel Q(s n+1 s n, a n ) Markovian process Reward r(s n, a n ) #empty tested channel Policy a n = π(s n ) [ ] Policy maximizing E α n r(s n, a n ) Extension : Competition between secondary users stochastic game Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 16 / 24 n
20 Examples MDP Applications Exploration HARQ optimization Scheduling Hybrid ARQ optimization Type-I HARQ : Let S be a packet composed by N coded symbols. TX RX S1 T S1 S2 NACK ACK NO YES Type-II HARQ : Memory at RX and non-identical packets at TX. TX RX S1(1). S1(2) NACK NO S2(1) ACK YES. Y 1 = S 1 (1) + N 1 Y 2 = S 1 (2) + N 2 detection on Y = [Y 1, Y 2 ] (Coding gain) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 17 / 24
21 Examples MDP Applications Exploration HARQ optimization Scheduling Case 1 : Power optimization [Taj13] We would like to adapt the power packet per packet After the n-th transmission, if NACK is received, we have New action A n+1 to do : here choosing P n+1 Available information : accumulated mutual information equal to I n = n log 2 (1 + G l P l ) l=1 K n {ACK and new round, 1 attempt and NACK received,, L attempts and NACK received} P(K n+1, I n+1 K n, I n, ) = P(K n+1, I n+1 K n, I n ) Markov Chain : S n = (K n, I n ) A policy µ = how selecting P n+1 given S n Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 18 / 24
22 Examples MDP Applications Exploration HARQ optimization Scheduling Optimization problem and numerical results max µ s.t. lim N N 1 1 reward µ (S n, A n ) N }{{} #information bits after ACK N 1 1 lim power(s n, A n ) P max N N Throughput (in bcpu) constant power optimal random policy based power Optimal random policy exists Optimal pdf obtained through linear programming SNR (db) [source: Tajan s PhD] Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 19 / 24
23 Examples MDP Applications Exploration HARQ optimization Scheduling Case 2 : Best Modulation and Coding scheme [Ksa15] Action : Best modulation and coding for the first packet transmission Action : Best modulation for the retransmission if BPSK : a few redondancy bits sent but well protected if QAM : a lot of redondancy bits sent but not well protected State : current channel impulse response, number of HARQ attempts, effective SNR Numerical result 1.5 LTE set-up Correlated channel (25km/h) Average throughput (Mbps) Proposed algorithm HARQ-aware scheme (not accounting for the future) HARQ-unaware scheme (not accounting for the future) Trivial scheme with random MCS selection subframe index (iteration number) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 20 / 24
24 Examples MDP Applications Exploration HARQ optimization Scheduling Scheduling with energy harvesting [Faw17]. Packet arrival a n (λ a ) Packets in trash w n DEVICE Battery: B Buffer: P Latency: K 0 Energy arrival e n (λ e ) Packets sent u n (cost in energy). Action : u n sent packets (of length L) using the following energy C n = σ2 g n ( 2 u nl 1 ) States : s n k n : age of the packet within the buffer b n : battery level b n+1 = min(b n C n + e n+1, B) g n : channel gain Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 21 / 24
25 Examples MDP Applications Exploration HARQ optimization Scheduling Scheduling with energy harvesting (cont d) Reward : packet loss (buffer overflow, delay non-fulfillment) [ ] min lim E 1 N (ε o (s n, u n ) + ε d (s n, u n )) N N with ε o number of dropped packets due to buffer overflow ε d number of dropped packets due to delay non-fulfillment Numerical results Dropped packet probability λ =1: Naive e λ e =1: Optimal λ =3: Naive e λ =3: Optimal e λ a Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 22 / 24
26 Examples MDP Applications Exploration HARQ optimization Scheduling Not treated issues Curse of dimensionality Competition between users (game theory [Lun15]) Unknown Q : reinforcment learning, Q-learning Closely related to Dynamic Programming (Viterbi algorithm for channel decoding or ISI management) Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 23 / 24
27 References Examples MDP Applications Exploration HARQ optimization Scheduling [Her89] O. Hernández-Lerma, Adaptive Markov Control Processes, [Ber95] D. Bertsekas, Dynamic Programming and Optimal Control, [Taj13] R. Tajan, HARQ retransmission scheme in cognitive radio, PhD thesis, Univ. Cergy-Pontoise, France, [Ksa15] N. Ksairi and P. Ciblat, Modulation and Coding Schemes Selection for Type-II HARQ in Time-Correlated Fading Channels, IEEE Workshop on Signal Processing Advances for Wireless Communications (SPAWC), [Lun15] J. Lunden, V. Koivunen, and H.V. Poor, Spectrum Exploration and Exploitation for Cognitive Radio : Recent Advances, IEEE Signal Processing Magazine, March [Faw17] I. Fawaz, P. Ciblat, and M. Sarkiss, Energy minimization based Resource Scheduling for Strict Delay Constrained Wireless Communications, IEEE Global Signal Processing Conference (GLOBALSIP), Philippe Ciblat Markovian Decision Process (MDP): theory and applications to wireless networks 24 / 24
Energy minimization based Resource Scheduling for Strict Delay Constrained Wireless Communications
Energy minimization based Resource Scheduling for Strict Delay Constrained Wireless Communications Ibrahim Fawaz 1,2, Philippe Ciblat 2, and Mireille Sarkiss 1 1 LIST, CEA, Communicating Systems Laboratory,
More informationA POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation
A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New
More informationCooperative HARQ with Poisson Interference and Opportunistic Routing
Cooperative HARQ with Poisson Interference and Opportunistic Routing Amogh Rajanna & Mostafa Kaveh Department of Electrical and Computer Engineering University of Minnesota, Minneapolis, MN USA. Outline
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationDynamic spectrum access with learning for cognitive radio
1 Dynamic spectrum access with learning for cognitive radio Jayakrishnan Unnikrishnan and Venugopal V. Veeravalli Department of Electrical and Computer Engineering, and Coordinated Science Laboratory University
More informationMarkov decision processes with threshold-based piecewise-linear optimal policies
1/31 Markov decision processes with threshold-based piecewise-linear optimal policies T. Erseghe, A. Zanella, C. Codemo Dept. of Information Engineering, University of Padova, Italy Padova, June 2, 213
More informationOnline Learning Schemes for Power Allocation in Energy Harvesting Communications
Online Learning Schemes for Power Allocation in Energy Harvesting Communications Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering
More informationThe Art of Sequential Optimization via Simulations
The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationModeling and Simulation NETW 707
Modeling and Simulation NETW 707 Lecture 6 ARQ Modeling: Modeling Error/Flow Control Course Instructor: Dr.-Ing. Maggie Mashaly maggie.ezzat@guc.edu.eg C3.220 1 Data Link Layer Data Link Layer provides
More informationSTRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS
STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS Qing Zhao University of California Davis, CA 95616 qzhao@ece.ucdavis.edu Bhaskar Krishnamachari University of Southern California
More informationOFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback. Prof. Phil Schniter
OFDMA Downlink Resource Allocation using Limited Cross-Layer Feedback Prof. Phil Schniter T. H. E OHIO STATE UNIVERSITY Joint work with Mr. Rohit Aggarwal, Dr. Mohamad Assaad, Dr. C. Emre Koksal September
More informationPerformance of Round Robin Policies for Dynamic Multichannel Access
Performance of Round Robin Policies for Dynamic Multichannel Access Changmian Wang, Bhaskar Krishnamachari, Qing Zhao and Geir E. Øien Norwegian University of Science and Technology, Norway, {changmia,
More informationPower Allocation over Two Identical Gilbert-Elliott Channels
Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationAlgorithms for Dynamic Spectrum Access with Learning for Cognitive Radio
Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Jayakrishnan Unnikrishnan, Student Member, IEEE, and Venugopal V. Veeravalli, Fellow, IEEE 1 arxiv:0807.2677v2 [cs.ni] 21 Nov 2008
More informationOpen Problem: Approximate Planning of POMDPs in the class of Memoryless Policies
Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment
More informationWIRELESS systems often operate in dynamic environments
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 61, NO. 9, NOVEMBER 2012 3931 Structure-Aware Stochastic Control for Transmission Scheduling Fangwen Fu and Mihaela van der Schaar, Fellow, IEEE Abstract
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationSequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague
Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationNOMA: Principles and Recent Results
NOMA: Principles and Recent Results Jinho Choi School of EECS GIST September 2017 (VTC-Fall 2017) 1 / 46 Abstract: Non-orthogonal multiple access (NOMA) becomes a key technology in 5G as it can improve
More informationAverage Age of Information with Hybrid ARQ under a Resource Constraint
1 Average Age of Information with Hybrid ARQ under a Resource Constraint arxiv:1710.04971v3 [cs.it] 5 Dec 2018 Elif Tuğçe Ceran, Deniz Gündüz, and András György Department of Electrical and Electronic
More informationCoding versus ARQ in Fading Channels: How reliable should the PHY be?
Coding versus ARQ in Fading Channels: How reliable should the PHY be? Peng Wu and Nihar Jindal University of Minnesota, Minneapolis, MN 55455 Email: {pengwu,nihar}@umn.edu Abstract This paper studies the
More informationOn Finding Optimal Policies for Markovian Decision Processes Using Simulation
On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationCapacity of the Discrete Memoryless Energy Harvesting Channel with Side Information
204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener
More informationWIRELESS COMMUNICATIONS AND COGNITIVE RADIO TRANSMISSIONS UNDER QUALITY OF SERVICE CONSTRAINTS AND CHANNEL UNCERTAINTY
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Theses, Dissertations, and Student Research from Electrical & Computer Engineering Electrical & Computer Engineering, Department
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationChannel Probing in Communication Systems: Myopic Policies Are Not Always Optimal
Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Matthew Johnston, Eytan Modiano Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge,
More informationAlgorithms for Dynamic Spectrum Access with Learning for Cognitive Radio
Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Jayakrishnan Unnikrishnan, Student Member, IEEE, and Venugopal V. Veeravalli, Fellow, IEEE 1 arxiv:0807.2677v4 [cs.ni] 6 Feb 2010
More informationDelay QoS Provisioning and Optimal Resource Allocation for Wireless Networks
Syracuse University SURFACE Dissertations - ALL SURFACE June 2017 Delay QoS Provisioning and Optimal Resource Allocation for Wireless Networks Yi Li Syracuse University Follow this and additional works
More informationThroughput Maximization for Delay-Sensitive Random Access Communication
1 Throughput Maximization for Delay-Sensitive Random Access Communication Derya Malak, Howard Huang, and Jeffrey G. Andrews arxiv:1711.02056v2 [cs.it] 5 Jun 2018 Abstract Future 5G cellular networks supporting
More informationChapter 4: Continuous channel and its capacity
meghdadi@ensil.unilim.fr Reference : Elements of Information Theory by Cover and Thomas Continuous random variable Gaussian multivariate random variable AWGN Band limited channel Parallel channels Flat
More informationINTRODUCTION TO MARKOV DECISION PROCESSES
INTRODUCTION TO MARKOV DECISION PROCESSES Balázs Csanád Csáji Research Fellow, The University of Melbourne Signals & Systems Colloquium, 29 April 2010 Department of Electrical and Electronic Engineering,
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationABSTRACT WIRELESS COMMUNICATIONS. criterion. Therefore, it is imperative to design advanced transmission schemes to
ABSTRACT Title of dissertation: DELAY MINIMIZATION IN ENERGY CONSTRAINED WIRELESS COMMUNICATIONS Jing Yang, Doctor of Philosophy, 2010 Dissertation directed by: Professor Şennur Ulukuş Department of Electrical
More informationOptimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood
Optimality of Walrand-Varaiya Type Policies and Approximation Results for Zero-Delay Coding of Markov Sources by Richard G. Wood A thesis submitted to the Department of Mathematics & Statistics in conformity
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationA Stochastic Control Approach for Scheduling Multimedia Transmissions over a Polled Multiaccess Fading Channel
A Stochastic Control Approach for Scheduling Multimedia Transmissions over a Polled Multiaccess Fading Channel 1 Munish Goyal, Anurag Kumar and Vinod Sharma Dept. of Electrical Communication Engineering,
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationELEC546 Review of Information Theory
ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random
More informationStochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania
Stochastic Dynamic Programming Jesus Fernande-Villaverde University of Pennsylvania 1 Introducing Uncertainty in Dynamic Programming Stochastic dynamic programming presents a very exible framework to handle
More informationMulticarrier transmission DMT/OFDM
W. Henkel, International University Bremen 1 Multicarrier transmission DMT/OFDM DMT: Discrete Multitone (wireline, baseband) OFDM: Orthogonal Frequency Division Multiplex (wireless, with carrier, passband)
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More information16.410/413 Principles of Autonomy and Decision Making
16.410/413 Principles of Autonomy and Decision Making Lecture 23: Markov Decision Processes Policy Iteration Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology December
More informationStochastic Contextual Bandits with Known. Reward Functions
Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern
More informationEnergy-Efficient Transmission Scheduling with Strict Underflow Constraints
Energy-Efficient Transmission Scheduling with Strict Underflow Constraints David I Shuman, Mingyan Liu, and Owen Q. Wu arxiv:0908.1774v2 [math.oc] 16 Feb 2010 Abstract We consider a single source transmitting
More informationDecentralized Control of Stochastic Systems
Decentralized Control of Stochastic Systems Sanjay Lall Stanford University CDC-ECC Workshop, December 11, 2005 2 S. Lall, Stanford 2005.12.11.02 Decentralized Control G 1 G 2 G 3 G 4 G 5 y 1 u 1 y 2 u
More informationOutage Probability for Two-Way Solar-Powered Relay Networks with Stochastic Scheduling
Outage Probability for Two-Way Solar-Powered Relay Networks with Stochastic Scheduling Wei Li, Meng-Lin Ku, Yan Chen, and K. J. Ray Liu Department of Electrical and Computer Engineering, University of
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationABSTRACT CROSS-LAYER ASPECTS OF COGNITIVE WIRELESS NETWORKS. Title of dissertation: Anthony A. Fanous, Doctor of Philosophy, 2013
ABSTRACT Title of dissertation: CROSS-LAYER ASPECTS OF COGNITIVE WIRELESS NETWORKS Anthony A. Fanous, Doctor of Philosophy, 2013 Dissertation directed by: Professor Anthony Ephremides Department of Electrical
More informationOpportunistic Spectrum Access for Energy-Constrained Cognitive Radios
1206 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 3, MARCH 2009 Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios Anh Tuan Hoang, Ying-Chang Liang, David Tung Chong Wong,
More informationMinimum Age of Information in the Internet of Things with Non-uniform Status Packet Sizes
1 Minimum Age of Information in the Internet of Things with Non-uniform Status Packet Sizes Bo Zhou and Walid Saad, Fellow, IEEE arxiv:1901.07069v2 [cs.it] 24 Jan 2019 Abstract In this paper, a real-time
More informationDiscrete planning (an introduction)
Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationFast-Decodable MIMO HARQ Systems
1 Fast-Decodable MIMO HARQ Systems Seyyed Saleh Hosseini, Student Member, IEEE, Jamshid Abouei, Senior Member, IEEE, and Murat Uysal, Senior Member, IEEE Abstract This paper presents a comprehensive study
More informationCS 598 Statistical Reinforcement Learning. Nan Jiang
CS 598 Statistical Reinforcement Learning Nan Jiang Overview What s this course about? A grad-level seminar course on theory of RL 3 What s this course about? A grad-level seminar course on theory of RL
More informationIntroduction to Reinforcement Learning Part 1: Markov Decision Processes
Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for
More informationReinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems
1 Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems Man Chu, Hang Li, Member, IEEE, Xuewen Liao, Member, IEEE, and Shuguang Cui, Fellow, IEEE
More information1 Introduction
Published in IET Communications Received on 27th February 2009 Revised on 13th November 2009 ISSN 1751-8628 Cross-layer optimisation of network performance over multiple-input multipleoutput wireless mobile
More informationApplication-Level Scheduling with Deadline Constraints
Application-Level Scheduling with Deadline Constraints 1 Huasen Wu, Xiaojun Lin, Xin Liu, and Youguang Zhang School of Electronic and Information Engineering, Beihang University, Beijing 100191, China
More informationA Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling
2015 IEEE 54th Annual Conference on Decision and Control (CDC) December 15-18, 2015 Osaka, Japan A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling
More informationTCP over Cognitive Radio Channels
1/43 TCP over Cognitive Radio Channels Sudheer Poojary Department of ECE, Indian Institute of Science, Bangalore IEEE-IISc I-YES seminar 19 May 2016 2/43 Acknowledgments The work presented here was done
More informationOptimal sequential decision making for complex problems agents. Damien Ernst University of Liège
Optimal sequential decision making for complex problems agents Damien Ernst University of Liège Email: dernst@uliege.be 1 About the class Regular lectures notes about various topics on the subject with
More informationSequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague
Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague /doku.php/courses/a4b33zui/start pagenda Previous lecture: individual rational
More informationReinforcement Learning as Classification Leveraging Modern Classifiers
Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationEnergy Efficient Multiuser Scheduling: Statistical Guarantees on Bursty Packet Loss
Energy Efficient Multiuser Scheduling: Statistical Guarantees on Bursty Packet Loss M. Majid Butt, Eduard A. Jorswieck and Amr Mohamed Department of Computer Science and Engineering, Qatar University,
More informationTODAY S mobile Internet is facing a grand challenge to. Application-Level Scheduling with Probabilistic Deadline Constraints
1 Application-Level Scheduling with Probabilistic Deadline Constraints Huasen Wu, Member, IEEE, Xiaojun Lin, Senior Member, IEEE, Xin Liu, Member, IEEE, and Youguang Zhang Abstract Opportunistic scheduling
More informationApproximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.
An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu
More informationSome AI Planning Problems
Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533
More informationCognitive Spectrum Access Control Based on Intrinsic Primary ARQ Information
Cognitive Spectrum Access Control Based on Intrinsic Primary ARQ Information Fabio E. Lapiccirella, Zhi Ding and Xin Liu Electrical and Computer Engineering University of California, Davis, California
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationInterleave Division Multiple Access. Li Ping, Department of Electronic Engineering City University of Hong Kong
Interleave Division Multiple Access Li Ping, Department of Electronic Engineering City University of Hong Kong 1 Outline! Introduction! IDMA! Chip-by-chip multiuser detection! Analysis and optimization!
More informationAN ABSTRACT OF THE THESIS OF
AN ABSTRACT OF THE THESIS OF Thai Duong for the degree of Master of Science in Computer Science presented on May 20, 2015. Title: Data Collection in Sensor Networks via the Novel Fast Markov Decision Process
More informationElements of Reinforcement Learning
Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,
More informationQ-Learning for Markov Decision Processes*
McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of
More informationStability Analysis in a Cognitive Radio System with Cooperative Beamforming
Stability Analysis in a Cognitive Radio System with Cooperative Beamforming Mohammed Karmoose 1 Ahmed Sultan 1 Moustafa Youseff 2 1 Electrical Engineering Dept, Alexandria University 2 E-JUST Agenda 1
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationINF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018
Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)
More informationOn the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels
On the Optimality of Myopic Sensing 1 in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels Kehao Wang, Lin Chen arxiv:1103.1784v1 [cs.it] 9 Mar 2011 Abstract Recent works ([1],
More informationCAP Plan, Activity, and Intent Recognition
CAP6938-02 Plan, Activity, and Intent Recognition Lecture 10: Sequential Decision-Making Under Uncertainty (part 1) MDPs and POMDPs Instructor: Dr. Gita Sukthankar Email: gitars@eecs.ucf.edu SP2-1 Reminder
More informationWireless Transmission with Energy Harvesting and Storage. Fatemeh Amirnavaei
Wireless Transmission with Energy Harvesting and Storage by Fatemeh Amirnavaei A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in The Faculty of Engineering
More informationA General Formula for Compound Channel Capacity
A General Formula for Compound Channel Capacity Sergey Loyka, Charalambos D. Charalambous University of Ottawa, University of Cyprus ETH Zurich (May 2015), ISIT-15 1/32 Outline 1 Introduction 2 Channel
More informationReinforcement learning an introduction
Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,
More informationAmr Rizk TU Darmstadt
Saving Resources on Wireless Uplinks: Models of Queue-aware Scheduling 1 Amr Rizk TU Darmstadt - joint work with Markus Fidler 6. April 2016 KOM TUD Amr Rizk 1 Cellular Uplink Scheduling freq. time 6.
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationOn Stochastic Feedback Control for Multi-antenna Beamforming: Formulation and Low-Complexity Algorithms
1 On Stochastic Feedback Control for Multi-antenna Beamforming: Formulation and Low-Complexity Algorithms Sun Sun, Student Member, IEEE, Min Dong, Senior Member, IEEE, and Ben Liang, Senior Member, IEEE
More information