Bits of Machine Learning Part 2: Unsupervised Learning

Size: px
Start display at page:

Download "Bits of Machine Learning Part 2: Unsupervised Learning"

Transcription

1 Bits of Machine Learning Part 2: Unsupervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology)

2 Outline of the Course 1. Supervised Learning Regression and Classification Neural Networks (Deep Learning) 2. Unsupervised Learning Clustering Reinforcement Learning 1

3 Outline of the Course 1. Supervised Learning Regression and Classification Neural Networks (Deep Learning) 2. Unsupervised Learning Clustering Reinforcement Learning 2

4 Clustering Clustering extracts structures from unlabelled data {X i, i = 1, n}, i.e., groups similar data points into a small set of clusters 3

5 Clustering: Applications Historical application: find geographical correlations in cholera deaths in London (John Snow 1850) Image segmentation: identify regions of interests in an image Marketing: detect groups of customers to design targeted ads Insurance: Identify groups of users with high risk Online services and web: e.g. cluster webpage based on their content Bioinformatics: group proteins w.r.t. their functionality and chemical structure... 4

6 Algorithms Commonly used clustering algorithms Center based approaches (K-means, K-medians, Gaussian mixture models, affinity propogation) Hierarchical/agglomerative approaches (single linkage, complete linkage, average linkage, Ward s method, BIRCH) Graph based approaches (spectral clustering) Density based approaches (DBSCAN, OPTICS) Bayesian nonparametric approaches (Dirichlet Processes, PitmanYor processes)... 5

7 Algorithms 6

8 Clustering evaluation 1. Performance benchmarks: artificially generate (clustered) data and evaluate the number of misclassified points 2. From the data only: a hard problem. What is a good clustering? - Intra-cluster cohesion: measures how near data points are in the same cluster Example: the cohesion of cluster C could be i,j C Xi Xj 2 K-means algorithm finds over possible clusters C 1,..., C K local minima of K X i µ k 2 i C k k=1 where µ k is the center of cluster k - Inter-cluster separation: distance between points of different clusters 7

9 Clustering evaluation 2. From the data only: Graph-based approach, V = {1,..., n} Define a complete weighted graph from the data: weight on edge (i, j) w ij = f( X i X j ) Example: Normalized cut. The Ncuts of (C 1, C 2 ) is Ncut(C 1, C 2 ) = cut(c 1, C 2 ) cut(c 1, V ) + cut(c 2, C 1 ) cut(c 2, V ), where cut(a, B) = i A,j B w ij We may wish to find clusters minimising the Ncut (NP-hard) 8

10 K-means (Lloyd 1957, MacQueen 1967) K-means aims at finding clusters C = {C 1, C k } solving: min C K k=1 j C k X i µ k 2 where K is the number of clusters, and µ k is the center of the k-th cluster. Solving the K-means clustering problem in R d is: 1. NP-hard in general Euclidean space d even for 2 clusters 2. NP-hard for a general number of clusters K even in the plane 3. When K and d are fixed, the problem can be solved in time O(n dk+1 log n) 9

11 K-means 1. Initialisation: pick K random points as initial cluster centers 2. Cluster improvement: while the cluster assignment changes do a. Assign each data point to the cluster with the closest center b. Change the cluster centers (the average position of points within the cluster) 10

12 Example 1 11

13 Example 2: Image Segmentation 12

14 K-means algorithm: Properties Guaranteed to converge in a finite number of iterations Running time per iteration: O(nKd) Issues 1. Can be stuck in a bad local minimum 2. Different initialisations yield different solutions 3. The number of clusters K needs to be specified in advance Example of approaches to solve the initialization and finding the number of clusters problem: V. Petrosyan and A. Proutiere, Viral Clustering: A Robust Method to Extract Structures in Heterogeneous Datasets, AAAI

15 Gaussian Mixture Models (GMM) and the EM Algorithm Approach: assume that the data points are generated according to a GMM in an i.i.d. manner, and estimate the parameters of the GMM GMM: Each point is distributed according to the distribution with the density p(x θ): θ = (θ k ) k=1,...,k and θ k = (α k, µ k, Σ k ) p(x θ) = K α k p G (x θ k ) k=1 1 where p G (x θ k ) = (2π) d/2 Σ k exp( 1 1/2 2 (x µ k) Σ 1 k (x µ k)) α k : probability that a point is attached to cluster k µ k : center of cluster k Σ k : covariance matrix of the k-th component of the GMM 14

16 GMM Example The density of a GMM with three components in R 2 15

17 The Expectation-Maximisation algorithm Dempster-Laird-Rubin, 1977 Aims at maximising the log-likelihood of the model given the data Two alternating steps: 1. E-step: For a given estimate of θ, evaluate the probability of a data point to belong to the various clusters. The weight membership of X i in cluster k is: w ik = α k p G (X i θ k ) K m=1 α mp G (X i θ m ) 2. M-step: Update θ with increased the log-likelihood 16

18 The Expectation-Maximisation algorithm Initialisation: select α 1 k, µ1 k, Σ1 k arbitrarily, t = 1 Until convergence: do 1. E-step: Compute for all i and k w ik = α t k p G(X i µ t k, Σt k ) K m=1 αt mp G (X i µ t m, Σ t m), N k = n i=1 2. M-step: Update the parameters as follows: for all k, w ik α t+1 k = N k N, µt+1 k = 1 N k n w ik X i i=1 3. t := t + 1 Σ t+1 k = 1 N k n w ik (X i µ t+1 k )(X i µ t+1 k i=1 ) 17

19 EM algorithm: Properties Convergence: On the Convergence Properties of the EM Algorithm, C.F.J. Wu, Annals of Statistics, 1983 Issues 1. Can be stuck in a bad local minimum 2. Suffers from the curse of dimensionality (not suitable for high dimensional data) 18

20 Spectral Clustering K-means and the EM algorithm for GMM are center-based algorithms. How to deal with clusters that do not look like potatoes?... apply spectral methods on similarity matrices or graphs! Spectral method: work with eigenvalues and eigenvectors of the similarity matrix 19

21 Spectral Clustering Data: X 1,... X n R d Similarity matrix: W = (w ij ) R n n with for example (Gaussian kernel) i, j, w ij = 1 i j exp( X i X j 2 ) σ Spectral analysis: we get the top-k eigenvectors of W (those corresponding to the highest singular values) The spectral idea: the data points are well represented by their projections on the linear span these eigenvectors 20

22 Spectral Clustering On spectral clustering: Analysis and an algorithm, A. Ng, M. Jordan, Y. Weiss, NIPS Input: W R n n, K {1, 2,, n} 2. Let D be a diagonal matrix s.t. D ii := n j=1 w ij 3. L := D 1/2 W D 1/2 4. E = [e 1, e 2,, e K ] := eigs(l, K) R n K 5. Normalize each row of E 6. y := K-means(E, K) 7. Output: y eigs(l, K) returns the top K eigenvectors of L 21

23 Example 22

24 Tuning Spectral Clustering We may consider different notions of similarity: 1. Gaussian kernel w ij = e X i X j 2 σ σ? for i j and w ii = 0. Choice of 2. k-nearest neighbour similarity: w ij = 1 if i is one of the k nearest neighbours of j or if j is one of the k nearest neighbours of i. Choice of k? 3. Self tuning kernel: w ij = e X i X j X i X im X j X jm for i j and w i,i = 0, where i m is the index of m-th nearest neighbour of i. Choice of m? [Play with these parameters in the lab... :-) ] 2 23

25 Outline of the Course 1. Supervised Learning Regression and Classification Neural Networks (Deep Learning) 2. Unsupervised Learning Clustering Reinforcement Learning 24

26 Reinforcement learning Learning optimal sequential behaviour / control from interacting with the environment State dynamics: s π t+1 = F t (s π t, a π t ) 25

27 Reinforcement learning Learning optimal sequential behaviour / control from interacting with the environment [...] By the time we learn to live It s already too late Our hearts cry in unison at night [...] Louis Aragon 26

28 Reinforcement learning: Applications Making a robot walk Portfolio optimisation Playing games better than humans Helicopter stunt manoeuvres Optimal communication protocols in radio networks Display ads Search engines... 27

29 1. Bandit Optimisation State dynamics: s π t+1 = F t (s π t, a π t ) Interact with an i.i.d. or adversarial environment The reward is independent of the state and is the only feedback: - i.i.d. environment: r t(a, s) = r t(a) random variable with mean θ a - adversarial environment: r t(a, s) = r t(a) is arbitrary! 28

30 2. Markov Decision Process (MDP) State dynamics: s π t+1 = F t (s π t, a π t ) History at t: h π t = (s π 1, a π 1,..., s π t 1, a π t 1, s π t ) Markovian environment: P[s π t+1 = s h π t, s π t = s, a π t = a] = p(s s, a) Stationary deterministic rewards (for simplicity): r t (a, s) = r(a, s) 29

31 What is to be learnt and optimised? Bandit optimisation: the average rewards of actions are unknown Information available at time t under π: a π 1, r 1 (a π 1 ),..., a π t 1, r t 1 (a π t 1) MDP: The state dynamics p( s, a) and the reward function r(a, s) are unknown Information available at time t under π: s π 1, a π 1, r 1 (a π 1, s π 1 ),..., s π t 1, a π t 1, r t 1 (a π t 1, s π t 1), s π t Objective: maximise the cumulative reward T E[r t (a π t, s π t )] t=1 or λ t E[r t (a π t, s π t )] t=1 30

32 Regret Difference between the cumulative reward of an Oracle policy and that of agent π Regret quantifies the price to pay for learning! Exploration vs. exploitation trade-off: we need to probe all actions to play the best later... 31

33 1. Bandit Optimisation First application: Clinical trial, Thompson A set of possible actions at each step - Unknown sequence of rewards for each action - Bandit feedback: only rewards of chosen actions are observed - Goal: maximise the cumulative reward (up to step T ) Two examples: a. Finite number of actions, stochastic rewards b. Continuous actions, concave adversarial rewards 32

34 a. Stochastic bandits Robbins 1952 Finite set of actions A (Unknown) rewards of action a A: (r t (a), t 0) i.i.d. Bernoulli with E[r t (a)] = θ a Optimal action a arg max a θ a Online policy π: select action a π t a π 1, r 1 (a π 1 ),..., a π t 1, r t 1 (a π t 1) at time t depending on Regret up to time T : R π (T ) = T θ a T t=1 θ a π t 33

35 a. Stochastic bandits Fundamental performance limits: (Lai-Robbins1985) For any reasonable π: lim inf T R π (T ) log(t ) a a θ a θ a KL(θ, θ a ) where KL(a, b) = a log( a 1 a b ) + (1 a) log( 1 b ) (KL divergence) Algorithms: (i) ɛ-greedy: linear regret (ii) ɛ t -greedy: logarithmic regret (ɛ t = 1/t) (iii) Upper Confidence Bound algorithm: b a (t) = ˆθ 2 log(t) a (t) + n a (t) ˆθ(t): empirical reward of a up to t n a (t): nb of times a played up to t 34

36 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 35

37 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 36

38 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 37

39 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 38

40 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 39

41 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 40

42 b. Adversarial Convex Bandits Continuous set of actions A = [0, 1] (Unknown) Arbitrary but concave rewards of action x A: r t (x) Online policy π: select action x π t x π 1, r 1 (x π 1 ),..., x π t 1, r t 1 (x π t 1) at time t depending on Regret up to time T : (defined w.r.t. the best empirical action up to time T ) T T R π (T ) = max r t (x) r t (x π t ) x [0,1] t=1 Can we do something smart at all? Achieve a sublinear regret? t=1 41

43 b. Adversarial Convex Bandits If r t ( ) = r( ), and if r( ) was known, we could apply a gradient ascent algorithm One-point gradient estimate: ˆf(x) = E v B [f(x + δv)], B = {x : x 2 1} E u S [f(x + δu)u] = δ ˆf(x), S = {x : x 2 = 1} Simulated Gradient Ascent algorithm: at each step t, do - u t uniformly chosen in S - y t = x t + δu t - y t+1 = y t + αr t(x t)u t Regret: R(T ) = O(T 5/6 ) 42

44 2. Markov Decision Process (MDP) State dynamics: s π t+1 = F t (s π t, a π t ) Markovian environment: P[s π t+1 = s h π t, s π t = s, a π t = a] = p(s s, a) Stationary deterministic rewards (for simplicity): r t (a, s) = r(a, s) p( s, a) and r(, ) are unknown initially 43

45 Example Playing pacman (Google Deepmind experiment, 2015) State: the current displayed image Action: right, left, down, up Feedback: the score and its increments + state 44

46 Bellman s equation Objective: max the average discounted reward t=1 λt E[r(a π t, s π t )] Assume the transition probabilities and the reward function are known Value function: maps the initial state s to the corresponding maximum reward v(s) Bellman s equation: v(s) = max a A r(a, s) + λ j p(j s, a)v(j) Solve Bellman s equation. The optimal policy is given by: a (s) = arg max a A r(a, s) + λ j p(j s, a)v(j) 45

47 Q-learning What if the transition probabilities and the reward function are unknown? Q-value function: the max expected reward starting from state s and playing action a: Q(s, a) = r(a, s) + λ j p(j s, a) max Q(j, b) b A Note that: v(s) = max a A Q(s, a) Algorithm: update the Q-value estimate sequentially so that it converges to the true Q-value 46

48 Q-learning 1. Initialisation: select Q R S A arbitrarily, and s 0 2. Q-value iteration: at each step t, select action a t (each state-action pair must be selected infinitely often) Observe the new state s t+1 and the reward r(s t, a t ) Update Q(s t, a t ): Q(s t, a t ) :=Q(s t, a t ) ] + α t [r(s t, a t ) + λ max Q(s t+1, a) Q(s t, a t ) a A It converges to Q if t α t = and t α2 t < 47

49 Q-learning: demo The crawling robot

50 Scaling up Q-learning Q-learning converges very slowly, especially when the state and action spaces are large... State-of-the-art algorithms (optimal exploration, ideas from bandit opt.): regret O(S AT ) What if the action and state are continuous variables? Example: Mountain car demo 1 1 See Sutton tutorial, NIPS

51 Q-learning with function approximation Idea: restrict our attention to Q-value functions belonging to a family of functions Q Examples: 1. Linear functions: Q = {Q θ, θ R M }, Q θ (s, a) = M φ i (s, a)θ i = φ θ i=1 where for all i, φ i is linear. The φ i s are linearly independent. 2. Deep networks: Q = {Q w, w R M }, Q w (s, a) given as the output of a neural network with weights w and inputs (s, a) 50

52 Q-learning with linear function approximation 1. Initialisation: select θ R M arbitrarily, and s 0 2. Q-value iteration: at each step t, select action a t (each state-action pair must be selected infinitely often) Observe the new state s t+1 and the reward r(s t, a t ) Update θ: θ := θ + α t t θ Q θ (s t, a t ) = θ + α t t φ(s t, a t ) where t = r(s t, a t ) + λ max a A Q θ (s t+1, a) Q θ (s t, a t ) For convergence results, see An analysis of Reinforcement Learning with Function Approximation, Melo et al., ICML

53 Q-learning with function approximation Success stories: TD-Gammon (Backgammon), Tesauro 1995 (neural nets) Acrobatic helicopter autopilots, Ng et al Jeopardy, IBM Watson, atari games, pixel-level visual inputs, Google Deepmind

54 Questions? Alexandre Proutiere 53

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 6: RL algorithms 2.0 Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Present and analyse two online algorithms

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Approximation Methods in Reinforcement Learning

Approximation Methods in Reinforcement Learning 2018 CS420, Machine Learning, Lecture 12 Approximation Methods in Reinforcement Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Reinforcement

More information

ARTIFICIAL INTELLIGENCE. Reinforcement learning

ARTIFICIAL INTELLIGENCE. Reinforcement learning INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Lecture 8: Policy Gradient

Lecture 8: Policy Gradient Lecture 8: Policy Gradient Hado van Hasselt Outline 1 Introduction 2 Finite Difference Policy Gradient 3 Monte-Carlo Policy Gradient 4 Actor-Critic Policy Gradient Introduction Vapnik s rule Never solve

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Lecture 7: Value Function Approximation

Lecture 7: Value Function Approximation Lecture 7: Value Function Approximation Joseph Modayil Outline 1 Introduction 2 3 Batch Methods Introduction Large-Scale Reinforcement Learning Reinforcement learning can be used to solve large problems,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

15-780: ReinforcementLearning

15-780: ReinforcementLearning 15-780: ReinforcementLearning J. Zico Kolter March 2, 2016 1 Outline Challenge of RL Model-based methods Model-free methods Exploration and exploitation 2 Outline Challenge of RL Model-based methods Model-free

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted 15-889e Policy Search: Gradient Methods Emma Brunskill All slides from David Silver (with EB adding minor modificafons), unless otherwise noted Outline 1 Introduction 2 Finite Difference Policy Gradient

More information

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Multi-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang

Multi-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang Multi-Armed Bandits Credit: David Silver Google DeepMind Presenter: Tianlu Wang Credit: David Silver (DeepMind) Multi-Armed Bandits Presenter: Tianlu Wang 1 / 27 Outline 1 Introduction Exploration vs.

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning RL in continuous MDPs March April, 2015 Large/Continuous MDPs Large/Continuous state space Tabular representation cannot be used Large/Continuous action space Maximization over action

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Daniel Hennes 19.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Eligibility traces n-step TD returns Forward and backward view Function

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

State Space Abstractions for Reinforcement Learning

State Space Abstractions for Reinforcement Learning State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

Approximate Q-Learning. Dan Weld / University of Washington

Approximate Q-Learning. Dan Weld / University of Washington Approximate Q-Learning Dan Weld / University of Washington [Many slides taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC Berkeley materials available at http://ai.berkeley.edu.] Q Learning

More information

Trust Region Policy Optimization

Trust Region Policy Optimization Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017 Yixin Lin (Duke) TRPO March 28, 2017 1 / 21 Overview 1 Preliminaries Markov Decision Processes Policy iteration

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

An online kernel-based clustering approach for value function approximation

An online kernel-based clustering approach for value function approximation An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Machine Learning I Continuous Reinforcement Learning

Machine Learning I Continuous Reinforcement Learning Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Reinforcement Learning: An Introduction

Reinforcement Learning: An Introduction Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is

More information

Deep Reinforcement Learning: Policy Gradients and Q-Learning

Deep Reinforcement Learning: Policy Gradients and Q-Learning Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use

More information

David Silver, Google DeepMind

David Silver, Google DeepMind Tutorial: Deep Reinforcement Learning David Silver, Google DeepMind Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep

More information

Dueling Network Architectures for Deep Reinforcement Learning (ICML 2016)

Dueling Network Architectures for Deep Reinforcement Learning (ICML 2016) Dueling Network Architectures for Deep Reinforcement Learning (ICML 2016) Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology October 11, 2016 Outline

More information

Expectation-Maximization

Expectation-Maximization Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Stuart Russell, UC Berkeley Stuart Russell, UC Berkeley 1 Outline Sequential decision making Dynamic programming algorithms Reinforcement learning algorithms temporal difference

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

(Deep) Reinforcement Learning

(Deep) Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Introduction of Reinforcement Learning

Introduction of Reinforcement Learning Introduction of Reinforcement Learning Deep Reinforcement Learning Reference Textbook: Reinforcement Learning: An Introduction http://incompleteideas.net/sutton/book/the-book.html Lectures of David Silver

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

ilstd: Eligibility Traces and Convergence Analysis

ilstd: Eligibility Traces and Convergence Analysis ilstd: Eligibility Traces and Convergence Analysis Alborz Geramifard Michael Bowling Martin Zinkevich Richard S. Sutton Department of Computing Science University of Alberta Edmonton, Alberta {alborz,bowling,maz,sutton}@cs.ualberta.ca

More information

Lecture 10 - Planning under Uncertainty (III)

Lecture 10 - Planning under Uncertainty (III) Lecture 10 - Planning under Uncertainty (III) Jesse Hoey School of Computer Science University of Waterloo March 27, 2018 Readings: Poole & Mackworth (2nd ed.)chapter 12.1,12.3-12.9 1/ 34 Reinforcement

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Online Learning with Gaussian Payoffs and Side Observations

Online Learning with Gaussian Payoffs and Side Observations Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic

More information

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

Introduction to Reinforcement Learning Part 1: Markov Decision Processes

Introduction to Reinforcement Learning Part 1: Markov Decision Processes Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 3: RL problems, sample complexity and regret Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce the

More information

Lecture 23: Reinforcement Learning

Lecture 23: Reinforcement Learning Lecture 23: Reinforcement Learning MDPs revisited Model-based learning Monte Carlo value function estimation Temporal-difference (TD) learning Exploration November 23, 2006 1 COMP-424 Lecture 23 Recall:

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models c Qing Zhao, UC Davis. Talk at Xidian Univ., September, 2011. 1 Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models Qing Zhao Department of Electrical and Computer Engineering University

More information