Bits of Machine Learning Part 2: Unsupervised Learning
|
|
- Esmond Jackson
- 5 years ago
- Views:
Transcription
1 Bits of Machine Learning Part 2: Unsupervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology)
2 Outline of the Course 1. Supervised Learning Regression and Classification Neural Networks (Deep Learning) 2. Unsupervised Learning Clustering Reinforcement Learning 1
3 Outline of the Course 1. Supervised Learning Regression and Classification Neural Networks (Deep Learning) 2. Unsupervised Learning Clustering Reinforcement Learning 2
4 Clustering Clustering extracts structures from unlabelled data {X i, i = 1, n}, i.e., groups similar data points into a small set of clusters 3
5 Clustering: Applications Historical application: find geographical correlations in cholera deaths in London (John Snow 1850) Image segmentation: identify regions of interests in an image Marketing: detect groups of customers to design targeted ads Insurance: Identify groups of users with high risk Online services and web: e.g. cluster webpage based on their content Bioinformatics: group proteins w.r.t. their functionality and chemical structure... 4
6 Algorithms Commonly used clustering algorithms Center based approaches (K-means, K-medians, Gaussian mixture models, affinity propogation) Hierarchical/agglomerative approaches (single linkage, complete linkage, average linkage, Ward s method, BIRCH) Graph based approaches (spectral clustering) Density based approaches (DBSCAN, OPTICS) Bayesian nonparametric approaches (Dirichlet Processes, PitmanYor processes)... 5
7 Algorithms 6
8 Clustering evaluation 1. Performance benchmarks: artificially generate (clustered) data and evaluate the number of misclassified points 2. From the data only: a hard problem. What is a good clustering? - Intra-cluster cohesion: measures how near data points are in the same cluster Example: the cohesion of cluster C could be i,j C Xi Xj 2 K-means algorithm finds over possible clusters C 1,..., C K local minima of K X i µ k 2 i C k k=1 where µ k is the center of cluster k - Inter-cluster separation: distance between points of different clusters 7
9 Clustering evaluation 2. From the data only: Graph-based approach, V = {1,..., n} Define a complete weighted graph from the data: weight on edge (i, j) w ij = f( X i X j ) Example: Normalized cut. The Ncuts of (C 1, C 2 ) is Ncut(C 1, C 2 ) = cut(c 1, C 2 ) cut(c 1, V ) + cut(c 2, C 1 ) cut(c 2, V ), where cut(a, B) = i A,j B w ij We may wish to find clusters minimising the Ncut (NP-hard) 8
10 K-means (Lloyd 1957, MacQueen 1967) K-means aims at finding clusters C = {C 1, C k } solving: min C K k=1 j C k X i µ k 2 where K is the number of clusters, and µ k is the center of the k-th cluster. Solving the K-means clustering problem in R d is: 1. NP-hard in general Euclidean space d even for 2 clusters 2. NP-hard for a general number of clusters K even in the plane 3. When K and d are fixed, the problem can be solved in time O(n dk+1 log n) 9
11 K-means 1. Initialisation: pick K random points as initial cluster centers 2. Cluster improvement: while the cluster assignment changes do a. Assign each data point to the cluster with the closest center b. Change the cluster centers (the average position of points within the cluster) 10
12 Example 1 11
13 Example 2: Image Segmentation 12
14 K-means algorithm: Properties Guaranteed to converge in a finite number of iterations Running time per iteration: O(nKd) Issues 1. Can be stuck in a bad local minimum 2. Different initialisations yield different solutions 3. The number of clusters K needs to be specified in advance Example of approaches to solve the initialization and finding the number of clusters problem: V. Petrosyan and A. Proutiere, Viral Clustering: A Robust Method to Extract Structures in Heterogeneous Datasets, AAAI
15 Gaussian Mixture Models (GMM) and the EM Algorithm Approach: assume that the data points are generated according to a GMM in an i.i.d. manner, and estimate the parameters of the GMM GMM: Each point is distributed according to the distribution with the density p(x θ): θ = (θ k ) k=1,...,k and θ k = (α k, µ k, Σ k ) p(x θ) = K α k p G (x θ k ) k=1 1 where p G (x θ k ) = (2π) d/2 Σ k exp( 1 1/2 2 (x µ k) Σ 1 k (x µ k)) α k : probability that a point is attached to cluster k µ k : center of cluster k Σ k : covariance matrix of the k-th component of the GMM 14
16 GMM Example The density of a GMM with three components in R 2 15
17 The Expectation-Maximisation algorithm Dempster-Laird-Rubin, 1977 Aims at maximising the log-likelihood of the model given the data Two alternating steps: 1. E-step: For a given estimate of θ, evaluate the probability of a data point to belong to the various clusters. The weight membership of X i in cluster k is: w ik = α k p G (X i θ k ) K m=1 α mp G (X i θ m ) 2. M-step: Update θ with increased the log-likelihood 16
18 The Expectation-Maximisation algorithm Initialisation: select α 1 k, µ1 k, Σ1 k arbitrarily, t = 1 Until convergence: do 1. E-step: Compute for all i and k w ik = α t k p G(X i µ t k, Σt k ) K m=1 αt mp G (X i µ t m, Σ t m), N k = n i=1 2. M-step: Update the parameters as follows: for all k, w ik α t+1 k = N k N, µt+1 k = 1 N k n w ik X i i=1 3. t := t + 1 Σ t+1 k = 1 N k n w ik (X i µ t+1 k )(X i µ t+1 k i=1 ) 17
19 EM algorithm: Properties Convergence: On the Convergence Properties of the EM Algorithm, C.F.J. Wu, Annals of Statistics, 1983 Issues 1. Can be stuck in a bad local minimum 2. Suffers from the curse of dimensionality (not suitable for high dimensional data) 18
20 Spectral Clustering K-means and the EM algorithm for GMM are center-based algorithms. How to deal with clusters that do not look like potatoes?... apply spectral methods on similarity matrices or graphs! Spectral method: work with eigenvalues and eigenvectors of the similarity matrix 19
21 Spectral Clustering Data: X 1,... X n R d Similarity matrix: W = (w ij ) R n n with for example (Gaussian kernel) i, j, w ij = 1 i j exp( X i X j 2 ) σ Spectral analysis: we get the top-k eigenvectors of W (those corresponding to the highest singular values) The spectral idea: the data points are well represented by their projections on the linear span these eigenvectors 20
22 Spectral Clustering On spectral clustering: Analysis and an algorithm, A. Ng, M. Jordan, Y. Weiss, NIPS Input: W R n n, K {1, 2,, n} 2. Let D be a diagonal matrix s.t. D ii := n j=1 w ij 3. L := D 1/2 W D 1/2 4. E = [e 1, e 2,, e K ] := eigs(l, K) R n K 5. Normalize each row of E 6. y := K-means(E, K) 7. Output: y eigs(l, K) returns the top K eigenvectors of L 21
23 Example 22
24 Tuning Spectral Clustering We may consider different notions of similarity: 1. Gaussian kernel w ij = e X i X j 2 σ σ? for i j and w ii = 0. Choice of 2. k-nearest neighbour similarity: w ij = 1 if i is one of the k nearest neighbours of j or if j is one of the k nearest neighbours of i. Choice of k? 3. Self tuning kernel: w ij = e X i X j X i X im X j X jm for i j and w i,i = 0, where i m is the index of m-th nearest neighbour of i. Choice of m? [Play with these parameters in the lab... :-) ] 2 23
25 Outline of the Course 1. Supervised Learning Regression and Classification Neural Networks (Deep Learning) 2. Unsupervised Learning Clustering Reinforcement Learning 24
26 Reinforcement learning Learning optimal sequential behaviour / control from interacting with the environment State dynamics: s π t+1 = F t (s π t, a π t ) 25
27 Reinforcement learning Learning optimal sequential behaviour / control from interacting with the environment [...] By the time we learn to live It s already too late Our hearts cry in unison at night [...] Louis Aragon 26
28 Reinforcement learning: Applications Making a robot walk Portfolio optimisation Playing games better than humans Helicopter stunt manoeuvres Optimal communication protocols in radio networks Display ads Search engines... 27
29 1. Bandit Optimisation State dynamics: s π t+1 = F t (s π t, a π t ) Interact with an i.i.d. or adversarial environment The reward is independent of the state and is the only feedback: - i.i.d. environment: r t(a, s) = r t(a) random variable with mean θ a - adversarial environment: r t(a, s) = r t(a) is arbitrary! 28
30 2. Markov Decision Process (MDP) State dynamics: s π t+1 = F t (s π t, a π t ) History at t: h π t = (s π 1, a π 1,..., s π t 1, a π t 1, s π t ) Markovian environment: P[s π t+1 = s h π t, s π t = s, a π t = a] = p(s s, a) Stationary deterministic rewards (for simplicity): r t (a, s) = r(a, s) 29
31 What is to be learnt and optimised? Bandit optimisation: the average rewards of actions are unknown Information available at time t under π: a π 1, r 1 (a π 1 ),..., a π t 1, r t 1 (a π t 1) MDP: The state dynamics p( s, a) and the reward function r(a, s) are unknown Information available at time t under π: s π 1, a π 1, r 1 (a π 1, s π 1 ),..., s π t 1, a π t 1, r t 1 (a π t 1, s π t 1), s π t Objective: maximise the cumulative reward T E[r t (a π t, s π t )] t=1 or λ t E[r t (a π t, s π t )] t=1 30
32 Regret Difference between the cumulative reward of an Oracle policy and that of agent π Regret quantifies the price to pay for learning! Exploration vs. exploitation trade-off: we need to probe all actions to play the best later... 31
33 1. Bandit Optimisation First application: Clinical trial, Thompson A set of possible actions at each step - Unknown sequence of rewards for each action - Bandit feedback: only rewards of chosen actions are observed - Goal: maximise the cumulative reward (up to step T ) Two examples: a. Finite number of actions, stochastic rewards b. Continuous actions, concave adversarial rewards 32
34 a. Stochastic bandits Robbins 1952 Finite set of actions A (Unknown) rewards of action a A: (r t (a), t 0) i.i.d. Bernoulli with E[r t (a)] = θ a Optimal action a arg max a θ a Online policy π: select action a π t a π 1, r 1 (a π 1 ),..., a π t 1, r t 1 (a π t 1) at time t depending on Regret up to time T : R π (T ) = T θ a T t=1 θ a π t 33
35 a. Stochastic bandits Fundamental performance limits: (Lai-Robbins1985) For any reasonable π: lim inf T R π (T ) log(t ) a a θ a θ a KL(θ, θ a ) where KL(a, b) = a log( a 1 a b ) + (1 a) log( 1 b ) (KL divergence) Algorithms: (i) ɛ-greedy: linear regret (ii) ɛ t -greedy: logarithmic regret (ɛ t = 1/t) (iii) Upper Confidence Bound algorithm: b a (t) = ˆθ 2 log(t) a (t) + n a (t) ˆθ(t): empirical reward of a up to t n a (t): nb of times a played up to t 34
36 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 35
37 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 36
38 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 37
39 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 38
40 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 39
41 b. Adversarial Convex Bandits At the beginning of each year, Volvo has to select a vector x (in a convex set) representing the relative efforts in producing various models (S60, V70, V90,...). The reward is an arbitrarily varying and unknown concave function of x. How to maximise reward over say 50 years? 40
42 b. Adversarial Convex Bandits Continuous set of actions A = [0, 1] (Unknown) Arbitrary but concave rewards of action x A: r t (x) Online policy π: select action x π t x π 1, r 1 (x π 1 ),..., x π t 1, r t 1 (x π t 1) at time t depending on Regret up to time T : (defined w.r.t. the best empirical action up to time T ) T T R π (T ) = max r t (x) r t (x π t ) x [0,1] t=1 Can we do something smart at all? Achieve a sublinear regret? t=1 41
43 b. Adversarial Convex Bandits If r t ( ) = r( ), and if r( ) was known, we could apply a gradient ascent algorithm One-point gradient estimate: ˆf(x) = E v B [f(x + δv)], B = {x : x 2 1} E u S [f(x + δu)u] = δ ˆf(x), S = {x : x 2 = 1} Simulated Gradient Ascent algorithm: at each step t, do - u t uniformly chosen in S - y t = x t + δu t - y t+1 = y t + αr t(x t)u t Regret: R(T ) = O(T 5/6 ) 42
44 2. Markov Decision Process (MDP) State dynamics: s π t+1 = F t (s π t, a π t ) Markovian environment: P[s π t+1 = s h π t, s π t = s, a π t = a] = p(s s, a) Stationary deterministic rewards (for simplicity): r t (a, s) = r(a, s) p( s, a) and r(, ) are unknown initially 43
45 Example Playing pacman (Google Deepmind experiment, 2015) State: the current displayed image Action: right, left, down, up Feedback: the score and its increments + state 44
46 Bellman s equation Objective: max the average discounted reward t=1 λt E[r(a π t, s π t )] Assume the transition probabilities and the reward function are known Value function: maps the initial state s to the corresponding maximum reward v(s) Bellman s equation: v(s) = max a A r(a, s) + λ j p(j s, a)v(j) Solve Bellman s equation. The optimal policy is given by: a (s) = arg max a A r(a, s) + λ j p(j s, a)v(j) 45
47 Q-learning What if the transition probabilities and the reward function are unknown? Q-value function: the max expected reward starting from state s and playing action a: Q(s, a) = r(a, s) + λ j p(j s, a) max Q(j, b) b A Note that: v(s) = max a A Q(s, a) Algorithm: update the Q-value estimate sequentially so that it converges to the true Q-value 46
48 Q-learning 1. Initialisation: select Q R S A arbitrarily, and s 0 2. Q-value iteration: at each step t, select action a t (each state-action pair must be selected infinitely often) Observe the new state s t+1 and the reward r(s t, a t ) Update Q(s t, a t ): Q(s t, a t ) :=Q(s t, a t ) ] + α t [r(s t, a t ) + λ max Q(s t+1, a) Q(s t, a t ) a A It converges to Q if t α t = and t α2 t < 47
49 Q-learning: demo The crawling robot
50 Scaling up Q-learning Q-learning converges very slowly, especially when the state and action spaces are large... State-of-the-art algorithms (optimal exploration, ideas from bandit opt.): regret O(S AT ) What if the action and state are continuous variables? Example: Mountain car demo 1 1 See Sutton tutorial, NIPS
51 Q-learning with function approximation Idea: restrict our attention to Q-value functions belonging to a family of functions Q Examples: 1. Linear functions: Q = {Q θ, θ R M }, Q θ (s, a) = M φ i (s, a)θ i = φ θ i=1 where for all i, φ i is linear. The φ i s are linearly independent. 2. Deep networks: Q = {Q w, w R M }, Q w (s, a) given as the output of a neural network with weights w and inputs (s, a) 50
52 Q-learning with linear function approximation 1. Initialisation: select θ R M arbitrarily, and s 0 2. Q-value iteration: at each step t, select action a t (each state-action pair must be selected infinitely often) Observe the new state s t+1 and the reward r(s t, a t ) Update θ: θ := θ + α t t θ Q θ (s t, a t ) = θ + α t t φ(s t, a t ) where t = r(s t, a t ) + λ max a A Q θ (s t+1, a) Q θ (s t, a t ) For convergence results, see An analysis of Reinforcement Learning with Function Approximation, Melo et al., ICML
53 Q-learning with function approximation Success stories: TD-Gammon (Backgammon), Tesauro 1995 (neural nets) Acrobatic helicopter autopilots, Ng et al Jeopardy, IBM Watson, atari games, pixel-level visual inputs, Google Deepmind
54 Questions? Alexandre Proutiere 53
Reinforcement Learning
Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the
More informationReinforcement Learning
Reinforcement Learning Lecture 6: RL algorithms 2.0 Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Present and analyse two online algorithms
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationApproximation Methods in Reinforcement Learning
2018 CS420, Machine Learning, Lecture 12 Approximation Methods in Reinforcement Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Reinforcement
More informationARTIFICIAL INTELLIGENCE. Reinforcement learning
INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationLecture 8: Policy Gradient
Lecture 8: Policy Gradient Hado van Hasselt Outline 1 Introduction 2 Finite Difference Policy Gradient 3 Monte-Carlo Policy Gradient 4 Actor-Critic Policy Gradient Introduction Vapnik s rule Never solve
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationLecture 7: Value Function Approximation
Lecture 7: Value Function Approximation Joseph Modayil Outline 1 Introduction 2 3 Batch Methods Introduction Large-Scale Reinforcement Learning Reinforcement learning can be used to solve large problems,
More informationReinforcement Learning
Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More information15-780: ReinforcementLearning
15-780: ReinforcementLearning J. Zico Kolter March 2, 2016 1 Outline Challenge of RL Model-based methods Model-free methods Exploration and exploitation 2 Outline Challenge of RL Model-based methods Model-free
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More information15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted
15-889e Policy Search: Gradient Methods Emma Brunskill All slides from David Silver (with EB adding minor modificafons), unless otherwise noted Outline 1 Introduction 2 Finite Difference Policy Gradient
More informationClustering. Léon Bottou COS 424 3/4/2010. NEC Labs America
Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.
More informationINF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018
Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)
More informationMulti-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang
Multi-Armed Bandits Credit: David Silver Google DeepMind Presenter: Tianlu Wang Credit: David Silver (DeepMind) Multi-Armed Bandits Presenter: Tianlu Wang 1 / 27 Outline 1 Introduction Exploration vs.
More informationReinforcement Learning
Reinforcement Learning RL in continuous MDPs March April, 2015 Large/Continuous MDPs Large/Continuous state space Tabular representation cannot be used Large/Continuous action space Maximization over action
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement
More informationReinforcement Learning
Reinforcement Learning Function approximation Daniel Hennes 19.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Eligibility traces n-step TD returns Forward and backward view Function
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationState Space Abstractions for Reinforcement Learning
State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationApproximate Q-Learning. Dan Weld / University of Washington
Approximate Q-Learning Dan Weld / University of Washington [Many slides taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC Berkeley materials available at http://ai.berkeley.edu.] Q Learning
More informationTrust Region Policy Optimization
Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017 Yixin Lin (Duke) TRPO March 28, 2017 1 / 21 Overview 1 Preliminaries Markov Decision Processes Policy iteration
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationAn online kernel-based clustering approach for value function approximation
An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationMachine Learning I Continuous Reinforcement Learning
Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationDeep Reinforcement Learning: Policy Gradients and Q-Learning
Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use
More informationDavid Silver, Google DeepMind
Tutorial: Deep Reinforcement Learning David Silver, Google DeepMind Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep
More informationDueling Network Architectures for Deep Reinforcement Learning (ICML 2016)
Dueling Network Architectures for Deep Reinforcement Learning (ICML 2016) Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology October 11, 2016 Outline
More informationExpectation-Maximization
Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,
More informationReinforcement learning
Reinforcement learning Stuart Russell, UC Berkeley Stuart Russell, UC Berkeley 1 Outline Sequential decision making Dynamic programming algorithms Reinforcement learning algorithms temporal difference
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More information(Deep) Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationReinforcement learning
Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationProf. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be
REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More informationIntroduction of Reinforcement Learning
Introduction of Reinforcement Learning Deep Reinforcement Learning Reference Textbook: Reinforcement Learning: An Introduction http://incompleteideas.net/sutton/book/the-book.html Lectures of David Silver
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationilstd: Eligibility Traces and Convergence Analysis
ilstd: Eligibility Traces and Convergence Analysis Alborz Geramifard Michael Bowling Martin Zinkevich Richard S. Sutton Department of Computing Science University of Alberta Edmonton, Alberta {alborz,bowling,maz,sutton}@cs.ualberta.ca
More informationLecture 10 - Planning under Uncertainty (III)
Lecture 10 - Planning under Uncertainty (III) Jesse Hoey School of Computer Science University of Waterloo March 27, 2018 Readings: Poole & Mackworth (2nd ed.)chapter 12.1,12.3-12.9 1/ 34 Reinforcement
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationBandit Online Convex Optimization
March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up
More informationMarks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:
Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationCluster Kernels for Semi-Supervised Learning
Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de
More informationOnline Learning with Gaussian Payoffs and Side Observations
Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic
More informationCOMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017
COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More informationReinforcement learning an introduction
Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,
More informationIntroduction to Reinforcement Learning Part 1: Markov Decision Processes
Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for
More informationReinforcement Learning
Reinforcement Learning Lecture 3: RL problems, sample complexity and regret Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce the
More informationLecture 23: Reinforcement Learning
Lecture 23: Reinforcement Learning MDPs revisited Model-based learning Monte Carlo value function estimation Temporal-difference (TD) learning Exploration November 23, 2006 1 COMP-424 Lecture 23 Recall:
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationMulti-Armed Bandit: Learning in Dynamic Systems with Unknown Models
c Qing Zhao, UC Davis. Talk at Xidian Univ., September, 2011. 1 Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models Qing Zhao Department of Electrical and Computer Engineering University
More information