EM-based Reinforcement Learning
|
|
- Felicity Harper
- 5 years ago
- Views:
Transcription
1 EM-based Reinforcement Learning Gerhard Neumann 1 1 TU Darmstadt, Intelligent Autonomous Systems December 21, 2011
2 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data with Maximum Likelihood Expectation Maximization EM for RL Applications
3 Why should we use probabilities for RL? Reinforcement Learning in Continuous State and Action Spaces is a hard problem Value-functions are hard to estimate in continuous spaces Many RL methods rely on discretizations of the state space, action space or both
4 Why should we use probabilities for RL? However : Many probablistic inference algorithms can be used in continuous spaces Gaussians, Mixture of Gaussians, Linear Gaussian Models, Gaussian Processes We know how to estimate these distributions from data Can we use probabilistic inference for infering a policy?
5 Quick Recap : Fun from high school... Definitions : Marginal distribution : P (X) = Y P (X, Y ) Conditional distribution : P (X Y ) = P (X,Y ) P (Y )
6 Quick Recap : Fun from high school... Implications : Product rule : P (X, Y ) = P (X Y )P (Y ) = P (Y X)P (X) Chain rule : P (X 1,..., X n ) = i P (X i X 1,..., X i 1 ) Bayes rule : P (Y X) = P (X Y )P (Y ) P (X)
7 Quick Recap Gaussian Distribution: P (x θ) = N (x µ, Σ) = Parameters θ : µ... mean Σ... covariance matrix 1 (2π) (k/2) Σ 1/2 exp( 1 2 (x µ)t Σ 1 (x µ))
8 Recap : Modelling our data We are given a set of data points y i... and we want to estimate a generative model P (y i ; θ) for these data points
9 Recap : Modelling our data Maximum Likelihood Solutions We want to find the parameters θ maximizes the likelihood P (Y ; θ) of the data y i argmax P (y 1:N ; θ) = P (y i ; θ) θ This is often easier in log-space argmax log P (y 1:N ; θ) = θ i=1...n i=1...n log P (y i ; θ) A piece of cake for all distributions from the exponential family (e.g Gaussian)
10 Recap : Modelling our data E.g. Gaussian Distribution Given : Set of data-points {x i } i=1...n Estimate Parameters xi µ = N, Σ = (xi µ)(x i µ) T N
11 Recap : Modelling our data with hidden variables Often we are not given all information... E.g. missing data Mixture Modelling / Clustering : Which mixture component created the data? Reinforcement Learning : Which trajectories create high reward?
12 Recap : Modelling our data with hidden variables Maximum Likelihood Solutions with hidden variables z Given a model P (y, z; θ) which maximizes the likelihood of the data y i argmax L(θ) = log P (y 1:N ; θ) = θ i log P (y i ; θ) = i log z P (y i, z; θ) Since the data for the hidden variables z is missing, we need to marginalize it out!
13 Recap : Modelling our data with hidden variables Maximum Likelihood Solutions with hidden variables z argmax L(θ) = log P (y 1:N ; θ) = θ i log P (y i ; θ) = i log z P (y i, z; θ) oohh... the log of a sum... are we doomed?! At least no closed form solution exists any more...
14 Outline EM-based Reinforcement Learning Recap : Modelling data with Maximum Likelihood Expectation Maximization (EM) EM for RL Applications
15 Iterative Solution : Expectation-Maximization Expectation-Maximization based Algorithms: (E)xpectation Step (M)aximization Step
16 Expectation Step: Use a proposal distribution P i (z) over the hidden variables What is my belief over the hidden variables given the current model θ (t 1) and the observation y i? Calculate P i (z) = P (z y i ; θ (t 1) )
17 Maximization Step: Weight the log-likelihood of the joint by the proposal distribution Q(θ) = argmax θ P i (z) log P (y i, z; θ) Set θ (t) to argmax θ Q(θ) i z
18 Iterative Solution : EM Comparison : Standard ML Solution : L(θ) = argmax log θ z i P (y i, z; θ) M-Step : Q(θ) = i z P i (z) log P (y i, z; θ) Magic of EM : Transformed log of sum into sum of log The E and the M-step can be solved in closed form! Both steps are proved to increase the log-likelihood L(θ) or leave it unchanged Thus the algorithm always converges to a (local) maxima
19 Example : Gaussian Mixture Models The distribution is composed of K Gaussians components P (y) = k=1...k P (k)p (y k) = k=1...k c k N (y µ k, Σ k ) θ : c k... Mixture coefficients, µ k... mean, Σ k... Covariance
20 Hidden variable k We do not know which component k created our data Joint Distribution : P (y, k) = c k N (y µ k, Σ k ) If we would know k the task would be easy...
21 EM for Gaussian Mixture Models Expectation Step : Calculate probability that component k created data point y j P i (k) = P (k y i ) = P (y i, k; θ) k P (y i, k; θ) Called responsibilities... Maximization Step : argmax {c 1:K,µ 1:K,Σ 1:K } i k P i (k) log P (y, k) Each mixture component can be optimized independently!
22 EM for Gaussian Mixture Models Expectation Step : Calculate probability that component k created data point y j Called responsibilities... P i (k) = P (k y i ) = P (y i, k; θ) k P (y i, k; θ) Maximization Step : argmax P i (k) log P (y, k) {c 1:K,µ 1:K,Σ 1:K } i Each mixture component can be optimized independently! argmax P i (k) log P (y, k) {c 1:K,µ 1:K,Σ 1:K } k k i
23 EM for Gaussian Mixture Models Each mixture component can be optimized independently! argmax P i (k)(log N (y i µ k, Σ k ) + log c k ) {c k,µ k,σ k } i Comparison : Maximum-Likelihood (ML) Problem of a single Gaussian argmax log N (y i µ, Σ) {µ,σ} i Weighted ML-Solution : Pi (k) defines a weighting of each data-point
24 EM for Gaussian Mixture Models Comparison : ML-Solution for single Gaussian j µ = y j Σ = N N M-Step : Weighted ML-Solution µ k = j P j (k)y j j P j (k) Σ k = j (y j µ k )(y j µ k ) T j P j (k)(y j µ k )(y j µ k ) T j P j (k)
25 EM for Gaussian Mixture Models Example: From Bishop book
26 EM in a nutshell EM can be used whenever we need to deal with hidden/unobserved variables Iteratively apply E- and M-step Both are applicable in closed formulate No learning rates or whatsoever are needed! Uses proposal distribution over hidden variables Belief over hidden variables using the current model... Used as Weighting in the joint log-likelihood
27 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data with Maximum Likelihood Expectation Maximization EM for RL Applications
28 EM for Reinforcement Learning Ok, nice, but how can I use that for robotic learning? Model RL as Maximum Likelihood Problem! Observed variable : We want to observe a reward event P (R = 1 τ) exp(βr τ ) Binary event of observing a reward, β temperature of distribution τ... trajectory, R τ reward of the trajectory Common approach to transform a reward into a distribution
29 EM for Reinforcement Learning Example for reward distribution : Matlab...
30 EM for Reinforcement Learning Observed variable : Reward Event P (R = 1 τ) exp(βr τ ) Hidden Variable : We do not know which trajectories generated the reward event Model for trajectories : p(τ; θ) Contains our policy : p(τ; θ) = P (s 0 ) t P (s t a t 1, s t 1 )π(a t 1 s t 1 ; θ) We want to find a θ which gives high reward!
31 EM for Reinforcement Learning We want to find a θ which gives high reward! Joint Distribution : p(r, τ; θ) = p(r τ)p(τ; ) We want to maximize the log-likelihood of our observation (getting a reward) argmax log p(r = 1; θ) = log P (R = 1 τ)p (τ; θ)dτ θ τ
32 EM for Reinforcement Learning We want to maximize the log-likelihood of our observation (getting a reward) log p(r) = log P (R τ)p (τ; θ)dτ τ High dimensional trajectory space : The sum over all trajectories is intractable Are we doomed again?
33 EM for Reinforcement Learning We want to maximize the log-likelihood of our observation (getting a reward) log p(r) = log P (R τ)p (τ; θ)dτ τ High dimensional trajectory space : The sum over all trajectories is intractable Are we doomed again? No... EM can help us out
34 EM for Reinforcement Learning EM can help us out Use proposal distibution P (τ) over trajectories E-step : Estimate the probability that trajectory τ has created the reward event. P (τ) = P (τ R; θ t 1 ) = P (R τ)p (τ; θt 1 ) P (R; θ t 1 ) P (R τ)p (τ; θ t 1 ) P (τ) is also called the reward-weighted model distribution.
35 EM for Reinforcement Learning M-step : θ t = argmax Q(θ) = P (τ) log P (R τ)p (τ; θ)dτ θ τ = P (R τ)p (τ; θ t 1 ) log P (τ; θ)dτ + const τ If we we use samples from τ j P (τ; θ t 1 ) this integral can be efficiently approximated! L(θ) τ j P (R τ j ) log P (τ j ; θ) This is again just the weighted maximum likelihood solution Each trajectory is weighted by its reward probability exp(βr τ )
36 Summary : EM for Reinforcement Learning Start with initial distribution P (τ; θ 0 ) For t = 1... L Sample N trajectories from P (τ; θ t 1 ) Weight each trajectory by its probability w i exp(βr τ ) that it created the reward event Estimate new model parameters θ t by weighted maximum likilihood estimate
37 Illustration : 1-step RL Problem 2-dimensional action space, no states Reward Function : r(a) = (a a ) T D(a a ) Show matlab demo...
38 Problems : 1-step RL Problem 2-dimensional action space, multi-modal solution space Reward Function : r(a) = max( (a a,1 ) T D(a a,1 ), (a a,2 ) T D(a a,2 )) Show matlab demo... Current master thesis of Chris
39 Using Linear Features... 2 different models have been used Reward-Weighted Regression (RWR) : a = θ T φ(s) + ɛ Add noise to the action vector... Policy-learning by Weighting Exploration with Returns (PoWER) : a = (θ T + ɛ)φ(s) Add noise to the parameter vector... with ɛ N (0, σ 2 I)
40 Linear Feature Representations 2 different models have been used Reward-Weighted Regression (RWR) : a = θ T φ(s) + ɛ Add noise to the action vector... Policy-learning by Weighting Exploration with Returns (PoWER) : a = (θ T + ɛ)φ(s) Add noise to the parameter vector... Will both be covererd in more detail by Jan... with ɛ N (0, σ 2 I)
41 Reward Weighted Regression a = θ T φ(s) + ɛ : Model for the Policy π(a s; θ) = N (a θ T φ(s), σ 2 I) In the M-step we have to maximize argmax exp(βr j )(a j θ T φ(s j )) 2 θ Looks familiar...? j
42 Reward Weighted Regression In the M-step we have to maximize argmax exp(βr j )(a j θ T φ(s j )) 2 θ j This is just a weighted linear regression problem! θ = (Φ T RΦ) 1 Φ T RA with... Φ = [φ(s 1 ), φ(s 2 ),..., φ(s N )] T R = diag([r j ]) A = [a 1, a 2,..., a N ]
43 Things you can do... Ball in the Cup EM-based Reinforcement Learning Robot Learning, WS 2011
44 Things you can do... Dart : Playing around the clock
45 Things you can do... Robot Balancing for different forces...
46 Extensions / Not covered... Similar EM-based approach to estimate the V-function (Neumann & Peters, 2009) Variational inference approach which has better properties in case of a multi-modal solution-space (Neumann, 2011) How to choose β? Similar, but better : Relative Entropy Policy Search (REPS) (Peters et al., 2010) Bound the distance between two subsequent policies
47 Possible Projects / Bachelor Thesis... Lets play table tennis...! Final Setup : 2 robots playing against each other... We will also get the real robots...
48 Lets play table tennis...! Use EM-based algorithms for... Learning when to intercept the ball Learning to smash Learning to stop the ball Learning to play the ball with spin
49 The end Thanks for your attention!
50 Bibliography I Neumann, G., & Peters, J Fitted Q-Iteration by Advantage Weighted Regression. In: Advances in Neural Information Processing Systems 22 (NIPS 2008). MA: MIT Press. Neumann, Gerhard Variational Inference for Policy Search in Changing Situations. Pages of: Getoor, Lise, & Scheffer, Tobias (eds), Proceedings of the 28th International Conference on Machine Learning (ICML-11). ICML 11. New York, NY, USA: ACM.
51 Bibliography II Peters, Jan, Mülling, Katharina, & Altun, Yasemin Relative Entropy Policy Search. In: AAAI.
Multivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationEM Algorithm LECTURE OUTLINE
EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationVariational Inference for Policy Search in changing Situations
in changing Situations Gerhard Neumann GERHARD@IGI.TU-GRAZ.AC.AT Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria Abstract Many policy search algorithms minimize
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationTime Indexed Hierarchical Relative Entropy Policy Search
Time Indexed Hierarchical Relative Entropy Policy Search Florentin Mehlbeer June 19, 2013 1 / 15 Structure Introduction Reinforcement Learning Relative Entropy Policy Search Hierarchical Relative Entropy
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationPolicy Search for Path Integral Control
Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationLearning Motor Skills from Partially Observed Movements Executed at Different Speeds
Learning Motor Skills from Partially Observed Movements Executed at Different Speeds Marco Ewerton 1, Guilherme Maeda 1, Jan Peters 1,2 and Gerhard Neumann 3 Abstract Learning motor skills from multiple
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationVariables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)
CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training
More informationConstraint-Space Projection Direct Policy Search
Constraint-Space Projection Direct Policy Search Constraint-Space Projection Direct Policy Search Riad Akrour 1 riad@robot-learning.de Jan Peters 1,2 jan@robot-learning.de Gerhard Neumann 1,2 geri@robot-learning.de
More informationGradient Methods for Markov Decision Processes
Gradient Methods for Markov Decision Processes Department of Computer Science University College London May 11, 212 Outline 1 Introduction Markov Decision Processes Dynamic Programming 2 Gradient Methods
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationU-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models
U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationActive Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationPolicy Search for Path Integral Control
Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2,JanPeters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationNon-Parametric Contextual Stochastic Search
Non-Parametric Contextual Stochastic Search Abbas Abdolmaleki,,, Nuno Lau, Luis Paulo Reis,, Gerhard Neumann Abstract Stochastic search algorithms are black-box optimizer of an objective function. They
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationPolicy Gradient Methods. February 13, 2017
Policy Gradient Methods February 13, 2017 Policy Optimization Problems maximize E π [expression] π Fixed-horizon episodic: T 1 Average-cost: lim T 1 T r t T 1 r t Infinite-horizon discounted: γt r t Variable-length
More informationRelative Entropy Inverse Reinforcement Learning
Relative Entropy Inverse Reinforcement Learning Abdeslam Boularias Jens Kober Jan Peters Max-Planck Institute for Intelligent Systems 72076 Tübingen, Germany {abdeslam.boularias,jens.kober,jan.peters}@tuebingen.mpg.de
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationReinforcement Learning
Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More information