Outline Lecture 2 2(32)

Size: px
Start display at page:

Download "Outline Lecture 2 2(32)"

Transcription

1 Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control Linköping University Linköping, Sweden. Phone: , Office: House B, Entrance 7.. Summary of Lecture. Bayesian linear regression 3. Motivation of Kernel methods 4. Linear Classification. Problem setup. Discriminant functions (mainly least squares) 3. Probabilistic generative models 4. Logistic regression (discriminative model) 5. Bayesian logistic regression (Chapter 3.3-4) Summary of Lecture (I/VI) 3(3) Summary of Lecture (II/VI) 4(3) The exponential family of distributions over x, parameterized by η, ( ) p(x η) = h(x)g(η) exp η T u(x) One important member is the Gaussian density, which is commonly used as a building block in more sophisticated models. Important basic properties were provided. The idea underlying maximum likelihood is that the parameters θ should be chosen in such a way that the measurements {x i } i= are as likely as possible, i.e., θ = arg max θ p(x,, x θ). The three basic steps of Bayesian modeling ( all variables are modeled as stochastic). Assign prior distributions p(θ) to all unknown parameters θ.. Write down the likelihood p(x,..., x θ) of the data x,..., x given the parameters θ. 3. Determine the posterior distribution of the parameters given the data p(θ x,..., x ) = p(x,..., x θ)p(θ) p(x,..., x ) p(x,..., x θ)p(θ) If the posterior p(θ x,..., x ) and the prior p(θ) distributions are of the same functional form they are conjugate distributions and the prior is said to be a conjugate prior for the likelihood.

2 Summary of Lecture (III/VI) 5(3) Modeling heavy tails using the Student s t-distribution ( St(x µ, λ, ν) = x µ, (ηλ) ) Gam (η ν/, ν/) dη = Γ(ν/ + /) Γ(ν/) ( ) λ ( + πν ) ν λ(x µ) ν which according to the first expressions can be interpreted as an infinite mix of Gaussians with the same mean, but different variance log Student log Gaussian Poor robustness is due to an unrealistic model, the ML estimator is inherently robust, provided we have the correct model. Summary of Lecture (IV/VI) 6(3) Linear regression models the relationship between a continuous target variable t and a possibly nonlinear function φ(x) of the input variable x, Solved this problem using. Maximum Likelihood (ML). Bayesian approach t n = w T φ(x n ) +ɛ n. }{{} y(x n,w) ML with a Gaussian noise model is equivalent to least squares (LS). Summary of Lecture (V/VI) 7(3) Theorem (Gauss-Markov) In a linear regression model T = Φw + E, E is white noise with zero mean and covariance R, the best linear unbiased estimate (BLUE) of w is ŵ = (Φ T R Φ) Φ T R T, Cov(ŵ) = (Φ T R Φ). Summary of Lecture (VI/VI) 8(3) Two potentially biased estimators are ridge regression (p = ) and the Lasso (p = ) min ( w tn w T φ(x n ) ) s.t. M j= w j p η which using a Lagrange multiplier λ can be stated ( ) min w t n w T M φ(x n ) + λ w j p j= Interpretation: The least squares estimator has the smallest mean square error (MSE) of all linear estimators with no bias, BUT there may exist a biased estimator with lower MSE. Alternative interpretation: The MAP estimate with the likelihood (t n w T φ(x n )) together with a Gaussian prior leads to ridge regression and together with a Laplacian prior it leads to the LASSO.

3 y y Bayesian Linear Regression - Example (I/VI) 9(3) Consider the problem of fitting a straight line to noisy measurements. Let the model be (t R, x n R) t n = w + w x }{{} n +ɛ n, n =,...,. () y(x,w) ɛ n (,. ), β =. = 5. According to (), the following identity basis function is used φ (x n ) =, φ (x n ) = x n. Since the example lives in two dimensions, we can plot distributions to illustrate the inference. Bayesian Linear Regression - Example (II/VI) (3) Let the true values for w be w = (.3 white circle below). Generate synthetic measurements by x n U(, )..5 ) T (plotted using a t n = w + w x n + ɛ n, ɛ n (,. ), Furthermore, let the prior be p(w) = ( w ( α =. ) ) T, α I Bayesian Linear Regression - Example (III/VI) (3) Plot of the situation before any data arrives. Bayesian Linear Regression - Example (IV/VI) (3) Plot of the situation after one measurement has arrived x.6 Prior, p(w) = ( w ( ) ) T, I x Example of a few realizations from the posterior. Likelihood (plotted as a function of w) p(t w) = (t w + w x, β ) Posterior/prior, p(w t ) = (w m, S ), m = βs Φ T t, S = (αi + βφ T Φ). Example of a few realizations from the posterior and the first measurement (black circle).

4 y y Bayesian Linear Regression - Example (V/VI) 3(3) Bayesian Linear Regression - Example (VI/VI) 4(3) Plot of the situation after two measurements have arrived. Plot of the situation after 3 measurements have arrived x.5.5 x Likelihood (plotted as a function of w) p(t w) = (t w + w x, β ) Posterior/prior, p(w T) = (w m, S ), m = βs Φ T T, Example of a few realizations from the posterior and the measurements (black circles). Likelihood (plotted as a function of w) p(t 3 w) = (t 3 w + w x 3, β ) Posterior/prior, p(w T) = (w m 3, S 3 ), m 3 = βs 3 Φ T T, Example of a few realizations from the posterior and the measurements (black circles). S = (αi + βφ T Φ). S 3 = (αi + βφ T Φ). Predictive Distribution - Example 5(3) Posterior Distribution 6(3) Investigating the predictive distribution for the example above Recall that the posterior distribution is given by p(w T) = (w m, S ), = observations = 5 observations = observations m = βs Φ T T, S = (αi + βφ T Φ). True system (y(x) =.3 +.5x) generating the data (red line) Mean of the predictive distribution (blue line) One standard deviation of the predictive distribution (gray shaded area) ote that this is the point-wise predictive standard deviation as a function of x. Observations (black circles) Let us now investigate the posterior mean solution m above, which has an interpretation that directly leads to the kernel methods (lecture 4), including Gaussian processes.

5 Generative and Discriminative Models 7(3) ML for Probabilistic Generative Models (I/IV) 8(3) Approaches that model the distributions of both the inputs and the outputs are known as generative models. The reason for the name is the fact that using these models we can generate new samples in the input space. Approaches that models the posterior probability directly are referred to as discriminative models. Consider the two class case, the class-conditional densities p(x C k ) are Gaussian and the training data is given by {x n, t n }. Furthermore, assume that p(c ) = α. The task is now to find the parameters α, µ, µ, Σ by maximizing the likelihood function, p(t, X α, µ, µ, Σ) = (p(x n, C )) t n (p(x n, C ) t n, p(x n, C ) = p(c )p(x n C ) = α (x n µ, Σ), p(x n, C ) = p(c )p(x n C ) = ( α) (x n µ, Σ). ML for Probabilistic Generative Models (II/IV) 9(3) Let us now maximize the logarithm of the likelihood function, ( ) L(α, µ, µ, Σ) = ln (α (x n µ, Σ)) t n (( α) (x n µ, Σ)) t n The terms that depends on α are (t n ln α + ( t n ) ln( α)) which is maximized by α = t n = + (as expected). k denotes the number of data in class C k. Straightforwardly we get µ = t n x n, µ = t n )x n. ( ML for Probabilistic Generative Models (III/IV) (3) L(Σ) = t n ln det Σ t n (x n µ ) T Σ (x n µ ) ( t n ) ln det Σ Using the fact that x T Ax = Tr ( Axx T) we have L(Σ) = ln det Σ ) (Σ Tr S, S = ( t n )(x n µ ) T Σ (x n µ ) (t n (x n µ )(x n µ ) T + ( t n )(x n µ )(x n µ ) T)

6 ML for Probabilistic Generative Models (IV/IV) (3) Lemma (Useful Matrix derivatives) M ln det M = M T, ( ) M Tr M = M T T M T. Differentiating L(Σ) = ln det Σ Tr ( Σ S ) results in Σ = Σ T + Σ T SΣ T Σ = Hence, Σ = S More results on matrix derivatives are available in Magnus, J. R., & eudecker, H. (999). Matrix differential calculus with applications in statistics and econometrics. nd Edition, Chichester, UK: Wiley. Generalized Linear Models for Classification (3) In linear regression we made use of a linear model t n = y(x, w) = w T φ(x n ) + ɛ n. For classification problems the target variables are discrete, or slightly more general, posterior probabilities in the range (, ). This is achieved using a so called activation function f (f must exist), y(x) = f (w T x + w ). () ote that the decision surface corresponds to y(x) = constant, implying that w T x + w = constant. This means that the decision surface is a linear function of x, even if f is nonlinear. Hence, the name generalized linear model for (). Gradient of L(w) for Logistic Regression (I/II) 3(3) The negative log-likelihood is L(w) = y n = σ(a n ) = Using the chain rule we have, (t n ln y n + ( t n ) ln( y n )), + exp( a n ), and a n = w T φ n. w = y n a n y n a n w y n = t n y n t n y n = y n t n y n ( y n ) Gradient of L(w) for Logistic Regression (II/II) 4(3) Furthermore, y n = σ(a n) = = σ(a n )( σ(a n )) = y n ( y n ), a n a n a n w = φ n. which results in the following expression for the gradient w = (y n t n )φ n = Φ T (Y T), φ (x ) φ (x )... φ M (x ) y t φ (x ) φ (x )... φ M (x ) Φ = Y = y. T = t. φ (x ) φ (x )... φ M (x ) y t

7 Hessian of L(w) for Logistic Regression 5(3) Bayesian Logistic Regression 6(3) L H = w w T = = (y n t n )φ n φn T = Φ T RΦ y ( y )... y ( y )... R = y ( y ) Recall that p(t w) = Hence, computing the posterior density ( ) σ(w T φ n ) t n σ(w T tn φ n ) p(w T) = p(t w)p(w) p(t) is intractable. We are forced to an approximation. Three alternatives. Laplace approximation (this lecture). VB & EP (lecture 5) 3. Sampling methods, e.g., MCMC (lecture 6) Laplace Approximation (I/III) 7(3) The Laplace approximation is a simple approximation that is obtained by fitting a Gaussian centered around the (MAP) mode of the distribution. Consider the density function p(z) of a scalar stochastic variable z, given by p(z) = Z f (z), Z = f (z)dz is the normalization coefficient. We start by finding a mode z of the density function, df (z) dz =. z=z Laplace Approximation (II/III) 8(3) Consider a Taylor expansion of ln f (z) around the mode z, ln f (z) ln f (z ) + d dz ln f (z) (z z ) + }{{ z=z d ln f (z) dz (z z ) z=z } = = ln f (z ) A (z z ), (3) A = d ln f (z) dz z=z Taking the exponential of both sides in the approx. (3) results in f (z) f (z ) exp ( A ) (z z )

8 Laplace Approximation (III/III) 9(3) Bayesian Logistic Regression (I/II) 3(3) By normalizing this expression we have now obtained a Gaussian approximation q(z) = ( ) A / exp ( A ) π (z z ) A = d ln f (z) dz z=z The main limitation of the Laplace approximation is that it is a local method that only captures aspects of the true density around a specific value z. The posterior is p(w T) p(t w)p(w), (4) we assume a Gaussian prior p(w) = (w m, S ) and make use of the Laplace approximation. Taking logarithm on both sides of (4) gives ln p(w t) = (w m ) T S (w m ) + y n = σ(w T φ n ). (t n ln y n + ( t n ) ln( y n )) + const. Bayesian Logistic Regression (II/II) 3(3) A Few Concepts to Summarize Lecture 3(3) Using the Laplace approximation we can now obtain a Gaussian approximation q(w) = (w w MAP, S ) w MAP is the MAP estimate of p(w T) and the covariance S is the Hessian of ln p(w T), S = ln p(w T) = S w wt + y n ( y n )φ n φ T n Based on this distribution we can now start making predictions for new input data φ(x), which is typically what we are interested in. Recall that prediction corresponds to marginalization w.r.t. w. Hyperparameter: A parameter of the prior distribution that controls the distribution of the parameters of the model. Classification: The goal of classification is to assign an input vector x to one of K classes, C k, k =,..., K. Discriminant: A discriminant is a function that takes an input x and assigns it to one of K classes. Generative models: Approaches that model the distributions of both the inputs and the outputs are known as generative models. In classification this amounts to modelling the class-conditional densities p(x C k ), as well as the prior densities p(c k ). The reason for the name is the fact that using these models we can generate new samples in the input space. Discriminative models: Approaches that models the posterior probability directly are referred to as discriminative models. Logistic Regression: Discriminative model that makes direct use of a generalized linear model in the form of a logistic sigmoid to solve the classification problem. Laplace approximation: A local approximation method that finds the mode of the posterior distribution and then fits a Gaussian centered at that mode.

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Outline Lecture 3 2(40)

Outline Lecture 3 2(40) Outline Lecture 3 4 Lecture 3 Expectation aximization E and Clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 3-8373 Office: House

More information

The Laplace Approximation

The Laplace Approximation The Laplace Approximation Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative Models

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Outline lecture 4 2(26)

Outline lecture 4 2(26) Outline lecture 4 2(26), Lecture 4 eural etworks () and Introduction to Kernel Methods Thomas Schön Division of Automatic Control Linköping University Linköping, Sweden. Email: schon@isy.liu.se, Phone:

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Outline lecture 6 2(35)

Outline lecture 6 2(35) Outline lecture 35 Lecture Expectation aximization E and clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 13-1373 Office: House

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Bayesian Logistic Regression

Bayesian Logistic Regression Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Neural Networks - II

Neural Networks - II Neural Networks - II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Variational Bayesian Logistic Regression

Variational Bayesian Logistic Regression Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Lecture 1b: Linear Models for Regression

Lecture 1b: Linear Models for Regression Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

The evolution from MLE to MAP to Bayesian Learning

The evolution from MLE to MAP to Bayesian Learning The evolution from MLE to MAP to Bayesian Learning Zhe Li January 13, 2015 1 The evolution from MLE to MAP to Bayesian Based on the linear regression, we illustrate the evolution from Maximum Loglikelihood

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information