SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

Similar documents
Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

ECE521 week 3: 23/26 January 2017

Linear Models for Regression CS534

STA414/2104 Statistical Methods for Machine Learning II

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Multivariate Bayesian Linear Regression MLAI Lecture 11

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Linear Models for Regression

Linear Models for Regression

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Linear Regression and Discrimination

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Machine Learning

Nonparametric Bayesian Methods (Gaussian Processes)

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA 4273H: Sta-s-cal Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Pattern Recognition and Machine Learning

GWAS IV: Bayesian linear (variance component) models

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Gaussian Processes in Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Curve Fitting Re-visited, Bishop1.2.5

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Bayesian Linear Regression. Sargur Srihari

Linear Models for Regression

Ch 4. Linear Models for Classification

INTRODUCTION TO PATTERN RECOGNITION

STA 4273H: Statistical Machine Learning

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Nonparameteric Regression:

Advanced Introduction to Machine Learning CMU-10715

PATTERN RECOGNITION AND MACHINE LEARNING

Probability Theory for Machine Learning. Chris Cremer September 2015

ECE521 lecture 4: 19 January Optimization, MLE, regularization

STA 414/2104, Spring 2014, Practice Problem Set #1

ECE521 Lecture 19 HMM cont. Inference in HMM

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Machine Learning Lecture 7

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Will Penny. SPM for MEG/EEG, 15th May 2012

Bayesian Regression Linear and Logistic Regression

An Introduction to Statistical and Probabilistic Linear Models

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Bias-Variance Trade-off in ML. Sargur Srihari

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Reading Group on Deep Learning Session 1

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Bayesian Inference of Noise Levels in Regression

Will Penny. SPM short course for M/EEG, London 2013

Variational Bayesian Logistic Regression

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Bayesian methods in economics and finance

ECE285/SIO209, Machine learning for physical applications, Spring 2017

Linear Models for Regression. Sargur Srihari

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Bayesian Machine Learning

GAUSSIAN PROCESS REGRESSION

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

Bayesian Deep Learning

Advanced statistical methods for data analysis Lecture 2

Bayesian Machine Learning

Logistic Regression. COMP 527 Danushka Bollegala

Neutron inverse kinetics via Gaussian Processes

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Gaussian with mean ( µ ) and standard deviation ( σ)

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning. 7. Logistic and Linear Regression

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Chris Bishop s PRML Ch. 8: Graphical Models

Linear Models for Regression

Introduction to Gaussian Processes

Machine Learning Lecture 5

Reliability Monitoring Using Log Gaussian Process Regression

Will Penny. DCM short course, Paris 2012

Bayesian Inference for the Multivariate Normal

Lecture : Probabilistic Machine Learning

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Lecture 5: GPs and Streaming regression

Linear Models for Classification

Machine Learning Linear Classification. Prof. Matteo Matteucci

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Low Bias Bagged Support Vector Machines

Data Mining Techniques. Lecture 2: Regression

Gaussian Process Regression

Relevance Vector Machines

Transcription:

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida ruggero.donida@unimi.it

Regression The following slides are from: C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006,.

Polynomial Curve Fitting

Sum-of-Squares Error Function

0 th Order Polynomial

1 st Order Polynomial

3 rd Order Polynomial

9 th Order Polynomial

Over-fitting Root-Mean-Square (RMS) Error:

Polynomial Coefficients

9 th Order Polynomial Data Set Size:

9 th Order Polynomial Data Set Size:

Regularization Penalize large coefficient values

Regularization:

Regularization:

Regularization: vs.

Polynomial Coefficients

How Bayesian Inference Works http://www.datasciencecentral.com/profiles/ blogs/how-bayesian-inference-works

Apples and Oranges Probability Theory

Probability Theory Marginal Probability Joint Probability Conditional Probability

Probability Theory Sum Rule Product Rule

The Rules of Probability Sum Rule Product Rule

Bayes Theorem Data posterior likelihood prior Model parameters Alternatively posterior = likelihood prior evidence

The example p(b = r) = 4/10 p(b = b) = 6/10 p(f = a B = r) = 1/4 p(f = o B = r) = 3/4 p(f = a B = b) = 3/4 p(f = o B = b) = 1/4

Probability Densities

Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

Variances and Covariances

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian

Gaussian Parameter Estimation Likelihood function

Maximum (Log) Likelihood

Maximum Likelihood Determine by minimizing sum-of-squares error,.

Curve Fitting Re-visited

Linear models for regression

Linear Basis Function Models (1) Example: Polynomial Curve Fitting

Linear Basis Function Models (2)

Linear Basis Function Models (3) Polynomial basis functions: These are global; a small change in x affect all basis functions.

Linear Basis Function Models (4) Gaussian basis functions: These are local; a small change in x only affect nearby basis functions. μ j and s control location and scale (width).

Linear Basis Function Models (5) Sigmoidal basis functions: where Also these are local; a small change in x only affect nearby basis functions. μ j and s control location and scale (slope).

Maximum Likelihood and Least Squares (1) Assume observations from a deterministic function with added Gaussian noise: where which is the same as saying, Given observed inputs,, and targets,, we obtain the likelihood function

Maximum Likelihood and Least Squares (2) Taking the logarithm, we get where is the sum-of-squares error.

Maximum Likelihood and Least Squares (3) Computing the gradient and setting it to zero yields Solving for w, we get The Moore-Penrose pseudo-inverse,. where

Sequential Learning Data items considered one at a time (a.k.a. online learning); use stochastic (sequential) gradient descent: This is known as the least-mean-squares (LMS) algorithm. Issue: how to choose?

Regularized Least Squares Consider the error function: Data term + Regularization term With the sum-of-squares error function and a quadratic regularizer, we get which is minimized by λ is called the regularization coefficient.

The Bias-Variance Decomposition (1) Recall the expected squared loss, where The second term of E[L] corresponds to the noise inherent in the random variable t. What about the first term?

The Bias-Variance Decomposition (2) Suppose we were given multiple data sets, each of size N. Any particular data set, D, will give a particular function y(x;d). We then have

The Bias-Variance Decomposition (3) Taking the expectation over D yields

The Bias-Variance Decomposition (4) Thus we can write where

The Bias-Variance Decomposition (5) Example: 25 data sets from the sinusoidal, varying the degree of regularization, λ.

The Bias-Variance Decomposition (6) Example: 25 data sets from the sinusoidal, varying the degree of regularization, λ.

The Bias-Variance Decomposition (7) Example: 25 data sets from the sinusoidal, varying the degree of regularization, λ.

The Bias-Variance Trade-off From these plots, we note that an over-regularized model (large λ) will have a high bias, while an underregularized model (small λ) will have a high variance.

Bayesian Linear Regression (1) Define a conjugate prior over w Combining this with the likelihood function and using results for marginal and conditional Gaussian distributions, gives the posterior where

Bayesian Linear Regression (2) A common choice for the prior is for which Next we consider an example

Bayesian Linear Regression (3) 0 data points observed Prior Data Space

Bayesian Linear Regression (4) 1 data point observed Likelihood Posterior Data Space

Bayesian Linear Regression (5) 2 data points observed Likelihood Posterior Data Space

Bayesian Linear Regression (6) 20 data points observed Likelihood Posterior Data Space