Machine Learning. 7. Logistic and Linear Regression

Similar documents
Multiclass Logistic Regression

Iterative Reweighted Least Squares

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Reading Group on Deep Learning Session 1

Linear Models for Classification

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Neural Network Training

Ch 4. Linear Models for Classification

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Bayesian Logistic Regression

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Machine Learning Lecture 5

Machine Learning Lecture 7

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Linear Models for Classification

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Linear Models for Regression

Linear Models for Regression. Sargur Srihari

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

CSC 411: Lecture 04: Logistic Regression

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Linear and logistic regression

Outline Lecture 2 2(32)

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Stochastic gradient descent; Classification

Classification. Sandro Cumani. Politecnico di Torino

Linear & nonlinear classifiers

Overfitting, Bias / Variance Analysis

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Learning from Data: Regression

Linear Models in Machine Learning

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

NEURAL NETWORKS

Artificial Neural Networks. MGS Lecture 2

Regression with Numerical Optimization. Logistic

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction to Neural Networks

Stochastic Gradient Descent

Cheng Soon Ong & Christian Walder. Canberra February June 2018

An Introduction to Statistical and Probabilistic Linear Models

Logistic Regression. Seungjin Choi

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Relevance Vector Machines

Slides modified from: PATTERN RECOGNITION CHRISTOPHER M. BISHOP. and: Computer vision: models, learning and inference Simon J.D.

Probabilistic generative models

Linear Models for Regression

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

ECE521 week 3: 23/26 January 2017

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Logistic Regression. COMP 527 Danushka Bollegala

Linear Models for Regression

Machine Learning - Waseda University Logistic Regression

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Computational statistics

y(x n, w) t n 2. (1)

Linear Classification

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Linear Classification: Probabilistic Generative Models

Neural Networks and Deep Learning

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Cheng Soon Ong & Christian Walder. Canberra February June 2018

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Bayesian Linear Regression. Sargur Srihari

Machine Learning and Bayesian Inference

Pattern Recognition and Machine Learning

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met

Introduction to Machine Learning

Statistical Machine Learning Hilary Term 2018

6.867 Machine Learning

STA 4273H: Statistical Machine Learning

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

Multi-layer Neural Networks

STAT5044: Regression and Anova

PATTERN RECOGNITION AND MACHINE LEARNING

Artificial Neural Networks

Neural Networks - II

Machine Learning: Logistic Regression. Lecture 04

Linear & nonlinear classifiers

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen

Multilayer Perceptron

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Logistic Regression & Neural Networks

Artificial Intelligence: what have we seen so far? Machine Learning and Bayesian Inference

CS 340 Lec. 16: Logistic Regression

CSCI567 Machine Learning (Fall 2014)

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Transcription:

Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi, Valsamis Ntouskos Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Probabilistic Generative Models Consider first the case of two classes. Find the conditional probability: with: p(x C )P(C ) P(C x) = p(x C )P(C ) + p(x C 2 )P(C 2 ) = + exp( α) = σ(α). α = ln p(x C )P(C ) p(x C 2 )P(C 2 ) and σ(α) = +exp( α) the sigmoid function. Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 2 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Probabilistic Generative Models Assume p(x C i ) = N (x µ i, Σ) - same covariance matrix we get: with: P(C x) = σ(w T x + w ), w = Σ (µ µ 2 ), w = 2 µt Σ µ + 2 µt 2 Σ µ 2 + ln P(C ) P(C 2 ). note that: P(C 2 x) = P(C x) µ P(C {}}{{}}{ ) {}}{ Nr. of parameters: M(M + 5)/2 + = 2 M + M(M + )/2 +. Σ Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 3 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Probabilistic Generative Models Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 4 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Logistic regression - Model P(C x) = σ(w T x + w ),.9.8.7.6.5.4.3.2. 5 5.9.8.7.6.5.4.3.2. 46 48 5 52 54 56 58 6 62 64 Decision rule: c = C P(c = C x) >.5 Nr. of parameters: M. Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 5 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Varying the parameters w in 2D Logistic regression - Model 5 4 3 2.5.5 W = ( 2, 3 ) x x 2 W = ( 2, ) x x 2 w 2.5 W = (, 4 ).5 x x 2 W = (, 2 ) W = ( 2, 2 ).5 x x 2 x x 2 W = (, ) W = ( 3, ).5.5 x x 2 x x 2 W = ( 2, 2 ).5.5.5 W = ( 5, 4 ) x x 2 W = ( 5, ) x x 2 w 2 x x 2 3 3 2 2 3 4 5 6 Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 6 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Logistic regression - Algorithms How parameters w are computed given a dataset D? We will see this in a while... Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 7 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression Goal: Estimate the value t of a continuous function at x based on a dataset D composed of N observations {x n }, where n =,..., N, together with the corresponding target values {t n }. Ideally: t = y(x, w) Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 8 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Model Linear Basis Function Models Simplest case: y(x, w) = w + w x +... + w D x D = w T x w with x =. and w =. x D w D Linear both in model parameters w and variables x. Too limiting! Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 9 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Example - Line fitting 5 y = w x + w 4 prediction truth 3 2 2 3 4 3 2 2 3 4 Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Model Linear Basis Function Models Using nonlinear functions of input variables: y(x, w) = M j= w j φ j (x) = w T φ, with φ (x) = and φ = φ. φ M Still linear in the parameters w! Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Example - Polynomial curve fitting y = w + w x + w 2 x 2 +... + w M x M = M w j x j j= M = M = t t x x M = 3 M = 9 t t x x Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 2 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Model Examples of basis functions.5.5.75.5.25.75.5.25 Polynomial Radial Sigmoid / Tanh Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 3 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Algorithms Maximum likelihood and least squares Target value t is given by y(x, w) affected by additive noise t = y(x, w) + ɛ Assume Gaussian noise p(ɛ β) = N (ɛ, β ), with β the precision (inverse variance). We have: p(t x, w, β) = N (t y(x, w), β ) Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 4 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Algorithms Assume observations independent and identically distributed (i.i.d.) We seek the maximum of the likelihood function: p({t,..., t N } x, w, β) = or equivalently minimize: ln p({t,..., t N } w, β) = N N (t n w T φ(x n ), β ). n= N ln N (t n w T φ(x n ), β ) n= = β N [t n w T φ(x n )] 2 + N 2 2 ln(2πβ ). n= }{{} E D (w) Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 5 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Note: with t = t. Linear Regression - Algorithms t N E D = 2 (t Φw)T (t Φw), and Φ = φ (x ) φ (x ) φ M (x ) φ (x 2 ) φ (x 2 ) φ M (x 2 )....... φ (x N ) φ (x N ) φ M (x N ) Optimality condition: E D = Φ T Φw = Φ T t. Hence: w ML = (Φ T Φ) Φ T t. }{{} Φ : pseudo-inverse Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 6 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Algorithms Sequential Learning Stochastic gradient descent algorithm: w (τ+) = w (τ) η E n, with τ the iteration number and η the learning rate parameter. Therefore: w (τ+) = w (τ) + η(t n (w (τ) ) T φ(x n ))φ(x n ). Algorithm converges for suitable values of η. Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 7 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Algorithms Regularization Why regularize? To control over-fitting. Minimize: E D (w) + λe W (w). A common choice: Other choices: E W (w) = 2 wt w. M E W (w) = w j q. j= Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 8 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Algorithms Regularization w 2 w 2 w w w w q = 2 q = Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 9 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Algorithms Regularization E W (w) = 2 wt w. ln λ = 8 ln λ = t t x x Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 2 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Linear Regression - Algorithms Multiple outputs y(x, W) = W T φ(x). Target variable is given by: T = y(x, W) + ɛ, with p(ɛ β) = N (ɛ, β I). Similarly with before we obtain: W ML = (Φ T Φ) Φ T T. Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 2 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Logistic regression - Algorithms Time to see how parameters of the logistic regression model are estimated. Remember that logistic regression is used for classification! In the following we will consider the logistic regression problem: P(C φ) = y(φ) = σ(w T φ), with σ( ) the logistic sigmoid function. Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 22 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Logistic regression - Algorithms Given data set {φ n, t n }, t n {, } and φ n = φ(x n ), n =,..., N Likelihood function: P(t w) = N n= with y n = P(C φ n ) = σ(w T φ n ). y tn n ( y n ) tn ( ) (*) Hint: P(c = φ n ) = y n and P(c = φ n ) = y n. Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 23 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Logistic regression - Algorithms Cross-entropy error function N E(w) = ln P(t w) = [t n ln y n + ( t n ) ln( y n )]. n= Gradient of the error with respect to w using dσ dα = σ( σ) N E(w) = (y n t n )φ n, n= Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 24 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Iterative reweighted least squares Apply Newton-Raphson iterative optimization technique for minimizing E(w). w w H E(w) H = E(w) is the Hessian matrix of E(w). Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 25 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Iterative reweighted least squares E(w) = Φ T (y t) H = Φ T RΦ with t = (t,..., t n ) T, y = (y,..., y n ) T R diagonal matrix with R n,n = y n ( y n ) φ T Φ =... φ T N Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 26 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Iterative reweighted least squares Iterative method w w (Φ T RΦ) Φ T (y t) Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 27 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Multiclass logistic regression with a k = w T k φ. P(C k φ) = y k (φ) = exp(a k) j exp(a j) }{{} softmax Discriminative model P(T w,... w K ) = N n= k= with y nk = y k (φ n ) and T n,k = t nk. K P(C k φ n ) t nk = N K n= k= y t nk nk Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 28 / 29

Sapienza University of Rome, Italy - Machine Learning (27/28) Multiclass logistic regression Cross-entropy error function N K E(w,... w K ) = ln P(T w,... w K ) = t nk ln y nk Gradient n= k= Hessian matrix N wj E(w,... w K ) = (y nj t nj )φ n n= N wk wj E(w,... w K ) = y nk (I kj y nj )φ n φ T n n= Luca Iocchi, Valsamis Ntouskos 7. Logistic and Linear Regression 29 / 29