FSAN815/ELEG815: Foundations of Statistical Learning

Similar documents
Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Lecture 5: LDA and Logistic Regression

STAT5044: Regression and Anova

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Linear Regression Models P8111

Cox regression: Estimation

Linear Decision Boundaries

Logistic Regression and Generalized Linear Models

Lecture 16 Solving GLMs via IRWLS

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

DISCRIMINANT ANALYSIS: LDA AND QDA

Proportional hazards regression

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Logistic Regression. Seungjin Choi

Lecture 15: Logistic Regression

Statistical Machine Learning Hilary Term 2018

Lecture 6: Methods for high-dimensional problems

Logistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Iterative Reweighted Least Squares

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

SB1a Applied Statistics Lectures 9-10

Machine Learning Linear Classification. Prof. Matteo Matteucci

Intelligent Systems (AI-2)

Generalized Linear Models

Multinomial Logistic Regression Algorithms

Intelligent Systems (AI-2)

Neural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.

Kernel Logistic Regression and the Import Vector Machine

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Lecture 2 Part 1 Optimization

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Outline of GLMs. Definitions

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Lecture 5 Multivariate Linear Regression

Ch 4. Linear Models for Classification

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Machine Learning 2017

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Linear Methods for Prediction

Logistic regression model for survival time analysis using time-varying coefficients

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Machine Learning Lecture 7

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Lecture 4 Logistic Regression

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Lecture 5: Classification

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Classification 2: Linear discriminant analysis (continued); logistic regression

Machine Learning (CS 567) Lecture 5

Linear Methods for Prediction

ELEG-636: Statistical Signal Processing

Logistic Regressions. Stat 430

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Fall 2003: Maximum Likelihood II

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

Lecture 4 Towards Deep Learning

Week 5: Logistic Regression & Neural Networks

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

LDA, QDA, Naive Bayes

Weighted Least Squares I

Multiclass Logistic Regression

Classification. Chapter Introduction. 6.2 The Bayes classifier

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Linear Models for Classification

Machine Learning. 7. Logistic and Linear Regression

Logistisk regression T.K.

Linear Regression and Discrimination

ECE521 Lecture7. Logistic Regression

Gaussian Graphical Models and Graphical Lasso

COM336: Neural Computing

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

MATH 450: Mathematical statistics

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

2 Nonlinear least squares algorithms

POLI 8501 Introduction to Maximum Likelihood Estimation

Approximate Normality, Newton-Raphson, & Multivariate Delta Method

Lecture 3 - Linear and Logistic Regression

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Study Notes on the Latent Dirichlet Allocation

STA 450/4000 S: January

Probabilistic Graphical Models

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

CMSC858P Supervised Learning Methods

Linear and logistic regression

Linear Models in Machine Learning

Transcription:

FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014

Course Objectives & Structure Course Objectives & Structure The course provides an introduction to the mathematics of data analysis and a detailed overview of statistical models for inference and prediction. Course Structure: Weekly lectures [notes: www.ece.udel.edu/ arce/courses] Homework & computer assignments [30%] Midterm & Final examinations [70%] Textbooks: Papoulis and Pillai, Probability, random variables, and stochastic processes. Hastie, Tibshirani and Friedman, The elements of statistical learning. Haykin, Adaptive Filter Theory. Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 2 / 18

Logistic Regression Logistic Regression Logistic Regression The logistic regression model arises from the desire to model the posterior probabilities of the K classes via linear functions in x, while at the same time ensuring that they sum to one and remain in [0,1]. Pr(G = 1 X = x) log Pr(G = K X = x) = β 10 + β T 1 x Pr(G = 2 X = x) log Pr(G = K X = x) = β 20 + β T 2 x Pr(G = K 1 X = x) log = β (K 1)0 + β T 1 Pr(G = K X = x) x.. (1) Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 3 / 18

Logistic Regression Logistic Regression Here β k = [β k1,β k2,...,β kp ] T, x = [x 1,x 2,...,x p ] T. The model is specified in terms of K 1 log-odds or logit transformations(reflecting the constraint that the probabilities sum to one). A simple calculation shows that exp(β Pr(G = k X = x) = k0 + β T k x) 1 + K 1 l=1 exp(β l0 + β T,k = 1,2,...,K 1, l x) 1 Pr(G = K X = x) = 1 + K 1 l=1 exp(β l0 + β T l x) To emphasize the dependence on the entire parameter set θ = {β 10, β Y 1,...,β (K 1)0, β T K 1 }, we denote the probabilities Pr(G = k X = x) = p k (x;θ) (2) Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 4 / 18

Fitting Logistic Regression Models Fitting Logistic Regression Models Logistic regression models are usually fit by maximum likelihood, using the conditional likelihood of G given X. The log-likelihood for N observations is l(θ) = N i=1 where p k (x i ;θ) = Pr(G = k X = x i ;θ) logp gi (x i ;θ) (3) Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 5 / 18

Fitting Logistic Regression Models Fitting Logistic Regression Models Discuss in detail the two-class case. Code the two-class g i via a 0/1 response y i, where y i = 1 when g i = 1, and y i = 0 when g i = 2. Let p 1 (x;θ) = p(x;θ), p 2 (x;θ) = 1 p(x;θ) The log-likelihood can be written as l(θ) = = = = N i=1 N i=1 N i=1 N i=1 {y i logp(x i ; β) + (1 y i )log(1 p(x i ; β))} {y i (logp(x i ; β) log(1 p(x i ; β))) + log(1 p(x i ; β))} {y i log Pr(G = 1 X = x i) Pr(G = 2 X = x i ) + log(pr(g = 2 X = x i))} {y i β T x i log(1 + e β T x i )} (4) Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 6 / 18

Fitting Logistic Regression Models Fitting Logistic Regression Models Here β = {β 10, β 1 }, and we assume that the vector of inputs x i includes the constant term 1 to accommodate the intercept. To maximize the log-likelihood, we set its derivatives to zero. These score equations are l(β) β which are p + 1 equations nonlinear in β. N = x i (y i p(x i ; β)) = 0 (5) i=1 Since the first component of x i is 1, the first score equation specifies that N i=1 y i = N i=1 p(x i; β), the expected number of class ones matches the observed number. Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 7 / 18

Fitting Logistic Regression Models Newton Raphson algorithm To solve the score equations (5), we use the Newton Raphson algorithm, which requires the second-derivative or Hessian matrix 2 N l(β) β β T = x i x T i p(x i ; β)(1 p(x i ; β)) (6) i=1 Starting with β old, a single Newton update is β new = β old ( 2 l(β) l(β) ) 1 T β β β (7) where the derivatives are evaluated at β old. Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 8 / 18

Fitting Logistic Regression Models Newton Raphson algorithm Write the score and Hessian in matrix notation. y: N vector of y i values. x i : p + 1 vector of input. X: N (p + 1) matrix of x i values. p: N vector of fitted probabilities with ith element p(x i ; β old ). W: N N diagonal matrix of weights with ith diagonal element p(x i ; β old )(1 p(x i ; β old )). Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 9 / 18

Fitting Logistic Regression Models Fitting Logistic Regression Models Then we have l(β) β = XT (y p) 2 l(β) β β T = XT WX (8) The Newton step is thus β new = β old + (X T WX) 1 X T (y p) = (X T WX) 1 X T W(Xβ old + W 1 (y p)) = (X T WX) 1 X T Wz (9) Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 10 / 18

Fitting Logistic Regression Models Fitting Logistic Regression Models In the second and third line we have re-expressed the Newton step as a weighted least squares step, with the response sometimes known as the adjusted response. z = Xβ old + W 1 (y p) (10) These equations get solved repeatedly, since at each iteration p changes, and hence so does W and z. This algorithm is referred to as iteratively reweighted least squares or IRLS, since each iteration solves the weighted least squares problem: β new argmin(z Xβ) T W(z Xβ) (11) β Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 11 / 18

Exercise Exercise 4.10 Weekly data set is part of the ISLR package. It contains 1089 weekly percentage returns for 21 years, from the beginning of 1990 to the end of 2010. Lag1 to Lag5 Volume Today Direction The percentage returns for each of the five previous trading weeks The number of shares traded on the previous week, in billions The percentage return on the week in question Whether the market was Up or Down on this week Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 12 / 18

Exercise Exercise (b) Use the full data set to perform a logistic regression. library(islr) attach(weekly) glm.fit=glm(direction Lag1+Lag2+Lag3+Lag4+Lag5+Volume,data=Weekly,famil summary(glm.fit) According to p-value, lag1, lag2 and lag4 appear to be statistically significant. Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 13 / 18

Exercise Exercise (c) Compute the confusion matrix and overall fraction of correct predictions. glm.probs=predict(glm.fit,type="response") glm.pred =rep("down",1089) glm.pred [glm.probs>.5]="up" table(glm.pred,direction) mean(glm.pred==direction) Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 14 / 18

Exercise Exercise (d) Fit the logistic regression model using a training data then predict. train=(year<2009) A=(Year==2009) B=(Year==2010) Weekly.2009=Weekly[A,] Weekly.2010=Weekly[B,] Direction.2009=Direction[A] Direction.2010=Direction[B] glm.fit=glm(direction Lag2,data=Weekly,family=binomial,subset=train) glm.probs=predict(glm.fit,weekly.2009,type="response") glm.pred =rep("down",52) glm.pred [glm.probs>.5]="up" table(glm.pred,direction.2009) mean(glm.pred==direction.2009) Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 15 / 18

Exercise Exercise Figure: prediction for 2009 Figure: prediction for 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 16 / 18

Exercise Exercise (d) Fit the LDA model using a training data then predict. library(mass) lda.fit=lda(direction Lag2,data=Weekly,subset=train) lda.pred=predict(lda.fit,weekly.2009) lda.class=lda.pred$class table(lda.class,direction.2009) mean(lda.class==direction.2009) Figure: prediction for 2009 Figure: prediction for 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 17 / 18

Exercise Exercise (e) Fit the QDA model using a training data then predict. qda.fit=qda(direction Lag2,data=Weekly,subset=train) qda.pred=predict(qda.fit,weekly.2009) qda.class=qda.pred$class table(qda.class,direction.2009) mean(qda.class==direction.2009) Figure: prediction for 2009 Figure: prediction for 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 18 / 18