ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Similar documents
ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

MS&E 226: Small Data

MS&E 226: Small Data

MATH 829: Introduction to Data Mining and Analysis Logistic regression

Lecture 4: Newton s method and gradient descent

Machine Learning Linear Classification. Prof. Matteo Matteucci

STA 450/4000 S: January

Classification. Chapter Introduction. 6.2 The Bayes classifier

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Logistic Regression and Generalized Linear Models

Linear Regression Models P8111

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

LOGISTIC REGRESSION Joseph M. Hilbe

Stat 642, Lecture notes for 04/12/05 96

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Linear Classification: Probabilistic Generative Models

Beyond GLM and likelihood

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Sparse Linear Models (10/7/13)

Theorems. Least squares regression

Data Mining Stat 588

Logistic Regression. Machine Learning Fall 2018

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Model Selection. Frank Wood. December 10, 2009

Data-analysis and Retrieval Ordinal Classification

Data Mining 2018 Logistic Regression Text Classification

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Lecture 5: LDA and Logistic Regression

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Analysing categorical data using logit models

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian Learning (II)

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Classification: Linear Discriminant Analysis

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification Based on Probability

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Linear Models for Classification

R Hints for Chapter 10

Generalized logit models for nominal multinomial responses. Local odds ratios

MS&E 226: Small Data

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Support Vector Machines

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Statistics in medicine

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Topic 18: Model Selection and Diagnostics

Introduction to Statistical Analysis

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STA 4273H: Statistical Machine Learning

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

CS 340 Lec. 16: Logistic Regression

Linear Regression With Special Variables

COMS 4771 Regression. Nakul Verma

Simple logistic regression

Machine Learning Lecture 7

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

Age 55 (x = 1) Age < 55 (x = 0)

Outline. Topic 13 - Model Selection. Predicting Survival - Page 350. Survival Time as a Response. Variable Selection R 2 C p Adjusted R 2 PRESS

Bayesian Decision Theory

Discriminant Analysis and Statistical Pattern Recognition

Generalized Linear Models for Non-Normal Data

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Cheng Soon Ong & Christian Walder. Canberra February June 2018

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Logistic Regression 21/05

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

8 Nominal and Ordinal Logistic Regression

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Neural networks (not in book)

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

CSCE 478/878 Lecture 6: Bayesian Learning

Introduction to Machine Learning

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Machine Learning - Waseda University Logistic Regression

Regression I: Mean Squared Error and Measuring Quality of Fit

9 Generalized Linear Models

Transcription:

ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The linear regression model: The output is called response (dependent variable) s are unknown parameters (coefficients) 2

Linear Regression Models Least Squares A set of training data Each corresponding to attributes Each is a class attribute value / a label Wish to estimate the parameters 3

Linear Regression Models Least Squares One common approach the method of least squares: Pick the coefficients minimize the residual sum of squares: to 4

Linear Regression Models Least Squares This criterion is reasonable if the training observations represent independent draws. Even if the s were not drawn randomly, the criterion is still valid if the s are conditionally independent given the inputs. 5

Linear Regression Models Least Squares Make no assumption about the validity of the model Simply finds the best linear fit to the data 6

Linear Regression Models Finding Residual Sum of Squares Denote by the matrix with each row an input vector (with a 1 in the first position) Let be the N vector of outputs in the training set Quadratic function in the parameters: 7

Linear Regression Models Finding Residual Sum of Squares Set the first derivation to zero: Obtain the unique solution: 8

Linear Regression Models Orthogonal Projection The fitted values at the training inputs are: The matrix appearing in the above equation, called hat matrix because it puts the hat on 9

Linear Regression Models Example Training Data: x y (1, 2, 1) 22 (2, 0, 4) 49 (3, 4, 2) 39 (4, 2, 3) 52 (5, 4, 1) 38 10

Linear Regression Models Example 1 1 2 1 1 2 0 4 1 3 4 2 1 1 4 5 2 4 3 1 22 49 39 52 38 4.04 0.51 8.43 8.13 11

Linear Regression Models Example 4.04 0.51 8.43 8.13 21.61 0.39 residual vector 49.91 0.91 39.13 0.13 50.57 1.4 38.78 0.78 12

Suppose there are classes, labeled 1,2,, A class of methods that model for each class. Then, classify to the class with the largest value for its discriminant function 13

Desire to model the posterior probabilities of the classes via linear functions in (p dimensional vector) Ensuring they sum to one and remain in Model: 14

Specified in terms of log odds or logit transformations Choice of denominator is arbitrary estimates are equivariant under this choice l l l Sum to 1 l l l 15

Two class Classification For two class classification, we can model two classes as 0 and 1. Treating the class 1 as the concept of interest, the posterior probability can be regarded as the class membership probability: Pr 1 exp 1exp logistic function As a result, it maps in p dimensional space to a value in [0,1] 16

Shape of sigmoid curve Consider 1 dimensional Pr 17

An Example of One dimension We wish to predict death from baseline APACHE II score of patients. Let Pr be the probability that a patient with score will die. Note that linear regression would not work well since it could produce probabilities less than 0 or greater than 1 18

An Example of One dimension Data that has a sharp survival cut off point between patients who live or die will lead to a large value of 19

An Example of One dimension One the other hand, if the data has a lengthy transition from survival to death, it will lead to a low value of 20

Model Fitting for General Cases (K classes, p Dimension) Logistic regression models fit by maximum likelihood using the conditional likelihood of given completely specifies the conditional distribution the multinomial distribution is appropriate 21

Model Fitting for General Cases (K classes, p Dimension) Let entire parameter set be, then Log likelihood for observations of input data and class labels: where Find the model that maximizes the log likelihood. 22

Example The subset of the Coronary Risk Factor Study (CORIS) baseline survey, carried out in three rural areas of the Western Cape, South Africa Aim: establish the intensity of ischemic heart disease risk factors in that high incidence region Response variable is the presence or absence of myocardial infraction (MI) at the time of survey 160 cases in data set, sample of 302 controls 23

Example 24

Example Fit a logistic regression model by maximum likelihood, giving the results shown in the next slide z scores for each coefficients in the model (coefficients divided by their standard errors) 25

Example Results from a logistic regression fit to the South African heart disease data: Coefficient Std. Error Z Score (Intercept) 4.130 0.964 4.285 sbp 0.006 0.006 1.023 tobacco 0.080 0.026 3.034 ldl 0.185 0.057 3.219 famhist 0.939 0.225 4.178 obesity 0.035 0.029 1.187 alcohol 0.001 0.004 0.136 age 0.043 0.010 4.184 26

Example z scores greater than approximately 2 in absolute value is significant at the 5% level Some surprises in the table of coefficients sbp and obesity appear to be not significant On their own, both sbp and obesity are significant, with positive sign Presence of many other correlated variables no longer needed (can even get a negative sign) 27

3 common transformation/ link function (provided by SAS): Logit : ln(p/1-p) (We call this log of odd ratio) Probit: Normal inverse of p (Recall: normal table s mapping scheme) Complementary log-log: ln(-ln(1-p)) The choice of link function depends on your purpose rather than performance. They all perform equally good but the implications is a bit different. 28

Related Measures Akaike s Information Criteria (AIC) and Schwarz s Bayesian Criteria (SBC) Related to the explanatory power of the model Both has smaller value for higher maximized likelihood, so smaller value is preferred Z score (T score) Whether the coefficient is significant. It is computed by the coefficient value divided by its standard error If we are using the model for predicting the outcome rather than the probability for that outcome, the interpretation for miss classification rate/profit and loss/lift chart is similar to those for decision trees. 29