Lecture 5: LDA and Logistic Regression
|
|
- Douglas Washington
- 5 years ago
- Views:
Transcription
1 Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39
2 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant Analysis () Logistic Regression Models Take-home message: Both and Logistic regression models rely on the linear-odd assumption, indirectly or directly. However, they estimate the coefficients in a different manner. Hao Helen Zhang Lecture 5: and Logistic Regression 2 / 39
3 Linear Classifier Linear Classification Methods Linear methods: The decision boundary is linear. Common linear classification methods: Linear regression methods (covered in Lecture 3) Linear log-odds (logit) models Linear logistic models Linear discriminant analysis () separating hyperplanes (introduced later) perceptron model (Rosenblatt 1958) Optimal separating hyperplane (Vapnik 1996) SVMs From now on, we assume equal costs (by default). Hao Helen Zhang Lecture 5: and Logistic Regression 3 / 39
4 Odds, Logit, and Linear Some terminologies Call the term Call log Pr(Y =1 X=x) Pr(Y =0 X=x) Pr(Y =1 X=x) Pr(Y =0 X=x) is called odds log of the odds, or logit function Linear odds models assume: the logit is linear in x, i.e., log Examples:, Logistic regression Pr(Y = 1 X = x) Pr(Y = 0 X = x) = β 0 + β T 1 x. Hao Helen Zhang Lecture 5: and Logistic Regression 4 / 39
5 Classifier Based on Linear Odd Models From the linear odds, we can obtain posterior class probabilities Pr(Y = 1 x) = Pr(Y = 0 x) = exp (β 0 + β1 T x) 1 + exp (β 0 + β1 T x) exp (β 0 + β1 T x) Assuming equal costs, the decision boundary is given by {x P(Y = 1 x) = 0.5} = {x β 0 + β T 1 x = 0}, which can be interpreted as zero log-odds Hao Helen Zhang Lecture 5: and Logistic Regression 5 / 39
6 logit function log[s/(1 s)] logistic curve exp(t)/(1+exp(t)) g(s) p(t) s t Hao Helen Zhang Lecture 5: and Logistic Regression 6 / 39
7 Linear Discriminant Analysis () assumes Assume each class density is multivariate Gaussian, i.e., X Y j N(µ j, Σ j ), j = 0, 1. Equal covariance assumption Σ j = Σ, j = 0, 1. In other words, both classes are from Gaussian and they have the same covariance matrix. Hao Helen Zhang Lecture 5: and Logistic Regression 7 / 39
8 Linear Discriminant Function Under the mixture Gaussian assumption, the log-odd is log Pr(Y = 1 X = x) Pr(Y = 0 X = x) = log π 1 π (µ 1 + µ 0 ) T Σ 1 (µ 1 µ 0 ) + x T Σ 1 (µ 1 µ 0 ) Under equal costs, the classifies to 1 if and only if [ log π 1 1 ] π 0 2 (µ 1 + µ 0 ) T Σ 1 (µ 1 µ 0 ) + x T Σ 1 (µ 1 µ 0 ) > 0. It has a linear boundary {x : β 0 + x T β 1 = 0}, with β 0 = log π 1 1 π 0 2 (µ 1 + µ 0 ) T Σ 1 (µ 1 µ 0 ), β 1 = Σ 1 (µ 1 µ 0 ). Hao Helen Zhang Lecture 5: and Logistic Regression 8 / 39
9 Parameter Estimation in In practice, π 1, π 0, µ 0, µ 1, Σ are unknown We estimate the parameters from the training data, using MLE or the moment estimator ˆπ j = n j /n, where n k is the size size of class j. ˆµ j = Y i =j x i/n j for j = 0, 1. The sample covariance matrix is S j = 1 n j 1 (x i ˆµ j )(x i ˆµ j ) T Y i =j (Unbiased) pooled sample covariance is a weighted average ˆΣ = = n 0 1 (n 0 1) + (n 1 1) S n (n 0 1) + (n 1 1) S 1 1 (x i ˆµ j )(x i ˆµ j ) T /(n 2) j=0 Y i =j Hao Helen Zhang Lecture 5: and Logistic Regression 9 / 39
10 R code for Fitting (I) There are two ways to call the function lda. The first way is to use a formula and an optional data frame. library(mass) lda(formula, data,subset) Arguments: Output: formula: the form groups x 1 + x , where the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. data: data frame from which variables specified subset: An index vector specifying the cases to be used in the training sample. an object of class lda with multiple components Hao Helen Zhang Lecture 5: and Logistic Regression 10 / 39
11 R code for Fitting (II) The second way is to use a matrix and group factor as the first two arguments. library(mass) lda(x, grouping, prior = proportions, CV = FALSE) Arguments: Output: x: a matrix or data frame or Matrix containing predictors. grouping: a factor specifying the class for each observation. prior: the prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If CV = TRUE, the return value is a list with components class (the MAP classification, a factor) and posterior (posterior probabilities for the classes). Hao Helen Zhang Lecture 5: and Logistic Regression 11 / 39
12 R code for Prediction We use the predict or predict.lda function to classify multivariate observations with lda predict(object, newdata,...) Arguments: Output: object: object of class lda newdata: data frame of cases to be classified or, if object has a formula, a data frame with columns of the same names as the variables used. a list with the components class (the MAP classification, a factor) and posterior (posterior probabilities for the classes) Hao Helen Zhang Lecture 5: and Logistic Regression 12 / 39
13 Fisher s Iris Data (Three-Classification Problems) Fisher (1936) The use of multiple measurements in taxonomic problems. Three species: Iris setosa, Iris versicolor, Iris verginica Four features: the length and the width of the sepals and petals, in centimeters. 50 samples from each species In the following analysis, we randomly select 50% of the data points as the training set, and the rest as the test set. Hao Helen Zhang Lecture 5: and Logistic Regression 13 / 39
14 Hao Helen Zhang Lecture 5: and Logistic Regression 14 / 39
15 Illustration 1 Linear Classification Methods Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c("s","c","v"), rep(50,3))) train <- sample(1:150, 75) table(iris$sp[train]) z <- lda(sp ~., Iris, prior = c(1,1,1)/3, subset = train) Hao Helen Zhang Lecture 5: and Logistic Regression 15 / 39
16 Training Error and Test Error #training error rate ytrain <- predict(z, Iris[train, ])$class table(ytrain, Iris$Sp[train]) train_err <- mean(ytrain!=iris$sp[train]) #test error rate ytest <- predict(z, Iris[-train, ])$class table(ytest, Iris$Sp[-train]) test_err <- mean(ytest!=iris$sp[-train]) Hao Helen Zhang Lecture 5: and Logistic Regression 16 / 39
17 Illustration 2 Linear Classification Methods tr <- sample(1:50, 25) train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3]) test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3]) cl <- factor(c(rep("s",25), rep("c",25), rep("v",25))) z <- lda(train, cl) ytest <- predict(z, test)$class Hao Helen Zhang Lecture 5: and Logistic Regression 17 / 39
18 Logistic Regression Linear Classification Methods Model assumption: the log-odd is linear in x. Define log Pr(Y = 1 X = x) Pr(Y = 0 X = x) = β 0 + β T 1 x. p(x; β) = exp (β 0 + β T 1 x) 1 + exp (β 0 + β T l x) Write β = (β 0, β 1 ). The posterior class probabilities can be calculated as p 1 (x) = Pr(Y = 1 x) = p(x; β), p 0 (x) = Pr(Y = 0 x) = 1 p(x; β). The classification boundary is given by: {x : β 0 + β T 1 x = 0}. Hao Helen Zhang Lecture 5: and Logistic Regression 18 / 39
19 Model Interpretation Denote µ = E(Y X) = P(Y = 1 X) By assuming g(µ) = log[µ/(1 µ)] = β 0 + β1 T X, the logit g connects µ with the linear predictor β 0 + β1 T X. We call g the link function Var(Y X) = µ(1 µ). Hao Helen Zhang Lecture 5: and Logistic Regression 19 / 39
20 Interpretation of β j In logistic regression, odds Xj = odds(..., X j = x + 1,...) odds(..., X j = x,...) = e β j. If X j = 0 or 1, then odds for group with X j = 1 are e β j higher than for group with X j = 0, with other parameters fixed. For rare diseases, when incidence is rare, Pr(Y = 0) 1, then odds Pr(Y = 1) e β j = odds Xj Pr(..., X j = x + 1,...). Pr(..., X j = x,...) in a cancer study, an log-or of 5 means that smokers are e times more likely to develop the cancer Hao Helen Zhang Lecture 5: and Logistic Regression 20 / 39
21 Maximum Likelihood Estimate (MLE) for Logistic Models The joint conditional likelihood of y i given x i is l(β) = n log p yi (x; β), i=1 where In details, l(β) = =. p y (β; x) = p(x; β) y [1 p(x; β)] 1 y. n {y i log p(x i ; β) + (1 y i ) log[1 p(x i, β)]} i=1 n {y i (β 10 + β T 1 x i ) log[1 + exp (β 10 + β T 1 x i )]. i=1 Hao Helen Zhang Lecture 5: and Logistic Regression 21 / 39
22 Score Equations Linear Classification Methods For simplicity, now assume x i has 1 in its first component. l(β) n β = x i [y i p(x i ; β)] = 0, i=1 Totally (d+1) nonlinear equations. The first equation n y i = i=1 n p(x i ; β), i=1 expected number of 1 s = observed number in sample. y = [y 1,, y n ] T p = [p(x 1 ; β old ),, p(x n ; β old )] T. { W = diag p(x i ; β old )(1 p(x i ; β old } )). Hao Helen Zhang Lecture 5: and Logistic Regression 22 / 39
23 Newton-Raphson Algorithm The second-derivative (Hessian) matrix 2 l(β) β β T = n x i x T i p(x i ; β)[1 p(x i ; β)]. i=1 1 Choose an initial value β 0 2 Update β by β new = β old Using matrix notations, we have [ 2 ] 1 l(β) l(β) β β T β old β β old l(β) β = X T (y p), 2 l(β) β β T = X T WX. Hao Helen Zhang Lecture 5: and Logistic Regression 23 / 39
24 Iteratively Re-weighted Least Squares (IRLS) Newton-Raphson step β new = β old + (X T WX ) 1 X T (y p) ( = (X T WX ) 1 X T W X β old ) + W 1 (y p) = (X T WX ) 1 X T W z, where we defined the adjusted response z = X β old + W 1 (y p) Repeatedly solve weighted least squares till convergence. β new = arg min β (z X β)t W (z X β), Weight W, response z, and p change in each iteration. The algorithm can be generalized to K 3 case. Hao Helen Zhang Lecture 5: and Logistic Regression 24 / 39
25 Quadratic Approximations β satisfies a self-consistency relationship: it solves a weighted least square fit with response z i = x T i β + (y i ˆp i ) ˆp i (1 ˆp i ) and the weight w i = ˆp i (1 ˆp i ). β = 0 seems to be a good starting value Typically the algorithm converges, but it is never guaranteed. The weighted residual sum-of-squared is Pearson chi-square statistic n (y i ˆp i ) 2 ˆp i (1 ˆp i ), i=1 a quadratic approximation to the deviance. Hao Helen Zhang Lecture 5: and Logistic Regression 25 / 39
26 Statistical Inferences Using the weighted least squares formulation, we have Asymptotic likelihood theory says: if the model is correct, β is consistent Using central limit theorem, the distribution of β converges to N(β, (X T WX ) 1 ) Model building is costly due to iterations, popular shortcuts: For inclusion of a term, use Rao score test. For exclusion of a term, use Wald test. Neither of these two algorithms require iterative fitting, and are based on the maximum likelihood fit of the current model. Hao Helen Zhang Lecture 5: and Logistic Regression 26 / 39
27 R Code for Logistic Regression logist <- glm(formula, family, data, subset,...) predict(logist) Arguments: Output: formula: an object of class formula family: a description of the error distribution and link function to be used in the model. data: an optional data frame, list or environment containing the variables in the model. subset: an optional vector specifying a subset of observations to be used in the fitting process. returns an object of class inheriting from glm and lm Hao Helen Zhang Lecture 5: and Logistic Regression 27 / 39
28 R Code for Logistic Prediction logist <- glm(y~x, data, family=binomial(link="logit")) predict(logist, newdata) summary(logist) anova(logist) The function predict gives the predicted values, newdata is a data frame The function summary is used to obtain or print a summary of the results The function anova produces an analysis of variance table. Hao Helen Zhang Lecture 5: and Logistic Regression 28 / 39
29 Relationship between and Least Squares (LS) For two-class problems, both and least squares fit a linear boundary β 0 + β T x. Their solutions have the following relationship: The least square regression coefficient ˆβ is proportional to the direction, i.e., ˆβ ˆΣ 1 (ˆµ 1 ˆµ 0 ), where ˆΣ is the pooled sample covariance matrix, and ˆµ k is the sample mean of points from class k, k = 0, 1. In other words, their slope coefficients are identical, up to a scalar multiple. (Exercise 4.2 in textbook) The LS intercept ˆβ 0 is generally different from that of, unless n 1 = n 0. In general, they have different decision rules (unless n 1 = n 2 ). Hao Helen Zhang Lecture 5: and Logistic Regression 29 / 39
30 Common Feature of and Logistic Regression Models For both and logistic regression, the logit has a linear form log Pr(Y = 1 x) Pr(Y = 0 x) = β 0 + β T 1 x. Or equivalently, for both estimators, their posterior class probability can be expressed in the form of Pr(Y = 1 x) = exp (β 0 + β T 1 x) 1 + exp (β 0 + β T 1 x). They have exactly same forms. Are they same estimators? Hao Helen Zhang Lecture 5: and Logistic Regression 30 / 39
31 Major Differences of and Logistsics Regresion Main differences of two estimators include Where is the linear-logit from? What assumptions are made on the data distribution? (Difference on data distribution) How to estimate the linear coefficients? (Difference in parameter estimation) Any assumption on the marginal density of X? (flexibility of model) Hao Helen Zhang Lecture 5: and Logistic Regression 31 / 39
32 Where is Linear-logit from? For, the linear logit is due to the equal-covariance Gaussian assumption on data log Pr(Y = 1 x) Pr(Y = 0 x) = log π 1 π (µ 1 + µ 0 ) T Σ 1 (µ 1 + µ 0 ) +x T Σ 1 (µ 1 µ 0 ) = β 0 + β T 1 x For Logistic model, the linear logit is due to construction log Pr(Y = 1 x) Pr(Y = 0 x) = β 0 + β T 1 x Hao Helen Zhang Lecture 5: and Logistic Regression 32 / 39
33 Difference in Marginal Density Assumption The assumptions on Pr(X): The logistic model leaves the marginal density of X arbitrary and unspecified. The model assumes a Gaussian density Pr(X) = 1 π j φ(x; µ j, Σ) j=0 Conclusion: The logistic model makes less assumptions about the data, and hence is more general. Hao Helen Zhang Lecture 5: and Logistic Regression 33 / 39
34 Difference in Parameter Estimation Logistic regression Maximizing the conditional likelihood, the multinomial likelihood with probabilities Pr(Y = k X) The marginal density Pr(X) is totally ignored (fully nonparametric using the empirical distribution function which places 1/n at each observation) Maximizing the full log-likelihood based on the joint density Pr(X, Y = j) = φ(x; µ j, Σ)π j, Standard MLE theory leads to estimators ˆµ j, ˆΣ, ˆπ j Marginal density does play a role Hao Helen Zhang Lecture 5: and Logistic Regression 34 / 39
35 More Comments Linear Classification Methods is easier to compute than logistic regression. If the true f k (x) s are Gaussian, is better. Logistic regression may lose efficiency around 30% asymptotically in error rate (by Efron 1975) Robustness? uses all the points to estimate the covariance matrix; more information but not robust against outliers Logistic regression down-weights points far from decision boundary (recall the weight is p i (1 p i ); more robust and safer In practice, these two methods often give similar results (for approximately normal distributed data) Hao Helen Zhang Lecture 5: and Logistic Regression 35 / 39
36 Two-dimensional Linear Example Consider the following scenarios for a two-class problem: π 1 = π 0 = 0.5. Class 1: X N 2 ((1, 1) T, 4I) Class 2: X N 2 (( 1, 1) T, 4I) The Bayes boundary is We generate {x : x 1 x 2 = 0}. the training set of n = 200, 400 to fit the classifier the testing set of n = 2000 to evaluate its prediction performance. Hao Helen Zhang Lecture 5: and Logistic Regression 36 / 39
37 Training Scatter Plot x1 x Bayes and Linear Classifiers x1 x2 bay lin Logistic and Classifiers x1 x2 log lda All Linear Classifiers x1 x2 bay lin log lda Hao Helen Zhang Lecture 5: and Logistic Regression 37 / 39
38 Performance of Various Classifiers (n=200) Bayes Linear Logistic Train Error Test Error Fitted classification boundaries: Linear & : x 2 = x 1 Logistic: x 2 = x 1. Hao Helen Zhang Lecture 5: and Logistic Regression 38 / 39
39 Training Scatter Plot x1 x Bayes and Linear Classifiers x1 x2 bay lin Logistic and Classifiers x1 x2 log lda All Linear Classifiers x1 x2 bay lin log lda Hao Helen Zhang Lecture 5: and Logistic Regression 39 / 39
Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLecture 5: Classification
Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationSTA 450/4000 S: January
STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLogistic Regression and Generalized Linear Models
Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationA Bahadur Representation of the Linear Support Vector Machine
A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSupervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012
Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationFSAN815/ELEG815: Foundations of Statistical Learning
FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLecture 16: Modern Classification (I) - Separating Hyperplanes
Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis
Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationIntroduction to Data Science
Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLEC 4: Discriminant Analysis for Classification
LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationLinear Models for Classification
Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationClassification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.
Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),
More informationMachine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1
Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap
More informationBayesian Classification Methods
Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationPoisson regression 1/15
Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLogistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014
Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the
More informationLecture 6: Discrete Choice: Qualitative Response
Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationLecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More information