Lecture 5: LDA and Logistic Regression

Size: px
Start display at page:

Download "Lecture 5: LDA and Logistic Regression"

Transcription

1 Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39

2 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant Analysis () Logistic Regression Models Take-home message: Both and Logistic regression models rely on the linear-odd assumption, indirectly or directly. However, they estimate the coefficients in a different manner. Hao Helen Zhang Lecture 5: and Logistic Regression 2 / 39

3 Linear Classifier Linear Classification Methods Linear methods: The decision boundary is linear. Common linear classification methods: Linear regression methods (covered in Lecture 3) Linear log-odds (logit) models Linear logistic models Linear discriminant analysis () separating hyperplanes (introduced later) perceptron model (Rosenblatt 1958) Optimal separating hyperplane (Vapnik 1996) SVMs From now on, we assume equal costs (by default). Hao Helen Zhang Lecture 5: and Logistic Regression 3 / 39

4 Odds, Logit, and Linear Some terminologies Call the term Call log Pr(Y =1 X=x) Pr(Y =0 X=x) Pr(Y =1 X=x) Pr(Y =0 X=x) is called odds log of the odds, or logit function Linear odds models assume: the logit is linear in x, i.e., log Examples:, Logistic regression Pr(Y = 1 X = x) Pr(Y = 0 X = x) = β 0 + β T 1 x. Hao Helen Zhang Lecture 5: and Logistic Regression 4 / 39

5 Classifier Based on Linear Odd Models From the linear odds, we can obtain posterior class probabilities Pr(Y = 1 x) = Pr(Y = 0 x) = exp (β 0 + β1 T x) 1 + exp (β 0 + β1 T x) exp (β 0 + β1 T x) Assuming equal costs, the decision boundary is given by {x P(Y = 1 x) = 0.5} = {x β 0 + β T 1 x = 0}, which can be interpreted as zero log-odds Hao Helen Zhang Lecture 5: and Logistic Regression 5 / 39

6 logit function log[s/(1 s)] logistic curve exp(t)/(1+exp(t)) g(s) p(t) s t Hao Helen Zhang Lecture 5: and Logistic Regression 6 / 39

7 Linear Discriminant Analysis () assumes Assume each class density is multivariate Gaussian, i.e., X Y j N(µ j, Σ j ), j = 0, 1. Equal covariance assumption Σ j = Σ, j = 0, 1. In other words, both classes are from Gaussian and they have the same covariance matrix. Hao Helen Zhang Lecture 5: and Logistic Regression 7 / 39

8 Linear Discriminant Function Under the mixture Gaussian assumption, the log-odd is log Pr(Y = 1 X = x) Pr(Y = 0 X = x) = log π 1 π (µ 1 + µ 0 ) T Σ 1 (µ 1 µ 0 ) + x T Σ 1 (µ 1 µ 0 ) Under equal costs, the classifies to 1 if and only if [ log π 1 1 ] π 0 2 (µ 1 + µ 0 ) T Σ 1 (µ 1 µ 0 ) + x T Σ 1 (µ 1 µ 0 ) > 0. It has a linear boundary {x : β 0 + x T β 1 = 0}, with β 0 = log π 1 1 π 0 2 (µ 1 + µ 0 ) T Σ 1 (µ 1 µ 0 ), β 1 = Σ 1 (µ 1 µ 0 ). Hao Helen Zhang Lecture 5: and Logistic Regression 8 / 39

9 Parameter Estimation in In practice, π 1, π 0, µ 0, µ 1, Σ are unknown We estimate the parameters from the training data, using MLE or the moment estimator ˆπ j = n j /n, where n k is the size size of class j. ˆµ j = Y i =j x i/n j for j = 0, 1. The sample covariance matrix is S j = 1 n j 1 (x i ˆµ j )(x i ˆµ j ) T Y i =j (Unbiased) pooled sample covariance is a weighted average ˆΣ = = n 0 1 (n 0 1) + (n 1 1) S n (n 0 1) + (n 1 1) S 1 1 (x i ˆµ j )(x i ˆµ j ) T /(n 2) j=0 Y i =j Hao Helen Zhang Lecture 5: and Logistic Regression 9 / 39

10 R code for Fitting (I) There are two ways to call the function lda. The first way is to use a formula and an optional data frame. library(mass) lda(formula, data,subset) Arguments: Output: formula: the form groups x 1 + x , where the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. data: data frame from which variables specified subset: An index vector specifying the cases to be used in the training sample. an object of class lda with multiple components Hao Helen Zhang Lecture 5: and Logistic Regression 10 / 39

11 R code for Fitting (II) The second way is to use a matrix and group factor as the first two arguments. library(mass) lda(x, grouping, prior = proportions, CV = FALSE) Arguments: Output: x: a matrix or data frame or Matrix containing predictors. grouping: a factor specifying the class for each observation. prior: the prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If CV = TRUE, the return value is a list with components class (the MAP classification, a factor) and posterior (posterior probabilities for the classes). Hao Helen Zhang Lecture 5: and Logistic Regression 11 / 39

12 R code for Prediction We use the predict or predict.lda function to classify multivariate observations with lda predict(object, newdata,...) Arguments: Output: object: object of class lda newdata: data frame of cases to be classified or, if object has a formula, a data frame with columns of the same names as the variables used. a list with the components class (the MAP classification, a factor) and posterior (posterior probabilities for the classes) Hao Helen Zhang Lecture 5: and Logistic Regression 12 / 39

13 Fisher s Iris Data (Three-Classification Problems) Fisher (1936) The use of multiple measurements in taxonomic problems. Three species: Iris setosa, Iris versicolor, Iris verginica Four features: the length and the width of the sepals and petals, in centimeters. 50 samples from each species In the following analysis, we randomly select 50% of the data points as the training set, and the rest as the test set. Hao Helen Zhang Lecture 5: and Logistic Regression 13 / 39

14 Hao Helen Zhang Lecture 5: and Logistic Regression 14 / 39

15 Illustration 1 Linear Classification Methods Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c("s","c","v"), rep(50,3))) train <- sample(1:150, 75) table(iris$sp[train]) z <- lda(sp ~., Iris, prior = c(1,1,1)/3, subset = train) Hao Helen Zhang Lecture 5: and Logistic Regression 15 / 39

16 Training Error and Test Error #training error rate ytrain <- predict(z, Iris[train, ])$class table(ytrain, Iris$Sp[train]) train_err <- mean(ytrain!=iris$sp[train]) #test error rate ytest <- predict(z, Iris[-train, ])$class table(ytest, Iris$Sp[-train]) test_err <- mean(ytest!=iris$sp[-train]) Hao Helen Zhang Lecture 5: and Logistic Regression 16 / 39

17 Illustration 2 Linear Classification Methods tr <- sample(1:50, 25) train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3]) test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3]) cl <- factor(c(rep("s",25), rep("c",25), rep("v",25))) z <- lda(train, cl) ytest <- predict(z, test)$class Hao Helen Zhang Lecture 5: and Logistic Regression 17 / 39

18 Logistic Regression Linear Classification Methods Model assumption: the log-odd is linear in x. Define log Pr(Y = 1 X = x) Pr(Y = 0 X = x) = β 0 + β T 1 x. p(x; β) = exp (β 0 + β T 1 x) 1 + exp (β 0 + β T l x) Write β = (β 0, β 1 ). The posterior class probabilities can be calculated as p 1 (x) = Pr(Y = 1 x) = p(x; β), p 0 (x) = Pr(Y = 0 x) = 1 p(x; β). The classification boundary is given by: {x : β 0 + β T 1 x = 0}. Hao Helen Zhang Lecture 5: and Logistic Regression 18 / 39

19 Model Interpretation Denote µ = E(Y X) = P(Y = 1 X) By assuming g(µ) = log[µ/(1 µ)] = β 0 + β1 T X, the logit g connects µ with the linear predictor β 0 + β1 T X. We call g the link function Var(Y X) = µ(1 µ). Hao Helen Zhang Lecture 5: and Logistic Regression 19 / 39

20 Interpretation of β j In logistic regression, odds Xj = odds(..., X j = x + 1,...) odds(..., X j = x,...) = e β j. If X j = 0 or 1, then odds for group with X j = 1 are e β j higher than for group with X j = 0, with other parameters fixed. For rare diseases, when incidence is rare, Pr(Y = 0) 1, then odds Pr(Y = 1) e β j = odds Xj Pr(..., X j = x + 1,...). Pr(..., X j = x,...) in a cancer study, an log-or of 5 means that smokers are e times more likely to develop the cancer Hao Helen Zhang Lecture 5: and Logistic Regression 20 / 39

21 Maximum Likelihood Estimate (MLE) for Logistic Models The joint conditional likelihood of y i given x i is l(β) = n log p yi (x; β), i=1 where In details, l(β) = =. p y (β; x) = p(x; β) y [1 p(x; β)] 1 y. n {y i log p(x i ; β) + (1 y i ) log[1 p(x i, β)]} i=1 n {y i (β 10 + β T 1 x i ) log[1 + exp (β 10 + β T 1 x i )]. i=1 Hao Helen Zhang Lecture 5: and Logistic Regression 21 / 39

22 Score Equations Linear Classification Methods For simplicity, now assume x i has 1 in its first component. l(β) n β = x i [y i p(x i ; β)] = 0, i=1 Totally (d+1) nonlinear equations. The first equation n y i = i=1 n p(x i ; β), i=1 expected number of 1 s = observed number in sample. y = [y 1,, y n ] T p = [p(x 1 ; β old ),, p(x n ; β old )] T. { W = diag p(x i ; β old )(1 p(x i ; β old } )). Hao Helen Zhang Lecture 5: and Logistic Regression 22 / 39

23 Newton-Raphson Algorithm The second-derivative (Hessian) matrix 2 l(β) β β T = n x i x T i p(x i ; β)[1 p(x i ; β)]. i=1 1 Choose an initial value β 0 2 Update β by β new = β old Using matrix notations, we have [ 2 ] 1 l(β) l(β) β β T β old β β old l(β) β = X T (y p), 2 l(β) β β T = X T WX. Hao Helen Zhang Lecture 5: and Logistic Regression 23 / 39

24 Iteratively Re-weighted Least Squares (IRLS) Newton-Raphson step β new = β old + (X T WX ) 1 X T (y p) ( = (X T WX ) 1 X T W X β old ) + W 1 (y p) = (X T WX ) 1 X T W z, where we defined the adjusted response z = X β old + W 1 (y p) Repeatedly solve weighted least squares till convergence. β new = arg min β (z X β)t W (z X β), Weight W, response z, and p change in each iteration. The algorithm can be generalized to K 3 case. Hao Helen Zhang Lecture 5: and Logistic Regression 24 / 39

25 Quadratic Approximations β satisfies a self-consistency relationship: it solves a weighted least square fit with response z i = x T i β + (y i ˆp i ) ˆp i (1 ˆp i ) and the weight w i = ˆp i (1 ˆp i ). β = 0 seems to be a good starting value Typically the algorithm converges, but it is never guaranteed. The weighted residual sum-of-squared is Pearson chi-square statistic n (y i ˆp i ) 2 ˆp i (1 ˆp i ), i=1 a quadratic approximation to the deviance. Hao Helen Zhang Lecture 5: and Logistic Regression 25 / 39

26 Statistical Inferences Using the weighted least squares formulation, we have Asymptotic likelihood theory says: if the model is correct, β is consistent Using central limit theorem, the distribution of β converges to N(β, (X T WX ) 1 ) Model building is costly due to iterations, popular shortcuts: For inclusion of a term, use Rao score test. For exclusion of a term, use Wald test. Neither of these two algorithms require iterative fitting, and are based on the maximum likelihood fit of the current model. Hao Helen Zhang Lecture 5: and Logistic Regression 26 / 39

27 R Code for Logistic Regression logist <- glm(formula, family, data, subset,...) predict(logist) Arguments: Output: formula: an object of class formula family: a description of the error distribution and link function to be used in the model. data: an optional data frame, list or environment containing the variables in the model. subset: an optional vector specifying a subset of observations to be used in the fitting process. returns an object of class inheriting from glm and lm Hao Helen Zhang Lecture 5: and Logistic Regression 27 / 39

28 R Code for Logistic Prediction logist <- glm(y~x, data, family=binomial(link="logit")) predict(logist, newdata) summary(logist) anova(logist) The function predict gives the predicted values, newdata is a data frame The function summary is used to obtain or print a summary of the results The function anova produces an analysis of variance table. Hao Helen Zhang Lecture 5: and Logistic Regression 28 / 39

29 Relationship between and Least Squares (LS) For two-class problems, both and least squares fit a linear boundary β 0 + β T x. Their solutions have the following relationship: The least square regression coefficient ˆβ is proportional to the direction, i.e., ˆβ ˆΣ 1 (ˆµ 1 ˆµ 0 ), where ˆΣ is the pooled sample covariance matrix, and ˆµ k is the sample mean of points from class k, k = 0, 1. In other words, their slope coefficients are identical, up to a scalar multiple. (Exercise 4.2 in textbook) The LS intercept ˆβ 0 is generally different from that of, unless n 1 = n 0. In general, they have different decision rules (unless n 1 = n 2 ). Hao Helen Zhang Lecture 5: and Logistic Regression 29 / 39

30 Common Feature of and Logistic Regression Models For both and logistic regression, the logit has a linear form log Pr(Y = 1 x) Pr(Y = 0 x) = β 0 + β T 1 x. Or equivalently, for both estimators, their posterior class probability can be expressed in the form of Pr(Y = 1 x) = exp (β 0 + β T 1 x) 1 + exp (β 0 + β T 1 x). They have exactly same forms. Are they same estimators? Hao Helen Zhang Lecture 5: and Logistic Regression 30 / 39

31 Major Differences of and Logistsics Regresion Main differences of two estimators include Where is the linear-logit from? What assumptions are made on the data distribution? (Difference on data distribution) How to estimate the linear coefficients? (Difference in parameter estimation) Any assumption on the marginal density of X? (flexibility of model) Hao Helen Zhang Lecture 5: and Logistic Regression 31 / 39

32 Where is Linear-logit from? For, the linear logit is due to the equal-covariance Gaussian assumption on data log Pr(Y = 1 x) Pr(Y = 0 x) = log π 1 π (µ 1 + µ 0 ) T Σ 1 (µ 1 + µ 0 ) +x T Σ 1 (µ 1 µ 0 ) = β 0 + β T 1 x For Logistic model, the linear logit is due to construction log Pr(Y = 1 x) Pr(Y = 0 x) = β 0 + β T 1 x Hao Helen Zhang Lecture 5: and Logistic Regression 32 / 39

33 Difference in Marginal Density Assumption The assumptions on Pr(X): The logistic model leaves the marginal density of X arbitrary and unspecified. The model assumes a Gaussian density Pr(X) = 1 π j φ(x; µ j, Σ) j=0 Conclusion: The logistic model makes less assumptions about the data, and hence is more general. Hao Helen Zhang Lecture 5: and Logistic Regression 33 / 39

34 Difference in Parameter Estimation Logistic regression Maximizing the conditional likelihood, the multinomial likelihood with probabilities Pr(Y = k X) The marginal density Pr(X) is totally ignored (fully nonparametric using the empirical distribution function which places 1/n at each observation) Maximizing the full log-likelihood based on the joint density Pr(X, Y = j) = φ(x; µ j, Σ)π j, Standard MLE theory leads to estimators ˆµ j, ˆΣ, ˆπ j Marginal density does play a role Hao Helen Zhang Lecture 5: and Logistic Regression 34 / 39

35 More Comments Linear Classification Methods is easier to compute than logistic regression. If the true f k (x) s are Gaussian, is better. Logistic regression may lose efficiency around 30% asymptotically in error rate (by Efron 1975) Robustness? uses all the points to estimate the covariance matrix; more information but not robust against outliers Logistic regression down-weights points far from decision boundary (recall the weight is p i (1 p i ); more robust and safer In practice, these two methods often give similar results (for approximately normal distributed data) Hao Helen Zhang Lecture 5: and Logistic Regression 35 / 39

36 Two-dimensional Linear Example Consider the following scenarios for a two-class problem: π 1 = π 0 = 0.5. Class 1: X N 2 ((1, 1) T, 4I) Class 2: X N 2 (( 1, 1) T, 4I) The Bayes boundary is We generate {x : x 1 x 2 = 0}. the training set of n = 200, 400 to fit the classifier the testing set of n = 2000 to evaluate its prediction performance. Hao Helen Zhang Lecture 5: and Logistic Regression 36 / 39

37 Training Scatter Plot x1 x Bayes and Linear Classifiers x1 x2 bay lin Logistic and Classifiers x1 x2 log lda All Linear Classifiers x1 x2 bay lin log lda Hao Helen Zhang Lecture 5: and Logistic Regression 37 / 39

38 Performance of Various Classifiers (n=200) Bayes Linear Logistic Train Error Test Error Fitted classification boundaries: Linear & : x 2 = x 1 Logistic: x 2 = x 1. Hao Helen Zhang Lecture 5: and Logistic Regression 38 / 39

39 Training Scatter Plot x1 x Bayes and Linear Classifiers x1 x2 bay lin Logistic and Classifiers x1 x2 log lda All Linear Classifiers x1 x2 bay lin log lda Hao Helen Zhang Lecture 5: and Logistic Regression 39 / 39

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Lecture 5: Classification

Lecture 5: Classification Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Logistic Regression and Generalized Linear Models

Logistic Regression and Generalized Linear Models Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

A Bahadur Representation of the Linear Support Vector Machine

A Bahadur Representation of the Linear Support Vector Machine A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

FSAN815/ELEG815: Foundations of Statistical Learning

FSAN815/ELEG815: Foundations of Statistical Learning FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Lecture 16: Modern Classification (I) - Separating Hyperplanes Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Linear Models for Classification

Linear Models for Classification Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome. Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),

More information

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1 Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap

More information

Bayesian Classification Methods

Bayesian Classification Methods Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014 Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information