Data Mining 2018 Logistic Regression Text Classification

Size: px
Start display at page:

Download "Data Mining 2018 Logistic Regression Text Classification"

Transcription

1 Data Mining 2018 Logistic Regression Text Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 50

2 Two types of approaches to classification In (probabilistic) classification we are interested in the conditional distribution P(Y x), so that, for example, when we observe X = x we can predict the class with the highest probability for that value of X. There are two basic approaches to modeling P(Y x): Generative Models (use Bayes rule): P(Y = j x) = P(x Y = j)p(y = j) P(x) Discriminative Models: model P(Y x) directly. P(x Y = j)p(y = j) = k i=1 P(x Y = i)p(y = i) Ad Feelders ( Universiteit Utrecht ) Data Mining 2 / 50

3 Generative Models Examples of generative classification methods: Naive Bayes classifier (discussed in the previous lecture) Linear/Quadratic Discriminant Analysis (not discussed)... Ad Feelders ( Universiteit Utrecht ) Data Mining 3 / 50

4 Discriminative Models Discriminative methods only model the conditional distribution of Y given X. The probability distribution of X itself is not modeled. For the binary classification problem: P(Y = 1 X ) = f (X, β) where f (X, β) is some deterministic function of X. Ad Feelders ( Universiteit Utrecht ) Data Mining 4 / 50

5 Discriminative Models Examples of discriminative classification methods: Linear probability model Logistic regression Feed-forward neural networks... Ad Feelders ( Universiteit Utrecht ) Data Mining 5 / 50

6 Discriminative Models: linear probability model Consider the linear regression model E[Y x] = β x Y {0, 1}, where But β x = m β j x j, with x 0 1. j=0 E[Y x] = 1 P(Y = 1 x) + 0 P(Y = 0 x) = P(Y = 1 x) So the model assumes that P(Y = 1 x) = β x Ad Feelders ( Universiteit Utrecht ) Data Mining 6 / 50

7 Linear response function P (Y = 1 x) 1 1 β x Ad Feelders ( Universiteit Utrecht ) Data Mining 7 / 50

8 Logistic regression Logistic response function E[Y x] = P(Y = 1 x) = eβ x 1 + e β x or (divide numerator and denominator by e β x ) P(Y = 1 x) = e β x = (1 + e β x ) 1 Ad Feelders ( Universiteit Utrecht ) Data Mining 8 / 50

9 Logistic Response Function P (Y = 1 x) β x Ad Feelders ( Universiteit Utrecht ) Data Mining 9 / 50

10 Linearization: the logit transformation { } P(Y = 1 x) ln 1 P(Y = 1 x) { } (1 + e β x ) 1 = ln 1 (1 + e β x ) 1 { } { } 1 1 = ln = ln (1 + e β x ) 1 e β x = ln e β x = β x In the second step, we divided the numerator and the denominator by (1 + e β x ) 1. The ratio is called the odds. P(Y = 1 x) P(Y = 1 x) = 1 P(Y = 1 x) P(Y = 0 x) Ad Feelders ( Universiteit Utrecht ) Data Mining 10 / 50

11 Linear Separation Assign to class 1 if P(Y = 1 x) > P(Y = 0 x), i.e. if P(Y = 1 x) P(Y = 0 x) > 1 This is true if ln { } P(Y = 1 x) > 0 P(Y = 0 x) So assign to class 1 if β x > 0, and to class 0 otherwise. Ad Feelders ( Universiteit Utrecht ) Data Mining 11 / 50

12 Decision Boundary x 2 Class 1 β 0 + β 1 x 1 + β 2 x 2 = 0 Class 0 x 1 Ad Feelders ( Universiteit Utrecht ) Data Mining 12 / 50

13 Maximum Likelihood Estimation Y = 1 if heads, Y = 0 if tails. p = P(Y = 1). One coin flip P(y) = p y (1 p) 1 y Note that P(1) = p, P(0) = 1 p as required. Sequence of n independent coin flips P(y 1, y 2,..., y n ) = n p y i (1 p) 1 y i which defines the likelihood function when viewed as a function of p. i=1 Ad Feelders ( Universiteit Utrecht ) Data Mining 13 / 50

14 Maximum Likelihood Estimation In a sequence of 10 coin flips we observe y = (1, 0, 1, 1, 0, 1, 1, 1, 1, 0). The corresponding likelihood function is P(y p) = p (1 p) p p (1 p) p p p p (1 p) = p 7 (1 p) 3 The corresponding log-likelihood function is ln P(y p) = ln(p 7 (1 p) 3 ) = 7 ln p + 3 ln(1 p) Ad Feelders ( Universiteit Utrecht ) Data Mining 14 / 50

15 Computing the maximum To determine the maximum we take the derivative and equate it to zero d ln P(y p) dp = 7 p 3 1 p = 0 which yields maximum likelihood estimate ˆp = 0.7. This is just the relative frequency of heads in the sample. Ad Feelders ( Universiteit Utrecht ) Data Mining 15 / 50

16 Log-likelihood function for y = (1, 0, 1, 1, 0, 1, 1, 1, 1, 0) loglikelihood Ad Feelders ( Universiteit Utrecht ) Data Mining mu 16 / 50

17 ML estimation for logistic regression Logistic regression is a bit like the coin tossing example, except that now the probability of success p i depends on x i : p i = P(Y = 1 x i ) = (1 + e β x i ) 1 1 p i = P(Y = 0 x i ) = (1 + e β x i ) 1 we can represent its probability distribution as follows P(y i ) = p y i i (1 p i ) 1 y i y i {0, 1}; i = 1,..., n Ad Feelders ( Universiteit Utrecht ) Data Mining 17 / 50

18 ML estimation for logistic regression Example i x i y i P(y i ) (1 + e β 0+8β 1 ) (1 + e β 0+12β 1 ) (1 + e β 0 15β 1 ) (1 + e β 0 10β 1 ) 1 Ad Feelders ( Universiteit Utrecht ) Data Mining 18 / 50

19 LR: likelihood function Since the y i observations are independent: P(y β) = Or, taking the natural log: ln P(y β) = ln = n P(y i ) = i=1 n i=1 n i=1 p y i i (1 p i ) 1 y i p y i i (1 p i ) 1 y i (1) n {y i ln p i + (1 y i ) ln(1 p i )} i=1 Ad Feelders ( Universiteit Utrecht ) Data Mining 19 / 50

20 LR: log-likelihood function For the logistic regression model we have so filling in gives l(β) = n i=1 p i = (1 + e β x i ) 1 1 p i = (1 + e β x i ) 1, { ( 1 y i ln 1 + e β x i Non-linear function of the parameters. ) ( + (1 y i ) ln e β x i No closed form solution (no nice formulas for the parameter estimates). )} Likelihood function globally concave so relatively easy optimization problem. Ad Feelders ( Universiteit Utrecht ) Data Mining 20 / 50

21 First-order conditions l(β) = n {y i ln p i + (1 y i ) ln(1 p i )} i=1 g(β j ) = l(β) β j = n i=1 Filling in (2) in equation (1) gives: y i p i p i β j + 1 y i 1 p i (1 p i) β j (1) p i β j = p i (1 p i )x ij (2) g(β j ) = n (y i p i )x ij i=1 Ad Feelders ( Universiteit Utrecht ) Data Mining 21 / 50

22 First-order conditions First order condition for maximum: g(β j ) = For the intercept β 0 we get n (y i p i )x ij = 0, j = 0,..., m. i=1 l(β) β 0 = n (y i p i ) = 0, i=1 since x i0 1. So in the optimal solution we have n y i = i=1 The expected number of cases with Y = 1 is equal to the observed number of cases with Y = 1. n i=1 p i Ad Feelders ( Universiteit Utrecht ) Data Mining 22 / 50

23 First-order conditions First order condition for maximum: Likewise, for β j we get g(β j ) = l(β) β j = So in the optimal solution we have Interpretation? n (y i p i )x ij = 0 i=1 n (y i p i )x ij = 0. i=1 n y i x ij = i=1 n p i x ij i=1 Ad Feelders ( Universiteit Utrecht ) Data Mining 23 / 50

24 Optimization by Gradient Ascent The gradient points in the direction of the steepest ascent of the function. We can perform optimization by the method of gradient ascent as follows. The new estimate of β j based on processing the i-th observation is: where β (t+1) j = β (t) j + η g(β j ) β=β (t)= β (t) j p (t) i = (1 + e β(t) x i ) 1 + η (y i p (t) i )x ij, is the current estimate of P(Y = 1 x i ), that is, using β (t). Ad Feelders ( Universiteit Utrecht ) Data Mining 24 / 50

25 Optimization by Gradient Ascent The basic gradient-ascent algorithm applied to logistic regression: 1 choose an initial value β (0) (e.g. at random); t 0 2 determine the gradient l(β) β j β=β (t) of l(β) at β (t) and update β (t+1) j = β (t) j + η n i=1 3 Repeat the previous step until (y i p (t) i )x ij, j = 0,..., m g(β j ) β=β (t)= 0 for all j = 0,..., m and check if a (local) maximum has been reached. η > 0 is the step size. Ad Feelders ( Universiteit Utrecht ) Data Mining 25 / 50

26 Fitted Response Function Substitute maximum likelihood estimates into the response function to obtain the fitted response function ˆP(Y = 1 x) = e ˆβ x 1 + e ˆβ x Ad Feelders ( Universiteit Utrecht ) Data Mining 26 / 50

27 Example: Programming Assignment Model the probability of successfully completing a programming assignment. Explanatory variable: programming experience. We find ˆβ 0 = and ˆβ 1 = , so ˆP(Y = 1 x) = 14 months of programming experience: e x 1 + e x ˆP(Y = 1 x = 14) = e (14) e (14) Ad Feelders ( Universiteit Utrecht ) Data Mining 27 / 50

28 Interpretation We have { } ˆP(Y = 1 x) ln = x, ˆP(Y = 0 x) so with every additional month of programming experience, the log odds increase with The odds are multiplied by e so with every additional month of programming experience, the odds increase with 17.5%. Note that the effect of an increase in x on the probability of success depends on the value of x: An increase from 14 to 24 months of programming experience leads to an increase of the probability of success from 0.31 to An increase from 34 to 44 months of programming experience leads to an increase of the probability of success from 0.92 to Ad Feelders ( Universiteit Utrecht ) Data Mining 28 / 50

29 Example: Programming Assignment month.exp success fitted month.exp success fitted Ad Feelders ( Universiteit Utrecht ) Data Mining 29 / 50

30 Allocation Rule Probability of the classes is equal when Solving for x we get x Allocation Rule: x 19: predict y = 1 x < 19: predict y = x = 0 If a person has 19 months or more programming experience, predict success, otherwise predict failure. Ad Feelders ( Universiteit Utrecht ) Data Mining 30 / 50

31 Programming Assignment: Confusion Matrix Cross table of observed and predicted class label: Row: observed, Column: predicted Error rate: 6/25= Default (predict majority class): 11/25=0.44 Ad Feelders ( Universiteit Utrecht ) Data Mining 31 / 50

32 How to in R > prog.logreg <- glm(success month.exp, data=prog.dat, family=binomial) > summary(prog.logreg) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) * month.exp * --- Number of Fisher Scoring iterations: 4 > table(prog.dat$success, as.numeric(prog.logreg$fitted > 0.5)) Ad Feelders ( Universiteit Utrecht ) Data Mining 32 / 50

33 Non-binary classes in logistic regression Recall the logistic regression model assumption for binary class variable Y {0, 1}: P(Y = 1 x) = exp(β x) 1 + exp(β x) from which it follows that P(Y = 0 x) = exp(β x) since P(Y = 1 x) + P(Y = 0 x) = 1 Ad Feelders ( Universiteit Utrecht ) Data Mining 33 / 50

34 Non-binary classes in logistic regression We can generalize this model to non-binary class variable Y {0, 1,..., K 1} (where K is the number of classes) as follows P(Y = k x) = exp(β k x) K 1 j=0 exp(β j x) where we now have a weight vector β k for each class. This is called the multinomial logit model or multi-class logistic regression. Ad Feelders ( Universiteit Utrecht ) Data Mining 34 / 50

35 Multi-class logistic regression We can arrive at this model in the following steps: 1 Assume that P(Y = k x) is a function of the linear combination β k x. 2 To ensure that the probabilities are non-negative, take the exponential exp(β k x). 3 To make sure the probabilities sum to 1, we divide exp(β k x) by K 1 j=0 exp(β j x): P(Y = k x) = exp(β k x) K 1 j=0 exp(β j x) Ad Feelders ( Universiteit Utrecht ) Data Mining 35 / 50

36 Identification Restriction Note that exp((β k + d) x) K 1 j=0 exp((β j + d) x) = exp(d x) exp(βk exp(d x) x) K 1 j=0 exp(β j x) so adding a vector d to each of the vectors β j, j = 0,..., K 1 would yield the same fitted probabilities. To identify the model, we put β 0 = 0. Verify that binary logistic regression is a special case of the multinomial logit model, with K = 2, since exp(β0 x) = exp(0) = 1. Ad Feelders ( Universiteit Utrecht ) Data Mining 36 / 50

37 Interpretation We have ln { } P(Y = k x) = (β k β l ) x, P(Y = l x) which allows us to interpret β kj β lj as follows: for a unit increase in x j, the log-odds of class k versus class l is expected to change by β kj β lj units, holding all the other variables constant. Since β 0 = 0, β kj is the effect of x j on the log-odds of class k relative to class 0: for a unit increase in x j, the log-odds of class k versus class 0 is expected to change by β kj units, holding all the other variables constant. Ad Feelders ( Universiteit Utrecht ) Data Mining 37 / 50

38 Regularization If we have a large number of predictors, even a linear model estimated with maximum likelihood can be prone to overfitting. This can be controlled by punishing large (positive or negative) weights. The coefficient estimates are shrunken towards zero. Add a penalty term for the size of the coefficients to the objective function. With LASSO penalty: E(β) = l(β) + λ m β j, where E(β) is the new error function that we want to minimize, and l(β) is the negative log-likelihood function. The value of λ is usually selected by cross-validation. j=1 Ad Feelders ( Universiteit Utrecht ) Data Mining 38 / 50

39 Application of Logistic Regression to Movie Reviews # logistic regression with lasso penalty > reviews.glmnet <- cv.glmnet(as.matrix(train.dtm),labels[index.train], family="binomial",type.measure="class") > plot(reviews.glmnet) > coef(reviews.glmnet,s="lambda.1se") 309 x 1 sparse Matrix of class "dgcmatrix" 1 bad beautiful best better boring excellent fun funny. minutes perfect poor script stupid supposed terrible wonderful worst Ad Feelders ( Universiteit Utrecht ) Data Mining 39 / 50

40 Cross-Validation on lambda Misclassification Error log(lambda) Ad Feelders ( Universiteit Utrecht ) Data Mining 40 / 50

41 Application of Logistic Regression to Movie Reviews # make predictions on the test set > reviews.logreg.pred <- predict(reviews.glmnet, newx=as.matrix(test.dtm),s="lambda.1se",type="class") # show confusion matrix > table(reviews.logreg.pred,labels[-index.train]) reviews.logreg.pred # compute accuracy: about 81% correct > ( )/9000 [1] Ad Feelders ( Universiteit Utrecht ) Data Mining 41 / 50

42 Using tf-idf weights Currently, we use term frequency as feature value. In information retrieval other term weighting schemes are popular. For example: tf-idf(i, j) = tf(i, j) idf(i), where tf-idf(i, j) is the number of times term i occurs in document j, and N doc idf(i) = log {d : w i d} There are many variations on this theme. Should work well in finding relevant documents to queries, but not obvious why it should work in document classification. Ad Feelders ( Universiteit Utrecht ) Data Mining 42 / 50

43 Logistic Regression: using tf-idf weights # construct training document term matrix with tf-idf weights > train2.dtm <- DocumentTermMatrix(reviews.all[index.train], control=list(weighting=weighttfidf)) # remove sparse terms > train2.dtm <- removesparseterms(train2.dtm,0.95) # perform logistic regression with lasso penalty > reviews2.glmnet <- cv.glmnet(as.matrix(train2.dtm),labels[index.train], family="binomial",type.measure="class") # compute tf-idf scores on test set: we use the document frequency # from the training set! > train3.dtm <- as.matrix(train.dtm) # convert term frequency counts to binary indicator > train3.dtm <- matrix(as.numeric(train3.dtm > 0),nrow=16000,ncol=308) Ad Feelders ( Universiteit Utrecht ) Data Mining 43 / 50

44 Logistic Regression: using tf-idf weights # sum the columns of the training set > train3.idf <- apply(train3.dtm,2,sum) # compute idf for each term (column) > train3.idf <- log2(16000/train3.idf) # term frequencies on the test set > test3.dtm <- as.matrix(test.dtm) # compute tf-idf weights on the test set > for(i in 1:308){test3.dtm[,i] <- test3.dtm[,i]*train3.idf[i]} Ad Feelders ( Universiteit Utrecht ) Data Mining 44 / 50

45 Logistic Regression: using tf-idf weights # make predictions on the test set using lambda=lambda.1se > reviews.logreg2.pred <- predict(reviews2.glmnet, newx=test3.dtm,s="lambda.1se",type="class") # show confusion matrix > table(reviews.logreg2.pred,labels[-index.train]) reviews.logreg2.pred # compute accuracy: still about 81% correct > ( )/9000 [1] Ad Feelders ( Universiteit Utrecht ) Data Mining 45 / 50

46 Including Bigrams The bigrams in are: the spy who loved me the spy spy who who loved loved me but not for example spy loved The two words need to be next to each other. Ad Feelders ( Universiteit Utrecht ) Data Mining 46 / 50

47 Including Bigrams # extract bigrams > train.dtm2 <- DocumentTermMatrix(reviews.all[index.train], control = list(tokenize = BigramTokenizer)) # more than one million bigrams! > dim(train.dtm2) [1] > train.dtm2 <- removesparseterms(train.dtm2,0.99) # after removing sparse bigrams only 152 left > dim(train.dtm2) [1] > train.dat1 <- as.matrix(train.dtm) > train.dat2 <- as.matrix(train.dtm2) # combine unigrams and bigrams > train.dat <- cbind(train.dat1,train.dat2) > dim(train.dat) [1] Ad Feelders ( Universiteit Utrecht ) Data Mining 47 / 50

48 Including Bigrams # fit regularized logistic regression model # use cross-validation to evaluate different lambda values > reviews3.glmnet <- cv.glmnet(train.dat,labels[index.train], family="binomial",type.measure="class") # show coefficient estimates for lambda-1se # (only a selection of the bigram coefficients is shown here) > coef(reviews3.glmnet,s="lambda.1se") bad movie black white cant believe character development dont want end movie even though felt like first time good film great movie highly recommend just plain make sense much better must see one best one worst read book really good worst movie Ad Feelders ( Universiteit Utrecht ) Data Mining 48 / 50

49 Cross-Validation on lambda Misclassification Error log(lambda) Ad Feelders ( Universiteit Utrecht ) Data Mining 49 / 50

50 Including Bigrams # create document term matrix for the test data, # using the training dictionary > test3.dtm <- DocumentTermMatrix(reviews.all[-index.train], list(dictionary=dimnames(train.dat)[[2]])) # convert to ordinary matrix > test3.dat <- as.matrix(test3.dtm) # get columns in the same order as on the training set > test3.dat <- test3.dat[,dimnames(train.dat)[[2]]] # make predictions using lambda.1se > reviews.logreg3.pred <- predict(reviews3.glmnet,newx=test3.dat, s="lambda.1se",type="class") > table(reviews.logreg3.pred,labels[-index.train]) reviews.logreg3.pred # accuracy does not improve > ( )/9000 [1] Ad Feelders ( Universiteit Utrecht ) Data Mining 50 / 50

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014 Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Data-analysis and Retrieval Ordinal Classification

Data-analysis and Retrieval Ordinal Classification Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Neural networks (not in book)

Neural networks (not in book) (not in book) Another approach to classification is neural networks. were developed in the 1980s as a way to model how learning occurs in the brain. There was therefore wide interest in neural networks

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Lecture 15: Logistic Regression

Lecture 15: Logistic Regression Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers and Logistic Regression Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers Combines all ideas we ve covered Conditional Independence Bayes Rule Statistical Estimation

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Behavioral Data Mining. Lecture 3 Naïve Bayes Classifier and Generalized Linear Models

Behavioral Data Mining. Lecture 3 Naïve Bayes Classifier and Generalized Linear Models Behavioral Data Mining Lecture 3 Naïve Bayes Classifier and Generalized Linear Models Outline Naïve Bayes Classifier Regularization in Linear Regression Generalized Linear Models Assignment Tips: Matrix

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

Linear classifiers: Logistic regression

Linear classifiers: Logistic regression Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 STAT 391 - Spring Quarter 2017 - Midterm 1 - April 27, 2017 Name: Student ID Number: Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 Directions. Read directions carefully and show all your

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Lecture #11: Classification & Logistic Regression

Lecture #11: Classification & Logistic Regression Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Uncertainty in prediction. Can we usually expect to get a perfect classifier, if we have enough training data?

Uncertainty in prediction. Can we usually expect to get a perfect classifier, if we have enough training data? Logistic regression Uncertainty in prediction Can we usually expect to get a perfect classifier, if we have enough training data? Uncertainty in prediction Can we usually expect to get a perfect classifier,

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Logistic regression for conditional probability estimation

Logistic regression for conditional probability estimation Logistic regression for conditional probability estimation Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta Course website: http://cseweb.ucsd.edu/classes/wi19/cse151-b/ Uncertainty in prediction

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 5: Text classification (Feb 5, 2019) David Bamman, UC Berkeley Data Classification A mapping h from input data x (drawn from instance space X) to a

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Pattern Recognition 2018 Support Vector Machines

Pattern Recognition 2018 Support Vector Machines Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Qualifier: CS 6375 Machine Learning Spring 2015

Qualifier: CS 6375 Machine Learning Spring 2015 Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Lecture 8: Classification

Lecture 8: Classification 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation

More information

Regression.

Regression. Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets

More information