Lecture 10: Logistic Regression
|
|
- Helen Gaines
- 5 years ago
- Views:
Transcription
1 BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1
2 Outline An Example Logistic Regression with One Predictor Multiple Logistic Regression Sparse Logistic Regression Tips for real applications One vs All for multi-class classification Imbalanced Data Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani. 2
3 An Example Predict if a individual will default based on annual income, monthly balance, and status (student?). An subset of the default data set, which contains three predictors with a binary response. Notice that, the response is not continuous, and it takes binary values (Yes or No). This is a classification problem. 3
4 Why Not Linear Regression I For linear regression, the outcome is usually continuous (there is an order in the possible values of the outcome). For classification problem, the outcome takes discrete/categorical values, where order is meaningless. An Example: Suppose you are trying to guess the identity of a superhero based on his/her special talents. He or she can be the Captain America, The Hulk, Black Widow, or Thor. How can we encode the outcome? YY = Captain America The Hulk Black Widow Thor YY = 1 The Hulk 2 Black Widow 3 Thor 4 Captain America Which one is more reasonable? 4
5 Why Not Linear Regression II If we apply linear regression to the classification problems, different encodings of the outcome would produce fundamentally different linear models leading to different predictions for the same test observations. We do not have general principles to convert discrete outcomes with more than two possible levels to continuous ones, which are suitable for linear regression. A good news is that, if the outcome is binary, we can code it simply as YY = 0 1 Captain America Thor 5
6 Logistic Regression with Binary Outcome Consider the example in which the outcome YY is binary: default or not default. As YY is binary, we can code it by YY = 1 1 if default = no If default = yes Notational convenient for later use YY = 0 1 if default = no If default = yes Different from linear regression, logistic regression models the probability of YY = 1 and YY = 1. For example, if balance is given, logistic regression models PP default = yes balance PP default = no balance = 1 PP default = yes balance or 6
7 Simple Logistic Regression For simplicity, we only consider one predictor: balance. As PP default = yes balance [0,1], we make use of the logistic function to model the probability. a simplified form of PP YY = 1 XX PP XX = eeββ 0+ββ 1 XX 1 + ee ββ 0+ββ 1 XX = ee ββ 0+ββ 1 XX the predictor An illustration of the logistic function. The figure is from 7
8 Odds and Log-odds The so-called odds is defined by the probability that the outcome is Yes the probability that the outcome is No PP XX 1 PP XX = eeββ 0+ββ 1 XX Odds is popular in gambling, e.g., horse-racing. The log-odds is defined by the log of the odds log PP XX 1 PP XX = ββ 0 + ββ 1 XX Log-odds is linear regarding to the predictor, although PP XX is not. 8
9 Estimating the Regression Coefficients We estimate the regression coefficients by maximum likelihood. Intuitively, the idea of maximum likelihood is to find a set of coefficients such that the probability of the observations is maximized (under the assumption that the observations are drawn independently). For our example, maximum likelihood means that we try to find ββ 0 and ββ 1 such that PP XX is close to one for all individuals who default and PP XX is close to zero for all individuals who do not default. 9
10 Maximum Likelihood Estimate I Given a set of observations xx ii, yy ii : ii = 1,, nn, the likelihood function is l ββ 0, ββ 1 = pp xx ii 1 pp xx jj ii:yy ii =1 jj:yy jj = 1 = ii:yy ii =1 = ii:yy ii = ee ββ 0+ββ 1 xx ii jj:yy jj = ee ββ 0+ββ 1 xx ii jj:yy jj = ee ββ 0+ββ 1 xx jj ee ββ 0+ββ 1 xx ii = ee yy ii ββ 0 +ββ 1 xx ii Do NOT confuse with the probability function ii 10
11 Maximum Likelihood Estimate II Maximum likelihood aims to maximize the likelihood function. max l ββ 0, ββ 1 = ββ 0,ββ 1 ii The maximization problem is equivalent to a minimization problem (we first convert the product to summation by Logarithm) ee yy ii ββ 0 +ββ 1 xx ii min log l ββ 0, ββ 1 = ββ 0,ββ 1 ii log 1 + ee yy ii ββ 0 +ββ 1 xx ii 11
12 Gradient Descent Let us denote ββ 0, ββ 1 = argmin ββ 0,ββ 1 log l ββ 0, ββ 1 = There is no closed form solution of ββ 0, ββ 1. One of the simplest method is to iteratively decrease the objective function value LL ββ 0, ββ 1 = log l ββ 0, ββ 1 by gradient descent: ββ 0 kk+1 = ββ 0 kk ηη ββ0 LL ββ 0 kk, ββ 1 kk ββ 1 kk+1 = ββ 1 kk ηη ββ1 LL ββ 0 kk, ββ 1 kk For properly chosen learning rate ηη (e.g., back-line tracking), the sequence LL ββ kk kk 0, ββ 1, kk = 1,2, will converge to LL ββ 0, ββ 1, and the following holds LL ββ 0 kk+1, ββ 1 kk+1 ii LL ββ 0 kk, ββ 1 kk log 1 + ee yy ii ββ 0 +ββ 1 xx ii ηη is the so-called learning rate 12
13 Batch Training We write the updates of ββ 0, ββ 1 explicitly by ββ kk+1 0 = ββ kk yy ii ee yy kk kk ii ββ 0 +ββ1 xxii 0 ηη ii 1 + ee yy kk kk ii ββ 0 +ββ1 xxii ββ kk+1 1 = ββ kk yy ii xx ii ee yy kk kk ii ββ 0 +ββ1 xxii 1 ηη ii 1 + ee yy kk kk ii ββ 0 +ββ1 xxii The above updates need to pass through all of the training data. Thus, it is called batch update. If the training data has millions or billions (very common in real applications) of data samples, the updates can be very time-consuming. 13
14 Online Training Different from batch training (that involves all the data samples in each iteration), online training update the parameters by visiting one data sample in each iteration. Let ff ii ββ 0, ββ 1 = log 1 + ee yy ii ββ 0 +ββ 1 xx ii. Then, ββ 0, ββ 1 = argmin ββ 0,ββ 1 log l ββ 0, ββ 1 = For online training, in each iteration, the updates are ii ff ii ββ 0, ββ 1 Stochastic gradient descent ββ 0 kk+1 = ββ 0 kk ηη ββ0 ff kk ββ 0 kk, ββ 1 kk ββ 1 kk+1 = ββ 1 kk ηη ββ1 ff kk ββ 0 kk, ββ 1 kk = ββ kk 0 ηη yy kkee yy kk kk kk ββ 0 +ββ1 xxkk 1 + ee yy kk kk kk ββ 0 +ββ1 xxkk = ββ kk 1 ηη yy kkxx kk ee yy kk kk kk ββ 0 +ββ1 xxkk 1 + ee yy kk ββ 0 kk +ββ1 kk xxkk 14
15 Batch Training vs Online Training I Batch training (gradient descent). Figure is from 15
16 Batch Training vs Online Training II Online training (stochastic gradient descent). Figure is from 16
17 Batch Training vs Online Training III Online training (improved stochastic gradient descent). Figure is from 17
18 Multiple Logistic Regression When there is more than one predictors, logistic regression model the outcome by PP XX = eeββ 0+ββ 1 XX 1 + ββ pp XX pp 1 + ee ββ 0+ββ 1 XX 1 + ββ pp XX pp = The likelihood function is l ββ 0, ββ 1,, ββ pp The maximum likelihood estimates of the parameters are ee ββ 0+ββ 1 XX 1 + ββ pp XX pp 1 = ii 1 + ee yy 1 pp ii ββ 0 +ββ 1 xx ii + ββpp xx ii ββ 0, ββ 1,, ββ pp = argmin log l ββ 0, ββ 1,, ββ pp = ββ 0,ββ 1,,ββ pp ii log 1 + ee yy ii ββ 0 +ββ 1 xx ii 1 + ββpp xx ii pp 18
19 Logistic Regression on the Default Data Std. error is used to compute confidence intervals. That is, there is approximately 95% chance that the interval ββ ii 2SE ββ 1, ββ ii + 2SE ββ 1 contain the true value of ββ ii. Large absolute value of Z-statistic or small value of P-value indicates strong evidence that the corresponding predictor is associated with the outcome. An visualization of the Default data set. The annual incomes and monthly credit card balances of a number of individuals. The individuals who defaulted on their credit card payments are shown in orange, and those who did not are shown in blue. 19
20 Motivation of Sparse Models Revisit of the default data set There is no strong association between the predictor income and the outcome default. The single predictor balance may lead to better generalization performance (on testing data). In many real word applications (especially biomedical studies), the number of features/predictors (genes, SNPs, etc.) is much larger than the number of samples (e.g., patients). Many studies indicate that only a very few of the features contribute to the outcome (symptoms of diseases, etc.). 20
21 Sparse Logistic Regression Introduction of sparsity to the probability PP XX = ee ββ 0+ββ 1 XX 1 +ββ 2 XX 2 + ββ pp XX pp Sparsity: many coefficients are zero, i.e., they have no contributions to the probability We can enforce sparsity by the l 1 regularizer. ββ 0, ββ 1,, ββ pp pp = argmin ββ 0,ββ 1,,ββ pp log l ββ 0, ββ 1,, ββ pp + λλ jj=1 ββ jj Usually, we do not penalize the intercept ββ 0. = argmin ββ 0,ββ 1,,ββ pp log 1 + ee yy 1 pp ii ββ 0 +ββ 1 xx ii + ββpp xx ii ii pp + λλ jj=1 ββ jj 21
22 Projected Gradient Descent Another popular formulation of sparse logistic regression is min ββ 0,ββ 1,,ββ pp s.t. Framework of projected gradient descent in the kk th iteration log 1 + ee yy 1 pp ii ββ 0 +ββ 1 xx ii + ββpp xx ii ii pp jj=1 ββ jj rr Update ββ 0 kk, ββ 1 kk,, ββ pp kk ββ 0 kk 1, ββ 1 kk 1,, ββ pp kk 1 Project ββ 1 kk, ββ 2 kk,, ββ pp kk by gradient descent from to the set SS = pp ββ 1, ββ 2,, ββ pp : jj=1 ββ jj rr 22
23 Logistic Regression for Multi-Class Problems One vs All YY = 1 YY = 1 PP 1 XX Class 1 YY = 1 YY = 1 PP 2 XX Given a new instance xx, we set the class label of xx to kk such that PP kk xx PP ii xx for all ii. Class 2 Class 3 YY = 1 YY = 1 PP 3 XX 23
24 Tip for Unbalanced Data: Undersampling In many (probably most of them) real applications, the data are unbalanced. In other words, the number of positive samples is much larger or smaller than the number of negative samples, which may cause serious problem in training the modeling. The common used technique to handle the unbalanced data is undersampling, i.e., we undersample the larger class of data (the one that has more data samples) such that the two classes have roughly the same number of data samples. 24
25 A Synthetic Data Why synthetic data? We know the ground truth. Positive samples: ~NN 5, 2.5. Negative samples: ~NN 5,
26 Imbalanced Data Fitted Models on imbalanced data. 26
27 Undersampling and Ensembling We undersample the data such that the number of the data samples of two classes are roughly the same. Fitted models on the undersampled data. The fifth figure plots the model obtained by averaing the parameters. The last one plots the model by majority voting. 27
28 Resources Books An Introduction to Statistical Learning with applications in R, Section 4.3. Bayesian Reasoning and Machine Learning, Section Software SLEP: Sparse Learning with Efficient Projections. 28
Lecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationStatistical Learning with the Lasso, spring The Lasso
Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationSTAT 5500/6500 Conditional Logistic Regression for Matched Pairs
STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLecture 11. Kernel Methods
Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product
More informationSupport Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationLecture 3. Linear Regression
Lecture 3. Linear Regression COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Weeks 2 to 8 inclusive Lecturer: Andrey Kan MS: Moscow, PhD:
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationFundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015
Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 2: Vector Data: Prediction Instructor: Yizhou Sun yzsun@cs.ucla.edu October 8, 2018 TA Office Hour Time Change Junheng Hao: Tuesday 1-3pm Yunsheng Bai: Thursday 1-3pm
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationOrdinary Least Squares Linear Regression
Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics
More informationLeast Mean Squares Regression. Machine Learning Fall 2018
Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationCS260: Machine Learning Algorithms
CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-
More informationMachine Learning Lecture 6 Note
Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,
More informationMath, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD
Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline These preliminaries serve to signal to students what tools they need to know to succeed in ECON 360 and refresh their
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationStatistical Machine Learning
Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationDEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.
More informationConfidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s
Chapter 866 Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable and one or
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationConfidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s
Chapter 867 Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable
More informationIntroduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationBuilding a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationLecture 4 Logistic Regression
Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationApplied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson
Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers
More informationDefinition: A sequence is a function from a subset of the integers (usually either the set
Math 3336 Section 2.4 Sequences and Summations Sequences Geometric Progression Arithmetic Progression Recurrence Relation Fibonacci Sequence Summations Definition: A sequence is a function from a subset
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationBANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1
BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More information