Rapid Introduction to Machine Learning/ Deep Learning
|
|
- Molly King
- 5 years ago
- Views:
Transcription
1 Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/62
2 Lecture 1b Logistic regression & neural network October 2, /62
3 Table of contents 1 1 Bird s-eye view of Lecture 1b 1.1 Objectives 1.2 Quick Summary 2 2. GLM: Generalized linear model 2.1 Exponential family of distributions 2.2 Generalized linear model(glm) 2.3 Parameter estimation 3 3. XOR problem and neural network with hidden layer 4 4. Universal approximation 4.1 Further construction 4.2 Universal approximation theorem 4.3 Deep vs Shallow learning 3/62
4 1.1 Objectives 1 Bird s-eye view of Lecture 1b 1.1 Objectives Objective1 Understand logistic regression (binary classification) and its multiclass generalization (softmax regression) Objective2 Recast logistic and softmax regression in a neural network (perceptron) formalism 4/62
5 1.1 Objectives Objective3 Learn the limitations of the perceptron by looking at the XOR problem Learn how to fix it by adding a hidden layer Objective4 Introduce the Universal Approximation Learn about the clash of Deep vs Shallow paradigms in machine learning 5/62
6 1.2 Quick Summary 1.2 Quick Summary Logistic regression Data D = {(x (t), y (t) )} N t=1 input x (t) R d label y (t) {0, 1} Given x = (x 1,, x d ) R d, logistic regression outputs the probability of the output label y being equal to 1 by P[y = 1 x] = sigm(b + w 1 x w d x d ), where sigm(t) = et 1 + e t. 6/62
7 1.2 Quick Summary Thus P[y = 1 x] = P[y = 0 x] = e b+ j w j x j 1 + e b+ j w j x j e b+ j w j x j Decision Given x, decide the output label is ŷ, where { ŷ = 1 if b + j w jx j 0 ŷ = 0 if b + j w jx j < 0 [Thus the decision boundary is the hyperplane b + j w jx j = 0 in R d ] 7/62
8 1.2 Quick Summary Neural network formulation Figure: Neural Network 8/62
9 1.2 Quick Summary z: input to the output neuron. z = b 1 + w 1 x b 1 + w d x d h: output of the output neuron h = sigm(z) = sigm(b 1 + w 1 x b 1 + w d x d ) 9/62
10 1.2 Quick Summary Symmetric (redundant) form of logistic regression The probabilities P[y = 1 x] and P[y = 0 x] have different form in logistic regression. We can put them in symmetric form by rewriting them in the following (redundant) form: exp (b 1 + ) j w 1jx j P[y = 1 x] = exp (b 1 + ) j w 1jx j + exp (b 2 + ) j w 2jx j exp (b 2 + ) j w 2jx j P[y = 0 x] = exp (b 1 + ) j w 1jx j + exp (b 2 + ) j w 2jx j 10/62
11 1.2 Quick Summary Decision Given x, decide the output label is ŷ, where { ŷ = 1 if b1 + j w 1jx j b 2 + j w 2jx j ŷ = 0 if b 1 + j w 1jx j < b 2 + j w 2jx j The decision boundary is the hyperplane b 1 + j w 1j x j = b 2 + j w 2j x j in R d 11/62
12 1.2 Quick Summary Neural network formulation Figure: Neural Network 12/62
13 1.2 Quick Summary z i : input to the ith neuron in the output layer. z i = b i + j w ij x j, i = 1, 2 h i : output of the ith neuron in the output layer. h i = e z i e z 1 + e z 2, i = 1, 2 13/62
14 1.2 Quick Summary Softmax regression: multiclass classification There are K output labels, i.e., y {1,, K} Probability P[y = i x] = exp (b i + ) j w ijx j exp (b 1 + ) j w 1jx j + + exp (b K + ) j w Kjx j for i = 1,, K. Decision Given x, decide the output label is ŷ, where ŷ = argmaxp[y = i x] i 14/62
15 1.2 Quick Summary Decision boundary Figure: example of decision boundary Decision regions are partitioned by linear hyperplanes in R d 15/62
16 1.2 Quick Summary Neural network formalism Figure: Neural network 16/62
17 1.2 Quick Summary z i : input to the ith neuron in the output layer. z i = b i + j w ij x j, i = 1,, K h i : output of the ith neuron in the output layer. h i = ez i k ez k In vector notation, we write: = P[y = i x], i = 1,, K (h 1,, h K ) = softmax(z 1,, z K ), or h = softmax(z) 17/62
18 1.2 Quick Summary XOR problem Given a data set D consisting of 4 points in R 2 in 2 classes as shown in the following: Figure: XOR Note that there is no line that separates there two classes 18/62
19 1.2 Quick Summary But if we add one more (hidden) layer to the neural network, then this network can separate the two classes Figure: hidden layer 19/62
20 1.2 Quick Summary Cybenko-Hornik-Funabashi Theorem Let Σ = [0, 1] d : d-dimensional hypercube. Then the sum of the form f (x) = d c i sigm(b i + w ij x j ) i j=1 can approximate any continuous function on Σ to any degree of accuracy 20/62
21 1.2 Quick Summary Universal Approximation This theorem implies that the neural network with one hidden layer is good enough to do any classification job with small error. at least in principle. In fact, Lecture 2 should be viewed in this spirit Then, why deep learning? 21/62
22 2.1 Exponential family of distributions 2. GLM: Generalized linear model 2.1 Exponential family of distributions Exponential family of distributions An exponential family of distributions in canonical form is a probability distribution of the form: ( ) P θ (y) = 1 h(y) exp θ i T i (y), Z(θ) where y = (y 1,, y K ) R K, θ = (θ 1,, θ m ) R m, T : R m R K i 22/62
23 2.1 Exponential family of distributions Rewrite it in the form [ ] P θ (y) = exp θ i T i (y) A(θ) + C(y), i where A(θ) = log Z(θ) : log partition (cumulant) function C(y) = log h(y) [Remark: Here, we assume the dispersion parameter is 1] 23/62
24 2.1 Exponential family of distributions Bernoulli distribution Random variable Y with value y {0, 1}. Let Then p = P[y = 1] P(y) = p y (1 p) 1 y = exp In exponential family form: T (y) = y θ = log [ ] p y log + log(1 p) 1 p p 1 p = logit(p) p = sigm(θ) = eθ 1 + e θ 24/62
25 2.1 Exponential family of distributions Multivariate Bernoulli (Multinoulli) distribution Random variable Y with value y {0,, K}. Let p i = P[y = i] Define y i = I(y = i) {0, 1}. Thus y y K = 1, and we have P(y) = p y 1 1 py K K = p y 1 1 py K 1 [ K 1 = exp i=1 y i log p i p K + log p K K 1 K 1 p1 i=1 y i k [Note: when K = 2, this is exactly the Bernoulli distribution.] ] 25/62
26 2.1 Exponential family of distributions In the exponential family form: T i (y) = y i θ i = log p i p K, i = 1,, K 1 26/62
27 2.1 Exponential family of distributions Solving for p i, we get generalized sigmoid (softmax) function p i = p K = e θ i 1 + K 1 k=1 eθ k K 1 k=1 eθ k The generalized logit function θ i = log = P[y = i], i = 1,, K 1 = P[y = K] p i 1 K 1 k=1 p, i = 1,, K 1 k The above expressions show how p 1,, p K 1 and θ 1,, θ K 1 are related; p K is gotten by setting p K = 1 (p p K 1 ) 27/62
28 2.2 Generalized linear model(glm) 2.2 Generalized linear model(glm) GLM GLM mechanism is a way to relate the input vector x = (x 1,, x d ) to the parameters θ i of GLM by setting θ i = b i + d w ij x j, where b i and w ij are the GLM parameters to be determined by the data Thus get p i = j=1 ( exp b i + ) d j=1 w ijx j 1 + ( K 1 k=1 exp b i + ), d j=1 w kjx j 28/62
29 2.2 Generalized linear model(glm) i.e., p i = P[y = i x] for i = 1,, K 1, and p K = P[y = k x] = ( K 1 k=1 exp b i + ) d j=1 w kjx j 29/62
30 2.2 Generalized linear model(glm) Note: when K = 2, it is the logistic regression such that P[y = 1 x] = p 1 = P[y = 0 x] = p 2 = exp (b + ) j w jx j 1 + exp (b + ) j w jx j exp (b + ). j w jx j Here, we set b = b 1, w j = w 1j 30/62
31 2.2 Generalized linear model(glm) Symmetric (redundant) form The expression for p K is different from those for p i. To put p 1,, p K in symmetric form, multiply d exp a + α j x j on the numerator and the denominator of p i and p K. Then p i = ( exp a + b i + ) d j=1 (w ij + α j )x j ( exp a + ) d j=1 α jx j + ( K 1 k=1 exp a + b k + ), d j=1 (w kj + α j )x j i = 1,, K 1 j=1 31/62
32 2.2 Generalized linear model(glm) and p K = ( exp a + ) d j=1 α jx j ( exp a + ) d j=1 α jx j + ( K 1 k=1 exp a + b k + ) d j=1 (w kj + α j )x j Set b i b i + a w ij w ij + α j, j = 1,, d, for i = 1,, K 1 and set b K = a w Kj = α j, j = 1,, d 32/62
33 2.2 Generalized linear model(glm) Then we have ( exp b i + ) d j=1 w ijx j p i = K k=1 (b exp k + ) = P[y = 1 x], d j=1 w kjx j i = 1,, K. In vector notation p = (p 1,, p K ) = softmax(z 1,, z K ) = softmax(z), where d z i = exp b i + w ij x j, i = 1,, K j=1 33/62
34 2.2 Generalized linear model(glm) Neural network formalism Figure: Neural network 34/62
35 2.2 Generalized linear model(glm) z i : input to the ith neuron in the output layer z i = b i + j w ij x j, i = 1,, K h i : output of the ith neuron in the output layer h i = ez i k ez k In vector notation, we write: = P[y = i x], i = 1,, K (h 1,, h K ) = softmax(z 1,, z K ), or h = softmax(z) 35/62
36 2.3 Parameter estimation 2.3 Parameter estimation Determining W and b MLE So far the parameters K 1 vector b = [b 1,, b K ] T and K d matrix W = [w ij ] are regarded as given But need to determine b and W using the given data Use MLE (maximum likelihood estimation) Data D = {(x (t), y (t) )} N t=1 Probability P(y x) = p y 1 1 py K K, 36/62
37 2.3 Parameter estimation where p i = Likelihood function Log likelihood function ( exp b i + ) d j=1 w ijx j K k=1 (b exp k + ) d j=1 w kjx j L(W, b) = N P[y (t) x (t) ] t=1 l(w, b) = log L(W, b) = N log P[y (t) x (t) ] t=1 37/62
38 2.3 Parameter estimation Recall Thus P(y x) = p y 1 1 py K K log P[y x] = y 1 log p y K log p K K = I(y = k) log p k = = k=1 K I(y = k) log P[y = k x] k=1 K e z k I(y = k) log K, i=1 ez i k=1 38/62
39 2.3 Parameter estimation where z i = b i + d j=1 w ijx j Rewrite the log likelihood function: l(w, b) = = = N log P[y (t) x (t) ] t=1 N t=1 k=1 N t=1 k=1 K I(y (t) = k) log P[y (t) = k x (t) ] K I(y (t) = k) log e z(t) k K i=1 ez(t) i, 39/62
40 2.3 Parameter estimation where z (t) i = b i + d j=1 w ijx (t) j MLE is to find W and b that maximizes l(w, b) [Note: for softmax regression it turns out that l(w, b) is a concave (for generic data sets, strictly concave) function of W and b.] 40/62
41 2.3 Parameter estimation Neural network formalism Recall Figure: Neural network 41/62
42 2.3 Parameter estimation For each input x (t) z (t) i : input to the ith neuron in the output layer. z (t) i = b i + j w ij x (t) j, i = 1,, K h (t) i : output of the ith neuron in the output layer. (t) h (t) i = ez i k ez(t) k, i = 1,, K 42/62
43 2.3 Parameter estimation For neural networks, the error function is set to be l(w, b) and the training is to minimize this error. [Note: This neural network training is exactly the same as the MLE estimation in softmax regression] Training (learning) of neural network in case of single layer (no hidden layer) neural network Training (learning) is a convex optimization optimization problem; so it is a relatively easy problem Three kinds of training (learning) strategies Full-batch learning: train using all data in D at once Mini-batch learning: train using a small portion of D successively, and cycle through them On-line learning: train using one data point at a time and cycle through them 43/62
44 3. XOR problem and neural network with hidden layer XOR Problem Separate X s from O s 44/62
45 XOR(x 1, x 2 ) = x 1 x 2 + x 1 x 2 x 1 x 2 : 45/62
46 z 1 = a(x 1 x ), a : large h 1 = sigm(z 1 ) 46/62
47 47/62
48 x 1 x 2 : 48/62
49 z 2 = a( x 1 + x ), a : large h 2 = sigm(z 2 ) 49/62
50 50/62
51 z 3 = b(h 1 + h ), b : large h 2 = sigm(z 3 ) 51/62
52 This neural network achieves the separation 52/62
53 4.1 Further construction 4. Universal approximation 4.1 Further construction Further construction The NN constructed above has values 53/62
54 4.1 Further construction Can also construct another NN 54/62
55 4.1 Further construction 55/62
56 4.1 Further construction The region where h 1 h 2 h 3 h 4 = 0 is The neural network 56/62
57 4.1 Further construction One can easily find a hyperplane in R 4 that separates (0, 0, 0, 0) from the rest; and this hyperplane define h 5, which defines a function with value 0 in the center and 1 in the rest 57/62
58 4.1 Further construction Continuing this way, one can construct any approximate bump function as an output of a neural network with one hidden Combining these bump functions, one can approximate any continuous function Namely, a neural network with one hidden layer can do any task, at least in principle 58/62
59 4.2 Universal approximation theorem 4.2 Universal approximation theorem Universal approximation theorem This heuristic argument can be made rigorous using Stone-Weienstrass theorem-type argument to get Cybenko-Hornik-Funabashi Theorem Cybenko-Hornik-Funabashi Theorem Let Σ = [0, 1] d : d-dimensional hypercube. Then the sum of the form f (x) = d c i sigm(b i + w ij x j ) i j=1 can approximate any continuous function on Σ to any degree of accuracy 59/62
60 4.2 Universal approximation theorem Universal approximation theorem There are many similar results to this effect 60/62
61 4.3 Deep vs Shallow learning 4.2 Deep vs Shallow learning Deep vs Shallow learning This theorem says that at least in principle one can do any classification with a neural network with one hidden layer Deep learning utilizes neural network with many hidden layers, typically up to 40 or more layers. Question: If universal Approximation Theorem says one can do the job with only one hidden layer, why does one use so many hidden layers? What is advantage in doing so? This is one big question we like to address to for the rest of this lecture series. 61/62
62 4.3 Deep vs Shallow learning Deep vs Shallow learning To achieve high accuracy, the number of terms has to be huge and the training (learning) is a big problem: typical problem of shallow networks (shallow learning) In contrast, deep NN arranges neurons in depth for more efficiency and better training, but training is a very subtle issue, [which will be dealt with later in this lecture series] 62/62
Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/59 Lecture 4a Feedforward neural network October 30, 2015 2/59 Table of contents 1 1. Objectives of Lecture
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationLearning Deep Architectures for AI. Part I - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 04: Logistic Regression 1 Outline Announcements Baseline Learning Machine Learning Pyramid Regression or Classification (that s it!) History of Classification History of
More informationGradient-Based Learning. Sargur N. Srihari
Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationLearning from Data: Multi-layer Perceptrons
Learning from Data: Multi-layer Perceptrons Amos Storkey, School of Informatics University of Edinburgh Semester, 24 LfD 24 Layered Neural Networks Background Single Neurons Relationship to logistic regression.
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationDeep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang
Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationLogistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017
1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationLearning Neural Networks
Learning Neural Networks Neural Networks can represent complex decision boundaries Variable size. Any boolean function can be represented. Hidden units can be interpreted as new features Deterministic
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationMachine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Neural Networks Le Song Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 5 CB Learning highly non-linear functions f:
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationCS60010: Deep Learning
CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationLecture 2: Logistic Regression and Neural Networks
1/23 Lecture 2: and Neural Networks Pedro Savarese TTI 2018 2/23 Table of Contents 1 2 3 4 3/23 Naive Bayes Learn p(x, y) = p(y)p(x y) Training: Maximum Likelihood Estimation Issues? Why learn p(x, y)
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationLearning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors
Linear Classifiers CS 88 Artificial Intelligence Perceptrons and Logistic Regression Pieter Abbeel & Dan Klein University of California, Berkeley Feature Vectors Some (Simplified) Biology Very loose inspiration
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationApplied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson
Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationGeneralized Linear Models
Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationMachine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang
Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationModern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K
Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationDeep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More information