Instructor: Dr. Benjamin Thompson Lecture 8: 3 February 2009

Size: px
Start display at page:

Download "Instructor: Dr. Benjamin Thompson Lecture 8: 3 February 2009"

Transcription

1 Instructor: Dr. Benjamin Thompson Lecture 8: 3 February 2009

2 Announcement Homework 3 due one week from today.

3 Not so long ago in a classroom very very closeby Unconstrained Optimization The Method of Steepest Descent Newton s Method The Gauss-Newton Method Really? Only four topics? I must be slacking

4 Episode VIII: A New Trope The Wiener Filter The LMS Algorithm Moving On Matlab Demo: Linear Regression Matlab Demo: The Wiener Filter Matlab Demo: LMS Data Sets The Iris Classification Problem The Sumo-Basketball Player-Jockeys Problem Stock Market Data

5 It involves bell-shaped figs.

6 Motivation The motivation for this approach is to address the complexity issues, without taking too big of a hit on convergence Outline for derivation: We define a new error function as the sum of squared errors accrued up to time n We then linearize the dependence of these error terms on the weight vector (to make the math easier) Given this relationship between the error terms and the weight vector, we may find the weight vector wthat minimizes the error function defined in the beginning

7 Let s Get Started Define our (new) cost function as the sum of the squares of all the approximation errors from the first iteration up until the current iteration: n 1 2 For the least-squares problem, the error E( w) = e ( i T ) term is: e( i) = d( i) y( i) = d( i) w x( i) 2 We linearize each of these error terms, which are functions of the weight vector w, (whatever they may be) by a firstorder Taylor series expansion: (, w) ( ) ( ) e i i= 1 ( n) ( w w( )) ' e i = e i + n w w = w T

8 An Illustration (, ) e i w 1 (, ( )) e i w i The slope of this line is the derivative term: this line is the linearized function described on the previous slide! w( i) w 1

9 Still Going We may vectorize this over all e (n,w) (that is, the e (i,w) for each i from [1..n]) to get: ' e ( n, w) = e( n) + J( n) ( w w( n) ) where e(n) is the error vectorof the accumulated errors, and J(n) is the Jacobianof the error vector, given as: J ( ) ( 1) ( 1) ( 1) e e e w1 w2 w e e e n = w1 w2 w e n e n ( 2) ( 2) ( 2) ( ) ( ) e( n) w1 w2 w m w = w m m ( n) Note that this is n-by-m: the number of observations so far by the size of the parameter vector we are trying to optimize!

10 Still Going The Jacobian may also be expressed as the transpose of the gradient matrix e( n) = e( 1) e( 2) e( n) Now we may find the weight vector wthat would minimize the errors of all the previous stimuli and current input, which gives us the nextvalue of w: n 1 1 w n 1 argmin e i, w argmin e n, w ' ' ( + ) = ( ) = w 2 i= 1 w 2 Evaluating this norm gives us: 1 ' ( ) 2 1 e n, w = e( n) + J( n) ( w w( n) ) Which expands to: 2 ( ) ( ) 1 2 T 1 e( n) + e ( n) J( n) w w( n) + w w n J n J n w w n 2 2 T T ( ) ( ( )) ( ) ( )( ( )) 2 Each of these elements is a column vector!

11 Still Going As usual, since we want to find the value of wthat maximizes this, we take the derivative and set equal to zero: ( ) T ( n) ( n) ( n) ( n) ( n) T J e + J J w w = 0 And now solve for the wthat minimizes this: T ( 1) ( ) ( ) ( ) 1 T ( ) ( ) ( ) w= w n+ = w n J n J n J n e n So the next value of wis just the previous value of w minus the error times some gradient information!

12 Some Caveats J T J must be nonsingular so you can calculate its inverse Not a guarantee, so frequently J T J +δi is used instead, where δ is a small positive constant that ensures that the result is invertible. As it turns out, this is equivalent to minimizing the cost function n 2 δ 1 2 i= 1 e i ( ) + w w( n) where the second term represents the deviation from the previous weight, which is another form of regularization

13 In Practice Let s take a closer look at the Jacobian J(n): Recall for our linear approximation w T x(i), the error is given by the difference between this approximation and the true output d(i) = f(x(i)), which is given as e(i)=d(i)-w T x(i) So the derivative of this error with respect to the weight vector is just -x(i)! So

14 In Practice For the case of linear approximation, the Jacobian becomes: ( 1) ( 1) ( 1) e e e w w w ( 2) ( 2) ( 2) ( n) ( 1) ( 1) m( 1) ( 2) ( 2) ( 2) 1 2 m x1 x2 x e e e x1 x2 xm J( n) = w1 w2 w m = = x( 1) x( n) x1( n) x2( n) xm( n) e( n) e( n) e( n) w1 w2 w m w = w So the Jacobian is just the transpose of all the input data up to time n! T

15 In Practice So here s how Gauss-Newton works: Given a weight vector at time n, w(n), calculate the errors for all the data you ve used from x(1) up to x(n), with respect to that weight vector Hint: This may be simplified by creating an n x m matrix of all the data and multiplying it by the (m x 1) weight vector, which results in a vector of all the estimated outputs up to time n. This is then subtracted from the target vector d(n), whose elements are the target values from 1 to n, to produce e(n) The sharp student will notice that the n x m matrix he or she made above is the negative of the Jacobian! Then just turn the crank to get your new answer.

16 True story: I bought a dachshund ( Wiener Dog ) for the sole purpose of naming him Norbert. After Norbert Wiener, inventor of the Wiener filter. As it happens, I also have a cat named Newton. After Isaac, not Fig. My wife actually let me get away with this twice. I have a plan to buy a second cat, name him Liebniz, and watch them duke it out.

17 The Wiener Filter Fortunately, we already derived this, sort of: The Wiener Filter simply applies the Gauss-Newton method of optimization to the linear least-squares function approximation That is, we want a currentestimate to a linear approximator d(i) = f(x(n)) w(n) T x(n) = y(n), given the current error e(n) = d(n) y(n) and all the errors of the previous inputs applied to the current weight vector The thing we want to approximate is the weight vector, or parameter vector

18 To Reiterate Here s how the Wiener Filter works: Given a current weight vector w(n), and all the previous inputs that led up to that current weight vector, x(1) up to x(n), we form the data vectorx(n) as: ( n) ( 1) ( 2) ( n) X = x x x We then calculate the cumulative errors as: ( n) = ( n) ( n) ( n) e d X w where d(n) is a column vector containing all the desired responses up until now. T

19 I Love It When A Plan Comes Together Recall, to linearize this error with respect to w, we get the Jacobian: ( n) T e e( ) = J ( ) = = X w Since the error is already linear with respect to w, Gauss-Newton will actually converge in a single iteration! So we apply it: T 1 T w( n+ 1) = w( n) J ( n) J( n) J n e n ( ) T n n n ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( T ) 1 T T ( ) ( ) ( ) ( ) = w n + X n X n X n d n X n w n = T 1 T ( X ( n) X( n) ) X ( n) d( n) Now THAT s simplification!

20 Some Notes on Wiener 1 T T The multiplier ( X ( n) X( n) ) X ( n) is called the pseudoinversematrixof X(n), and is denoted X + (n) What does the pseudoinverse devolve into when X(n) is a nonsingular square matrix? Hint: ( ) AB = B A Obviously, this can be computationally intensive for large data sets: you need to keep a matrix of all data for all time since the beginning of your estimation! The same caveats for the invertibility of X T X apply as did in the case of the Gauss-Newton method

21 The Wiener To The Limit Everybody Fhqwghads! Let s assume stable statistics for the underlying function we re trying to estimate Furthermore, let s assume ergodicity, which is BTSOTC, but means that we can perfectly determine the statistics from one (maybe infinitely long) realization of this random process Let s look at what happens to the nature of the Wiener Filter as our data set becomes infinitely large; that is, what is wo = limw( n+ 1) n where w o is the Wiener solutionto the linear leastsquares problem

22 No Math This Time! Well, not really. Let s look at the Wiener Filter equation again: ( ) ( ) ( ) T 1 T w( n+ 1) = X ( n) X( n) X n d n and simply rewrite this in sum-form: 1 T ( ) ( i) ( i) ( i) d( i) w = w = o x x x i= 1 i= 1 ( ) R 1 xx rdx What do those terms look like? Looks like the Maximum Likelihood Estimator!

23 All that back story for this?

24 LMS Motivation Recall that the Wiener Filter sought to minimize the n 1 2 cumulative error, e ( i) 2 i= 1 To address the memory and complexity requirements of this algorithm, the LMS algorithm seeks instead to 1 minimize the instantaneous error, E( wˆ ) = e 2 ( n), where 2 w is the desired weight vector estimate ŵ That is, it only looks at the current prediction error rather than all the previous prediction errors with respect to the current parameter estimate

25 Derivation We should be used to this by now: We want to minimize the error with respect to some unknown weight vector, so we take the derivative: and our error signal is determined, as usual, by: so ( wˆ ) E wˆ ( ) e n = e( n) = 0 wˆ T ( ) = ( ) x ( n) wˆ ( n) e n d n ( ) e n wˆ = x ( n)

26 Derivation (cont.) This gives us an instantaneous estimateof the gradient as: E( wˆ ) = x( n) e( n) wˆ which gives us an estimate to use for the method of steepest descent: wˆ ( n+ 1) = wˆ ( n) η gˆ ( n) wˆ ( n+ 1) = wˆ ( n) + η x( n) e( n) Look familiar? We ve just generalized the Rosenblatt Perceptron Learning Rule for all linear approximators!

27 The LMS Algorithm Given a training sample x(n) and desired response d(n), and some starting weight vector w(n), compute the error as e(n) = d(n) w T x(n) Update the weight vector as wˆ ( n+ 1) = wˆ ( n) + η x( n) e( n) Repeat. Yes. It s that simple. And powerful!

28 Notes on the LMS Algorithm While the Rosenblatt Perceptron algorithm generally assumes a fixed training set generated ahead of time, and a fixed structure after training is complete, the LMS algorithm assumes that each training pattern is shown only once, and new data is generated continuously In this sense, the LMS is an adaptivealgorithm The LMS algorithm converges to a random walk around the optimal (Wiener) solution if the training set is stationary (doesn t statistically change over time)

29 What You Say??? The rate of convergence of the LMS algorithm toward the optimal solution (given the assumption of stationarity) is given by k min ( ) = + η J + λ υ ( 0) ( 1 η λ ) J n J min M M λ η J 2 2 η λk min k= 1 η λk k= 1 This should be obvious by inspection, so I will not prove it in class. 2 2 n k k k Just kidding. where the λ k s are the eigenvaluesof the correlation matrix of the data, J min is the minimum mean-squared error produced by the Wiener Filter, and υ k (0) is the state evolution of the Markov model of the error premultiplied by a matrix whose rows are the eigenvectors of the correlation matrix! It s all so simple!!!!

30 Which of course is ancient Greek for The Populace of Matlab

31 The Sequence of Interest Today we ll be examining the autoregressive sequence ( ) ( ) y n m = i + i= 1 wy n i ε We ll look at this from several standpoints: Parameter estimation of the wvector using ML and MAP estimators The Wiener filter approximation of the function The LMS filter approximation of the function i

32 Makin Data The data set is constructed as: ( 1) ( 2) ( ) ( 2) ( 3) ( + 1) y y y n y y y n X= y( m) y( m+ 1) y( n+ m 1) T d ( 1) ( 2) ( ) = y m+ y m+ y m+ n T

33 ML and MAP Given the entiredata set a priori, we want to estimate the parameters wthat generated that data, assuming the output was a linear combination of the inputs ˆ ML w = R r 1 xx dx ˆ MAP R xx I r dx w = +λ ( ) 1

34 Wiener Filter Given all the data up to time n, we want to estimate the parameters that generated that data. We derived this for the special case of a linear filter. ( ) T( ) ( ) ( + 1) = T( ) ( ) w n X n X n X n d n 1

35 The Least Mean Square Filter Given onlythe data at time n, and a current weight estimate, we want to update our estimate of the parameters that generated the data. ( n+ 1) = ˆ ( n) + η ( n) e( n) wˆ w x

36 Something useful this way comes.

37 The Nature of Data Recall: there are two primary tasks for neural networks (and the other tools we ve been learning about) Classification Function Approximation So far, all we ve talked about is two-class classification That is, given a data point, does it belong to one class or the other? What about more classes?

38 I Borrowed This From Some Website.

39 Data Big and Small Example: Character recognition via neural networks Suppose we want to train a neural network to recognize handwritten characters digitized into black-and-white, 30x20-pixel images We could vectorize each image into a SIX HUNDRED ELEMENT input vector This just might be computationally intensive OR we could perform feature extraction

40 Feature Extraction Consider some features for character recognition: number of black pixels (add them up!): a d probably contains more writing than an i character height (number of pixels between bottom-most black pixel and top-most black pixel): l and h are taller than e and c character width (number of pixels between left-most black pixel and right-most black pixel): e and b are wider than i and t Others? The (ideal) goal is to find the smallest feature set that fully allows for identification of all the input classes

41 How Much Is Too Much? Rule of thumb: more data is always better. The bigger the training set, the better the trained learning machine will be at generalization The reality, of course, is that more data equals longer training time! Character recognition example: have lots of people write lots of individual letters of the alphabet

Instructor: Dr. Benjamin Thompson Lecture 10: 12 February 2009

Instructor: Dr. Benjamin Thompson Lecture 10: 12 February 2009 Instructor: Dr. Benjamin Thompson Lecture 10: 12 February 2009 Announcement Reminder: Homework 3 is due right now Homework 1 solutions are available online Homework 2 is graded (avg ~75%, minus two outliers

More information

3.4 Linear Least-Squares Filter

3.4 Linear Least-Squares Filter X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

Week 3: Linear Regression

Week 3: Linear Regression Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to

More information

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality Math 336-Modern Algebra Lecture 08 9/26/4. Cardinality I started talking about cardinality last time, and you did some stuff with it in the Homework, so let s continue. I said that two sets have the same

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Sequences and infinite series

Sequences and infinite series Sequences and infinite series D. DeTurck University of Pennsylvania March 29, 208 D. DeTurck Math 04 002 208A: Sequence and series / 54 Sequences The lists of numbers you generate using a numerical method

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras Lecture - 09 Newton-Raphson Method Contd We will continue with our

More information

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras Lecture No. # 10 Convergence Characteristics of Newton-Raphson Method

More information

In this unit we will study exponents, mathematical operations on polynomials, and factoring.

In this unit we will study exponents, mathematical operations on polynomials, and factoring. GRADE 0 MATH CLASS NOTES UNIT E ALGEBRA In this unit we will study eponents, mathematical operations on polynomials, and factoring. Much of this will be an etension of your studies from Math 0F. This unit

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Usually, when we first formulate a problem in mathematics, we use the most familiar

Usually, when we first formulate a problem in mathematics, we use the most familiar Change of basis Usually, when we first formulate a problem in mathematics, we use the most familiar coordinates. In R, this means using the Cartesian coordinates x, y, and z. In vector terms, this is equivalent

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

2.6 The optimum filtering solution is defined by the Wiener-Hopf equation

2.6 The optimum filtering solution is defined by the Wiener-Hopf equation .6 The optimum filtering solution is defined by the Wiener-opf equation w o p for which the minimum mean-square error equals J min σ d p w o () Combine Eqs. and () into a single relation: σ d p p 1 w o

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Lecture No. # 03 Moving from one basic feasible solution to another,

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Adaptive Filters. un [ ] yn [ ] w. yn n wun k. - Adaptive filter (FIR): yn n n w nun k. (1) Identification. Unknown System + (2) Inverse modeling

Adaptive Filters. un [ ] yn [ ] w. yn n wun k. - Adaptive filter (FIR): yn n n w nun k. (1) Identification. Unknown System + (2) Inverse modeling Adaptive Filters - Statistical digital signal processing: in many problems of interest, the signals exhibit some inherent variability plus additive noise we use probabilistic laws to model the statistical

More information

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this. Preface Here are my online notes for my Calculus II course that I teach here at Lamar University. Despite the fact that these are my class notes they should be accessible to anyone wanting to learn Calculus

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression

More information

Topic 2: Logistic Regression

Topic 2: Logistic Regression CS 4850/6850: Introduction to Machine Learning Fall 208 Topic 2: Logistic Regression Instructor: Daniel L. Pimentel-Alarcón c Copyright 208 2. Introduction Arguably the simplest task that we can teach

More information

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras Lecture - 13 Conditional Convergence Now, there are a few things that are remaining in the discussion

More information

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron PMR5406 Redes Neurais e Aula 3 Single Layer Percetron Baseado em: Neural Networks, Simon Haykin, Prentice-Hall, 2 nd edition Slides do curso por Elena Marchiori, Vrije Unviersity Architecture We consider

More information

COMP 875 Announcements

COMP 875 Announcements Announcements Tentative presentation order is out Announcements Tentative presentation order is out Remember: Monday before the week of the presentation you must send me the final paper list (for posting

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

In the Name of God. Lecture 11: Single Layer Perceptrons

In the Name of God. Lecture 11: Single Layer Perceptrons 1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

1 GSW Sets of Systems

1 GSW Sets of Systems 1 Often, we have to solve a whole series of sets of simultaneous equations of the form y Ax, all of which have the same matrix A, but each of which has a different known vector y, and a different unknown

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

One-to-one functions and onto functions

One-to-one functions and onto functions MA 3362 Lecture 7 - One-to-one and Onto Wednesday, October 22, 2008. Objectives: Formalize definitions of one-to-one and onto One-to-one functions and onto functions At the level of set theory, there are

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Elman Mansimov 1 September 24, 2015 1 Modified based on Shenlong Wang s and Jake Snell s tutorials, with additional contents borrowed from Kevin Swersky and Jasper Snoek

More information

MITOCW MITRES18_006F10_26_0602_300k-mp4

MITOCW MITRES18_006F10_26_0602_300k-mp4 MITOCW MITRES18_006F10_26_0602_300k-mp4 FEMALE VOICE: The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational

More information

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/! Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!! WARNING! this class will be dense! will learn how to use nonlinear optimization

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your

More information

Name: Student number:

Name: Student number: UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2018 EXAMINATIONS CSC321H1S Duration 3 hours No Aids Allowed Name: Student number: This is a closed-book test. It is marked out of 35 marks. Please

More information

From Non-Negative Matrix Factorization to Deep Learning

From Non-Negative Matrix Factorization to Deep Learning The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Mon Jan Improved acceleration models: linear and quadratic drag forces. Announcements: Warm-up Exercise:

Mon Jan Improved acceleration models: linear and quadratic drag forces. Announcements: Warm-up Exercise: Math 2250-004 Week 4 notes We will not necessarily finish the material from a given day's notes on that day. We may also add or subtract some material as the week progresses, but these notes represent

More information

1 Review of Winnow Algorithm

1 Review of Winnow Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm

More information

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of Factoring Review for Algebra II The saddest thing about not doing well in Algebra II is that almost any math teacher can tell you going into it what s going to trip you up. One of the first things they

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Finding Limits Graphically and Numerically

Finding Limits Graphically and Numerically Finding Limits Graphically and Numerically 1. Welcome to finding limits graphically and numerically. My name is Tuesday Johnson and I m a lecturer at the University of Texas El Paso. 2. With each lecture

More information

Designing Information Devices and Systems I Spring 2018 Homework 13

Designing Information Devices and Systems I Spring 2018 Homework 13 EECS 16A Designing Information Devices and Systems I Spring 2018 Homework 13 This homework is due April 30, 2018, at 23:59. Self-grades are due May 3, 2018, at 23:59. Submission Format Your homework submission

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Lecture - 30 Stationary Processes

Lecture - 30 Stationary Processes Probability and Random Variables Prof. M. Chakraborty Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 30 Stationary Processes So,

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Neural Networks (and Gradient Ascent Again)

Neural Networks (and Gradient Ascent Again) Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Numerical Methods for Inverse Kinematics

Numerical Methods for Inverse Kinematics Numerical Methods for Inverse Kinematics Niels Joubert, UC Berkeley, CS184 2008-11-25 Inverse Kinematics is used to pose models by specifying endpoints of segments rather than individual joint angles.

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks

More information

Getting Started with Communications Engineering

Getting Started with Communications Engineering 1 Linear algebra is the algebra of linear equations: the term linear being used in the same sense as in linear functions, such as: which is the equation of a straight line. y ax c (0.1) Of course, if we

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Part III: A Simplex pivot

Part III: A Simplex pivot MA 3280 Lecture 31 - More on The Simplex Method Friday, April 25, 2014. Objectives: Analyze Simplex examples. We were working on the Simplex tableau The matrix form of this system of equations is called

More information

Homework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM.

Homework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM. Homework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM. Unless granted by the instructor in advance, you must turn

More information

Mathematics for Intelligent Systems Lecture 5 Homework Solutions

Mathematics for Intelligent Systems Lecture 5 Homework Solutions Mathematics for Intelligent Systems Lecture 5 Homework Solutions Advanced Calculus I: Derivatives and local geometry) Nathan Ratliff Nov 25, 204 Problem : Gradient and Hessian Calculations We ve seen that

More information

Announcements. Problem Set 6 due next Monday, February 25, at 12:50PM. Midterm graded, will be returned at end of lecture.

Announcements. Problem Set 6 due next Monday, February 25, at 12:50PM. Midterm graded, will be returned at end of lecture. Turing Machines Hello Hello Condensed Slide Slide Readers! Readers! This This lecture lecture is is almost almost entirely entirely animations that that show show how how each each Turing Turing machine

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Machine Learning (CSE 446): Probabilistic Machine Learning

Machine Learning (CSE 446): Probabilistic Machine Learning Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as

More information

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 13 Steepest Descent Method Hello, welcome back to this series

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

MITOCW ocw f99-lec05_300k

MITOCW ocw f99-lec05_300k MITOCW ocw-18.06-f99-lec05_300k This is lecture five in linear algebra. And, it will complete this chapter of the book. So the last section of this chapter is two point seven that talks about permutations,

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation EECS 6A Designing Information Devices and Systems I Fall 08 Lecture Notes Note. Positioning Sytems: Trilateration and Correlation In this note, we ll introduce two concepts that are critical in our positioning

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Non-polynomial Least-squares fitting

Non-polynomial Least-squares fitting Applied Math 205 Last time: piecewise polynomial interpolation, least-squares fitting Today: underdetermined least squares, nonlinear least squares Homework 1 (and subsequent homeworks) have several parts

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Vector Spaces. 9.1 Opening Remarks. Week Solvable or not solvable, that s the question. View at edx. Consider the picture

Vector Spaces. 9.1 Opening Remarks. Week Solvable or not solvable, that s the question. View at edx. Consider the picture Week9 Vector Spaces 9. Opening Remarks 9.. Solvable or not solvable, that s the question Consider the picture (,) (,) p(χ) = γ + γ χ + γ χ (, ) depicting three points in R and a quadratic polynomial (polynomial

More information

What is proof? Lesson 1

What is proof? Lesson 1 What is proof? Lesson The topic for this Math Explorer Club is mathematical proof. In this post we will go over what was covered in the first session. The word proof is a normal English word that you might

More information

Lecture 03 Positive Semidefinite (PSD) and Positive Definite (PD) Matrices and their Properties

Lecture 03 Positive Semidefinite (PSD) and Positive Definite (PD) Matrices and their Properties Applied Optimization for Wireless, Machine Learning, Big Data Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Lecture 03 Positive Semidefinite (PSD)

More information

Math Lecture 18 Notes

Math Lecture 18 Notes Math 1010 - Lecture 18 Notes Dylan Zwick Fall 2009 In our last lecture we talked about how we can add, subtract, and multiply polynomials, and we figured out that, basically, if you can add, subtract,

More information

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk 94 Jonathan Richard Shewchuk 7 Neural Networks NEURAL NETWORKS Can do both classification & regression. [They tie together several ideas from the course: perceptrons, logistic regression, ensembles of

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information