Regression and the LMS Algorithm

Similar documents
CSE 5526: Introduction to Neural Networks Linear Regression

Supervised learning: Linear regression Logistic regression

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

15-381: Artificial Intelligence. Regression and neural networks (NN)

Multiple Choice Test. Chapter Adequacy of Models for Regression

Support vector machines II

Support vector machines

Binary classification: Support Vector Machines

Generative classification models

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Lecture Notes Forecasting the process of estimating or predicting unknown situations

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Objectives of Multiple Regression

Unsupervised Learning and Other Neural Networks

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Correlation and Regression Analysis

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Lecture 12: Multilayer perceptrons II

Radial Basis Function Networks

Kernel-based Methods and Support Vector Machines

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

1 Lyapunov Stability Theory

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES

Bayes (Naïve or not) Classifiers: Generative Approach

Linear regression (cont.) Linear methods for classification

Chapter Two. An Introduction to Regression ( )

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

An Introduction to. Support Vector Machine

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

A conic cutting surface method for linear-quadraticsemidefinite

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

ESS Line Fitting

9.1 Introduction to the probit and logit models

Model Fitting, RANSAC. Jana Kosecka

Fourth Order Four-Stage Diagonally Implicit Runge-Kutta Method for Linear Ordinary Differential Equations ABSTRACT INTRODUCTION

F. Inequalities. HKAL Pure Mathematics. 進佳數學團隊 Dr. Herbert Lam 林康榮博士. [Solution] Example Basic properties

Maximum Likelihood Estimation

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

= 2. Statistic - function that doesn't depend on any of the known parameters; examples:

Point Estimation: definition of estimators

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Correlation and Simple Linear Regression

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

QR Factorization and Singular Value Decomposition COS 323

Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18

Special Instructions / Useful Data

Barycentric Interpolators for Continuous. Space & Time Reinforcement Learning. Robotics Institute, Carnegie Mellon University

3. Basic Concepts: Consequences and Properties

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

LINEAR REGRESSION ANALYSIS

Chapter 5. Curve fitting

Arithmetic Mean and Geometric Mean

STK4011 and STK9011 Autumn 2016

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

Classification : Logistic regression. Generative classification model.

Chapter 14 Logistic Regression Models

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Introduction to local (nonparametric) density estimation. methods

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

EVALUATION OF FUNCTIONAL INTEGRALS BY MEANS OF A SERIES AND THE METHOD OF BOREL TRANSFORM

Power Flow S + Buses with either or both Generator Load S G1 S G2 S G3 S D3 S D1 S D4 S D5. S Dk. Injection S G1

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

Rademacher Complexity. Examples

Lecture 2: Linear Least Squares Regression

Line Fitting and Regression

Econometric Methods. Review of Estimation

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

1 0, x? x x. 1 Root finding. 1.1 Introduction. Solve[x^2-1 0,x] {{x -1},{x 1}} Plot[x^2-1,{x,-2,2}] 3

A Robust Total Least Mean Square Algorithm For Nonlinear Adaptive Filter

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

A tighter lower bound on the circuit size of the hardest Boolean functions

Chapter 3. Differentiation 3.3 Differentiation Rules

8.1 Hashing Algorithms

Linear Regression with One Regressor

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

The Mathematics of Portfolio Theory

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

Simple Linear Regression

ECE 421/599 Electric Energy Systems 7 Optimal Dispatch of Generation. Instructor: Kai Sun Fall 2014

Some Different Perspectives on Linear Least Squares

Jacobian-Free Diagonal Newton s Method for Solving Nonlinear Systems with Singular Jacobian ABSTRACT INTRODUCTION

Linear regression (cont) Logistic regression

Machine Learning. Topic 4: Measuring Distance

Chapter 5 Properties of a Random Sample

On the convergence of derivatives of Bernstein approximation

LECTURE 9: Principal Components Analysis

LECTURE 2: Linear and quadratic classifiers

Transcription:

CSE 556: Itroducto to Neural Netorks Regresso ad the LMS Algorthm CSE 556: Regresso 1

Problem statemet CSE 556: Regresso

Lear regresso th oe varable Gve a set of N pars of data {, d }, appromate d b a lear fucto of regressor.e. or d + d here the actvato fucto φ s a lear fucto, correspodg to a lear euro. s the output of the euro, ad s called the regresso epectatoal error b + ε ϕ + b ε d + b + ε + ε CSE 556: Regresso 3

CSE 556: Regresso 4 Lear regresso cot. The problem of regresso th oe varable s ho to choose ad b to mmze the regresso error The least squares method ams to mmze the square error: N N d E 1 1 1 1 ε

Lear regresso cot. To mmze the to-varable square fucto, set E b E CSE 556: Regresso 5

CSE 556: Regresso 6 Lear regresso cot. b d b d b b E 1 b d b d E 1

Aaltc soluto approaches Solve oe equato for b terms of Substtute to other equato, solve for Substtute soluto for back to equato for b Setup sstem of equatos matr otato Solve matr equato Rerte problem matr form Compute matr gradet Solve for CSE 556: Regresso 7

CSE 556: Regresso 8 Lear regresso cot. Hece here a overbar.e. dcates the mea N d d b d d Derve ourself!

Lear regresso matr otato Let XX 11 33 NN TT The the model predctos are XXXX Ad the mea square error ca be rtte EE dd dd XXXX To fd the optmal, set the gradet of the error th respect to equal to ad solve for EE dd XXXX See The Matr Cookbook Peterse & Pederse CSE 556: Regresso 9

Lear regresso matr otato EE dd XXXX dd XXXX TT dd XXXX ddtt dd TT XX TT dd + TT XX TT XXXX XX TT dd XX TT XXXX EE XXTT dd XX TT XXXX XX TT XX 1 XX TT dd Much cleaer! CSE 556: Regresso 1

Fdg optmal parameters va search Ofte there s o closed form soluto for EE We ca stll use the gradet a umercal soluto We ll stll use the same eample to permt comparso For smplct s sake, set b E 1 N 1 E s called a cost fucto d CSE 556: Regresso 11

Cost fucto E E m * Questo: ho ca e update from to mmze E? CSE 556: Regresso 1

CSE 556: Regresso 13 Gradet ad drectoal dervatves Cosder a to-varable fucto f,. Its gradet at the pot, T s defed as here u ad u are ut vectors the ad drectos, ad ad T f f f f u, u,,,, f + f f f f

CSE 556: Regresso 14 Gradet ad drectoal dervatves cot. At a gve drecto, u au + bu, th, the drectoal dervatve at, T alog the ut vector u s Whch drecto has the greatest slope? The gradet because of the dot product! u, f,, ],, [ ],, [ lm,, lm, u T h h bf af h f hb f hb f hb ha f h f hb ha f f D + + + + + + + + 1 + b a

Gradet ad drectoal dervatves cot. Eample: f, 5 3 + 5 + + CSE 556: Regresso 15

Gradet ad drectoal dervatves cot. Eample: f, 5 3 + 5 + + CSE 556: Regresso 16

Gradet ad drectoal dervatves cot. The level curves of a fucto ff, are curves such that ff, kk Thus, the drectoal dervatve alog a level curve s T Du f, u Ad the gradet vector s perpedcular to the level curve CSE 556: Regresso 17

Gradet ad drectoal dervatves cot. The gradet of a cost fucto s a vector th the dmeso of that pots to the drecto of mamum E crease ad th a magtude equal to the slope of the taget of the cost fucto alog that drecto Ca the slope be egatve? CSE 556: Regresso 18

Gradet llustrato E E m * Δ E lm E + E Gradet CSE 556: Regresso 19

Gradet descet Mmze the cost fucto va gradet steepest descet a case of hll-clmbg + 1 η E : terato umber η: learg rate See prevous fgure CSE 556: Regresso

CSE 556: Regresso 1 Gradet descet cot. For the mea-square-error cost fucto ad lear euros ] [ 1 ] [ 1 1 d d e E 1 e e E E

CSE 556: Regresso Gradet descet cot. Hece Ths s the least-mea-square LMS algorthm, or the Wdro-Hoff rule ] [ 1 d e + + + η η

Stochastc gradet descet If the cost fucto s of the form EE EE CSE 556: Regresso 3 NN 1 The oe gradet descet step requres computg Δ NN EE EE 1 Whch meas computg EE or ts gradet for ever data pot Ma steps ma be requred to reach a optmum

Stochastc gradet descet It s geerall much more computatoall effcet to use + bb 1 Δ EE For small values of bb Ths update rule ma coverge ma feer passes through the data epochs CSE 556: Regresso 4

Stochastc gradet descet eample CSE 556: Regresso 5

Stochastc gradet descet error fuctos CSE 556: Regresso 6

Stochastc gradet descet gradets CSE 556: Regresso 7

Stochastc gradet descet amato CSE 556: Regresso 8

Gradet descet amato CSE 556: Regresso 9

Mult-varable LMS The aalss for the oe-varable case eteds to the multvarable case 1 T E [ d ] E E E,,..., 1 E m here b bas ad 1, as doe for perceptro learg T CSE 556: Regresso 3

CSE 556: Regresso 31 Mult-varable LMS cot. The LMS algorthm ] [ 1 d e E + + + η η η

LMS algorthm remarks The LMS rule s eactl the same equato as the perceptro learg rule Perceptro learg s for olear M-P euros, hereas LMS learg s for lear euros..e., perceptro learg s for classfcato ad LMS s for fucto appromato LMS should be less sestve to ose the put data tha perceptros O the other had, LMS learg coverges slol Neto s method chages eghts the drecto of the mmum E ad leads to fast covergece. But t s ot ole ad s computatoall epesve CSE 556: Regresso 3

Stablt of adaptato Whe η s too small, learg coverges slol Whe η s too large, learg does t coverge CSE 556: Regresso 33

Learg rate aealg Basc dea: start th a large rate but graduall decrease t Stochastc appromato η c s a postve parameter c CSE 556: Regresso 34

Learg rate aealg cot. Search-the-coverge η η 1+ τ η ad τ are postve parameters Whe s small compared to τ, learg rate s appromatel costat Whe s large compared to τ, learg rule schedule roughl follos stochastc appromato CSE 556: Regresso 35

Rate aealg llustrato CSE 556: Regresso 36

CSE 556: Regresso 37 Nolear euros To eted the LMS algorthm to olear euros, cosder dfferetable actvato fucto φ at terato 1 ] [ 1 j j j d d E ϕ

CSE 556: Regresso 38 Nolear euros cot. B cha rule of dfferetato ] [ v e v d v v E E j j j j ϕ ϕ

CSE 556: Regresso 39 Nolear euros cot. Gradet descet gves The above s called the delta δ rule If e choose a logstc sgmod for φ the 1 v e j j j j j ηδ ϕ η + + + ep 1 1 av v + ϕ ] [1 v v a v ϕ ϕ ϕ see tetbook

Role of actvato fucto φ φ v v The role of φ : eght update s most sestve he v s ear zero CSE 556: Regresso 4