CSE 5526: Introduction to Neural Networks Linear Regression

Similar documents
Regression and the LMS Algorithm

Supervised learning: Linear regression Logistic regression

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Binary classification: Support Vector Machines

Multiple Choice Test. Chapter Adequacy of Models for Regression

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

15-381: Artificial Intelligence. Regression and neural networks (NN)

Support vector machines II

Support vector machines

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Generative classification models

1 Lyapunov Stability Theory

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES

Bayes (Naïve or not) Classifiers: Generative Approach

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Unsupervised Learning and Other Neural Networks

Kernel-based Methods and Support Vector Machines

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Correlation and Regression Analysis

Radial Basis Function Networks

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Lecture 12: Multilayer perceptrons II

STK4011 and STK9011 Autumn 2016

A conic cutting surface method for linear-quadraticsemidefinite

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Classification : Logistic regression. Generative classification model.

Linear regression (cont.) Linear methods for classification

Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18

F. Inequalities. HKAL Pure Mathematics. 進佳數學團隊 Dr. Herbert Lam 林康榮博士. [Solution] Example Basic properties

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

An Introduction to. Support Vector Machine

Rademacher Complexity. Examples

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Chapter 14 Logistic Regression Models

3. Basic Concepts: Consequences and Properties

A tighter lower bound on the circuit size of the hardest Boolean functions

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Special Instructions / Useful Data

Point Estimation: definition of estimators

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

9.1 Introduction to the probit and logit models

Arithmetic Mean and Geometric Mean

ENGI 3423 Simple Linear Regression Page 12-01

CS475 Parallel Programming

Lecture 2: Linear Least Squares Regression

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Introduction to local (nonparametric) density estimation. methods

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

PTAS for Bin-Packing

Chapter Two. An Introduction to Regression ( )

Lyapunov Stability. Aleksandr Mikhailovich Lyapunov [1] 1 Autonomous Systems. R into Nonlinear Systems in Mechanical Engineering Lesson 5

Parameter Estimation in Generalized Linear Models through

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

IFYMB002 Mathematics Business Appendix C Formula Booklet

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

X ε ) = 0, or equivalently, lim

Dimensionality Reduction and Learning

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

A Robust Total Least Mean Square Algorithm For Nonlinear Adaptive Filter

Simple Linear Regression

Chapter 5 Properties of a Random Sample

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

arxiv: v1 [cs.lg] 22 Feb 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

LINEAR REGRESSION ANALYSIS

A new type of optimization method based on conjugate directions

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Objectives of Multiple Regression

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific

7.0 Equality Contraints: Lagrange Multipliers

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

Functions of Random Variables

Sampling Theory MODULE V LECTURE - 14 RATIO AND PRODUCT METHODS OF ESTIMATION

. The set of these sums. be a partition of [ ab, ]. Consider the sum f( x) f( x 1)

Lecture 3 Probability review (cont d)

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Maximum Likelihood Estimation

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

1 Solution to Problem 6.40

Chapter 5. Curve fitting

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

= lim. (x 1 x 2... x n ) 1 n. = log. x i. = M, n

LECTURE 2: Linear and quadratic classifiers

ALGORITHMS FOR OPTIMAL DECISIONS 2. PRIMAL-DUAL INTERIOR POINT LP ALGORITHMS 3. QUADRATIC PROGRAMMING ALGORITHMS & PORTFOLIO OPTIMISATION

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

STRONG CONSISTENCY FOR SIMPLE LINEAR EV MODEL WITH v/ -MIXING


{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Barycentric Interpolators for Continuous. Space & Time Reinforcement Learning. Robotics Institute, Carnegie Mellon University

Transcription:

CSE 556: Itroducto to Neural Netorks Lear Regresso Part II 1

Problem statemet Part II

Problem statemet Part II 3

Lear regresso th oe varable Gve a set of N pars of data <, d >, appromate d by a lear fucto of regressor.e. or d + d here the actvato fucto φ s a lear fucto, ad t correspods to a lear euro. y s the output of the euro, ad ε s called the regresso epectatoal error b y + ε ϕ + b + ε d + y b + ε Part II 4

Part II 5 Lear regresso cot. The problem of regresso th oe varable s ho to choose ad b to mmze the regresso error The least squares method ams to mmze the square error E: N N y d E 1 1 1 1 ε

Lear regresso cot. To mmze the to-varable square fucto, set E b E 0 0 Part II 6

Part II 7 Lear regresso cot. b d b b d b E 0 1 b d b d E 0 1

Part II 8 Lear regresso cot. Hece here a overbar.e. dcates the mea N d d b ] [ d d Derve yourself!

Lear regresso cot. Ths method gves a optmal soluto, but t ca be tmead memory-cosumg as a batch soluto Part II 9

Fdg optmal parameters va search Wthout loss of geeralty, set b 0 E 1 N 1 E s called a cost fucto d Part II 10

Cost fucto E E m * Questo: ho ca e update to mmze E? Part II 11

Part II 1 Gradet ad drectoal dervatves Wthout loss of geeralty, cosder a to-varable fucto f, y. The gradet of f, y at a gve pot 0, y 0 T s here u ad u y are ut vectors the ad y drectos, ad ad 0 0,,, y y T y y f y f f f f y f f y y y y f y f u u,, 0 0 0 0 +

Gradet ad drectoal dervatves cot. At ay gve drecto, u au + bu y, th a + b 1, the drectoal dervatve at 0, y 0 T alog the ut vector u s D u f, y f + ha, y + hb h f Whch drecto has the greatest slope? The gradet because of the dot product!, y 0 0 0 0 0 0 lm h 0 [ f 0 + ha, y0 + hb f 0, y0 + hb] + [ f 0, y0 + hb f 0, y0] lm h 0 h af 0, y0 bf 0, y0 + f T 0, y0u y Part II 13

Gradet ad drectoal dervatves cot. Eample: see blackboard Part II 14

Gradet ad drectoal dervatves cot. To fd the gradet at a partcular pot 0, y 0 T, frst fd the level curve or cotour of f, y at that pot, C 0, y 0. A taget vector u to C satsfes D u T f 0, y0 u 0 because f, y s costat o a level curve. Hece the gradet vector s perpedcular to the taget vector Part II 15

A llustrato of level curves Part II 16

Gradet ad drectoal dervatves cot. The gradet of a cost fucto s a vector th the dmeso of that pots to the drecto of mamum E crease ad th a magtude equal to the slope of the taget of the cost fucto alog that drecto Ca the slope be egatve? Part II 17

Gradet llustrato E E m * Δ 0 E lm 0 E + E 0 0 0 Gradet Part II 18

Gradet descet Mmze the cost fucto va gradet steepest descet a case of hll-clmbg + 1 η E : terato umber η: learg rate See prevous fgure Part II 19

Part II 0 Gradet descet cot. For the mea-square-error cost fucto: ] [ 1 1 y d e E lear euros 1 e E E ] [ 1 d e

Gradet descet cot. Hece + 1 + ηe + η[ d y ] Ths s the least-mea-square LMS algorthm, or the Wdro-Hoff rule Part II 1

Mult-varable case The aalyss for the oe-varable case eteds to the multvarable case 1 T E [ d ] E E E E,,..., 0 1 m T here 0 b bas ad 0 1, as doe for perceptro learg Part II

Part II 3 Mult-varable case cot. The LMS algorthm 1 E + η e η + ] [ y d + η

LMS algorthm Remarks The LMS rule s eactly the same math form as the perceptro learg rule Perceptro learg s for McCulloch-Ptts euros, hch are olear, hereas LMS learg s for lear euros. I other ords, perceptro learg s for classfcato ad LMS s for fucto appromato LMS should be less sestve to ose the put data tha perceptros. O the other had, LMS learg coverges sloly Neto s method chages eghts the drecto of the mmum E ad leads to fast covergece. But t s ot a ole verso ad computatoally etesve Part II 4

Stablty of adaptato Whe η s too small, learg coverges sloly Part II 5

Stablty of adaptato cot. Whe η s too large, learg does t coverge Part II 6

Learg rate aealg Basc dea: start th a large rate but gradually decrease t Stochastc appromato η c s a postve parameter c Part II 7

Learg rate aealg cot. Search-the-coverge η0 η 1+ τ η 0 ad τ are postve parameters Whe s small compared to τ, learg rate s appromately costat Whe s large compared to τ, learg rate schedule roughly follos stochastc appromato Part II 8

Rate aealg llustrato Part II 9

Part II 30 Nolear euros To eted the LMS algorthm to olear euros, cosder dfferetable actvato fucto φ at terato ] [ 1 y d E ] [ 1 j j j d ϕ

Nolear euros cot. By cha rule of dfferetato E E y y v v j j [ d y ] ϕ v e ϕ v j j Part II 31

The gradet descet gves Nolear euros cot. The above s called the delta δ rule If e choose a logstc sgmod for φ the j + 1 +η e ϕ v ϕ v j j + ηδ 1+ 1 ep av see tetbook ϕ v aϕ v[1 ϕ v] j j Part II 3

Role of actvato fucto φ φ v v The role of φ : eght update s most sestve he v s ear zero Part II 33