Neural Network (Basic Ideas) Hung-yi Lee

Similar documents
Neural Network Introduction. Hung-yi Lee

SVMs for regression Multilayer neural networks

Support vector machines for regression

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

SVMs for regression Non-parametric/instance based classification method

Deep Learning. Hung-yi Lee 李宏毅

Regression. Hung-yi Lee 李宏毅

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Definition of Tracking

More Tips for Training Neural Network. Hung-yi Lee

Deep learning attracts lots of attention.

18.7 Artificial Neural Networks

Convolutional Neural Network. Hung-yi Lee

Least squares. Václav Hlaváč. Czech Technical University in Prague

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Classification: Probabilistic Generative Model

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

VECTORS VECTORS VECTORS VECTORS. 2. Vector Representation. 1. Definition. 3. Types of Vectors. 5. Vector Operations I. 4. Equal and Opposite Vectors

MEG 741 Energy and Variational Methods in Mechanics I

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

In this appendix, we evaluate the derivative of Eq. 9 in the main text, i.e., we need to calculate

PHYS 2421 Fields and Waves

Section 6: Area, Volume, and Average Value

An Introduction to Support Vector Machines

Principle Component Analysis

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Model Fitting and Robust Regression Methods

Study on the Normal and Skewed Distribution of Isometric Grouping

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

Lecture 3: Equivalence Relations

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past

Katholieke Universiteit Leuven Department of Computer Science

First Midterm Examination

Interpreting Integrals and the Fundamental Theorem

Reinforcement learning II

Periodic Learning of B-spline Models for Output PDF Control: Application to MWD Control

Physics 1402: Lecture 7 Today s Agenda

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning

7.2 Volume. A cross section is the shape we get when cutting straight through an object.

Lecture 4: Piecewise Cubic Interpolation

Actor-Critic. Hung-yi Lee

Multilayer Perceptron (MLP)

Research Article Special Issue

S56 (5.3) Vectors.notebook January 29, 2016

Collective Network of Evolutionary Binary Classifiers for Content-Based Image Retrieval

What would be a reasonable choice of the quantization step Δ?

Lecture Notes on Linear Regression

Calculation of time complexity (3%)

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

Module 3: Element Properties Lecture 5: Solid Elements

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

The practical version

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Math 124B January 24, 2012

Language Modeling. Hung-yi Lee 李宏毅

Directional Independent Component Analysis with Tensor Representation

Product Layout Optimization and Simulation Model in a Multi-level Distribution Center

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

Learning Enhancement Team

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Bi-level models for OD matrix estimation

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

Decomposition of Boolean Function Sets for Boolean Neural Networks

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

Multiple view geometry

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU

SOLVING SYSTEMS OF EQUATIONS, ITERATIVE METHODS

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Coalgebra, Lecture 15: Equations for Deterministic Automata

Boostrapaggregating (Bagging)

Abhilasha Classes Class- XII Date: SOLUTION (Chap - 9,10,12) MM 50 Mob no

Statistics 423 Midterm Examination Winter 2009

Non-Linear Data for Neural Networks Training and Testing

CSC 411 / CSC D11 / CSC C11

7.1 Integral as Net Change and 7.2 Areas in the Plane Calculus

Demand. Demand and Comparative Statics. Graphically. Marshallian Demand. ECON 370: Microeconomic Theory Summer 2004 Rice University Stanley Gilbert

Semi-supervised Learning

Learning Theory: Lecture Notes

10-701/ Machine Learning, Fall 2005 Homework 3

set is not closed under matrix [ multiplication, ] and does not form a group.

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Introduction to the Introduction to Artificial Neural Network

Graphical rules for SU(N)

IV. Performance Optimization

CISE 301: Numerical Methods Lecture 5, Topic 4 Least Squares, Curve Fitting

Homework Assignment 6 Solution Set

Minnesota State University, Mankato 44 th Annual High School Mathematics Contest April 12, 2017

Numerical Analysis: Trapezoidal and Simpson s Rule

Lecture 2e Orthogonal Complement (pages )

Which Separator? Spring 1

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Bases for Vector Spaces

Ensemble Methods: Boosting

Linear Feature Engineering 11

Transcription:

Neur Network (Bsc Ides) Hung-y Lee

Lernng Lookng for Functon Speech Recognton f Hndwrtten Recognton f Wether forecst f Py vdeo gmes f wether tody Postons nd numer of enemes 你好 sunny tomorrow fre

Frmework x : ŷ : (e) Mode Hypothess Functon Set f, f y Trnng: Pck the est Functon f * Best Functon f * Testng: f x y Trnng Dt x : functon nput ŷ : functon output x, yˆ, x, yˆ,

Outne. Wht s the mode (functon hypothess set)?. Wht s the est functon? 3. How to pck the est functon?

Tsk Consdered Tody Cssfcton Bnry Cssfcton Ony two csses nput oject Css A (yes) Css B (no) Spm fterng Is n e-m spm or not? Recommendton systems recommend the product to the customer or not? Mwre detecton Is the softwre mcous or not? Stock predcton W the future vue of stock ncrese or not?

Tsk Consdered Tody Cssfcton Bnry Cssfcton Ony two csses Mut-css Cssfcton More thn two csses nput oject Css A (yes) Css B (no) nput oject Css A Css B Css C

Mut-css Cssfcton Hndwrtng Dgt Cssfcton Input: Css:,,., 9, csses Imge Recognton Input: Css: dog, ct, ook,. Thousnds of csses

Mut-css Cssfcton Re speech recognton s not mut-css cssfcton The HW s mut-css cssfcton frme Input: Csses: h, how re you, I m sorry. Cnnot e enumerted // // /ε/ The frme eongs to whch phoneme. Csses re the phonemes.

. Wht s the mode?

Wht s the functon we re ookng for? cssfcton y = f x f: R N R M x: nput oject to e cssfed y: css Assume oth x nd y cn e represented s fxed-sze vector x s vector wth N dmensons, nd y s vector wth M dmensons

Wht s the functon we re ookng for? Hndwrtng Dgt Cssfcton f: R N R M x: mge y: css 6 x 6 Ech pxe corresponds to n eement n the vector : for nk, : otherwse 6 x 6 = 56 dmensons dmensons for dgt recognton 3 3 or not or not 3 or not

. Wht s the mode? A Lyer of Neuron

Snge Neuron z f: R N R z x x w w z Actvton functon z y x N w N s z e z

Snge Neuron z f: R N R z x x w w z Actvton functon z y x N w N s z e z

Snge Neuron f: R N R Snge neuron cn ony do nry cssfcton, cnnot hnde mut-css cssfcton x x y x N s not "" "" y y.5.5

A Lyer of Neuron f: R N R M Hndwrtng dgt cssfcton Csses:,,., 9, csses x x x N If y s the mx, then the mge s. y neurons or not y or not y 3 3 or not

. Wht s the mode? Lmtton of Snge Lyer

Lmtton of Snge Lyer x w w x z z w x w x Input Output x x No Yes Yes No x yes no threshod < threshod threshod threshod Cn we? < threshod threshod x

Lmtton of Snge Lyer No, we cn t x w w x z x x x x

Lmtton of Snge Lyer x w w x z Input Output x x No Yes Yes NOT AND AND OR No

Neur Network NOT AND Neur Network AND OR x z x z z Hdden Neurons

=.73 =.7 z x x =.7 =.5 x x =.5 =.7 z x =.7 =.73 x

x =.73 =.7 w z =.7 =.5 w x =.5 =.7 (.73,.5) x =.7 =.73 (.7,.7) (.5,.73) x

. Wht s the mode? Neur Network

Neur Network s Mode Input x f: R N R M Lyer Lyer Lyer L Output y vector x x x N Input Lyer Hdden Lyers Output Lyer Fuy connected feedforwrd network Deep Neur Network: mny hdden yers y y M vector y

Notton j Lyer nodes N j Lyer N nodes Output of neuron: Lyer Neuron Output of one yer: : vector

Notton Lyer nodes N j j w j Lyer N nodes W w j w w Lyer to Lyer from neuron j (Lyer ) to neuron (Lyer ) N w w N

Notton Lyer nodes N j j Lyer N nodes : s for neuron t yer s for neurons n yer

Notton j Lyer nodes N j w w w j z Lyer N z z z nodes : nput of the ctvton functon for neuron t yer : nput of the ctvton functon the neurons n yer w z w N j w j j

Notton - Summry :output of neuron w j : weght :output of yer W : weght mtrx z : nput of ctvton functon : s z : nput of ctvton functon for yer : s vector

Retons etween Lyer Outputs Lyer nodes N j j z z z z Lyer N nodes

Retons etween Lyer Outputs nodes N Lyer Lyer nodes N j j z z z z w w w w z z z W z w w z w w z w w z

Retons etween Lyer Outputs z z z z z nodes N Lyer Lyer nodes N j j z z z z

Retons etween Lyer Outputs z z z W z j j z z W Lyer nodes N Lyer N nodes

Functon of Neur Network Input x W, Lyer Lyer Lyer L W, W L, L Output y vector x x x N x W x W L y y M L L- L L W vector y y

Functon of Neur Network Input x W, Lyer Lyer Lyer L W, W L, L Output y vector x x y vector y x N y M y f x W L W W x L

. Wht s the est functon?

Best Functon = Best Prmeters y f L W W W x x functon set f x; W,, W, W, L L ecuse dfferent prmeters W nd ed to dfferent functon Form wy to defne functon set: prmeter set L Pck the est functon f* Pck the est prmeter set θ*

Cost Functon Defne functon for prmeter set C θ C θ evute how d prmeter set s The est prmeter set θ s the one tht mnmzes C θ θ = rg mn θ C θ C θ s ced cost/oss/error functon If you defne the goodness of the prmeter set y nother functon O θ O θ s ced ojectve functon

Cost Functon Gven trnng dt: r r R R x, yˆ x, yˆ x, yˆ Hndwrtng Dgt Cssfcton sum over trnng exmpes C R r f x ; r..4. 3 yˆ r Mnmze dstnce 3

3. How to pck the est functon? Grdent Descent

Sttement of Proems Sttement of proems: There s functon C(θ) θ represents prmeter set θ = {θ, θ, θ 3, } Fnd θ * tht mnmzes C(θ) Brute force? Enumerte posse θ Ccuus? Fnd θ * such tht C C, * *,

Grdent Descent Ide For smpfcton, frst consder tht θ hs ony one vre C Drop somewhere When the stops, we fnd the oc mnm 3

Grdent Descent Ide η s ced ernng rte For smpfcton, frst consder tht θ hs ony one vre C Rndomy strt t θ Compute θ θ η dc θ Τdθ dc θ Τdθ Compute dc θ Τdθ θ θ η dc θ Τdθ

Grdent Descent Suppose tht θ hs two vres {θ, θ } Rndomy strt t θ = θ Compute the grdents of C θ t θ : C θ = Updte prmeters θ θ θ = θ θ η C θ Τ θ C θ Τ θ C θ Τ θ C θ Τ θ θ = θ η C θ Compute the grdents of C θ t θ : C θ C θ Τ = θ C θ Τ θ

Grdent Descent θ C θ C θ Strt t poston θ θ θ Grdent Movement θ θ 3 C θ C θ 3 Compute grdent t θ Move to θ = θ - η C θ Compute grdent t θ Move to θ = θ η C θ θ

Form Dervton of Grdent Descent Suppose tht θ hs two vres {θ, θ } C(θ) Gven pont, we cn esy fnd the pont wth the smest vue nery. How?

Form Dervton of Grdent Descent Tyor seres: Let h(x) e nfntey dfferente round x = x. h x k h h k x k k! x x x hx x x x x x h! When x s cose to x h x h x h x x x

E.g. Tyor seres for h(x)=sn(x) round x =π/4 sn(x)= The pproxmton s good round π/4.

Mutvre Tyor seres,,,, y y y y x h x x x y x h y x h y x h When x nd y s cose to x nd y,,,, y y y y x h x x x y x h y x h y x h + somethng reted to (x-x ) nd (y-y ) +

Form Dervton of Grdent Descent Bsed on Tyor Seres: If the red crce s sm enough, n the red crce u C C, s C, C, C, C, C, C, v v s u, C(θ)

Form Dervton of Grdent Descent Fnd θ nd θ yedng the smest vue of C θ n the crce v u s C v u, C,, C v u, C, C Its vue dependng on the rdus of the crce, u nd v. Ths s how grdent descent updtes prmeters. Bsed on Tyor Seres: If the red crce s sm enough, n the red crce, C

Grdent Descent for Neur Network Strtng Prmeters C C ompute C C C ompute C C L L, W,, W,,, W,, W, j w C C w w w w Mons of prmeters To compute the grdents effcenty, we use ckpropgton.

Stuck t oc mnm? Sdde pont Who s Afrd of Non- Convex Loss Functons? http://vdeoectures.ne t/em7_ecun_w/ Deep Lernng: Theoretc Motvtons http://vdeoectures.ne t/deepernng5_e ngo_theoretc_motv tons/

3. How to pck the est functon? Prctc Issues for neur network

Prctc Issues for neur network Prmeter Intzton Lernng Rte Stochstc grdent descent nd Mn-tch Recpe for Lernng

Prmeter Intzton For grdent Descent, we need to pck n ntzton prmeter θ. The ntzton prmeters hve some nfuence to the trnng. We w go ck to ths ssue n the future. Suggeston tody: Do not set the prmeters θ equ Set the prmeters n θ rndomy

Lernng Rte C Set the ernng rte η crefuy cost Very Lrge Lrge Just mke sm No. of prmeters updtes Error Surfce

Lernng Rte C Set the ernng rte η crefuy Toy Exmpe x w z y z y * w Trnng Dt ( exmpes) x = [.,.5,.,.5,.,.5, 3., 3.5, 4., 4.5, 5., 5.5, 6., 6.5, 7., 7.5, 8., 8.5, 9., 9.5] y = [.,.4,.9,.6,.,.5,.8, 3.5, 3.9, 4.7, 5., 5.3, 6.3, 6.5, 6.7, 7.5, 8., 8.5, 8.9, 9.5]

Lernng Rte Toy Exmpe Error Surfce: C(w,) C strt trget

. Lernng Rte Toy Exmpe Dfferent ernng rte η.. ~ 3k updtes ~.3k updtes

Stochstc Grdent Descent nd Mn-tch Grdent Descent Stochstc Grdent Descent C r r C R C r C Pck n exmpe x r If exmpe x r hve equ protes to e pcked r r r C R C E Fster! Better! r r r y x f R C ˆ ; r r R C

Stochstc Grdent Descent nd Mn-tch When usng stochstc grdent descent C Strtng t θ Trnng Dt: R R r r y x y x y x y x ˆ,, ˆ,, ˆ,, ˆ, C pck x pck x pck x r r r r r C pck x R R R R R C pck x R R R C Seen the exmpes once One epoch Wht s epoch?

Stochstc Grdent Descent nd Mn-tch Toy Exmpe Grdent Descent Updte fter seeng exmpes See exmpes Stochstc Grdent Descent If there re exmpes, updte tmes n one epoch. See ony one exmpe epoch

Stochstc Grdent Descent nd Mn-tch Grdent Descent C r C C Stochstc Grdent Descent Pck n exmpe x r Shuffe your dt Mn Btch Grdent Descent Pck B exmpes s C r r C tch B s tch sze B R x r Averge the grdent of the exmpes n the tch r

Stochstc Grdent Descent nd Mn-tch Hndwrtng Dgt Cssfcton Btch sze = Grdent Descent

Stochstc Grdent Descent nd Mn-tch Why mn-tch s fster thn stochstc grdent descent? Stochstc Grdent Descent z = W x z = W x Mn-tch mtrx z z = W x x Prctcy, whch one s fster?

Recpe for Lernng Dt provded n homework Testng Dt Trnng Dt Vdton Re Testng x ŷ x y x y Best Functon f *

Recpe for Lernng Dt provded n homework Testng Dt Trnng Dt Vdton Re Testng x ŷ x y x y Immedtey know the ccurcy Do not know the ccurcy unt the dedne (wht rey count)

Recpe for Lernng Do I get good resuts on trnng set? no Modfy your trnng process Your code hs ug. Cn not fnd good functon Stuck t oc mnm, sdde ponts. Chnge the trnng strtegy Bd mode There s no good functon n the hypothess functon set. Proy you need gger network

Recpe for Lernng Do I get good resuts on trnng set? yes Do I get good resuts on vdton set? yes done no no Modfy your trnng process Preventng Overfttng Your code usuy do not hve ug t ths stuton.

Recpe for Lernng - Overfttng You pck est prmeter set θ * r r Trnng Dt: x, yˆ r r However, r : f x ; * Testng Dt: x u u u ˆ f x ; * Trnng dt nd testng dt hve dfferent dstruton. Trnng Dt: y Testng Dt: yˆ

Recpe for Lernng - Overfttng Pnce: Hve more trnng dt You cn do tht n re ppcton, ut you cn t do tht n homework. We w go ck to ths ssue n the future.

Concudng Remrks. Wht s the mode (functon hypothess set)? Neur Network. Wht s the est functon? Cost Functon 3. How to pck the est functon? Prmeter Intzton Lernng Rte Stochstc grdent descent, Mn-tch Recpe for Lernng Grdent Descent

Acknowedgement 感謝余朗祺同學於上課時糾正投影片上的拼字錯誤 感謝吳柏瑜同學糾正投影片上的 notton 錯誤 感謝 Yes Hung 糾正投影片上的打字錯誤