Neural Network (Basic Ideas) Hung-yi Lee

Neur Network (Bsc Ides) Hung-y Lee

Lernng Lookng for Functon Speech Recognton f Hndwrtten Recognton f Wether forecst f Py vdeo gmes f wether tody Postons nd numer of enemes 你好 sunny tomorrow fre

Frmework x : ŷ : (e) Mode Hypothess Functon Set f, f y Trnng: Pck the est Functon f * Best Functon f * Testng: f x y Trnng Dt x : functon nput ŷ : functon output x, yˆ, x, yˆ,

Outne. Wht s the mode (functon hypothess set)?. Wht s the est functon? 3. How to pck the est functon?

Tsk Consdered Tody Cssfcton Bnry Cssfcton Ony two csses nput oject Css A (yes) Css B (no) Spm fterng Is n e-m spm or not? Recommendton systems recommend the product to the customer or not? Mwre detecton Is the softwre mcous or not? Stock predcton W the future vue of stock ncrese or not?

Tsk Consdered Tody Cssfcton Bnry Cssfcton Ony two csses Mut-css Cssfcton More thn two csses nput oject Css A (yes) Css B (no) nput oject Css A Css B Css C

Mut-css Cssfcton Hndwrtng Dgt Cssfcton Input: Css:,,., 9, csses Imge Recognton Input: Css: dog, ct, ook,. Thousnds of csses

Mut-css Cssfcton Re speech recognton s not mut-css cssfcton The HW s mut-css cssfcton frme Input: Csses: h, how re you, I m sorry. Cnnot e enumerted // // /ε/ The frme eongs to whch phoneme. Csses re the phonemes.

. Wht s the mode?

Wht s the functon we re ookng for? cssfcton y = f x f: R N R M x: nput oject to e cssfed y: css Assume oth x nd y cn e represented s fxed-sze vector x s vector wth N dmensons, nd y s vector wth M dmensons

Wht s the functon we re ookng for? Hndwrtng Dgt Cssfcton f: R N R M x: mge y: css 6 x 6 Ech pxe corresponds to n eement n the vector : for nk, : otherwse 6 x 6 = 56 dmensons dmensons for dgt recognton 3 3 or not or not 3 or not

. Wht s the mode? A Lyer of Neuron

Snge Neuron z f: R N R z x x w w z Actvton functon z y x N w N s z e z

Snge Neuron f: R N R Snge neuron cn ony do nry cssfcton, cnnot hnde mut-css cssfcton x x y x N s not "" "" y y.5.5

A Lyer of Neuron f: R N R M Hndwrtng dgt cssfcton Csses:,,., 9, csses x x x N If y s the mx, then the mge s. y neurons or not y or not y 3 3 or not

. Wht s the mode? Lmtton of Snge Lyer

Lmtton of Snge Lyer x w w x z z w x w x Input Output x x No Yes Yes No x yes no threshod < threshod threshod threshod Cn we? < threshod threshod x

Lmtton of Snge Lyer No, we cn t x w w x z x x x x

Lmtton of Snge Lyer x w w x z Input Output x x No Yes Yes NOT AND AND OR No

Neur Network NOT AND Neur Network AND OR x z x z z Hdden Neurons

=.73 =.7 z x x =.7 =.5 x x =.5 =.7 z x =.7 =.73 x

x =.73 =.7 w z =.7 =.5 w x =.5 =.7 (.73,.5) x =.7 =.73 (.7,.7) (.5,.73) x

. Wht s the mode? Neur Network

Neur Network s Mode Input x f: R N R M Lyer Lyer Lyer L Output y vector x x x N Input Lyer Hdden Lyers Output Lyer Fuy connected feedforwrd network Deep Neur Network: mny hdden yers y y M vector y

Notton j Lyer nodes N j Lyer N nodes Output of neuron: Lyer Neuron Output of one yer: : vector

Notton Lyer nodes N j j w j Lyer N nodes W w j w w Lyer to Lyer from neuron j (Lyer ) to neuron (Lyer ) N w w N

Notton Lyer nodes N j j Lyer N nodes : s for neuron t yer s for neurons n yer

Notton j Lyer nodes N j w w w j z Lyer N z z z nodes : nput of the ctvton functon for neuron t yer : nput of the ctvton functon the neurons n yer w z w N j w j j

Notton - Summry :output of neuron w j : weght :output of yer W : weght mtrx z : nput of ctvton functon : s z : nput of ctvton functon for yer : s vector

Retons etween Lyer Outputs Lyer nodes N j j z z z z Lyer N nodes

Retons etween Lyer Outputs nodes N Lyer Lyer nodes N j j z z z z w w w w z z z W z w w z w w z w w z

Retons etween Lyer Outputs z z z z z nodes N Lyer Lyer nodes N j j z z z z

Retons etween Lyer Outputs z z z W z j j z z W Lyer nodes N Lyer N nodes

Functon of Neur Network Input x W, Lyer Lyer Lyer L W, W L, L Output y vector x x x N x W x W L y y M L L- L L W vector y y

Functon of Neur Network Input x W, Lyer Lyer Lyer L W, W L, L Output y vector x x y vector y x N y M y f x W L W W x L

. Wht s the est functon?

Best Functon = Best Prmeters y f L W W W x x functon set f x; W,, W, W, L L ecuse dfferent prmeters W nd ed to dfferent functon Form wy to defne functon set: prmeter set L Pck the est functon f* Pck the est prmeter set θ*

Cost Functon Defne functon for prmeter set C θ C θ evute how d prmeter set s The est prmeter set θ s the one tht mnmzes C θ θ = rg mn θ C θ C θ s ced cost/oss/error functon If you defne the goodness of the prmeter set y nother functon O θ O θ s ced ojectve functon

Cost Functon Gven trnng dt: r r R R x, yˆ x, yˆ x, yˆ Hndwrtng Dgt Cssfcton sum over trnng exmpes C R r f x ; r..4. 3 yˆ r Mnmze dstnce 3

3. How to pck the est functon? Grdent Descent

Sttement of Proems Sttement of proems: There s functon C(θ) θ represents prmeter set θ = {θ, θ, θ 3, } Fnd θ * tht mnmzes C(θ) Brute force? Enumerte posse θ Ccuus? Fnd θ * such tht C C, * *,

Grdent Descent Ide For smpfcton, frst consder tht θ hs ony one vre C Drop somewhere When the stops, we fnd the oc mnm 3

Grdent Descent Ide η s ced ernng rte For smpfcton, frst consder tht θ hs ony one vre C Rndomy strt t θ Compute θ θ η dc θ Τdθ dc θ Τdθ Compute dc θ Τdθ θ θ η dc θ Τdθ

Grdent Descent Suppose tht θ hs two vres {θ, θ } Rndomy strt t θ = θ Compute the grdents of C θ t θ : C θ = Updte prmeters θ θ θ = θ θ η C θ Τ θ C θ Τ θ C θ Τ θ C θ Τ θ θ = θ η C θ Compute the grdents of C θ t θ : C θ C θ Τ = θ C θ Τ θ

Grdent Descent θ C θ C θ Strt t poston θ θ θ Grdent Movement θ θ 3 C θ C θ 3 Compute grdent t θ Move to θ = θ - η C θ Compute grdent t θ Move to θ = θ η C θ θ

Form Dervton of Grdent Descent Suppose tht θ hs two vres {θ, θ } C(θ) Gven pont, we cn esy fnd the pont wth the smest vue nery. How?

Form Dervton of Grdent Descent Tyor seres: Let h(x) e nfntey dfferente round x = x. h x k h h k x k k! x x x hx x x x x x h! When x s cose to x h x h x h x x x

E.g. Tyor seres for h(x)=sn(x) round x =π/4 sn(x)= The pproxmton s good round π/4.

Mutvre Tyor seres,,,, y y y y x h x x x y x h y x h y x h When x nd y s cose to x nd y,,,, y y y y x h x x x y x h y x h y x h + somethng reted to (x-x ) nd (y-y ) +

Form Dervton of Grdent Descent Bsed on Tyor Seres: If the red crce s sm enough, n the red crce u C C, s C, C, C, C, C, C, v v s u, C(θ)

Form Dervton of Grdent Descent Fnd θ nd θ yedng the smest vue of C θ n the crce v u s C v u, C,, C v u, C, C Its vue dependng on the rdus of the crce, u nd v. Ths s how grdent descent updtes prmeters. Bsed on Tyor Seres: If the red crce s sm enough, n the red crce, C

Grdent Descent for Neur Network Strtng Prmeters C C ompute C C C ompute C C L L, W,, W,,, W,, W, j w C C w w w w Mons of prmeters To compute the grdents effcenty, we use ckpropgton.

Stuck t oc mnm? Sdde pont Who s Afrd of Non- Convex Loss Functons? http://vdeoectures.ne t/em7_ecun_w/ Deep Lernng: Theoretc Motvtons http://vdeoectures.ne t/deepernng5_e ngo_theoretc_motv tons/

3. How to pck the est functon? Prctc Issues for neur network

Prctc Issues for neur network Prmeter Intzton Lernng Rte Stochstc grdent descent nd Mn-tch Recpe for Lernng

Prmeter Intzton For grdent Descent, we need to pck n ntzton prmeter θ. The ntzton prmeters hve some nfuence to the trnng. We w go ck to ths ssue n the future. Suggeston tody: Do not set the prmeters θ equ Set the prmeters n θ rndomy

Lernng Rte C Set the ernng rte η crefuy cost Very Lrge Lrge Just mke sm No. of prmeters updtes Error Surfce

Lernng Rte C Set the ernng rte η crefuy Toy Exmpe x w z y z y * w Trnng Dt ( exmpes) x = [.,.5,.,.5,.,.5, 3., 3.5, 4., 4.5, 5., 5.5, 6., 6.5, 7., 7.5, 8., 8.5, 9., 9.5] y = [.,.4,.9,.6,.,.5,.8, 3.5, 3.9, 4.7, 5., 5.3, 6.3, 6.5, 6.7, 7.5, 8., 8.5, 8.9, 9.5]

Lernng Rte Toy Exmpe Error Surfce: C(w,) C strt trget

. Lernng Rte Toy Exmpe Dfferent ernng rte η.. ~ 3k updtes ~.3k updtes

Stochstc Grdent Descent nd Mn-tch Grdent Descent Stochstc Grdent Descent C r r C R C r C Pck n exmpe x r If exmpe x r hve equ protes to e pcked r r r C R C E Fster! Better! r r r y x f R C ˆ ; r r R C

Stochstc Grdent Descent nd Mn-tch When usng stochstc grdent descent C Strtng t θ Trnng Dt: R R r r y x y x y x y x ˆ,, ˆ,, ˆ,, ˆ, C pck x pck x pck x r r r r r C pck x R R R R R C pck x R R R C Seen the exmpes once One epoch Wht s epoch?

Stochstc Grdent Descent nd Mn-tch Toy Exmpe Grdent Descent Updte fter seeng exmpes See exmpes Stochstc Grdent Descent If there re exmpes, updte tmes n one epoch. See ony one exmpe epoch

Stochstc Grdent Descent nd Mn-tch Grdent Descent C r C C Stochstc Grdent Descent Pck n exmpe x r Shuffe your dt Mn Btch Grdent Descent Pck B exmpes s C r r C tch B s tch sze B R x r Averge the grdent of the exmpes n the tch r

Stochstc Grdent Descent nd Mn-tch Hndwrtng Dgt Cssfcton Btch sze = Grdent Descent

Stochstc Grdent Descent nd Mn-tch Why mn-tch s fster thn stochstc grdent descent? Stochstc Grdent Descent z = W x z = W x Mn-tch mtrx z z = W x x Prctcy, whch one s fster?

Recpe for Lernng Dt provded n homework Testng Dt Trnng Dt Vdton Re Testng x ŷ x y x y Best Functon f *

Recpe for Lernng Dt provded n homework Testng Dt Trnng Dt Vdton Re Testng x ŷ x y x y Immedtey know the ccurcy Do not know the ccurcy unt the dedne (wht rey count)

Recpe for Lernng Do I get good resuts on trnng set? no Modfy your trnng process Your code hs ug. Cn not fnd good functon Stuck t oc mnm, sdde ponts. Chnge the trnng strtegy Bd mode There s no good functon n the hypothess functon set. Proy you need gger network

Recpe for Lernng Do I get good resuts on trnng set? yes Do I get good resuts on vdton set? yes done no no Modfy your trnng process Preventng Overfttng Your code usuy do not hve ug t ths stuton.

Recpe for Lernng - Overfttng You pck est prmeter set θ * r r Trnng Dt: x, yˆ r r However, r : f x ; * Testng Dt: x u u u ˆ f x ; * Trnng dt nd testng dt hve dfferent dstruton. Trnng Dt: y Testng Dt: yˆ

Recpe for Lernng - Overfttng Pnce: Hve more trnng dt You cn do tht n re ppcton, ut you cn t do tht n homework. We w go ck to ths ssue n the future.

Concudng Remrks. Wht s the mode (functon hypothess set)? Neur Network. Wht s the est functon? Cost Functon 3. How to pck the est functon? Prmeter Intzton Lernng Rte Stochstc grdent descent, Mn-tch Recpe for Lernng Grdent Descent

Acknowedgement 感謝余朗祺同學於上課時糾正投影片上的拼字錯誤感謝吳柏瑜同學糾正投影片上的 notton 錯誤感謝 Yes Hung 糾正投影片上的打字錯誤