Fundamentals of Neural Networks

Size: px
Start display at page:

Download "Fundamentals of Neural Networks"

Transcription

1 Fundamentals of Neural Networks Xaodong Cu IBM T. J. Watson Research Center Yorktown Heghts, NY Fall, 2018

2 Outlne Feedforward neural networks Forward propagaton Neural networks as unversal approxmators Back propagaton Jacoban Vanshng gradent problem Generalzaton EECS 6894, Columba Unversty 2/33

3 Feedforward Neural Networks output layer h z a hdden layers x nput layer a (l) z (l) = j = h (l) ( w (l) j z(l 1) j + b (l) a (l) ) EECS 6894, Columba Unversty 3/33

4 Actvaton Functons Sgmod also known as logstc functon 1 h(a) = 1 + e a h e a (a) = (1 + e a ) 2 = h(a)[ 1 h(a) ] f df/dx EECS 6894, Columba Unversty 4/33

5 Actvaton Functons Hyperbolc Tangent (tanh) h(a) = ea e a e a + e a h 4 (a) = (e a + e a ) 2 = 1 [ h(a) ] f df/dx EECS 6894, Columba Unversty 5/33

6 Actvaton Functons Rectfed Lnear Unt (ReLU) { a f a > 0, h(a) = max(0, a) = 0 f a 0. { 1 f a > 0, h (a) = 0 f a 0 3 f df/dx EECS 6894, Columba Unversty 6/33

7 Actvaton Functons Softmax also known as normalzed exponental functon, a generalzaton of the logstc functon to multple classes. h(a k ) = ea k j ea j h(a k ) a j = h(a k) log h(a k ) Why "softmax"? ( ) log h(a k ) = h(a k ) δ kj h(a j ) a j max{x 1,, x n } log(e x e xn ) max{x 1,, x n } + log n EECS 6894, Columba Unversty 7/33

8 Loss Functons Regresson Classfcaton EECS 6894, Columba Unversty 8/33

9 Regresson: Least Square Error Let y n,k and ȳ n,k be the output and target respectvely. For regresson, the typcal loss functon s least square error (LSE): L(y, ȳ) = 1 (y n ȳ n ) 2 2 n = 1 (y nk ȳ nk ) 2 2 where n s the sample ndex and k s the dmenson ndex. The dervatve of LSE wth respect to each dmenson of each sample: n L(y, ȳ) y nk = (y nk ȳ nk ) k EECS 6894, Columba Unversty 9/33

10 Classfcaton: Cross-Entropy (CE) Suppose there are two (dscrete) probablty dstrbutons p and q, the "dstance" between the two dstrbutons can be measured by the Kullback-Lebler dvergence: D KL (p q) = k p k log p k q k For classfcaton, the targets ȳ n are gven y n = {ȳ n1,, ȳ nk,, ȳ nk } The predctons after the softmax layer are posteror probabltes after normalzaton y n = {y n1,, y nk,, y nk } Now measure the "dstance" of these two dstrbutons from all samples D KL (ȳ n y n ) = ȳ nk log ȳnk = ȳ nk log ȳ nk ȳ nk log y nk n n y k nk n k n k = ȳ nk log y nk + C n k EECS 6894, Columba Unversty 10/33

11 Classfcaton: Cross-Entropy (CE) Cross-entropy as the loss functon: L(ȳ, y) = ȳ nk log y nk n Typcally, targets are gven n the form of 1-of-K codng ȳ n = {0,, 1,, 0} where { 1 f k = kn, ȳ n = 0 otherwse It follows that the cross-entropy has an even smpler form: k L(ȳ, y) = n log y n kn EECS 6894, Columba Unversty 11/33

12 Classfcaton: Cross-Entropy (CE) dervatve of the composte functon of cross-entropy and softmax: E n a nk = L(ȳ, y) = E n = n n y n = a nk = = = ean j eanj ( ( ȳ n log y n ) = ȳ n log y n ) ȳ n log y n a nk log y n y n ȳ n = ȳ n y n log y n y n a nk y n log y n a nk ȳ n y n y n (δ k y nk ) = ȳ n δ k + ȳ n (δ k y nk ) ȳ n y nk = ȳ nk + y nk ( ȳ n, ) = y nk ȳ nk What s the dervatve for an arbtrary dfferentable functon? leave t for exercse EECS 6894, Columba Unversty 12/33

13 Algorthm 1 Defnton of a DNN wth 3 hdden layers functon create_model() MODEL local n_nputs = 460 local n_outputs = 3000 local n_hdden = 1024 local model = nn.sequental() model:add(nn.lnear(n_nputs, n_hdden)) model:add(nn.sgmod()) model:add(nn.lnear(n_hdden,n_hdden)) model:add(nn.sgmod()) model:add(nn.lnear(n_hdden,n_hdden)) model:add(nn.sgmod()) model:add(nn.lnear(n_hdden,n_outputs)) model:add(nn.logsoftmax()) LOSS FUNCTION local crteron = nn.classnllcrteron() return model:cuda(), crteron:cuda() end A DNN Example In Torch EECS 6894, Columba Unversty 13/33

14 Neural Networks As Unversal Approxmators Unversal Approxmaton Theorem[1][2][3] Let φ( ) be a nonconstant, bounded, and monotoncally-ncreasng contnuous functon. Let I n represent an n-dmensonal unt cube [0, 1] n and C(I n ) be the space of contnuous functons on I n. Then gven any functon f C(I n ) and ɛ > 0, there exst an nteger N and real constants a, b R and w R n, where = 1, 2,, N, such that one may defne N F (x) = a φ(w T x + b ) =1 as an approxmate realzaton of the functon f where f s ndependent of φ such that for all x I n. F (x) f(x) < ɛ [1] Cybenko, G. (1989). "Approxmaton by Superpostons of a Sgmodal Functon," Math. Control Sgnals Systems, 2, [2] Hornk, K., Stnchcombe, M., and Whte, H. (1989). "Multlayer Feedforward Networks are Unversal Approxmators," Neural Networks, 2(5), [3] Hornk, K. (1991). "Approxmaton Capabltes of Multlayer Feedforward Networks," Neural Networks, 4(2), EECS 6894, Columba Unversty 14/33

15 Neural Networks As Unversal Classfers Extends f(x) to decson functons of the form [1] f(x) = j, ff x P j, j = 1, 2,, K where P j partton A n nto K dsjont measurable subsets and A n s a compact subset of R n. Arbtrary decson regons can be arbtrarly well approxmated by contnuous feedforward neural networks wth only a sngle nternal, hdden layer and any contnuous sgmodal nonlnearty! [1] Cybenko, G. (1989). "Approxmaton by Superpostons of a Sgmodal Functon," Math. Control Sgnals Systems, 2, [2] Huang, W. Y. and Lppmann, R. P. (1988). "Neural Nets and Tradtonal Classfers," n Neural Informaton Processng Systems (Denver 1987), D. Z. Anderson, Edtor. Amercan Insttute of Physcs, New York, EECS 6894, Columba Unversty 15/33

16 Back Propagaton L How to compute the gradent of the loss functon wth respect to a weght w j? L k w j j wj wk D. E. Rumelhart, G. E. Hnton and R. J. Wllams (1986). "Learnng representatons by back-propagatng errors," Nature, vol. 323, EECS 6894, Columba Unversty 16/33

17 Back Propagaton a (l) z (l) = j w(l) j = h (l) ( a (l) ) j z(l 1) L w (l) j = L a (l) a (l) w (l) j It follows that a (l) w (l) = z (l) j and δ (l) L j δ s often referred to as errors. By the chan rule, a (l) δ (l) = L a (l) = k L a (l+1) k a (l+1) k a (l) = k δ (l+1) a (l+1) k k a (l) EECS 6894, Columba Unversty 17/33

18 Back Propagaton a (l+1) k = w (l+1) k h(a (l) ) therefore a (l+1) k a (l) = w (l+1) k h (a (l) ) It follows that δ (l) = h (a (l) ) k w (l+1) k δ (l+1) k whch tells us that the errors can be evaluated recursvely. EECS 6894, Columba Unversty 18/33

19 A neural network wth multple layers Softmax output layer Cross-entropy loss functon Back Propagaton: An Example Forward propagaton: push the nput x n through the network to get the actvatons to the hdden layers ) a (l) = j w (l) j z(l 1) j, z (l) ( = h (l) a (l) Back propagaton: start from the output layer δ (L) k = y nk ȳ nk back-propagate errors from the output layer all the way down to the nput layer δ (l) = h (a (l) ) k w (l+1) k δ (l+1) k evaluate gradents L w (l) j = δ (l) z (l) j EECS 6894, Columba Unversty 19/33

20 Gradent Checkng In practce, when you mplement the gradent of a network or a module, one way to check the correctness of your mplementaton s va the followng gradent checkng by a (symmetrc) fnte dfference: L = L(w j + ɛ) L(w j ɛ) w j 2ɛ check the "closeness" of the two gradents (your own mplementaton and the one from the above dfference) g 1 g 2 g 1 + g 2 EECS 6894, Columba Unversty 20/33

21 Jacoban Matrx y k x measure the senstvty of outputs wth respect to nputs of a (sub-)network can be used as a modular operaton for error backprop n a larger network EECS 6894, Columba Unversty 21/33

22 Jacoban Matrx The Jacoban matrx can be computed usng a smlar back-propagaton procedure. y k x = l y k a l a l x where a l are the actvaton functons havng mmedate connectons wth the nputs x. y k x = l Analogous to the error propagaton, defne and t can be computed recursvely δ kl = y k a l = j y k a j a j a l = j δ kl y k a l where j sweeps all the connectons w jl wth l. w l y k a l y k a j [ wjl h (a l ) ] = j δ kj [ wjl h (a l ) ] What s the Jacoban matrx of softmax outputs to the nputs? leave t as an exercse EECS 6894, Columba Unversty 22/33

23 Vanshng Gradents functon nonlnearty gradent sgmod x = 1 L L 1+e a a = x(1 x) x tanh x = ea e a e a +e a L a = (1 x2 ) L x softsgn x = a 1+ a L a = (1 x )2 L x What s the problem??? x 1 L a L x Nonlnearty causes vanshng gradents. EECS 6894, Columba Unversty 23/33

24 Generalzaton Statstcal Learnng Theory Statstcal learnng theory refers to the process of nferrng general rules by machnes from the observed samples. It attempts to answer the followng questons n terms of learnng Whch learnng tasks can be performed by machnes n general? What knd of assumptons do we have to make such that machne learnng can be successful? What are the key propertes a learnng algorthm needs to satsfy n order to be successful? Whch performance guarantees can we gve on the results of certan learnng algorthms? [1] U. v. Luxburg and B. Scholkopf, "Statstcal learnng theory: models, concepts, and results," arxv: v1, EECS 6894, Columba Unversty 24/33

25 Formulaton of Supervsed Learnng Suppose X s the nput space and Y s the output (label) space, the learnng s to estmate a mappng functon f between the two spaces f : X Y More specfcally, we choose f from a functon space F. In order to estmate f, we have access to a set of tranng samples (x 1, y 1),, (x, y ),, (x n, y n) X Y whch are ndependently drawn from the underlyng jont dstrbuton p(x, y) on X Y. A loss functon l s defned on X Y to measure the goodness of f l(x, y, f(x)) On top of that, a rsk functon R(f) s defned to measure the average loss over the underlyng data dstrbuton p(x, y): R(f) = E p[l(x, y, f(x))] = p(x, y)l(x, y, f(x))dx and we pck the mnmzer of the rsk f F = argmn R(f). f F EECS 6894, Columba Unversty 25/33

26 Formulaton of Supervsed Learnng If the functon space F ncludes all the functons, then the Bayesan rsk s the mnmal rsk we can ever acheve. Assumng we know the underlyng dstrbuton p(x, y), we can compute ts condtonal dstrbuton p(y x) from whch we can then compute the Bayes classfer f Bayes (x) = argmax p(y x). y Y In the formulaton of the supervsed learnng, we make no assumptons on the underlyng dstrbuton p(x, y). It can be any dstrbuton on X Y. However, we assume p(x, y) s fxed but unknown to us at the tme of learnng. In practce, we deal wth emprcal rsk and pck ts mnmzer R emp (f) = 1 n l(x, y, f(x )) n =1 f n = argmn R emp (f). f F EECS 6894, Columba Unversty 26/33

27 The Bas-Varance Trade-off Fall f Bayes ff F fn [ ] [ ] R(f n ) R(f Bayes ) = R(f n ) R(f F ) + R(f F ) R(f Bayes ) EECS 6894, Columba Unversty 27/33

28 Generalzaton and Consstency Let (x, y ) be an nfnte sequence of tranng samples whch have been drawn ndependently from some underlyng dstrbuton p(x, y). Let l be a loss functon. For each n, let f n be a classfer constructed by some learnng algorthm on the bass of the frst n tranng samples. 1. the learnng algorthm s called consstent wth respect to F and p f the rsk R(f n ) converges n probablty to the rsk R(f F ) of the best classfer n F, that s for all ɛ > 0 P (R(f n ) R(f F ) > ɛ) 0 as n 2. the learnng algorthm s called Bayes-consstent wth respect to F and p f the rsk R(f n ) converges n probablty to the rsk R(f Bayes ) of the Bayes classfer, that s for all ɛ > 0 P (R(f n ) R(f Bayes ) > ɛ) 0 as n 3. the learnng algorthm s called unversally consstent wth respect to F f t s consstent wth respect to F for all dstrbutons p. 4. the learnng algorthm s called unversally Bayes-consstent f t s Bayes-consstent for all dstrbutons p. EECS 6894, Columba Unversty 28/33

29 Emprcal Rsk Mnmzaton where f n = argmn R emp (f). f F R emp (f) = 1 n l(x, y, f(x )) n =1 As n changes, we are actually dealng wth a sequence of classfers {f n }. We hope that R(f n ) s consstent wth respect to R(f F ): R(f n ) R(f F ), when n where R(f F ) s the best rsk we can acheve gven F. EECS 6894, Columba Unversty 29/33

30 An Overfttng Example Suppose the data space s X = [0, 1], the underlyng dstrbuton on X s unform, the label y for nput x s defned as follows [1]: y = { 1 f x < f x 0.5 Obvously, we have R(f Bayes ) = 0. Suppose we observe a set of samples (x, y ), = 1,, n, and construct the classfer below f n(x) = { y f x = x, = 1,, n 1 otherwse The constructed classfer f n perfectly classfes all tranng samples, whch mnmzes the emprcal rsk and drves t to 0. Suppose we draw test samples from the underlyng dstrbuton and assume they are not dentcal to the tranng samples then the f n constructed above smply predct every sample wth label 1 whch s wrong on half of the test samples. 1 2 = R(fn) R(f Bayes) = 0 as n The classfer f n s not consstent. Obvously the classfer does not learn anythng from the tranng samples other than memorze them. [1] U. v. Luxburg and B. Scholkopf, "Statstcal learnng theory: models, concepts, and results," arxv: v1, EECS 6894, Columba Unversty 30/33

31 Unform Convergence of Emprcal Rsk For learnng consstency wth respect to F: R(f n) R(f F ) R(f n) R emp(f n) + R emp(f F ) R(f F ) The Chernoff-Hoeffdng nequalty: ( ) 1 n P ξ E(ξ) n ɛ 2 exp( 2nɛ 2 ) =1 It follows that P ( R emp(f) R(f) ɛ) 2 exp( 2nɛ 2 ) whch shows that for any fxed functon and suffcently large number of samples, t s hghly probable that the tranng error provdes a good estmate of the test error. Theorem (Vapnk & Chervonenks) Unform convergence ( P sup f F ) R(f) R emp(f) > ɛ 0 as n for all ɛ > 0 s a necessary and suffcent condton for consstency of emprcal rsk mnmzaton wth respect to F. EECS 6894, Columba Unversty 31/33

32 Capacty of Functon Spaces What knd of property should a functon space F have to ensure such unform convergence? ( ) Larger F gves rse to larger P sup f F R(f) R emp(f) > ɛ, whch makes t harder to ensure a unform convergence. Ths leads to the concept of the capacty of the functon space F. Unform Convergence Bounds: ( P sup f F ) R(f) R emp(f) ) > ɛ 2N(F, 2n) exp( nɛ 2 ) The quantty N(F, n) s referred to as the shatterng coeffcent of the functon class F wth respect to sample sze n, whch s also known as the growth functon. It measures the number of ways that the functon space can separate the patterns nto two classes. It measures the sze" of a functon space by countng the effectve number of functons gven a sample sze n. Theorem (Vapnk & Chervonenks) 1 log N(F, n) 0 n s a necessary and suffcent condton for consstency of emprcal rsk mnmzaton on F. EECS 6894, Columba Unversty 32/33

33 Generalzaton Bounds Gven δ > 0, wth a probablty of 1 δ, any functon f F satsfes R(f) R emp (f) + 1 ( ) log (2N(F, n)) log δ n Or R(f) R emp (f) + C + log 1 δ n happens wth probablty at least 1 δ, where C s a constant representng the complexty of the functon space F. Generalzaton bounds are typcally wrtten n the followng form for any f F. Other common capacty measures VC dmenson Rademacher complexty R(f) R emp (f) + capacty(f) + confdence(δ) EECS 6894, Columba Unversty 33/33

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester 0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Multi-layer neural networks

Multi-layer neural networks Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent

More information

Logistic Classifier CISC 5800 Professor Daniel Leeds

Logistic Classifier CISC 5800 Professor Daniel Leeds lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Engineering Risk Benefit Analysis

Engineering Risk Benefit Analysis Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Fuzzy Systems (2/2) Francesco Masulli

Fuzzy Systems (2/2) Francesco Masulli Learnng n FBF/ANFIS Networks Fuzzy Systems (/) Francesco Masull DIBRIS - Unversty of Genova, ITALY & S.H.R.O. - Sbarro Insttute for Cancer Research and Molecular Medcne Temple Unversty, Phladelpha, PA,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

Introduction to the Introduction to Artificial Neural Network

Introduction to the Introduction to Artificial Neural Network Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Nonlinear Classifiers II

Nonlinear Classifiers II Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

Lecture 4: Constant Time SVD Approximation

Lecture 4: Constant Time SVD Approximation Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),

More information

Why feed-forward networks are in a bad shape

Why feed-forward networks are in a bad shape Why feed-forward networks are n a bad shape Patrck van der Smagt, Gerd Hrznger Insttute of Robotcs and System Dynamcs German Aerospace Center (DLR Oberpfaffenhofen) 82230 Wesslng, GERMANY emal smagt@dlr.de

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Solutions Homework 4 March 5, 2018

Solutions Homework 4 March 5, 2018 1 Solutons Homework 4 March 5, 018 Soluton to Exercse 5.1.8: Let a IR be a translaton and c > 0 be a re-scalng. ˆb1 (cx + a) cx n + a (cx 1 + a) c x n x 1 cˆb 1 (x), whch shows ˆb 1 s locaton nvarant and

More information

Training Convolutional Neural Networks

Training Convolutional Neural Networks Tranng Convolutonal Neural Networks Carlo Tomas November 26, 208 The Soft-Max Smplex Neural networks are typcally desgned to compute real-valued functons y = h(x) : R d R e of ther nput x When a classfer

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Statistical machine learning and its application to neonatal seizure detection

Statistical machine learning and its application to neonatal seizure detection 19/Oct/2009 Statstcal machne learnng and ts applcaton to neonatal sezure detecton Presented by Andry Temko Department of Electrcal and Electronc Engneerng Page 2 of 42 A. Temko, Statstcal Machne Learnng

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Chapter 7 Channel Capacity and Coding

Chapter 7 Channel Capacity and Coding Wreless Informaton Transmsson System Lab. Chapter 7 Channel Capacty and Codng Insttute of Communcatons Engneerng atonal Sun Yat-sen Unversty Contents 7. Channel models and channel capacty 7.. Channel models

More information

Video Data Analysis. Video Data Analysis, B-IT

Video Data Analysis. Video Data Analysis, B-IT Lecture Vdeo Data Analyss Deformable Snakes Segmentaton Neural networks Lecture plan:. Segmentaton by morphologcal watershed. Deformable snakes 3. Segmentaton va classfcaton of patterns 4. Concept of a

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17 Neural Networks Perceptrons and Backpropagaton Slke Bussen-Heyen Unverstät Bremen Fachberech 3 5th of Novemeber 2012 Neural Networks 1 / 17 Contents 1 Introducton 2 Unts 3 Network structure 4 Snglelayer

More information

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

CS294A Lecture notes. Andrew Ng

CS294A Lecture notes. Andrew Ng CS294A Lecture notes Andrew Ng Sparse autoencoder 1 Introducton Supervsed learnng s one of the most powerful tools of AI, and has led to automatc zp code recognton, speech recognton, self-drvng cars, and

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

Computational and Statistical Learning theory Assignment 4

Computational and Statistical Learning theory Assignment 4 Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}

More information