Eight more Classic Machine Learning algorithms

Size: px
Start display at page:

Download "Eight more Classic Machine Learning algorithms"

Transcription

1 Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these slides verbatim, or to modif them to fit our own needs. PowerPoint originals are available. If ou mae use of a significant portion of these slides in our own lecture, please include this message, or the following lin to the source repositor of Andrew s tutorials http// Comments and corrections gratefull received. Eight more Classic Machine Learning algorithms Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon Universit awm@cs.cmu.edu Copright 00, Andrew W. Moore Oct 9th, 00 8 Polnomial Regression So far we ve mainl been dealing with linear regression X X Y 7 X= = Z= =(,).. =7.. = z =(,,).. z =(,, ) = b=(z T Z) - (Z T ) est = β 0 + β + β Copright 00, Andrew W. Moore Machine Learning Favorites Slide

2 Quadratic Regression It s trivial to do linear fits of fied nonlinear basis functions X Z= X Y 7 z=(,,,,,, ) 9 X= = 6 7 =(,).. =7.. = 4 7 b=(z T Z) - (Z T ) est = β 0 + β + β + β + β 4 + β 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide Quadratic Regression It s trivial Each to do component linear fits of of a fied z vector nonlinear is called basis a term. functions X X Y X= = Each column of the Z matri is called 7 a term column Z= 7 How man terms in a quadratic regression with m inputs? constant term =(,).. =7.. m linear terms = (m+)-choose- = m(m+)/ quadratic terms (m+)-choose- terms in total = O(m b=(z ) T Z) - (Z T ) z=(,,,,,, ) est = β 0 + β + β + Note that solving b=(z T Z) - (Z T ) β is thus + βo(m 4 6 ) + β 5 7 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4

3 Q th -degree polnomial Regression X Z= X Y 7 9 X= = 7 =(,).. =7.. = 7 6 b=(z T Z) - (Z T ) z=(all products of powers of inputs in which sum of powers is q or less, ) est = β 0 + β + Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 m inputs, degree Q how man terms? = the number of unique terms of the form q q qm... where q Q m m i= = the number of unique terms of the form q0 q q qm... where q = Q m = the number of lists of non-negative integers [q 0,q,q,..q m ] in which Σq i = Q m i=0 = the number of was of placing Q red diss on a row of squares of length Q+m = (Q+m)-choose-Q i i q 0 = q = q =0 q =4 q 4 = Q=, m=4 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6

4 7 Radial Basis Functions (RBFs) X Z= X Y 7 X= = 7 =(,).. =7.. = 7 b=(z T Z) - (Z T ) z=(list of radial basis function evaluations) est = β 0 + β + Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 -d RBFs c c c est = β φ () + β φ () + β φ () where φ i () = KernelFunction( - c i / KW) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 4

5 Eample c c c est = φ () φ () + 0.5φ () where φ i () = KernelFunction( - c i / KW) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 9 RBFs with Linear Regression est = φ () φ () + 0.5φ () where All c i s are held constant (initialized randoml or on a grid in m- dimensional input space) c c c φ i () = KernelFunction( - c i / KW) KW also held constant (initialized to be large enough that there s decent overlap between basis functions* *Usuall much better than the crapp overlap on m diagram Copright 00, Andrew W. Moore Machine Learning Favorites Slide 0 5

6 RBFs with Linear Regression est = φ () φ () + 0.5φ () where All c i s are held constant (initialized randoml or on a grid in m- dimensional input space) φ i () = KernelFunction( - c i / KW) then given Q basis functions, define the matri Z such that Z j = KernelFunction( - c i / KW) where is the th vector of inputs And as before, b=(z T Z) - (Z T ) c c c KW also held constant (initialized to be large enough that there s decent overlap between basis functions* *Usuall much better than the crapp overlap on m diagram Copright 00, Andrew W. Moore Machine Learning Favorites Slide RBFs with NonLinear Regression Allow the c i s to adapt to the data (initialized randoml or on a grid in m-dimensional input space) est = φ () φ () + 0.5φ () where φ i () = KernelFunction( - c i / KW) KW allowed to adapt to the data. (Some fols even let each basis function have its own c c KW c j,permitting fine detail in dense regions of input space) But how do we now find all the β j s, c i s and KW? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6

7 RBFs with NonLinear Regression Allow the c i s to adapt to the data (initialized randoml or on a grid in m-dimensional input space) est = φ () φ () + 0.5φ () where φ i () = KernelFunction( - c i / KW) KW allowed to adapt to the data. (Some fols even let each basis function have its own c c KW c j,permitting fine detail in dense regions of input space) But how do we now find all the β j s, c i s and KW? Answer Gradient Descent Copright 00, Andrew W. Moore Machine Learning Favorites Slide RBFs with NonLinear Regression Allow the c i s to adapt to the data (initialized randoml or on a grid in m-dimensional input space) est = φ () φ () + 0.5φ () where φ i () = KernelFunction( - c i / KW) KW allowed to adapt to the data. (Some fols even let each basis function have its own c c KW c j,permitting fine detail in dense regions of input space) But how do we now find all the β j s, c i s and KW? (But I d lie to see, or hope someone s alread done, a hbrid, where the c i s and KW are updated with gradient descent while the β j s use matri inversion) Answer Gradient Descent Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4 7

8 Radial Basis Functions in -d Two inputs. Outputs (heights sticing out of page) not shown. Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 Happ RBFs in -d Blue dots denote coordinates of input vectors Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6 8

9 Blue dots denote coordinates of input vectors Crabb RBFs in -d What s the problem in this eample? Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 Blue dots denote coordinates of input vectors More crabb RBFs And what s the problem in this eample? Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 9

10 Hopeless! Even before seeing the data, ou should understand that this is a disaster! Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 9 Unhapp Even before seeing the data, ou should understand that this isn t good either.. Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 0 0

11 6 Robust Regression Copright 00, Andrew W. Moore Machine Learning Favorites Slide Robust Regression This is the best fit that Quadratic Regression can manage Copright 00, Andrew W. Moore Machine Learning Favorites Slide

12 Robust Regression but this is what we d probabl prefer Copright 00, Andrew W. Moore Machine Learning Favorites Slide LOESS-based Robust Regression After the initial fit, score each datapoint according to how well it s fitted You are a ver good datapoint. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4

13 LOESS-based Robust Regression After the initial fit, score each datapoint according to how well it s fitted You are a ver good datapoint. You are not too shabb. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 LOESS-based Robust Regression But ou are pathetic. After the initial fit, score each datapoint according to how well it s fitted You are a ver good datapoint. You are not too shabb. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6

14 Robust Regression For = to R Let (, ) be the th datapoint Let est be predicted value of Let w be a weight for datapoint that is large if the datapoint fits well and small if it fits badl w = KernelFn([ - est ] ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 Then redo the regression using weighted datapoints. I taught ou how to do this in the Instancebased lecture (onl then the weights depended on distance in input-space) Guess what happens net? Robust Regression For = to R Let (, ) be the th datapoint Let est be predicted value of Let w be a weight for datapoint that is large if the datapoint fits well and small if it fits badl w = KernelFn([ - est ] ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 4

15 Then redo the regression using weighted datapoints. I taught ou how to do this in the Instancebased lecture (onl then the weights depended on distance in input-space) Repeat whole thing until converged! Robust Regression For = to R Let (, ) be the th datapoint Let est be predicted value of Let w be a weight for datapoint that is large if the datapoint fits well and small if it fits badl w = KernelFn([ - est ] ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 9 Robust Regression---what we re doing What regular regression does Assume was originall generated using the following recipe = β 0 + β + β +N(0,σ ) Computational tas is to find the Maimum Lielihood β 0, β and β Copright 00, Andrew W. Moore Machine Learning Favorites Slide 0 5

16 Robust Regression---what we re doing What LOESS robust regression does Assume was originall generated using the following recipe With probabilit p = β 0 + β + β +N(0,σ ) But otherwise ~ N(µ,σ huge ) Computational tas is to find the Maimum Lielihood β 0, β, β, p, µ and σ huge Copright 00, Andrew W. Moore Machine Learning Favorites Slide Robust Regression---what we re doing What LOESS robust regression does Assume was originall generated using the following recipe With probabilit p = β 0 + β + β +N(0,σ ) But otherwise ~ N(µ,σ huge ) Computational tas is to find the Maimum Lielihood β 0, β, β, p, µ and σ huge Msteriousl, the reweighting procedure does this computation for us. Your first glimpse of two spectacular letters E.M. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6

17 5 Regression Trees Decision trees for regression Copright 00, Andrew W. Moore Machine Learning Favorites Slide A regression tree leaf Predict age = 47 Mean age of records matching this leaf node Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4 7

18 A one-split regression tree Gender? Female Predict age = 9 Male Predict age = 6 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 Choosing the attribute to split on Gender Rich? Num. Children Num. Bean Babies Age Female No 8 Male No Male Yes We can t use information gain. What should we use? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6 8

19 Choosing the attribute to split on Male Male MSE(Y X) = The epected squared error if we must predict a record s Y value given onl nowledge of the record s X value If we re told =j, the smallest epected error comes from predicting the mean of the Y-values among those records in which =j. Call this mean quantit µ =j Then Gender Female Rich? No No Yes Num. Children 0 0 Num. Bean Babies 0 5+ MSE( Y X ) = R N X Age ( µ j= ( such that = j) = j ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 Choosing the attribute to split on Male Male MSE(Y X) = The epected squared error if we must predict a record s Y value given onl nowledge of the record s X value If we re told =j, the smallest epected error comes from predicting the mean of the Y-values among those records in which =j. Call this mean quantit µ =j Then Gender Female Rich? No No Num. Children 0 Num. Bean Babies 0 Age Regression tree attribute selection 8 greedil choose the attribute that minimizes MSE(Y X) YesGuess 0what we 5+ do about real-valued 7 inputs? MSE( Y X ) = R N X 4 Guess how we prevent overfitting ( µ j= ( such that = j) = j ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 9

20 Pruning Decision propert-owner = Yes Female Gender? Male Do I deserve to live? Predict age = 9 # propert-owning females = 567 Mean age among POFs = 9 Age std dev among POFs = Predict age = 6 # propert-owning males = Mean age among POMs = 6 Age std dev among POMs =.5 Use a standard Chi-squared test of the nullhpothesis these two populations have the same mean and Bob s our uncle. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 9 Linear Regression Trees propert-owner = Yes Female Gender? Male Also nown as Model Trees Predict age = * NumChildren - * YearsEducation Leaves contain linear functions (trained using linear regression on all records matching that leaf) Predict age = * NumChildren -.5 * YearsEducation Split attribute chosen to minimize MSE of regressed children. Pruning with a different Chisquared Copright 00, Andrew W. Moore Machine Learning Favorites Slide 40 0

21 Linear Regression Trees propert-owner = Yes Predict age = Female * NumChildren - * YearsEducation Leaves contain linear functions (trained using linear regression on all records matching that leaf) Gender? Male Predict age = Also nown as Model Trees * NumChildren -.5 * YearsEducation Detail You tpicall ignore an categorical attribute that has been tested on higher up in the tree during the regression. But use all untested attributes, and use real -valued attributes even if the ve been tested above Split attribute chosen to minimize MSE of regressed children. Pruning with a different Chisquared Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4 Test our understanding Assuming regular regression trees, can ou setch a graph of the fitted function est( ) over this diagram? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4

22 Test our understanding Assuming linear regression trees, can ou setch a graph of the fitted function est( ) over this diagram? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4 4 GMDH (c.f. BACON, AIM) Group Method Data Handling A ver simple but ver good idea. Do linear regression. Use cross-validation to discover whether an quadratic term is good. If so, add it as a basis function and loop.. Use cross-validation to discover whether an of a set of familiar functions (log, ep, sin etc) applied to an previous basis function helps. If so, add it as a basis function and loop. 4. Else stop Copright 00, Andrew W. Moore Machine Learning Favorites Slide 44

23 GMDH (c.f. BACON, AIM) Group Method Data Handling A ver simple but ver good idea. Do Tpical linear learned regression function. Use age cross-validation est = height -. sqrt(weight) to discover + whether an quadratic term 4. is income good. / If (cos so, (NumCars)) add it as a basis function and loop.. Use cross-validation to discover whether an of a set of familiar functions (log, ep, sin etc) applied to an previous basis function helps. If so, add it as a basis function and loop. 4. Else stop Copright 00, Andrew W. Moore Machine Learning Favorites Slide 45 Cascade Correlation A super-fast form of Neural Networ learning Begins with 0 hidden units Incrementall adds hidden units, one b one, doing ingeniousl little recomputation between each unit addition Copright 00, Andrew W. Moore Machine Learning Favorites Slide 46

24 Begin with a regular dataset Nonstandard notation X (i) is the i th attribute (i) is the value of the i th attribute in the th record Cascade beginning X (0) (0) (0) X () () () X (m) (m) (m) Y Copright 00, Andrew W. Moore Machine Learning Favorites Slide 47 Begin with a regular dataset Find weights w (0) i to best fit Y. I.E. to minimize Cascade first step R = ( (0) ) where (0) = m j= w (0) j ( j) X (0) (0) (0) X () () () X (m) (m) (m) Y Copright 00, Andrew W. Moore Machine Learning Favorites Slide 48 4

25 Consider our errors Begin with a regular dataset Find weights w (0) i to best fit Y. I.E. to minimize R = ( (0) ) where (0) = m j= w (0) j ( j) Definee (0) = (0) X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 49 Create a hidden unit Find weights u (0) i to define a new basis function H (0) () of the inputs. Mae it specialize in predicting the errors in our original fit Find {u (0) i } to maimize correlation between H (0) () and E (0) where H (0) () m (0) = g u j j= ( j) X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) H (0) h (0) h (0) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 50 5

26 Cascade net step Find weights w () i p () 0 to better fit Y. I.E. to minimize R m () () (0) ( j) (0) (0) ( ) where = w j + pj h = j= X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) H (0) h (0) h (0) Y () () () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 Now loo at new errors Find weights w () i p () 0 to better fit Y. Definee () = () X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) H (0) h (0) h (0) Y () () () E () e () e () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 6

27 Create net hidden unit Find weights u () i v() 0 to define a new basis function H () () of the inputs. Mae it specialize in predicting the errors in our original fit Find {u () i, v() 0 } to maimize correlation between H () () and E () where m () () ( j) () H ( ) = g u j + v0 h j= X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) H (0) h (0) h (0) Y () () () E () e () e () H () h () h () (0) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 Cascade n th step Find weights w (n) i p (n) j to better fit Y. I.E. to minimize R = ( ( n) ) where ( n) = m j= w ( n) j ( j) + n j= p ( n) j h ( j) X (0) X () X (m) Y Y (0) E (0) H (0) Y () E () H () Y (n) (0) () (m) (0) e (0) h (0) () e () h () (n) (0) () (m) (0) e (0) h (0) () e () h () (n) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 54 7

28 Now loo at new errors Find weights w (n) i p (n) j to better fit Y. I.E. to minimize Definee ( n) = ( n) X (0) X () X (m) Y Y (0) E (0) H (0) Y () E () H () Y (n) E (n) (0) () (m) (0) e (0) h (0) () e () h () (n) e (n) (0) () (m) (0) e (0) h (0) () e () h () (n) e (n) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 55 Create n th hidden unit Find weights u (n) i v(n) i to define a new basis function H (n) () of the inputs. Mae it specialize in predicting the errors in our previous fit Find {u (n) i, v(n) j } to maimize correlation between H(n) () and E (n) where m n ( n) ( n) ( j) ( n) ( j) H ( ) = g u j + vj h j= j= X (0) X () X (m) Y Y (0) E (0) H (0) Y () E () H () Y (n) E (n) H (n) (0) () (m) (0) e (0) h (0) () e () h () (n) e (n) h (n) (0) () (m) (0) e (0) h (0) () e () h () (n) e (n) h (n) Continue until satisfied with fit Copright 00, Andrew W. Moore Machine Learning Favorites Slide 56 8

29 Visualizing first iteration Copright 00, Andrew W. Moore Machine Learning Favorites Slide 57 Visualizing second iteration Copright 00, Andrew W. Moore Machine Learning Favorites Slide 58 9

30 Visualizing third iteration Copright 00, Andrew W. Moore Machine Learning Favorites Slide 59 Eample Cascade Correlation for Classification Copright 00, Andrew W. Moore Machine Learning Favorites Slide 60 0

31 Training two spirals Steps -6 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6 Training two spirals Steps - Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6

32 If ou lie Cascade Correlation See Also Projection Pursuit In which ou add together man non-linear nonparametric scalar functions of carefull chosen directions Each direction is chosen to maimize error-reduction from the best scalar function ADA-Boost An additive combination of regression trees in which the n+ th tree learns the error of the n th tree Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6 Multilinear Interpolation Consider this dataset. Suppose we wanted to create a continuous and piecewise linear fit to the data Copright 00, Andrew W. Moore Machine Learning Favorites Slide 64

33 Multilinear Interpolation Create a set of not points selected X-coordinates (usuall equall spaced) that cover the data q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 65 Multilinear Interpolation We are going to assume the data was generated b a nois version of a function that can onl bend at the nots. Here are eamples (none fits the data well) q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 66

34 How to find the best fit? Idea Simpl perform a separate regression in each segment for each part of the curve What s the problem with this idea? q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 67 How to find the best fit? Let s loo at what goes on in the red segment ( q ) w ( q ) w est ( ) = h + h where w = q q h h q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 68 4

35 How to find the best fit? In the red segment est ( ) = hf ( ) + hf ( ) q q where f ( ) =, f ( ) = w w h h φ () q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 69 How to find the best fit? In the red segment est ( ) = hf ( ) + hf ( ) q q where f ( ) =, f ( ) = w w h h φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 70 5

36 How to find the best fit? In the red segment est ( ) = hf ( ) + hf ( ) q q where f ( ) =, f ( ) = w w h h φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 How to find the best fit? In the red segment est ( ) = hf ( ) + hf ( ) q q where f ( ) =, f ( ) = w w h h φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 6

37 How to find the best fit? In general where f i est ( ) N = K i= qi ( ) = w 0 h f ( ) i i if q < w otherwise i h h φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 How to find the best fit? In general where f i est ( ) N = K i= qi ( ) = w 0 h f ( ) i i if q < w otherwise i h And this is simpl a basis function regression problem! h We now how to find the least squares h ii s! φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 74 7

38 In two dimensions Blue dots show locations of input vectors (outputs not depicted) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 75 Blue dots show locations of input vectors (outputs not depicted) In two dimensions Each purple dot is a not point. It will contain the height of the estimated surface Copright 00, Andrew W. Moore Machine Learning Favorites Slide 76 8

39 Blue dots show locations of input vectors (outputs not depicted) In two dimensions Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 77 Blue To dots predict show the locations value of here input vectors (outputs not depicted) In two dimensions Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 78 9

40 Blue dots show locations of input vectors (outputs not depicted) To predict the value here In two dimensions First interpolate its value on two opposite edges Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 79 Blue dots show locations of input vectors (outputs not depicted) To predict the value here First interpolate its value on two opposite edges Then interpolate between those two values In two dimensions Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 80 40

41 Blue dots show locations of input vectors (outputs not depicted) To predict the value here First interpolate its value on two opposite edges Then interpolate between those two values In two dimensions Notes 8 Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? 7. This can easil be generalized to m dimensions. It should be eas to see that it ensures continuit The patches are not linear Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 Doing the regression Given data, how do we find the optimal not heights? Happil, it s simpl a twodimensional basis function problem. (Woring out the basis functions is tedious, unilluminating, and eas) What s the problem in higher dimensions? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 4

42 MARS Multivariate Adaptive Regression Splines Invented b Jerr Friedman (one of Andrew s heroes) Simplest version Let s assume the function we are learning is of the following form est () = m = g ( Instead of a linear combination of the inputs, it s a linear combination of non-linear functions of individual inputs ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 MARS est () = m = g ( Instead of a linear combination of the inputs, it s a linear combination of non-linear functions of individual inputs ) Idea Each g is one of these q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 84 4

43 MARS est () = m = g ( Instead of a linear combination of the inputs, it s a linear combination of non-linear functions of individual inputs ) est where f () = j q q q q 4 q 5 m N K = j= h j q j ( ) = w 0 f j ( Copright 00, Andrew W. Moore Machine Learning Favorites Slide 85 ) if q j < w otherwise q j The location of the j th not in the th dimension h j The regressed height of the j th not in the th dimension w The spacing between nots in the th dimension That s not complicated enough! Oa, now let s get serious. We ll allow arbitrar two-wa interactions est () = m = g ( ) + m m = t= + g t (, ) t The function we re learning is allowed to be a sum of non-linear functions over all one-d and -d subsets of attributes Can still be epressed as a linear combination of basis functions Thus learnable b linear regression Full MARS Uses cross-validation to choose a subset of subspaces, not resolution and other parameters. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 86 4

44 If ou lie MARS See also CMAC (Cerebellar Model Articulated Controller) b James Albus (another of Andrew s heroes) Man of the same gut-level intuitions But entirel in a neural-networ, biologicall plausible wa (All the low dimensional functions are b means of looup tables, trained with a deltarule and using a clever blurred update and hash-tables) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 87 Where are we now? Inputs Inference Engine Learn P(E E ) Joint DE, Baes Net Structure Learning Inputs Classifier Predict categor Dec Tree, Sigmoid Perceptron, Sigmoid N.Net, Gauss/Joint BC, Gauss Naïve BC, N.Neigh, Baes Net Based BC, Cascade Correlation Inputs Densit Estimator Probabilit Joint DE, Naïve DE, Gauss/Joint DE, Gauss Naïve DE, Baes Net Structure Learning Inputs Regressor Predict real no. Linear Regression, Polnomial Regression, Perceptron, Neural Net, N.Neigh, Kernel, LWR, RBFs, Robust Regression, Cascade Correlation, Regression Trees, GMDH, Multilinear Interp, MARS Copright 00, Andrew W. Moore Machine Learning Favorites Slide 88 44

45 What You Should Know For each of the eight methods ou should be able to summarize briefl what the do and outline how the wor. You should understand them well enough that given access to the notes ou can quicl reunderstand them at a moments notice But ou don t have to memorize all the details In the right contet an one of these eight might end up being reall useful to ou one da! You should be able to recognize this when it occurs. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 89 Citations Radial Basis Functions T. Poggio and F. Girosi, Regularization Algorithms for Learning That Are Equivalent to Multilaer Networs, Science, 47, , 989 LOESS W. S. Cleveland, Robust Locall Weighted Regression and Smoothing Scatterplots, Journal of the American Statistical Association, 74, 68, 89-86, December, 979 GMDH etc http// P. Langle and G. L. Bradshaw and H. A. Simon, Rediscovering Chemistr with the BACON Sstem, Machine Learning An Artificial Intelligence Approach, R. S. Michalsi and J. G. Carbonnell and T. M. Mitchell, Morgan Kaufmann, 98 Regression Trees etc L. Breiman and J. H. Friedman and R. A. Olshen and C. J. Stone, Classification and Regression Trees, Wadsworth, 984 J. R. Quinlan, Combining Instance-Based and Model-Based Learning, Machine Learning Proceedings of the Tenth International Conference, 99 Cascade Correlation etc S. E. Fahlman and C. Lebiere. The cascadecorrelation learning architecture. Technical Report CMU-CS-90-00, School of Computer Science, Carnegie Mellon Universit, Pittsburgh, PA, 990. http//citeseer.nj.nec.com/fahlman9cascadec orrelation.html J. H. Friedman and W. Stuetzle, Projection Pursuit Regression, Journal of the American Statistical Association, 76, 76, December, 98 MARS J. H. Friedman, Multivariate Adaptive Regression Splines, Department for Statistics, Stanford Universit, 988, Technical Report No. 0 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 90 45

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

Probability Densities in Data Mining

Probability Densities in Data Mining Probabilit Densities in Data Mining Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these

More information

Feedforward Neural Networks

Feedforward Neural Networks Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem 3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples

More information

VC-dimension for characterizing classifiers

VC-dimension for characterizing classifiers VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to

More information

VC-dimension for characterizing classifiers

VC-dimension for characterizing classifiers VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to

More information

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Andrew W. Moore Professor School of Computer Science Carnegie Mellon University

Andrew W. Moore Professor School of Computer Science Carnegie Mellon University Spport Vector Machines Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce material sefl in giving yor own lectres. Feel free to se these slides verbatim, or

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

CS599 Lecture 2 Function Approximation in RL

CS599 Lecture 2 Function Approximation in RL CS599 Lecture 2 Function Approximation in RL Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA)

More information

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating? CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to

More information

Algorithms for Playing and Solving games*

Algorithms for Playing and Solving games* Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete

More information

Probabilistic and Bayesian Analytics

Probabilistic and Bayesian Analytics Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Chapter 8: Generalization and Function Approximation

Chapter 8: Generalization and Function Approximation Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating? CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

HTF: Ch4 B: Ch4. Linear Classifiers. R Greiner Cmput 466/551

HTF: Ch4 B: Ch4. Linear Classifiers. R Greiner Cmput 466/551 HTF: Ch4 B: Ch4 Linear Classifiers R Greiner Cmput 466/55 Outline Framework Eact Minimize Mistakes Perceptron Training Matri inversion LMS Logistic Regression Ma Likelihood Estimation MLE of P Gradient

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Linear Discriminant Functions

Linear Discriminant Functions Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Information Gain. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Information Gain. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. te to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

Biostatistics in Research Practice - Regression I

Biostatistics in Research Practice - Regression I Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Bayesian Networks: Independencies and Inference

Bayesian Networks: Independencies and Inference Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2

More information

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p) 1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the

More information

Polynomial approximation and Splines

Polynomial approximation and Splines Polnomial approimation and Splines 1. Weierstrass approimation theorem The basic question we ll look at toda is how to approimate a complicated function f() with a simpler function P () f() P () for eample,

More information

Lecture 13 - Handling Nonlinearity

Lecture 13 - Handling Nonlinearity Lecture 3 - Handling Nonlinearit Nonlinearit issues in control practice Setpoint scheduling/feedforward path planning repla - linear interpolation Nonlinear maps B-splines Multivariable interpolation:

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

Lecture 6: Backpropagation

Lecture 6: Backpropagation Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

Single- Parameter Linear Regression

Single- Parameter Linear Regression Predctng Real-valued outputs an ntroducton to Regresson Note to other teachers and users of these sldes. Andrew would be delghted f ou found ths source materal useful n gvng our own lectures. Feel free

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Methods for Advanced Mathematics (C3) Coursework Numerical Methods

Methods for Advanced Mathematics (C3) Coursework Numerical Methods Woodhouse College 0 Page Introduction... 3 Terminolog... 3 Activit... 4 Wh use numerical methods?... Change of sign... Activit... 6 Interval Bisection... 7 Decimal Search... 8 Coursework Requirements on

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

CS540 ANSWER SHEET

CS540 ANSWER SHEET CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

1 GSW Gaussian Elimination

1 GSW Gaussian Elimination Gaussian elimination is probabl the simplest technique for solving a set of simultaneous linear equations, such as: = A x + A x + A x +... + A x,,,, n n = A x + A x + A x +... + A x,,,, n n... m = Am,x

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

LESSON 35: EIGENVALUES AND EIGENVECTORS APRIL 21, (1) We might also write v as v. Both notations refer to a vector.

LESSON 35: EIGENVALUES AND EIGENVECTORS APRIL 21, (1) We might also write v as v. Both notations refer to a vector. LESSON 5: EIGENVALUES AND EIGENVECTORS APRIL 2, 27 In this contet, a vector is a column matri E Note 2 v 2, v 4 5 6 () We might also write v as v Both notations refer to a vector (2) A vector can be man

More information

Learning in State-Space Reinforcement Learning CIS 32

Learning in State-Space Reinforcement Learning CIS 32 Learning in State-Space Reinforcement Learning CIS 32 Functionalia Syllabus Updated: MIDTERM and REVIEW moved up one day. MIDTERM: Everything through Evolutionary Agents. HW 2 Out - DUE Sunday before the

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

1.2 Relations. 20 Relations and Functions

1.2 Relations. 20 Relations and Functions 0 Relations and Functions. Relations From one point of view, all of Precalculus can be thought of as studing sets of points in the plane. With the Cartesian Plane now fresh in our memor we can discuss

More information

Green s Theorem Jeremy Orloff

Green s Theorem Jeremy Orloff Green s Theorem Jerem Orloff Line integrals and Green s theorem. Vector Fields Vector notation. In 8.4 we will mostl use the notation (v) = (a, b) for vectors. The other common notation (v) = ai + bj runs

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Multilayer Neural Networks

Multilayer Neural Networks Pattern Recognition Multilaer Neural Networs Lecture 4 Prof. Daniel Yeung School of Computer Science and Engineering South China Universit of Technolog Outline Introduction (6.) Artificial Neural Networ

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Unit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents

Unit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents Unit NOTES Honors Common Core Math Da : Properties of Eponents Warm-Up: Before we begin toda s lesson, how much do ou remember about eponents? Use epanded form to write the rules for the eponents. OBJECTIVE

More information

Affine transformations

Affine transformations Reading Optional reading: Affine transformations Brian Curless CSE 557 Autumn 207 Angel and Shreiner: 3., 3.7-3. Marschner and Shirle: 2.3, 2.4.-2.4.4, 6..-6..4, 6.2., 6.3 Further reading: Angel, the rest

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

The Perceptron. Volker Tresp Summer 2016

The Perceptron. Volker Tresp Summer 2016 The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

f(x) = 2x 2 + 2x - 4

f(x) = 2x 2 + 2x - 4 4-1 Graphing Quadratic Functions What You ll Learn Scan the tet under the Now heading. List two things ou will learn about in the lesson. 1. Active Vocabular 2. New Vocabular Label each bo with the terms

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

15. Eigenvalues, Eigenvectors

15. Eigenvalues, Eigenvectors 5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Section 7.3: Parabolas, from College Algebra: Corrected Edition by Carl Stitz, Ph.D. and Jeff Zeager, Ph.D. is available under a Creative Commons

Section 7.3: Parabolas, from College Algebra: Corrected Edition by Carl Stitz, Ph.D. and Jeff Zeager, Ph.D. is available under a Creative Commons Section 7.: Parabolas, from College Algebra: Corrected Edition b Carl Stitz, Ph.D. and Jeff Zeager, Ph.D. is available under a Creative Commons Attribution-NonCommercial-ShareAlike.0 license. 0, Carl Stitz.

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information