Eight more Classic Machine Learning algorithms
|
|
- Edith Cannon
- 6 years ago
- Views:
Transcription
1 Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these slides verbatim, or to modif them to fit our own needs. PowerPoint originals are available. If ou mae use of a significant portion of these slides in our own lecture, please include this message, or the following lin to the source repositor of Andrew s tutorials http// Comments and corrections gratefull received. Eight more Classic Machine Learning algorithms Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon Universit awm@cs.cmu.edu Copright 00, Andrew W. Moore Oct 9th, 00 8 Polnomial Regression So far we ve mainl been dealing with linear regression X X Y 7 X= = Z= =(,).. =7.. = z =(,,).. z =(,, ) = b=(z T Z) - (Z T ) est = β 0 + β + β Copright 00, Andrew W. Moore Machine Learning Favorites Slide
2 Quadratic Regression It s trivial to do linear fits of fied nonlinear basis functions X Z= X Y 7 z=(,,,,,, ) 9 X= = 6 7 =(,).. =7.. = 4 7 b=(z T Z) - (Z T ) est = β 0 + β + β + β + β 4 + β 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide Quadratic Regression It s trivial Each to do component linear fits of of a fied z vector nonlinear is called basis a term. functions X X Y X= = Each column of the Z matri is called 7 a term column Z= 7 How man terms in a quadratic regression with m inputs? constant term =(,).. =7.. m linear terms = (m+)-choose- = m(m+)/ quadratic terms (m+)-choose- terms in total = O(m b=(z ) T Z) - (Z T ) z=(,,,,,, ) est = β 0 + β + β + Note that solving b=(z T Z) - (Z T ) β is thus + βo(m 4 6 ) + β 5 7 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4
3 Q th -degree polnomial Regression X Z= X Y 7 9 X= = 7 =(,).. =7.. = 7 6 b=(z T Z) - (Z T ) z=(all products of powers of inputs in which sum of powers is q or less, ) est = β 0 + β + Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 m inputs, degree Q how man terms? = the number of unique terms of the form q q qm... where q Q m m i= = the number of unique terms of the form q0 q q qm... where q = Q m = the number of lists of non-negative integers [q 0,q,q,..q m ] in which Σq i = Q m i=0 = the number of was of placing Q red diss on a row of squares of length Q+m = (Q+m)-choose-Q i i q 0 = q = q =0 q =4 q 4 = Q=, m=4 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6
4 7 Radial Basis Functions (RBFs) X Z= X Y 7 X= = 7 =(,).. =7.. = 7 b=(z T Z) - (Z T ) z=(list of radial basis function evaluations) est = β 0 + β + Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 -d RBFs c c c est = β φ () + β φ () + β φ () where φ i () = KernelFunction( - c i / KW) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 4
5 Eample c c c est = φ () φ () + 0.5φ () where φ i () = KernelFunction( - c i / KW) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 9 RBFs with Linear Regression est = φ () φ () + 0.5φ () where All c i s are held constant (initialized randoml or on a grid in m- dimensional input space) c c c φ i () = KernelFunction( - c i / KW) KW also held constant (initialized to be large enough that there s decent overlap between basis functions* *Usuall much better than the crapp overlap on m diagram Copright 00, Andrew W. Moore Machine Learning Favorites Slide 0 5
6 RBFs with Linear Regression est = φ () φ () + 0.5φ () where All c i s are held constant (initialized randoml or on a grid in m- dimensional input space) φ i () = KernelFunction( - c i / KW) then given Q basis functions, define the matri Z such that Z j = KernelFunction( - c i / KW) where is the th vector of inputs And as before, b=(z T Z) - (Z T ) c c c KW also held constant (initialized to be large enough that there s decent overlap between basis functions* *Usuall much better than the crapp overlap on m diagram Copright 00, Andrew W. Moore Machine Learning Favorites Slide RBFs with NonLinear Regression Allow the c i s to adapt to the data (initialized randoml or on a grid in m-dimensional input space) est = φ () φ () + 0.5φ () where φ i () = KernelFunction( - c i / KW) KW allowed to adapt to the data. (Some fols even let each basis function have its own c c KW c j,permitting fine detail in dense regions of input space) But how do we now find all the β j s, c i s and KW? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6
7 RBFs with NonLinear Regression Allow the c i s to adapt to the data (initialized randoml or on a grid in m-dimensional input space) est = φ () φ () + 0.5φ () where φ i () = KernelFunction( - c i / KW) KW allowed to adapt to the data. (Some fols even let each basis function have its own c c KW c j,permitting fine detail in dense regions of input space) But how do we now find all the β j s, c i s and KW? Answer Gradient Descent Copright 00, Andrew W. Moore Machine Learning Favorites Slide RBFs with NonLinear Regression Allow the c i s to adapt to the data (initialized randoml or on a grid in m-dimensional input space) est = φ () φ () + 0.5φ () where φ i () = KernelFunction( - c i / KW) KW allowed to adapt to the data. (Some fols even let each basis function have its own c c KW c j,permitting fine detail in dense regions of input space) But how do we now find all the β j s, c i s and KW? (But I d lie to see, or hope someone s alread done, a hbrid, where the c i s and KW are updated with gradient descent while the β j s use matri inversion) Answer Gradient Descent Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4 7
8 Radial Basis Functions in -d Two inputs. Outputs (heights sticing out of page) not shown. Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 Happ RBFs in -d Blue dots denote coordinates of input vectors Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6 8
9 Blue dots denote coordinates of input vectors Crabb RBFs in -d What s the problem in this eample? Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 Blue dots denote coordinates of input vectors More crabb RBFs And what s the problem in this eample? Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 9
10 Hopeless! Even before seeing the data, ou should understand that this is a disaster! Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 9 Unhapp Even before seeing the data, ou should understand that this isn t good either.. Center Sphere of significant influence of center Copright 00, Andrew W. Moore Machine Learning Favorites Slide 0 0
11 6 Robust Regression Copright 00, Andrew W. Moore Machine Learning Favorites Slide Robust Regression This is the best fit that Quadratic Regression can manage Copright 00, Andrew W. Moore Machine Learning Favorites Slide
12 Robust Regression but this is what we d probabl prefer Copright 00, Andrew W. Moore Machine Learning Favorites Slide LOESS-based Robust Regression After the initial fit, score each datapoint according to how well it s fitted You are a ver good datapoint. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4
13 LOESS-based Robust Regression After the initial fit, score each datapoint according to how well it s fitted You are a ver good datapoint. You are not too shabb. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 LOESS-based Robust Regression But ou are pathetic. After the initial fit, score each datapoint according to how well it s fitted You are a ver good datapoint. You are not too shabb. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6
14 Robust Regression For = to R Let (, ) be the th datapoint Let est be predicted value of Let w be a weight for datapoint that is large if the datapoint fits well and small if it fits badl w = KernelFn([ - est ] ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 Then redo the regression using weighted datapoints. I taught ou how to do this in the Instancebased lecture (onl then the weights depended on distance in input-space) Guess what happens net? Robust Regression For = to R Let (, ) be the th datapoint Let est be predicted value of Let w be a weight for datapoint that is large if the datapoint fits well and small if it fits badl w = KernelFn([ - est ] ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 4
15 Then redo the regression using weighted datapoints. I taught ou how to do this in the Instancebased lecture (onl then the weights depended on distance in input-space) Repeat whole thing until converged! Robust Regression For = to R Let (, ) be the th datapoint Let est be predicted value of Let w be a weight for datapoint that is large if the datapoint fits well and small if it fits badl w = KernelFn([ - est ] ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 9 Robust Regression---what we re doing What regular regression does Assume was originall generated using the following recipe = β 0 + β + β +N(0,σ ) Computational tas is to find the Maimum Lielihood β 0, β and β Copright 00, Andrew W. Moore Machine Learning Favorites Slide 0 5
16 Robust Regression---what we re doing What LOESS robust regression does Assume was originall generated using the following recipe With probabilit p = β 0 + β + β +N(0,σ ) But otherwise ~ N(µ,σ huge ) Computational tas is to find the Maimum Lielihood β 0, β, β, p, µ and σ huge Copright 00, Andrew W. Moore Machine Learning Favorites Slide Robust Regression---what we re doing What LOESS robust regression does Assume was originall generated using the following recipe With probabilit p = β 0 + β + β +N(0,σ ) But otherwise ~ N(µ,σ huge ) Computational tas is to find the Maimum Lielihood β 0, β, β, p, µ and σ huge Msteriousl, the reweighting procedure does this computation for us. Your first glimpse of two spectacular letters E.M. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6
17 5 Regression Trees Decision trees for regression Copright 00, Andrew W. Moore Machine Learning Favorites Slide A regression tree leaf Predict age = 47 Mean age of records matching this leaf node Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4 7
18 A one-split regression tree Gender? Female Predict age = 9 Male Predict age = 6 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 Choosing the attribute to split on Gender Rich? Num. Children Num. Bean Babies Age Female No 8 Male No Male Yes We can t use information gain. What should we use? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6 8
19 Choosing the attribute to split on Male Male MSE(Y X) = The epected squared error if we must predict a record s Y value given onl nowledge of the record s X value If we re told =j, the smallest epected error comes from predicting the mean of the Y-values among those records in which =j. Call this mean quantit µ =j Then Gender Female Rich? No No Yes Num. Children 0 0 Num. Bean Babies 0 5+ MSE( Y X ) = R N X Age ( µ j= ( such that = j) = j ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 Choosing the attribute to split on Male Male MSE(Y X) = The epected squared error if we must predict a record s Y value given onl nowledge of the record s X value If we re told =j, the smallest epected error comes from predicting the mean of the Y-values among those records in which =j. Call this mean quantit µ =j Then Gender Female Rich? No No Num. Children 0 Num. Bean Babies 0 Age Regression tree attribute selection 8 greedil choose the attribute that minimizes MSE(Y X) YesGuess 0what we 5+ do about real-valued 7 inputs? MSE( Y X ) = R N X 4 Guess how we prevent overfitting ( µ j= ( such that = j) = j ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 9
20 Pruning Decision propert-owner = Yes Female Gender? Male Do I deserve to live? Predict age = 9 # propert-owning females = 567 Mean age among POFs = 9 Age std dev among POFs = Predict age = 6 # propert-owning males = Mean age among POMs = 6 Age std dev among POMs =.5 Use a standard Chi-squared test of the nullhpothesis these two populations have the same mean and Bob s our uncle. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 9 Linear Regression Trees propert-owner = Yes Female Gender? Male Also nown as Model Trees Predict age = * NumChildren - * YearsEducation Leaves contain linear functions (trained using linear regression on all records matching that leaf) Predict age = * NumChildren -.5 * YearsEducation Split attribute chosen to minimize MSE of regressed children. Pruning with a different Chisquared Copright 00, Andrew W. Moore Machine Learning Favorites Slide 40 0
21 Linear Regression Trees propert-owner = Yes Predict age = Female * NumChildren - * YearsEducation Leaves contain linear functions (trained using linear regression on all records matching that leaf) Gender? Male Predict age = Also nown as Model Trees * NumChildren -.5 * YearsEducation Detail You tpicall ignore an categorical attribute that has been tested on higher up in the tree during the regression. But use all untested attributes, and use real -valued attributes even if the ve been tested above Split attribute chosen to minimize MSE of regressed children. Pruning with a different Chisquared Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4 Test our understanding Assuming regular regression trees, can ou setch a graph of the fitted function est( ) over this diagram? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4
22 Test our understanding Assuming linear regression trees, can ou setch a graph of the fitted function est( ) over this diagram? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 4 4 GMDH (c.f. BACON, AIM) Group Method Data Handling A ver simple but ver good idea. Do linear regression. Use cross-validation to discover whether an quadratic term is good. If so, add it as a basis function and loop.. Use cross-validation to discover whether an of a set of familiar functions (log, ep, sin etc) applied to an previous basis function helps. If so, add it as a basis function and loop. 4. Else stop Copright 00, Andrew W. Moore Machine Learning Favorites Slide 44
23 GMDH (c.f. BACON, AIM) Group Method Data Handling A ver simple but ver good idea. Do Tpical linear learned regression function. Use age cross-validation est = height -. sqrt(weight) to discover + whether an quadratic term 4. is income good. / If (cos so, (NumCars)) add it as a basis function and loop.. Use cross-validation to discover whether an of a set of familiar functions (log, ep, sin etc) applied to an previous basis function helps. If so, add it as a basis function and loop. 4. Else stop Copright 00, Andrew W. Moore Machine Learning Favorites Slide 45 Cascade Correlation A super-fast form of Neural Networ learning Begins with 0 hidden units Incrementall adds hidden units, one b one, doing ingeniousl little recomputation between each unit addition Copright 00, Andrew W. Moore Machine Learning Favorites Slide 46
24 Begin with a regular dataset Nonstandard notation X (i) is the i th attribute (i) is the value of the i th attribute in the th record Cascade beginning X (0) (0) (0) X () () () X (m) (m) (m) Y Copright 00, Andrew W. Moore Machine Learning Favorites Slide 47 Begin with a regular dataset Find weights w (0) i to best fit Y. I.E. to minimize Cascade first step R = ( (0) ) where (0) = m j= w (0) j ( j) X (0) (0) (0) X () () () X (m) (m) (m) Y Copright 00, Andrew W. Moore Machine Learning Favorites Slide 48 4
25 Consider our errors Begin with a regular dataset Find weights w (0) i to best fit Y. I.E. to minimize R = ( (0) ) where (0) = m j= w (0) j ( j) Definee (0) = (0) X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 49 Create a hidden unit Find weights u (0) i to define a new basis function H (0) () of the inputs. Mae it specialize in predicting the errors in our original fit Find {u (0) i } to maimize correlation between H (0) () and E (0) where H (0) () m (0) = g u j j= ( j) X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) H (0) h (0) h (0) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 50 5
26 Cascade net step Find weights w () i p () 0 to better fit Y. I.E. to minimize R m () () (0) ( j) (0) (0) ( ) where = w j + pj h = j= X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) H (0) h (0) h (0) Y () () () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 Now loo at new errors Find weights w () i p () 0 to better fit Y. Definee () = () X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) H (0) h (0) h (0) Y () () () E () e () e () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 6
27 Create net hidden unit Find weights u () i v() 0 to define a new basis function H () () of the inputs. Mae it specialize in predicting the errors in our original fit Find {u () i, v() 0 } to maimize correlation between H () () and E () where m () () ( j) () H ( ) = g u j + v0 h j= X (0) (0) (0) X () () () X (m) (m) (m) Y Y (0) (0) (0) E (0) e (0) e (0) H (0) h (0) h (0) Y () () () E () e () e () H () h () h () (0) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 5 Cascade n th step Find weights w (n) i p (n) j to better fit Y. I.E. to minimize R = ( ( n) ) where ( n) = m j= w ( n) j ( j) + n j= p ( n) j h ( j) X (0) X () X (m) Y Y (0) E (0) H (0) Y () E () H () Y (n) (0) () (m) (0) e (0) h (0) () e () h () (n) (0) () (m) (0) e (0) h (0) () e () h () (n) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 54 7
28 Now loo at new errors Find weights w (n) i p (n) j to better fit Y. I.E. to minimize Definee ( n) = ( n) X (0) X () X (m) Y Y (0) E (0) H (0) Y () E () H () Y (n) E (n) (0) () (m) (0) e (0) h (0) () e () h () (n) e (n) (0) () (m) (0) e (0) h (0) () e () h () (n) e (n) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 55 Create n th hidden unit Find weights u (n) i v(n) i to define a new basis function H (n) () of the inputs. Mae it specialize in predicting the errors in our previous fit Find {u (n) i, v(n) j } to maimize correlation between H(n) () and E (n) where m n ( n) ( n) ( j) ( n) ( j) H ( ) = g u j + vj h j= j= X (0) X () X (m) Y Y (0) E (0) H (0) Y () E () H () Y (n) E (n) H (n) (0) () (m) (0) e (0) h (0) () e () h () (n) e (n) h (n) (0) () (m) (0) e (0) h (0) () e () h () (n) e (n) h (n) Continue until satisfied with fit Copright 00, Andrew W. Moore Machine Learning Favorites Slide 56 8
29 Visualizing first iteration Copright 00, Andrew W. Moore Machine Learning Favorites Slide 57 Visualizing second iteration Copright 00, Andrew W. Moore Machine Learning Favorites Slide 58 9
30 Visualizing third iteration Copright 00, Andrew W. Moore Machine Learning Favorites Slide 59 Eample Cascade Correlation for Classification Copright 00, Andrew W. Moore Machine Learning Favorites Slide 60 0
31 Training two spirals Steps -6 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6 Training two spirals Steps - Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6
32 If ou lie Cascade Correlation See Also Projection Pursuit In which ou add together man non-linear nonparametric scalar functions of carefull chosen directions Each direction is chosen to maimize error-reduction from the best scalar function ADA-Boost An additive combination of regression trees in which the n+ th tree learns the error of the n th tree Copright 00, Andrew W. Moore Machine Learning Favorites Slide 6 Multilinear Interpolation Consider this dataset. Suppose we wanted to create a continuous and piecewise linear fit to the data Copright 00, Andrew W. Moore Machine Learning Favorites Slide 64
33 Multilinear Interpolation Create a set of not points selected X-coordinates (usuall equall spaced) that cover the data q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 65 Multilinear Interpolation We are going to assume the data was generated b a nois version of a function that can onl bend at the nots. Here are eamples (none fits the data well) q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 66
34 How to find the best fit? Idea Simpl perform a separate regression in each segment for each part of the curve What s the problem with this idea? q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 67 How to find the best fit? Let s loo at what goes on in the red segment ( q ) w ( q ) w est ( ) = h + h where w = q q h h q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 68 4
35 How to find the best fit? In the red segment est ( ) = hf ( ) + hf ( ) q q where f ( ) =, f ( ) = w w h h φ () q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 69 How to find the best fit? In the red segment est ( ) = hf ( ) + hf ( ) q q where f ( ) =, f ( ) = w w h h φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 70 5
36 How to find the best fit? In the red segment est ( ) = hf ( ) + hf ( ) q q where f ( ) =, f ( ) = w w h h φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 How to find the best fit? In the red segment est ( ) = hf ( ) + hf ( ) q q where f ( ) =, f ( ) = w w h h φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 6
37 How to find the best fit? In general where f i est ( ) N = K i= qi ( ) = w 0 h f ( ) i i if q < w otherwise i h h φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 7 How to find the best fit? In general where f i est ( ) N = K i= qi ( ) = w 0 h f ( ) i i if q < w otherwise i h And this is simpl a basis function regression problem! h We now how to find the least squares h ii s! φ () q q q q 4 q 5 φ () Copright 00, Andrew W. Moore Machine Learning Favorites Slide 74 7
38 In two dimensions Blue dots show locations of input vectors (outputs not depicted) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 75 Blue dots show locations of input vectors (outputs not depicted) In two dimensions Each purple dot is a not point. It will contain the height of the estimated surface Copright 00, Andrew W. Moore Machine Learning Favorites Slide 76 8
39 Blue dots show locations of input vectors (outputs not depicted) In two dimensions Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 77 Blue To dots predict show the locations value of here input vectors (outputs not depicted) In two dimensions Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 78 9
40 Blue dots show locations of input vectors (outputs not depicted) To predict the value here In two dimensions First interpolate its value on two opposite edges Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 79 Blue dots show locations of input vectors (outputs not depicted) To predict the value here First interpolate its value on two opposite edges Then interpolate between those two values In two dimensions Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 80 40
41 Blue dots show locations of input vectors (outputs not depicted) To predict the value here First interpolate its value on two opposite edges Then interpolate between those two values In two dimensions Notes 8 Each purple dot is a not point. It will contain the height of the estimated surface But how do we do the interpolation to ensure that the surface is continuous? 7. This can easil be generalized to m dimensions. It should be eas to see that it ensures continuit The patches are not linear Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 Doing the regression Given data, how do we find the optimal not heights? Happil, it s simpl a twodimensional basis function problem. (Woring out the basis functions is tedious, unilluminating, and eas) What s the problem in higher dimensions? Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 4
42 MARS Multivariate Adaptive Regression Splines Invented b Jerr Friedman (one of Andrew s heroes) Simplest version Let s assume the function we are learning is of the following form est () = m = g ( Instead of a linear combination of the inputs, it s a linear combination of non-linear functions of individual inputs ) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 8 MARS est () = m = g ( Instead of a linear combination of the inputs, it s a linear combination of non-linear functions of individual inputs ) Idea Each g is one of these q q q q 4 q 5 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 84 4
43 MARS est () = m = g ( Instead of a linear combination of the inputs, it s a linear combination of non-linear functions of individual inputs ) est where f () = j q q q q 4 q 5 m N K = j= h j q j ( ) = w 0 f j ( Copright 00, Andrew W. Moore Machine Learning Favorites Slide 85 ) if q j < w otherwise q j The location of the j th not in the th dimension h j The regressed height of the j th not in the th dimension w The spacing between nots in the th dimension That s not complicated enough! Oa, now let s get serious. We ll allow arbitrar two-wa interactions est () = m = g ( ) + m m = t= + g t (, ) t The function we re learning is allowed to be a sum of non-linear functions over all one-d and -d subsets of attributes Can still be epressed as a linear combination of basis functions Thus learnable b linear regression Full MARS Uses cross-validation to choose a subset of subspaces, not resolution and other parameters. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 86 4
44 If ou lie MARS See also CMAC (Cerebellar Model Articulated Controller) b James Albus (another of Andrew s heroes) Man of the same gut-level intuitions But entirel in a neural-networ, biologicall plausible wa (All the low dimensional functions are b means of looup tables, trained with a deltarule and using a clever blurred update and hash-tables) Copright 00, Andrew W. Moore Machine Learning Favorites Slide 87 Where are we now? Inputs Inference Engine Learn P(E E ) Joint DE, Baes Net Structure Learning Inputs Classifier Predict categor Dec Tree, Sigmoid Perceptron, Sigmoid N.Net, Gauss/Joint BC, Gauss Naïve BC, N.Neigh, Baes Net Based BC, Cascade Correlation Inputs Densit Estimator Probabilit Joint DE, Naïve DE, Gauss/Joint DE, Gauss Naïve DE, Baes Net Structure Learning Inputs Regressor Predict real no. Linear Regression, Polnomial Regression, Perceptron, Neural Net, N.Neigh, Kernel, LWR, RBFs, Robust Regression, Cascade Correlation, Regression Trees, GMDH, Multilinear Interp, MARS Copright 00, Andrew W. Moore Machine Learning Favorites Slide 88 44
45 What You Should Know For each of the eight methods ou should be able to summarize briefl what the do and outline how the wor. You should understand them well enough that given access to the notes ou can quicl reunderstand them at a moments notice But ou don t have to memorize all the details In the right contet an one of these eight might end up being reall useful to ou one da! You should be able to recognize this when it occurs. Copright 00, Andrew W. Moore Machine Learning Favorites Slide 89 Citations Radial Basis Functions T. Poggio and F. Girosi, Regularization Algorithms for Learning That Are Equivalent to Multilaer Networs, Science, 47, , 989 LOESS W. S. Cleveland, Robust Locall Weighted Regression and Smoothing Scatterplots, Journal of the American Statistical Association, 74, 68, 89-86, December, 979 GMDH etc http// P. Langle and G. L. Bradshaw and H. A. Simon, Rediscovering Chemistr with the BACON Sstem, Machine Learning An Artificial Intelligence Approach, R. S. Michalsi and J. G. Carbonnell and T. M. Mitchell, Morgan Kaufmann, 98 Regression Trees etc L. Breiman and J. H. Friedman and R. A. Olshen and C. J. Stone, Classification and Regression Trees, Wadsworth, 984 J. R. Quinlan, Combining Instance-Based and Model-Based Learning, Machine Learning Proceedings of the Tenth International Conference, 99 Cascade Correlation etc S. E. Fahlman and C. Lebiere. The cascadecorrelation learning architecture. Technical Report CMU-CS-90-00, School of Computer Science, Carnegie Mellon Universit, Pittsburgh, PA, 990. http//citeseer.nj.nec.com/fahlman9cascadec orrelation.html J. H. Friedman and W. Stuetzle, Projection Pursuit Regression, Journal of the American Statistical Association, 76, 76, December, 98 MARS J. H. Friedman, Multivariate Adaptive Regression Splines, Department for Statistics, Stanford Universit, 988, Technical Report No. 0 Copright 00, Andrew W. Moore Machine Learning Favorites Slide 90 45
Cross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationProbability Densities in Data Mining
Probabilit Densities in Data Mining Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these
More informationFeedforward Neural Networks
Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationOverview. IAML: Linear Regression. Examples of regression problems. The Regression Problem
3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples
More informationVC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More informationVC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationAndrew W. Moore Professor School of Computer Science Carnegie Mellon University
Spport Vector Machines Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce material sefl in giving yor own lectres. Feel free to se these slides verbatim, or
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationCS599 Lecture 2 Function Approximation in RL
CS599 Lecture 2 Function Approximation in RL Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA)
More informationCSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?
CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to
More informationAlgorithms for Playing and Solving games*
Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete
More informationProbabilistic and Bayesian Analytics
Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationChapter 8: Generalization and Function Approximation
Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationCSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?
CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationHTF: Ch4 B: Ch4. Linear Classifiers. R Greiner Cmput 466/551
HTF: Ch4 B: Ch4 Linear Classifiers R Greiner Cmput 466/55 Outline Framework Eact Minimize Mistakes Perceptron Training Matri inversion LMS Logistic Regression Ma Likelihood Estimation MLE of P Gradient
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationLinear Discriminant Functions
Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationInformation Gain. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
te to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationBayesian Approach 2. CSC412 Probabilistic Learning & Reasoning
CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationBiostatistics in Research Practice - Regression I
Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationBayesian Networks: Independencies and Inference
Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationLearning Decision Trees
Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2
More informationAE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)
1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the
More informationPolynomial approximation and Splines
Polnomial approimation and Splines 1. Weierstrass approimation theorem The basic question we ll look at toda is how to approimate a complicated function f() with a simpler function P () f() P () for eample,
More informationLecture 13 - Handling Nonlinearity
Lecture 3 - Handling Nonlinearit Nonlinearit issues in control practice Setpoint scheduling/feedforward path planning repla - linear interpolation Nonlinear maps B-splines Multivariable interpolation:
More information) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.
1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions
More informationLecture 6: Backpropagation
Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationSingle- Parameter Linear Regression
Predctng Real-valued outputs an ntroducton to Regresson Note to other teachers and users of these sldes. Andrew would be delghted f ou found ths source materal useful n gvng our own lectures. Feel free
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationMethods for Advanced Mathematics (C3) Coursework Numerical Methods
Woodhouse College 0 Page Introduction... 3 Terminolog... 3 Activit... 4 Wh use numerical methods?... Change of sign... Activit... 6 Interval Bisection... 7 Decimal Search... 8 Coursework Requirements on
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationCS540 ANSWER SHEET
CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More information1 GSW Gaussian Elimination
Gaussian elimination is probabl the simplest technique for solving a set of simultaneous linear equations, such as: = A x + A x + A x +... + A x,,,, n n = A x + A x + A x +... + A x,,,, n n... m = Am,x
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationLESSON 35: EIGENVALUES AND EIGENVECTORS APRIL 21, (1) We might also write v as v. Both notations refer to a vector.
LESSON 5: EIGENVALUES AND EIGENVECTORS APRIL 2, 27 In this contet, a vector is a column matri E Note 2 v 2, v 4 5 6 () We might also write v as v Both notations refer to a vector (2) A vector can be man
More informationLearning in State-Space Reinforcement Learning CIS 32
Learning in State-Space Reinforcement Learning CIS 32 Functionalia Syllabus Updated: MIDTERM and REVIEW moved up one day. MIDTERM: Everything through Evolutionary Agents. HW 2 Out - DUE Sunday before the
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More information1.2 Relations. 20 Relations and Functions
0 Relations and Functions. Relations From one point of view, all of Precalculus can be thought of as studing sets of points in the plane. With the Cartesian Plane now fresh in our memor we can discuss
More informationGreen s Theorem Jeremy Orloff
Green s Theorem Jerem Orloff Line integrals and Green s theorem. Vector Fields Vector notation. In 8.4 we will mostl use the notation (v) = (a, b) for vectors. The other common notation (v) = ai + bj runs
More informationSupport Vector Machines
Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationMultilayer Neural Networks
Pattern Recognition Multilaer Neural Networs Lecture 4 Prof. Daniel Yeung School of Computer Science and Engineering South China Universit of Technolog Outline Introduction (6.) Artificial Neural Networ
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationUnit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents
Unit NOTES Honors Common Core Math Da : Properties of Eponents Warm-Up: Before we begin toda s lesson, how much do ou remember about eponents? Use epanded form to write the rules for the eponents. OBJECTIVE
More informationAffine transformations
Reading Optional reading: Affine transformations Brian Curless CSE 557 Autumn 207 Angel and Shreiner: 3., 3.7-3. Marschner and Shirle: 2.3, 2.4.-2.4.4, 6..-6..4, 6.2., 6.3 Further reading: Angel, the rest
More informationFINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE
FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationf(x) = 2x 2 + 2x - 4
4-1 Graphing Quadratic Functions What You ll Learn Scan the tet under the Now heading. List two things ou will learn about in the lesson. 1. Active Vocabular 2. New Vocabular Label each bo with the terms
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More information15. Eigenvalues, Eigenvectors
5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationSection 7.3: Parabolas, from College Algebra: Corrected Edition by Carl Stitz, Ph.D. and Jeff Zeager, Ph.D. is available under a Creative Commons
Section 7.: Parabolas, from College Algebra: Corrected Edition b Carl Stitz, Ph.D. and Jeff Zeager, Ph.D. is available under a Creative Commons Attribution-NonCommercial-ShareAlike.0 license. 0, Carl Stitz.
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning
CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More information