Statistical machine learning and kernel methods. Primary references: John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis
|
|
- Domenic Nichols
- 5 years ago
- Views:
Transcription
1 Part 5 (MA 751) Statistical machine learning and kernel methods Primary references: John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis Christopher Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2, (1998).
2 Other references: Aronszajn ß Theory of reproducing kernels. Transactions of the American Mathematical Society, 686, , Felipe Cucker and Steve Smale, On the mathematical foundations of learning. Bulletin of the American Mathematical Society, Teo Evgeniou, Massimo Pontil and Tomaso Poggio, Regularization Networks and Support Vector Machines Advances in Computational Mathematics, 2000.
3 1. Linear functionals Given a vector space Z, we define a map 0 À Z Ä from Z to the real numbers to be a functional. If 0 is linear, i.e., if for real +ß, we have then we say 0Ð+ x, yñ œ +0ÐxÑ,0ÐyÑß 0 is a linear functional. If Z is an inner product space (so each v has a length mvm), we say that 0 is bounded if l0ðxñl Ÿ Gmxm for some number G! and all x \.
4 Reproducing kernel Hilbert spaces 2. Reproducing Kernel Hilbert spaces: Def. 1. A 8 8matrix Q is symmetric if Q34 œq43 for all 3ß4Þ A symmetric Q is positive if all of its eigenvalues are nonnegative.
5 Reproducing kernel Hilbert spaces Equivalently Q is positive if X ØaßQaÙ a Q a! Ô + " + # for all vectors a œ Ö Ù, with Ø, Ù the standard inner ã Õ+ Ø 8 product on 8. Above a X œð+ " ßáß+ 8Ñis the transpose of a.
6 Reproducing kernel Hilbert spaces Definition 2: Let \ : be compact (i.e., a closed bounded subset). A (real) reproducing kernel Hilbert space (RKHS) [ on \ is a Hilbert space of functions on \ (i.e., a complete collection of functions which is closed under addition and scalar mult, and for which an inner product is defined)þ [ also needs the property: for any fixed x \, the evaluation functional x À [ Ä defined by bounded linear functional on [. x Ð0Ñ œ 0ÐxÑ
7 Reproducing kernel Hilbert spaces Definition 3: We define a kernel to be a function OÀ\ \Ä which is symmetric, i.e., OÐxy ß Ñ œ OÐyx ß Ñ for xy ß \. We say that O is positive if for any fixed collection Öx ßáß x \ " 8, the 8 8matrix OœÐO34Ñ OÐx3ßx4Ñ is positive (i.e., non-negative).
8 Kernel existence We now have the reason these are called RKHS: Theorem 1: Given a reproducing kernel Hilbert space [ of functions on \., there exists a unique symmetric positive kernel function OÐxy ß Ñ such that for all 0 [ ß 0ÐxÑ œ Ø0Ð ÑßOÐ ßxÑÙ [ (inner product above is in the variable ; x is fixed). Note this means that evaluation of 0 at x is equivalent to taking inner product of 0 with the fixed function OÐ ßxÑ.
9 Kernel existence Proof (please look at this on your own): For any fixed x \, recall x is a bounded linear functional on [. By the Riesz Representation theorem 1 there exists a fixed function, call it OÐ Ñ x such that for all 0 [ (recall x is fixed, now 0is varying) 0ÐxÑ œ x Ð0Ñ œ Ø0Ð Ñß Ox Ð ÑÙÞ (1) # (all inner products are in [ß not in P, i.e., Ø0ß 1Ù œ Ø0ß 1Ù ). [ 1 Riesz Representation Theorem: If 9 À [ Ä is a bounded linear functional on [, there exists a unique y [ such that ax [ ß 9ÐxÑ œ ØyßxÙ.
10 Kernel existence That is, evaluation of 0 at x is equivalent to an inner product with the function. O x Define OÐxy ß Ñ œ OxÐyÑÞ Note by (1), the functions Ox Ð Ñ and OÐ Ñsatisfy y so OÐxy ß Ñ ØO Ð Ñß O Ð ÑÙ œ O ÐxÑ œ O Ðy Ñ, x y y x is symmetric.
11 Kernel existence To prove OÐxy ß Ñ is positive definite: let Öx" ßáß x8 be a fixed collection. If O 34 OÐx3ßx4Ñ, then if O œ ÐO34Ñ is a matrix Ô -" -# and c œ Ö Ùß ã Õ- Ø 8
12 Kernel existence 8 8 X ØcßOcÙ c Oc œ - - OÐx ß x Ñ œ - - ØO x Ð ÑßO x Ð ÑÙ ß4œ" 3ß4œ" œ -O 3 x Ð Ñß -O 4 x Ð Ñ œ¾ -O 3 x Ð Ñ ¾! œ" 4œ" 3œ" # [
13 Kernel existence Definition 4: We call the above kernel OÐxy ß Ñ the reproducing kernel of [. Definition 5: A Mercer kernel is a positive definite kernel OÐxy ß Ñ which is also continuous as a function of xand y and bounded. Def. 6: For a continuous function 0 on a compact set \. we define m0m œ maxl0ðxñlþ x \
14 Kernel existence Theorem 2: (i) For every Mercer kernel OÀ\ \Ä, there exists a unique Hilbert space [ of functions on \ such that O is its reproducing kernel. (ii) Moreover, this [ consists of continuous functions, and for any 0 [ m0m where Q œ max OÐxy ß Ñ Þ O xy ß \ Ÿ Q m0m O [,
15 Kernel existence Proof (please look at this on your own): Let OÐxy ß Ñ À \ \ Ä be a Mercer kernel. We will construct a reproducing kernel Hilbert space [ with reproducing kernel O as follows. Define [! œ spanöoxð Ñ x \ œ -3Ox 3 Ð Ñ À Ö x 3 3 \ is any finite subset à -3 ŸÞ 3
16 Kernel existence Now we define inner product Ø0ß1Ùfor 0ß1 Þ Assume 6 6 [! 0Ð Ñœ +O Ð Ñß 1Ð Ñœ,O Ð ÑÞ 3 x 3 3œ" 3œ" 3 3 [Note we may assume 0ß1 both use same set Ö x 3 since if not we may take a union without loss]. [Note again that here Ø ß ÙœØ ß Ù [ ] x
17 Kernel existence Then define ØOxÐ Ñß OyÐ ÑÙ œ OÐxy ß Ñ 6 6 Ø0Ð Ñß 1Ð ÑÙ œ + 3OÐx3ß Ñß, 4OÐx4ß Ñ 3œ" 4œ" 6 6 œ + + ØOÐx ß Ñß OÐx ß ÑÙ œ +, OÐx ß x ÑÞ ß4œ" 3ß4œ"
18 Kernel existence Easy to check that with the above inner product [! is an inner product space (i.e., satisfies properties ÐaÑ ÐdÑ). Now form the completion 2 of this space into the Hilbert space [Þ Note that for 0œ +O Ð Ñas above 3 3 x 3 2 The completion of a non-complete inner product space space [! is the (unique) smallest complete inner product (Hilbert) space [ which contains [!. That is, [! [, the inner product on [! is the same as on [, and there is no smaller complete Hilbert space which contains. [! Example 1: [ œœ œð+ ß+ ßáѺ # a " # l+l 3 with inner product ØaßbÙœ +, 3 3 was discussed in class. The inner product space 3œ" 3œ" [! œœð+ß+ßáñ " # [ º all but a finite number of + 3 are 0 [ is an example of an incomplete space. [ is its completion.
19 Kernel existence l0ðxñl œ Ø0Ð Ñß OÐxß ÑÙ Ÿ m0ð ÑmmOÐxß Ñm œ m0mèøoðxß ÑßOÐxß ÑÙ ß+ # ßáÑ [ ºall but a finite number of + 3 are 0 [ is an example of an incomplete space. [ is its completion. # Example 2: [ œp Ð 1ß1Ñwith standard inner product for functions. We know if + #! 0ÐB) [ then 0ÐBÑ œ + cos 5B, sin 5B. Define [ to be all 0 [ for which the 5œ" 5 5! above sum is finite (i.e., all but a finite number of terms are 0). Then [ is the completion of [!.
20 Kernel existence ŸQ m0m Þ O [ [Note again here we write m0m œ m0m [ Ø0ß 1Ù œ Ø0ß 1Ù [ ] by definition; similarly The above shows that the imbedding MÀ[! ÄGÐ\Ñ(the latter is the continuous functions on \) is bounded. By this we mean that M maps function 0 as a function in [! to itself as a function in GÐ\Ñ; in GÐ\Ñ the norm of 0 is m0m sup l0ðbñl. B \ By bounded we mean that constant.!. mm0m œ m0m Ÿ.m0m [ for some
21 Kernel existence Thus any Cauchy sequence in [! is also Cauchy in GÐ\Ñ, and so it follows easily that the completion [ of [! exists as a subset of GÐ\Ñ. That is a reproducing kernel for follows by approximation O [ from [! Þ
22 Regularization methods 3. Regularization methods for choosing 0 Finding 0 from R0 œ g œ ÖÐx 3 ßC 3 Ñ 3œ" is an ill-posed problem: " the operator R does not exist because R is not one to one. Need to combine both: (a) Data R0 (b) A priori information, e.g., " 0 is smooth", e.g. expressing a preference for smooth over wiggly solutions seen earlier. How to incorporate? Using Tikhonov regularization methods. R
23 Regularization methods We introduce a regularization loss functional L Ð0Ñ representing penalty for choice of an "unrealistic" 0 such as that in (a) above. Assume we want to find the correct function 0Ð! xñß R0 ÐxÑ œ ÐÐx, C ÑßáßÐx ßC ÑÑ œ g! " " 8 8 from data
24 Regularization methods Suppose we are given 0ÐxÑ as a candidate for approximating 0Ð! x Ñfrom the information in g Þ We score 0as a good or bad approximation based on a combination of its error on 8 the known points Ö x 3 3œ", together with its "plausibility", i.e., how low the Lagrangian 8 " _ œ ZÐ0Ð ÑßC Ñ PÐ0Ñ 8 x 3 3 3œ" is. Here ZÐ0Ðx 3 ÑßCÑ 3 is a measure of the loss whenever 0Ð Ñ is far from C, e.g. x 3 3 Z Ð0Ðx ÑßC Ñ œ l0ðx Ñ C l Þ #
25 Regularization methods And PÐ0Ñ measures the a priori loss, i.e., a measure of discrepancy between the prospective choice 0 and our prior expectation about 0.
26 Example: Examples: Regularization methods # # P PÐ0Ñ œ me0m # œ (. xle0ðxñl ß # # `0 `B# `0 `B# where E0 œ? 0 0à here? 0 œ á Þ Note?0 and thus PÐ0Ñmeasures the degree of nonsmoothness that 0 has (i.e., we prefer smoother functions a priori). " :
27 Examples: Regularization methods Example 7: Consider the case PÐ0Ñ œ me0m # norm m0m œ me0m [ P # above. The = reproducing kernel Hilbert space norm (at least if dimension. is small). That is, it comes from an inner product Ø0ß 1Ù œ ' \ ÐE0ÑÐBÑÐE1ÑÐBÑ.B, and with this inner product [ is an RKHS. If this is the case, in general things become easier.
28 Examples: Regularization methods Example 8: In the case 0œ0ÐBÑ, B ". Suppose we choose: we have # #.. E0 œ 0 0 œ " 0ß.B# Œ.B# #. PÐ0Ñ œ me0m œ ( Œ " 0.Bß.B# and me0m is a measure of "lack of smoothness" of 0. # #
29 Examples: Regularization methods 4. More about using the Laplacian to measure smoothness (Sobolev smoothness) Basic definitions: Recall the Laplacian operator? on a function 0 on : 0Ð Ñ œ 0ÐB ßáßB Ñ is defined by x " : # # ` `?0 œ 0 á 0Þ `B# `B# " :
30 Using the Laplacian for kernels For =! an even integer, we can define the Sobolev space L = by: = #. =Î# # : L œö0 P Ð ÑÀÐ"? Ñ 0 P Ð Ñ # : # to be functions in PÐ Ñwhich are still in P after taking the =Î# derivative operation Ð"? Ñ, i.e., ÐM? Ñ repeated =Î# times (the operator " is always the identity). For 0ß1 L = define the new inner product
31 Using the Laplacian for kernels Ø0ß 1Ù = œ ØÐ? "Ñ 0ß Ð? "Ñ 1Ù # à L =Î# [note Ø2ÐxÑß5ÐxÑÙ œ 2ÐxÑ5ÐxÑ. x] P \ # ' =Î# P Can show that L = is an RKHS with reproducing kernel " OÐzÑ œ Y " Œ Ðl l# "Ñ= = (3)
32 Using the Laplacian for kernels where Y " denotes the inverse Fourier transform. The " : function Ðl= l "Ñ is a function on = œð= # = " ßáß= : Ñ ß where l= l # œ = # á = # Þ " :
33 Using the Laplacian for kernels Fig 1: OÐzÑin one dimension - a smooth kernel
34 Using the Laplacian for kernels OÐz Ñ is called a radial basis function. Note: the kernel OÐxy ß Ñ (as function of 2 variables) is defined in terms of above O by OÐxy ß Ñ œ OÐx yñþ
35 Using the Laplacian for kernels The Representer Theorem for RKHS 1. An application: using RKHS for regularization Assume again we have an unknown function 0 on \, with only data R0 œ ÐÐx ß C ÑßáßÐx ßC ÑÑ œ g. " " 8 8 To find the best guess s0 for 0, approximate it by the minimizer
36 RKHS and regularization 8 s " # # 0 œ arg min m0ð 3Ñ C3m m0m Þ 8 L = x - Ÿ (1) = 0 L 3œ" where - can be some constant. Note we are finding an 0 8 which balances minimizing # m0ðx Ñ C m ß 3œ" 3 3 i.e., the data error, with minimizing m0m # L, i.e., maximizing the = smoothness. The solution to such a problem will look like this:
37 RKHS and regularization It will compromise between fitting the data (which may have error) and trying to be smooth.
38 RKHS and regularization The amazing thing: 0s can be found explicitly using radial basis functions. 2. Solving the minimization Now consider the optimization problem (1). We claim that we can solve it explicitly. To see this works in general for RKHS, return to the general problem. Given an unknown 0 [ œrkhs. Try to find the "best" approximation s0 to 0 fitting the data
39 RKHS and regularization R0 ÐÐx" ßC" ÑßáßÐx8ßC8 ÑÑ, but ALSO satisfying a priori knowledge that m0! m [ is small. Specifically, we want to find 8 " arg min # Z Ð0Ðx3ÑßC3Ñ -m0m[ Þ 8 0 [ 3œ" Note we can have, e.g., Z Ð0Ðx ÑßC Ñ œ Ð0Ðx Ñ C Ñ Þ case # (2) In that
40 RKHS and regularization # 3œ" 3œ" Z Ð0Ðx ÑßC Ñ œ Ð0Ðx Ñ C Ñ Þ Consider the general case (2), with arbitrary error measure. Z We have the
41 RKHS and regularization Representer Theorem: E solution of the Tikhonov optimization problem Ð"Ñ can be written 8 s0ðbñ œ + OÐxx ß Ñß (3) 3œ" 3 3 where O is the reproducing kernel of the RKHS [. Important theorem: says we only need to find a set of 8 numbers + 3 to optimize the infinite dimensional problem (1) above.
42 RKHS and regularization Proof: Use calculus of variations. If a minimizer 0 " exists, then for all 1 [, assuming that the derivatives with respect to % exist:
43 Representer theorem proof 8. "! œ # Z ÐÐ0" % 1ÑÐx 3Ñß C3 Ñ - m0" % 1m[ º.% 8 8 3œ" œ " `Z Ð0" Ðx3ÑßC3Ñ 1Ðx3Ñ 8 `0 Ðx Ñ 3œ" " 3 % œ!. - Ø0ß0Ù #Ø0ß1Ù % % Ø1ß1Ù.% # " " "
44 Representer theorem proof 8 " œ Z Ð0 Ð ÑßCÑ 1Ð Ñ # Ø0 ß1Ùß 8 " " x3 3 x3 - " 3œ" where ZÐ+ß,Ñœ ZÐ+ß,Ñand all inner products are in [ Þ " ` `+ Since the above is true for all 1 [ ß 1œO x we get: it follows that if we let
45 Representer theorem proof 8 "!œ ZÐ0Ð ÑßCÑOÐ Ñ # Ø0ßOÙ 8 " " x3 3 x x3 - " x 3œ" 8 " œ Z Ð0 Ð Ñß C ÑO Ð Ñ # 0 Ð Ñß 8 " " x3 3 x x3 - " x 3œ" or 8 " 0Ð " xñœ ZÐ0Ð " " x3ñßcñoð 3 xßx3ñþ #-8 3œ"
46 Representer theorem proof Thus if a minimizer s 0œ0" exists for (1) ßit can be written in the form (3) as claimed, with " + 3 œ Z" Ð0" Ðx3ÑßC3ÑÞ #-8 Note that this does not solve the problem, since the + 3 are expressed in terms of the solution itself. But it does reduce the possibilities for what a solution looks like.
SVM example: cancer classification Support Vector Machines
SVM example: cancer classification Support Vector Machines 1. Cancer genomics: TCGA The cancer genome atlas (TCGA) will provide high-quality cancer data for large scale analysis by many groups: SVM example:
More informationSolving SVM: Quadratic Programming
MA 751 Part 7 Solving SVM: Quadratic Programming 1. Quadratic programming (QP): Introducing Lagrange multipliers α 4 and. 4 (can be justified in QP for inequality as well as equality constraints) we define
More informationGene expression experiments. Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces
MA 751 Part 3 Gene expression experiments Infinite Dimensional Vector Spaces 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces Gene expression experiments Question: Gene
More informationInfinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces
MA 751 Part 3 Infinite Dimensional Vector Spaces 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces Microarray experiment: Question: Gene expression - when is the DNA in
More informationE.G., data mining, prediction. Function approximation problem: How >9 best estimate the function 0 from partial information (examples)
Can Neural Network and Statistical Learning Theory be Formulated in terms of Continuous Complexity Theory? 1. Neural Networks and Statistical Learning Theory Many areas of mathematics, statistics, and
More informationMAP METHODS FOR MACHINE LEARNING. Mark A. Kon, Boston University Leszek Plaskota, Warsaw University, Warsaw Andrzej Przybyszewski, McGill University
MAP METHODS FOR MACHINE LEARNING Mark A. Kon, Boston University Leszek Plaskota, Warsaw University, Warsaw Andrzej Przybyszewski, McGill University Example: Control System 1. Paradigm: Machine learning
More informationDr. H. Joseph Straight SUNY Fredonia Smokin' Joe's Catalog of Groups: Direct Products and Semi-direct Products
Dr. H. Joseph Straight SUNY Fredonia Smokin' Joe's Catalog of Groups: Direct Products and Semi-direct Products One of the fundamental problems in group theory is to catalog all the groups of some given
More informationThe Hahn-Banach and Radon-Nikodym Theorems
The Hahn-Banach and Radon-Nikodym Theorems 1. Signed measures Definition 1. Given a set Hand a 5-field of sets Y, we define a set function. À Y Ä to be a signed measure if it has all the properties of
More informationSuggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R
Suggetion - Problem Set 3 4.2 (a) Show the dicriminant condition (1) take the form x D Ð.. Ñ. D.. D. ln ln, a deired. We then replace the quantitie. 3ß D3 by their etimate to get the proper form for thi
More information7. Random variables as observables. Definition 7. A random variable (function) X on a probability measure space is called an observable
7. Random variables as observables Definition 7. A random variable (function) X on a probability measure space is called an observable Note all functions are assumed to be measurable henceforth. Note that
More informationInner Product Spaces
Inner Product Spaces In 8 X, we defined an inner product? @? @?@ ÞÞÞ? 8@ 8. Another notation sometimes used is? @? ß@. The inner product in 8 has several important properties ( see Theorem, p. 33) that
More informationProofs Involving Quantifiers. Proof Let B be an arbitrary member Proof Somehow show that there is a value
Proofs Involving Quantifiers For a given universe Y : Theorem ÐaBÑ T ÐBÑ Theorem ÐbBÑ T ÐBÑ Proof Let B be an arbitrary member Proof Somehow show that there is a value of Y. Call it B œ +, say Þ ÐYou can
More informationDEPARTMENT OF MATHEMATICS AND STATISTICS QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, SEPTEMBER 24, 2016
DEPARTMENT OF MATHEMATICS AND STATISTICS QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, SEPTEMBER 24, 2016 Examiners: Drs. K. M. Ramachandran and G. S. Ladde INSTRUCTIONS: a. The quality
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationQUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, MAY 9, Examiners: Drs. K. M. Ramachandran and G. S. Ladde
QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, MAY 9, 2015 Examiners: Drs. K. M. Ramachandran and G. S. Ladde INSTRUCTIONS: a. The quality is more important than the quantity. b. Attempt
More informationManifold Regularization
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationProduct Measures and Fubini's Theorem
Product Measures and Fubini's Theorem 1. Product Measures Recall: Borel sets U in are generated by open sets. They are also generated by rectangles VœN " á N hich are products of intervals NÞ 3 Let V be
More informationNotes on the Unique Extension Theorem. Recall that we were interested in defining a general measure of a size of a set on Ò!ß "Ó.
1. More on measures: Notes on the Unique Extension Theorem Recall that we were interested in defining a general measure of a size of a set on Ò!ß "Ó. Defined this measure T. Defined TÐ ( +ß, ) Ñ œ, +Þ
More informationProbability basics. Probability and Measure Theory. Probability = mathematical analysis
MA 751 Probability basics Probability and Measure Theory 1. Basics of probability 2 aspects of probability: Probability = mathematical analysis Probability = common sense Probability basics A set-up of
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More information8œ! This theorem is justified by repeating the process developed for a Taylor polynomial an infinite number of times.
Taylor and Maclaurin Series We can use the same process we used to find a Taylor or Maclaurin polynomial to find a power series for a particular function as long as the function has infinitely many derivatives.
More informationIt's Only Fitting. Fitting model to data parameterizing model estimating unknown parameters in the model
It's Only Fitting Fitting model to data parameterizing model estimating unknown parameters in the model Likelihood: an example Cohort of 8! individuals observe survivors at times >œ 1, 2, 3,..., : 8",
More informationExample Let VœÖÐBßCÑ À b-, CœB - ( see the example above). Explain why
Definition If V is a relation from E to F, then a) the domain of V œ dom ÐVÑ œ Ö+ E À b, F such that Ð+ß,Ñ V b) the range of V œ ran( VÑ œ Ö, F À b+ E such that Ð+ß,Ñ V " c) the inverse relation of V œ
More informationSystems of Equations 1. Systems of Linear Equations
Lecture 1 Systems of Equations 1. Systems of Linear Equations [We will see examples of how linear equations arise here, and how they are solved:] Example 1: In a lab experiment, a researcher wants to provide
More informationPART I. Multiple choice. 1. Find the slope of the line shown here. 2. Find the slope of the line with equation $ÐB CÑœ(B &.
Math 1301 - College Algebra Final Exam Review Sheet Version X This review, while fairly comprehensive, should not be the only material used to study for the final exam. It should not be considered a preview
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationRKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee
RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators
More informationAn Analytical Comparison between Bayes Point Machines and Support Vector Machines
An Analytical Comparison between Bayes Point Machines and Support Vector Machines Ashish Kapoor Massachusetts Institute of Technology Cambridge, MA 02139 kapoor@mit.edu Abstract This paper analyzes the
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationImplicit in the definition of orthogonal arrays is a projection propertyþ
Projection properties Implicit in the definition of orthogonal arrays is a projection propertyþ factor screening effect ( factor) sparsity A design is said to be of projectivity : if in every subset of
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationLA PRISE DE CALAIS. çoys, çoys, har - dis. çoys, dis. tons, mantz, tons, Gas. c est. à ce. C est à ce. coup, c est à ce
> ƒ? @ Z [ \ _ ' µ `. l 1 2 3 z Æ Ñ 6 = Ð l sl (~131 1606) rn % & +, l r s s, r 7 nr ss r r s s s, r s, r! " # $ s s ( ) r * s, / 0 s, r 4 r r 9;: < 10 r mnz, rz, r ns, 1 s ; j;k ns, q r s { } ~ l r mnz,
More informationFramework for functional tree simulation applied to 'golden delicious' apple trees
Purdue University Purdue e-pubs Open Access Theses Theses and Dissertations Spring 2015 Framework for functional tree simulation applied to 'golden delicious' apple trees Marek Fiser Purdue University
More informationB œ c " " ã B œ c 8 8. such that substituting these values for the B 3 's will make all the equations true
System of Linear Equations variables Ð unknowns Ñ B" ß B# ß ÞÞÞ ß B8 Æ Æ Æ + B + B ÞÞÞ + B œ, "" " "# # "8 8 " + B + B ÞÞÞ + B œ, #" " ## # #8 8 # ã + B + B ÞÞÞ + B œ, 3" " 3# # 38 8 3 ã + 7" B" + 7# B#
More information2. Let 0 be as in problem 1. Find the Fourier coefficient,&. (In other words, find the constant,& the Fourier series for 0.)
Engineering Mathematics (ESE 317) Exam 4 April 23, 200 This exam contains seven multiple-choice problems worth two points each, 11 true-false problems worth one point each, and some free-response problems
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationComplexity and regularization issues in kernel-based learning
Complexity and regularization issues in kernel-based learning Marcello Sanguineti Department of Communications, Computer, and System Sciences (DIST) University of Genoa - Via Opera Pia 13, 16145 Genova,
More informationHere are proofs for some of the results about diagonalization that were presented without proof in class.
Suppose E is an 8 8 matrix. In what follows, terms like eigenvectors, eigenvalues, and eigenspaces all refer to the matrix E. Here are proofs for some of the results about diagonalization that were presented
More information: œ Ö: =? À =ß> real numbers. œ the previous plane with each point translated by : Ðfor example,! is translated to :)
â SpanÖ?ß@ œ Ö =? > @ À =ß> real numbers : SpanÖ?ß@ œ Ö: =? > @ À =ß> real numbers œ the previous plane with each point translated by : Ðfor example, is translated to :) á In general: Adding a vector :
More informationAre Loss Functions All the Same?
Are Loss Functions All the Same? L. Rosasco E. De Vito A. Caponnetto M. Piana A. Verri November 11, 2003 Abstract In this paper we investigate the impact of choosing different loss functions from the viewpoint
More information4.3 Laplace Transform in Linear System Analysis
4.3 Laplace Transform in Linear System Analysis The main goal in analysis of any dynamic system is to find its response to a given input. The system response in general has two components: zero-state response
More informationInner Product Spaces. Another notation sometimes used is X.?
Inner Product Spaces X In 8 we have an inner product? @? @?@ ÞÞÞ? 8@ 8Þ Another notation sometimes used is X? ß@? @? @? @ ÞÞÞ? @ 8 8 The inner product in?@ ß in 8 has several essential properties ( see
More information! " # $! % & '! , ) ( + - (. ) ( ) * + / 0 1 2 3 0 / 4 5 / 6 0 ; 8 7 < = 7 > 8 7 8 9 : Œ Š ž P P h ˆ Š ˆ Œ ˆ Š ˆ Ž Ž Ý Ü Ý Ü Ý Ž Ý ê ç è ± ¹ ¼ ¹ ä ± ¹ w ç ¹ è ¼ è Œ ¹ ± ¹ è ¹ è ä ç w ¹ ã ¼ ¹ ä ¹ ¼ ¹ ±
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationMachine Learning : Support Vector Machines
Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationLund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics
Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics STATISTICAL METHODS FOR SAFETY ANALYSIS FMS065 ÓÑÔÙØ Ö Ü Ö Ì ÓÓØ ØÖ Ô Ð ÓÖ Ø Ñ Ò Ý Ò Ò ÐÝ In this exercise we will
More informationÐ"Ñ + Ð"Ñ, Ð"Ñ +, +, + +, +,,
Handout #11 Confounding: Complete factorial experiments in incomplete blocks Blocking is one of the important principles in experimental design. In this handout we address the issue of designing complete
More informationFrom Handout #1, the randomization model for a design with a simple block structure can be written as
Hout 4 Strata Null ANOVA From Hout 1 the romization model for a design with a simple block structure can be written as C œ.1 \ X α % (4.1) w where α œðα á α > Ñ E ÐÑœ % 0 Z œcov ÐÑ % with cov( % 3 % 4
More informationAn Example file... log.txt
# ' ' Start of fie & %$ " 1 - : 5? ;., B - ( * * B - ( * * F I / 0. )- +, * ( ) 8 8 7 /. 6 )- +, 5 5 3 2( 7 7 +, 6 6 9( 3 5( ) 7-0 +, => - +< ( ) )- +, 7 / +, 5 9 (. 6 )- 0 * D>. C )- +, (A :, C 0 )- +,
More informationMath 1AA3/1ZB3 Sample Test 3, Version #1
Math 1AA3/1ZB3 Sample Test 3, Version 1 Name: (Last Name) (First Name) Student Number: Tutorial Number: This test consists of 16 multiple choice questions worth 1 mark each (no part marks), and 1 question
More information5.6 Nonparametric Logistic Regression
5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that
More informationAn investigation into the use of maximum posterior probability estimates for assessing the accuracy of large maps
An investigation into the use of maximum posterior probability estimates for assessing the accuracy of large maps Brian Steele 1 and Dave Patterson2 Dept. of Mathematical Sciences University of Montana
More informatione) D œ < f) D œ < b) A parametric equation for the line passing through Ð %, &,) ) and (#,(, %Ñ.
Page 1 Calculus III : Bonus Problems: Set 1 Grade /42 Name Due at Exam 1 6/29/2018 1. (2 points) Give the equations for the following geometrical objects : a) A sphere of radius & centered at the point
More informationStat 643 Problems. a) Show that V is a sub 5-algebra of T. b) Let \ be an integrable random variable. Find EÐ\lVÑÞ
Stat 643 Problems 1. Let ( HT, ) be a measurable space. Suppose that., T and Uare 5-finite positive measures on this space with T Uand U...T.T.U a) Show that T. and œ a.s......u.....u.u Š.. b) Show that
More informationMath 131 Exam 4 (Final Exam) F04M
Math 3 Exam 4 (Final Exam) F04M3.4. Name ID Number The exam consists of 8 multiple choice questions (5 points each) and 0 true/false questions ( point each), for a total of 00 points. Mark the correct
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationEngineering Mathematics (E35 317) Final Exam December 18, 2007
Engineering Mathematics (E35 317) Final Exam December 18, 2007 This exam contains 18 multile-choice roblems orth to oints each, five short-anser roblems orth one oint each, and nine true-false roblems
More informationThis document has been prepared by Sunder Kidambi with the blessings of
Ö À Ö Ñ Ø Ò Ñ ÒØ Ñ Ý Ò Ñ À Ö Ñ Ò Ú º Ò Ì ÝÊ À Å Ú Ø Å Ê ý Ú ÒØ º ÝÊ Ú Ý Ê Ñ º Å º ² ºÅ ý ý ý ý Ö Ð º Ñ ÒÜ Æ Å Ò Ñ Ú «Ä À ý ý This document has been prepared by Sunder Kidambi with the blessings of Ö º
More informationVectors. Teaching Learning Point. Ç, where OP. l m n
Vectors 9 Teaching Learning Point l A quantity that has magnitude as well as direction is called is called a vector. l A directed line segment represents a vector and is denoted y AB Å or a Æ. l Position
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationA note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 68 November 999 C.B.C.L
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More information106 Chapter 5 Curve Sketching. If f(x) has a local extremum at x = a and. THEOREM Fermat s Theorem f is differentiable at a, then f (a) = 0.
5 Curve Sketching Whether we are interested in a function as a purely mathematical object or in connection with some application to the real world, it is often useful to know what the graph of the function
More informationNormed & Inner Product Vector Spaces
Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed
More informationAN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS
AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS First the X, then the AR, finally the MA Jan C. Willems, K.U. Leuven Workshop on Observation and Estimation Ben Gurion University, July 3, 2004 p./2 Joint
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationBinomial Randomization 3/13/02
Binomial Randomization 3/13/02 Define to be the product space 0,1, á, m â 0,1, á, m and let be the set of all vectors Ðs, á,s Ñ in such that s á s t. Suppose that \, á,\ is a multivariate hypergeometric
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationOverview of normed linear spaces
20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural
More informationEXCERPTS FROM ACTEX CALCULUS REVIEW MANUAL
EXCERPTS FROM ACTEX CALCULUS REVIEW MANUAL Table of Contents Introductory Comments SECTION 6 - Differentiation PROLEM SET 6 TALE OF CONTENTS INTRODUCTORY COMMENTS Section 1 Set Theory 1 Section 2 Intervals,
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationÆ Å not every column in E is a pivot column (so EB œ! has at least one free variable) Æ Å E has linearly dependent columns
ßÞÞÞß @ is linearly independent if B" @" B# @# ÞÞÞ B: @: œ! has only the trivial solution EB œ! has only the trivial solution Ðwhere E œ Ò@ @ ÞÞÞ@ ÓÑ every column in E is a pivot column E has linearly
More informationExample: A Markov Process
Example: A Markov Process Divide the greater metro region into three parts: city such as St. Louis), suburbs to include such areas as Clayton, University City, Richmond Heights, Maplewood, Kirkwood,...)
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationRelations. Relations occur all the time in mathematics. For example, there are many relations between # and % À
Relations Relations occur all the time in mathematics. For example, there are many relations between and % À Ÿ% % Á% l% Ÿ,, Á ß and l are examples of relations which might or might not hold between two
More informationConstructive Decision Theory
Constructive Decision Theory Joe Halpern Cornell University Joint work with Larry Blume and David Easley Economics Cornell Constructive Decision Theory p. 1/2 Savage s Approach Savage s approach to decision
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationOC330C. Wiring Diagram. Recommended PKH- P35 / P50 GALH PKA- RP35 / RP50. Remarks (Drawing No.) No. Parts No. Parts Name Specifications
G G " # $ % & " ' ( ) $ * " # $ % & " ( + ) $ * " # C % " ' ( ) $ * C " # C % " ( + ) $ * C D ; E @ F @ 9 = H I J ; @ = : @ A > B ; : K 9 L 9 M N O D K P D N O Q P D R S > T ; U V > = : W X Y J > E ; Z
More information2 Hallén s integral equation for the thin wire dipole antenna
Ú Ð Ð ÓÒÐ Ò Ø ØØÔ»» Ѻ Ö Ùº º Ö ÁÒغ º ÁÒ Ù ØÖ Ð Å Ø Ñ Ø ÎÓк ÆÓº ¾ ¾¼½½µ ½ ¹½ ¾ ÆÙÑ Ö Ð Ñ Ø Ó ÓÖ Ò ÐÝ Ó Ö Ø ÓÒ ÖÓÑ Ø Ò Û Ö ÔÓÐ ÒØ ÒÒ Ëº À Ø ÑÞ ¹Î ÖÑ ÞÝ Ö Åº Æ Ö¹ÅÓ Êº Ë Þ ¹Ë Ò µ Ô ÖØÑ ÒØ Ó Ð ØÖ Ð Ò Ò
More informationPlanning for Reactive Behaviors in Hide and Seek
University of Pennsylvania ScholarlyCommons Center for Human Modeling and Simulation Department of Computer & Information Science May 1995 Planning for Reactive Behaviors in Hide and Seek Michael B. Moore
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationMultinomial Allocation Model
Multinomial Allocation Model Theorem 001 Suppose an experiment consists of independent trials and that every trial results in exactly one of 8 distinct outcomes (multinomial trials) Let : equal the probability
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationThe Transpose of a Vector
8 CHAPTER Vectors The Transpose of a Vector We now consider the transpose of a vector in R n, which is a row vector. For a vector u 1 u. u n the transpose is denoted by u T = [ u 1 u u n ] EXAMPLE -5 Find
More informationDr. Nestler - Math 2 - Ch 3: Polynomial and Rational Functions
Dr. Nestler - Math 2 - Ch 3: Polynomial and Rational Functions 3.1 - Polynomial Functions We have studied linear functions and quadratic functions Defn. A monomial or power function is a function of the
More informationUniversal Multi-Task Kernels
Journal of Machine Learning Research 9 (2008) 1615-1646 Submitted 7/07; Revised 1/08; Published 7/08 Universal Multi-Task Kernels Andrea Caponnetto Department of Mathematics City University of Hong Kong
More informationQUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS
International Workshop Quarkonium Working Group QUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS ALBERTO POLLERI TU München and ECT* Trento CERN - November 2002 Outline What do we know for sure?
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationThéorie Analytique des Probabilités
Théorie Analytique des Probabilités Pierre Simon Laplace Book II 5 9. pp. 203 228 5. An urn being supposed to contain the number B of balls, e dra from it a part or the totality, and e ask the probability
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationThe Heat Equation John K. Hunter February 15, The heat equation on a circle
The Heat Equation John K. Hunter February 15, 007 The heat equation on a circle We consider the diffusion of heat in an insulated circular ring. We let t [0, ) denote time and x T a spatial coordinate
More informationCellular Automaton Growth on # : Theorems, Examples, and Problems
Cellular Automaton Growth on : Theorems, Examples, and Problems (Excerpt from Advances in Applied Mathematics) Exactly 1 Solidification We will study the evolution starting from a single occupied cell
More informationEngineering Mathematics (E35 317) Final Exam December 15, 2006
Engineering Mathematics (E35 317) Final Exam December 15, 2006 This exam contains six free-resonse roblems orth 36 oints altogether, eight short-anser roblems orth one oint each, seven multile-choice roblems
More information