11 KERNEL METHODS From Feature Combinations to Kernels. φ(x) = 1, 2x 1, 2x 2, 2x 3,..., 2x D, Learning Objectives:

Size: px
Start display at page:

Download "11 KERNEL METHODS From Feature Combinations to Kernels. φ(x) = 1, 2x 1, 2x 2, 2x 3,..., 2x D, Learning Objectives:"

Transcription

1 11 KERNEL METHODS May who have had a opportuity of kowig ay ore about atheatics cofuse it with arithetic, ad cosider it a arid sciece. I reality, however, it is a sciece which requires a great aout of iagiatio. Sofia Kovalevskaya Liear odels are great because they are easy to uderstad ad easy to optiize. They suffer because they ca oly lear very siple decisio boudaries. Neural etworks ca lear ore coplex decisio boudaries, but lose the ice covexity properties of ay liear odels. Oe way of gettig a liear odel to behave o-liearly is to trasfor the iput. For istace, by addig feature pairs as additioal iputs. Learig a liear odel o such a represetatio is covex, but is coputatioally prohibitive i all but very low diesioal spaces. You ight ask: istead of explicitly expadig the feature space, is it possible to stay with our origial data represetatio ad do all the feature blow up iplicitly? Surprisigly, the aswer is ofte yes ad the faily of techiques that akes this possible are kow as kerel approaches. Learig Objectives: Explai how kerels geeralize both feature cobiatios ad basis fuctios. Cotrast dot products with kerel products. Ipleet kerelized perceptro. Derive a kerelized versio of regularized least squares regressio. Ipleet a kerelized versio of the perceptro. Derive the dual forulatio of the support vector achie. Depedecies: 11.1 Fro Feature Cobiatios to Kerels I Sectio 5.4, you leared oe ethod for icreasig the expressive power of liear odels: explode the feature space. For istace, a quadratic feature explosio ight ap a feature vector x = x 1, x 2, x 3,..., x D to a expaded versio deoted φ(x): φ(x) = 1, 2x 1, 2x 2, 2x 3,..., 2x D, x 2 1, x 1x 2, x 1 x 3,..., x 1 x D, x 2 x 1, x 2 2, x 2x 3,..., x 2 x D, x 3 x 1, x 3 x 2, x 2 3,..., x 2x D,..., x D x 1, x D x 2, x D x 3,..., x 2 D (11.1) (Note that there are repetitios here, but hopefully ost learig algoriths ca deal well with redudat features; i particular, the 2x 1 ters are due to collapsig soe repetitios.)

2 142 a course i achie learig You could the trai a classifier o this expaded feature space. There are two priary cocers i doig so. The first is coputatioal: if your learig algorith scales liearly i the uber of features, the you ve just squared the aout of coputatio you eed to perfor; you ve also squared the aout of eory you ll eed. The secod is statistical: if you go by the heuristic that you should have about two exaples for every feature, the you will ow eed quadratically ay traiig exaples i order to avoid overfittig. This chapter is all about dealig with the coputatioal issue. It will tur out i Chapter 12 that you ca also deal with the statistical issue: for ow, you ca just hope that regularizatio will be sufficiet to atteuate overfittig. The key isight i kerel-based learig is that you ca rewrite ay liear odels i a way that does t require you to ever explicitly copute φ(x). To start with, you ca thik of this purely as a coputatioal trick that eables you to use the power of a quadratic feature appig without actually havig to copute ad store the apped vectors. Later, you will see that it s actually quite a bit deeper. Most algoriths we discuss ivolve a product of the for w φ(x), after perforig the feature appig. The goal is to rewrite these algoriths so that they oly ever deped o dot products betwee two exaples, say x ad z; aely, they deped o φ(x) φ(z). To uderstad why this is helpful, cosider the quadratic expasio fro above, ad the dot-product betwee two vectors. You get: φ(x) φ(z) = 1 + x 1 z 1 + x 2 z x D z D + x 2 1 z x 1x D z 1 z D + + x D x 1 z D z 1 + x D x 2 z D z x 2 D z2 D (11.2) = d x d z d + d x d x e z d z e (11.3) e = 1 + 2x z + (x z) 2 (11.4) = (1 + x z) 2 (11.5) Thus, you ca copute φ(x) φ(z) i exactly the sae aout of tie as you ca copute x z (plus the tie it takes to perfor a additio ad a ultiply, about 0.02 aosecods o a circa 2011 processor). The rest of the practical challege is to rewrite your algoriths so that they oly deped o dot products betwee exaples ad ot o ay explicit weight vectors Kerelized Perceptro Cosider the origial perceptro algorith fro Chapter 4, repeated i Algorith 11.2 usig liear algebra otatio ad usig feature expasio otatio φ(x). I this algorith, there are two places

3 kerel ethods 143 Algorith 29 PerceptroTrai(D, MaxIter) 1: w 0, b 0 // iitialize weights ad bias 2: for iter = 1... MaxIter do 3: for all (x,y) D do 4: a w φ(x) + b // copute activatio for this exaple 5: if ya 0 the 6: w w + y φ(x) // update weights 7: b b + y // update bias 8: ed if 9: ed for 10: ed for 11: retur w, b MATH REVIEW SPANS If U = {u i } I i=1 is a set of vectors i RD, the the spa of U is the set of vectors that ca be writte as liear cobiatios of u i s; aely: spa(u) = { i a i u i : a 1 R,..., a I R}. If all of the u i s are liearly idepedet, the the diesio of spa(u) is I; i particular, if there are D-ay liearly idepedet vectors the they spa R D. where φ(x) is used explicitly. The first is i coputig the activatio (lie 4) ad the secod is i updatig the weights (lie 6). The goal is to reove the explicit depedece of this algorith o φ ad o the weight vector. To do so, you ca observe that at ay poit i the algorith, the weight vector w ca be writte as a liear cobiatio of expaded traiig data. I particular, at ay poit, w = α φ(x ) for soe paraeters α. Iitially, w = 0 so choosig α = 0 yields this. If the first update occurs o the th traiig exaple, the the resolutio weight vector is siply y φ(x ), which is equivalet to settig α = y. If the secod update occurs o the th traiig exaple, the all you eed to do is update α α + y. This is true, eve if you ake ultiple passes over the data. This observatio leads to the followig represeter theore, which states that the weight vector of the perceptro lies i the spa of the traiig data. Theore 12 (Perceptro Represeter Theore). Durig a ru of the perceptro algorith, the weight vector w is always i the spa of the (assued o-epty) traiig data, φ(x 1 ),..., φ(x N ). Proof of Theore 12. By iductio. Base case: the spa of ay oepty set cotais the zero vector, which is the iitial weight vector. Iductive case: suppose that the theore is true before the kth update, ad suppose that the kth update happes o exaple. By the iductive hypothesis, you ca write w = i α i φ(x i ) before Figure 11.1:

4 144 a course i achie learig Algorith 30 KerelizedPerceptroTrai(D, MaxIter) 1: α 0, b 0 // iitialize coefficiets ad bias 2: for iter = 1... MaxIter do 3: for all (x,y ) D do 4: a α φ(x ) φ(x ) + b // copute activatio for this exaple 5: if y a 0 the 6: α α + y // update coefficiets 7: b b + y // update bias 8: ed if 9: ed for 10: ed for 11: retur α, b the update. The ew weight vector is [ i α i φ(x i )] + y φ(x ) = i (α i + y [i = ])φ(x i ), which is still i the spa of the traiig data. Now that you kow that you ca always write w = α φ(x ) for soe α i s, you ca additioall copute the activatios (lie 4) as: ( ) w φ(x) + b = α φ(x ) φ(x) + b defiitio of w (11.6) ] = α [φ(x ) φ(x) + b dot products are liear (11.7) This ow depeds oly o dot-products betwee data poits, ad ever explicitly requires a weight vector. You ca ow rewrite the etire perceptro algorith so that it ever refers explicitly to the weights ad oly ever depeds o pairwise dot products betwee exaples. This is show i Algorith The advatage to this kerelized algorith is that you ca perfor feature expasios like the quadratic feature expasio fro the itroductio for free. For exaple, for exactly the sae cost as the quadratic features, you ca use a cubic feature ap, coputed as φ(x)φ(z) = (1 + x z) 3, which correspods to three-way iteractios betwee variables. (Ad, i geeral, you ca do so for ay polyoial degree p at the sae coputatioal coplexity.) 11.3 Kerelized K-eas For a coplete chage of pace, cosider the K-eas algorith fro Sectio 3. This algorith is for clusterig where there is o otio of traiig labels. Istead, you wat to partitio the data ito coheret clusters. For data i R D, it ivolves radoly iitializig K-ay

5 kerel ethods 145 cluster eas µ (1),..., µ (K). The algorith the alterates betwee the followig two steps util covergece, with x replaced by φ(x) sice that is the evetual goal: 1. For each exaple, set cluster label z = arg i k φ(x ) µ (k) For each cluster k, update µ (k) = 1 N k :z =k φ(x ), where N k is the uber of with z = k. The questio is whether you ca perfor these steps without explicitly coputig φ(x ). The represeter theore is ore straightforward here tha i the perceptro. The ea of a set of data is, alost by defiitio, i the spa of that data (choose the a i s all to be equal to 1/N). Thus, so log as you iitialize the eas i the spa of the data, you are guarateed always to have the eas i the spa of the data. Give this, you kow that you ca write each ea as a expasio of the data; say that µ (k) = α (k) φ(x ) for soe paraeters α (k) (there are N K-ay such paraeters). Give this expasio, i order to execute step (1), you eed to copute ors. This ca be doe as follows: z = arg i k φ(x ) µ (k) 2 = arg i k φ(x ) α (k) = arg i φ(x ) 2 + k = arg i k α (k) α (k) φ(x ) α (k) 2 φ(x ) φ(x ) φ(x ) + 2 (11.8) (11.9) [ ] + φ(x ) α (k) φ(x ) (11.10) α (k) φ(x ) φ(x ) + cost (11.11) defiitio of z defiitio of µ (k) expad quadratic ter liearity ad costat This coputatio ca replace the assigets i step (1) of K-eas. The ea updates are ore direct i step (2): µ (k) = 1 N k φ(x ) α (k) = :z =k { 1Nk if z = k 0 otherwise (11.12) 11.4 What Makes a Kerel A kerel is just a for of geeralized dot product. You ca also thik of it as siply shorthad for φ(x) φ(z), which is cooly writte K φ (x, z). Or, whe φ is clear fro cotext, siply K(x, z).

6 146 a course i achie learig This is ofte refered to as the kerel product betwee x ad z (uder the appig φ). I this view, what you ve see i the precedig two sectios is that you ca rewrite both the perceptro algorith ad the K-eas algorith so that they oly ever deped o kerel products betwee data poits, ad ever o the actual datapoits theselves. This is a very powerful otio, as it has eabled the developet of a large uber of o-liear algoriths essetially for free (by applyig the so-called kerel trick, that you ve just see twice). This raises a iterestig questio. If you have rewritte these algoriths so that they oly deped o the data through a fuctio K : X X R, ca you stick ay fuctio K i these algoriths, or are there soe K that are forbidde? I oe sese, you could use ay K, but the real questio is: for what types of fuctios K do these algoriths retai the properties that we expect the to have (like covergece, optiality, etc.)? Oe way to aswer this questio is to say that K(, ) is a valid kerel if it correspods to the ier product betwee two vectors. That is, K is valid if there exists a fuctio φ such that K(x, z) = φ(x) φ(z). This is a direct defiitio ad it should be clear that if K satisfies this, the the algoriths go through as expected (because this is how we derived the). You ve already see the geeral class of polyoial kerels, which have the for: K (poly) d (x, z) = ( 1 + x z) d (11.13) where d is a hyperparaeter of the kerel. These kerels correspod to polyoial feature expasios. There is a alterative characterizatio of a valid kerel fuctio that is ore atheatical. It states that K : X X R is a kerel if K is positive sei-defiite (or, i shorthad, psd). This property is also soeties called Mercer s coditio. I this cotext, this eas the for all fuctios f that are square itegrable (i.e., f (x) 2 dx < ), other tha the zero fuctio, the followig property holds: f (x)k(x, z) f (z)dxdz > 0 (11.14) This likely sees like it cae out of owhere. Ufortuately, the coectio is well beyod the scope of this book, but is covered well is exteral sources. For ow, siply take it as a give that this is a equivalet requireet. (For those so iclied, the appedix of this book gives a proof, but it requires a bit of kowledge of fuctio spaces to uderstad.) The questio is: why is this alterative characterizatio useful? It is useful because it gives you a alterative way to costruct kerel

7 kerel ethods 147 fuctios. For istace, usig it you ca easily prove the followig, which would be difficult fro the defiitio of kerels as ier products after feature appigs. Theore 13 (Kerel Additio). If K 1 ad K 2 are kerels, the K defied by K(x, z) = K 1 (x, z) + K 2 (x, z) is also a kerel. Proof of Theore 13. You eed to verify the positive sei-defiite property o K. You ca do this as follows: f (x)k(x, z) f (z)dxdz = f (x) [K 1 (x, z) + K 2 (x, z)] f (z)dxdz = f (x)k 1 (x, z) f (z)dxdz + f (x)k 2 (x, z) f (z)dxdz (11.15) (11.16) defiitio of K distributive rule > K 1 ad K 2 are psd (11.17) More geerally, ay positive liear cobiatio of kerels is still a kerel. Specifically, if K 1,..., K M are all kerels, ad α 1,..., α M 0, the K(x, z) = α K (x, z) is also a kerel. You ca also use this property to show that the followig Gaussia kerel (also called the RBF kerel) is also psd: [ K (RBF) γ (x, z) = exp γ x z 2] (11.18) Here γ is a hyperparaeter that cotrols the width of this Gaussialike bups. To gai a ituitio for what the RBF kerel is doig, cosider what predictio looks like i the perceptro: f (x) = α K(x, x) + b (11.19) = α exp [ γ x z 2] (11.20) I this coputatio, each traiig exaple is gettig to vote o the label of the test poit x. The aout of vote that the th traiig exaple gets is proportioal to the egative expoetial of the distace betwee the test poit ad itself. This is very uch like a RBF eural etwork, i which there is a Gaussia bup at each traiig exaple, with variace 1/(2γ), ad where the α s act as the weights coectig these RBF bups to the output. Showig that this kerel is positive defiite is a bit of a exercise i aalysis (particularly, itegratio by parts), but otherwise ot difficult. Agai, the proof is provided i the appedix.

8 148 a course i achie learig So far, you have see two bsaic classes of kerels: polyoial kerels (K(x, z) = (1 + x z) d ), which icludes the liear kerel (K(x, z) = x z) ad RBF kerels (K(x, z) = exp[ γ x z 2 ]). The forer have a direct coectio to feature expasio; the latter to RBF etworks. You also kow how to cobie kerels to get ew kerels by additio. I fact, you ca do ore tha that: the product of two kerels is also a kerel. As far as a library of kerels goes, there are ay. Polyoial ad RBF are by far the ost popular. A cooly used, but techically ivalid kerel, is the hyperbolic-taget kerel, which iics the behavior of a two-layer eural etwork. It is defied as: K (tah) = tah(1 + x z) Warig: ot psd (11.21) A fial exaple, which is ot very coo, but is oetheless iterestig, is the all-subsets kerel. Suppose that your D features are all biary: all take values 0 or 1. Let A {1, 2,... D} be a subset of features, ad let f A (x) = d A x d be the cojuctio of all the features i A. Let φ(x) be a feature vector over all such As, so that there are 2 D features i the vector φ. You ca copute the kerel associated with this feature appig as: ) K (subs) (x, z) = (1 + x d z d d (11.22) Verifyig the relatioship betwee this kerel ad the all-subsets feature appig is left as a exercise (but closely resebles the expasio for the quadratic kerel) Support Vector Machies Kerelizatio predated support vector achies, but SVMs are defiitely the odel that popularized the idea. Recall the defiitio of the soft-argi SVM fro Chapter 8.7 ad i particular the optiizatio proble (8.38), which attepts to balace a large argi (sall w 2 ) with a sall loss (sall ξ s, where ξ is the slack o the th traiig exaple). This proble is repeated below: i w,b,ξ 1 2 w 2 + C ξ (11.23) subj. to y (w x + b) 1 ξ ( ) ξ 0 ( ) Previously, you optiized this by explicitly coputig the slack variables ξ, give a solutio to the decisio boudary, w ad b. However, you are ow a expert with usig Lagrage ultipliers

9 kerel ethods 149 to optiize costraied probles! The overall goal is goig to be to rewrite the SVM optiizatio proble i a way that it o loger explicitly depeds o the weights w ad oly depeds o the exaples x through kerel products. There are 2N costraits i this optiizatio, oe for each slack costrait ad oe for the requireet that the slacks are oegative. Ulike the last tie, these costraits are ow iequalities, which require a slightly differet solutio. First, you rewrite all the iequalities so that they read as soethig 0 ad the add correspodig Lagrage ultipliers. The ai differece is that the Lagrage ultipliers are ow costraied to be o-egative, ad their sig i the augeted objective fuctio atters. The secod set of costraits is already i the proper for; the first set ca be rewritte as y (w x + b) 1 + ξ 0. You re ow ready to costruct the Lagragia, usig ultipliers α for the first set of costraits ad β for the secod set. L(w, b, ξ, α, β) = 1 2 w 2 + C The ew optiizatio proble is: i ax w,b,ξ ax α 0 β 0 ξ β ξ (11.24) α [y (w x + b) 1 + ξ ] (11.25) L(w, b, ξ, α, β) (11.26) The ituitio is exactly the sae as before. If you are able to fid a solutio that satisfies the costraits (e.g., the purple ter is properly o-egative), the the β s caot do aythig to hurt the solutio. O the other had, if the purple ter is egative, the the correspodig β ca go to +, breakig the solutio. You ca solve this proble by takig gradiets. This is a bit tedious, but ad iportat step to realize how everythig fits together. Sice your goal is to reove the depedece o w, the first step is to take a gradiet with respect to w, set it equal to zero, ad solve for w i ters of the other variables. w L = w α y x = 0 w = α y x (11.27) At this poit, you should iediately recogize a siilarity to the kerelized perceptro: the optial weight vector takes exactly the sae for i both algoriths. You ca ow take this ew expressio for w ad plug it back i to the expressio for L, thus reovig w fro cosideratio. To avoid subscript overloadig, you should replace the i the expressio for

10 150 a course i achie learig w with, say,. This yields: L(b, ξ, α, β) = 1 2 α y x 2 + C [ α ] y ([ α y x x + b ξ β ξ (11.28) ) ] 1 + ξ (11.29) At this poit, it s coveiet to rewrite these ters; be sure you uderstad where the followig coes fro: L(b, ξ, α, β) = 1 2 α α y y x x + (C β )ξ (11.30) = 1 2 b α α y y x x α (y b 1 + ξ ) (11.31) α α y y x x + (C β )ξ (11.32) α y α (ξ 1) (11.33) Thigs are startig to look good: you ve successfully reoved the depedece o w, ad everythig is ow writte i ters of dot products betwee iput vectors! This ight still be a difficult proble to solve, so you eed to cotiue ad attept to reove the reaiig variables b ad ξ. The derivative with respect to b is: L b = α y = 0 (11.34) This does t allow you to substitute b with soethig (as you did with w), but it does ea that the fourth ter (b α y ) goes to zero at the optiu. The last of the origial variables is ξ ; the derivatives i this case look like: L ξ = C β α C β = α (11.35) Agai, this does t allow you to substitute, but it does ea that you ca rewrite the secod ter, which as (C β )ξ as α ξ. This the cacels with (ost of) the fial ter. However, you eed to be careful to reeber soethig. Whe we optiize, both α ad β are costraied to be o-egative. What this eas is that sice we are droppig β fro the optiizatio, we eed to esure that α C, otherwise the correspodig β will eed to be egative, which is ot

11 kerel ethods 151 allowed. You fially wid up with the followig, where x x has bee replaced by K(x, x ): L(α) = α 1 2 α α y y K(x, x ) (11.36) If you are cofortable with atrix otatio, this has a very copact for. Let 1 deote the N-diesioal vector of all 1s, let y deote the vector of labels ad let G be the N N atrix, where G, = y y K(x, x ), the this has the followig for: L(α) = α α Gα (11.37) The resultig optiizatio proble is to axiize L(α) as a fuctio of α, subject to the costrait that the α s are all o-egative ad less tha C (because of the costrait added whe reovig the β variables). Thus, your proble is: i α L(α) = 1 2 α α y y K(x, x ) α (11.38) subj. to 0 α C ( ) Oe way to solve this proble is gradiet descet o α. The oly coplicatio is akig sure that the αs satisfy the costraits. I this case, you ca use a projected gradiet algorith: after each gradiet update, you adjust your paraeters to satisfy the costraits by projectig the ito the feasible regio. I this case, the projectio is trivial: if, after a gradiet step, ay α < 0, siply set it to 0; if ay α > C, set it to C Uderstadig Support Vector Machies The prior discussio ivolved quite a bit of ath to derive a represetatio of the support vector achie i ters of the Lagrage variables. This appig is actually sufficietly stadard that everythig i it has a ae. The origial proble variables (w, b, ξ) are called the prial variables; the Lagrage variables are called the dual variables. The optiizatio proble that results after reovig all of the prial variables is called the dual proble. A succict way of sayig what you ve doe is: you foud that after covertig the SVM ito its dual, it is possible to kerelize. To uderstad SVMs, a first step is to peek ito the dual forulatio, Eq (11.38). The objective has two ters: the first depeds o the data, ad the secod depeds oly o the dual variables. The first thig to otice is that, because of the secod ter, the αs wat to

12 152 a course i achie learig get as large as possible. The costrait esures that they caot exceed C, which eas that the geeral tedecy is for the αs to grow as close to C as possible. To further uderstad the dual optiizatio proble, it is useful to thik of the kerel as beig a easure of siilarity betwee two data poits. This aalogy is ost clear i the case of RBF kerels, but eve i the case of liear kerels, if your exaples all have uit or, the their dot product is still a easure of siilarity. Sice you ca write the predictio fuctio as f (ˆx) = sig( α y K(x, ˆx)), it is atural to thik of α as the iportace of traiig exaple, where α = 0 eas that it is ot used at all at test tie. Cosider two data poits that have the sae label; aely, y = y. This eas that y y = +1 ad the objective fuctio has a ter that looks like α α K(x, x ). Sice the goal is to ake this ter sall, the oe of two thigs has to happe: either K has to be sall, or α α has to be sall. If K is already sall, the this does t affect the settig of the correspodig αs. But if K is large, the this strogly ecourages at least oe of α or α to go to zero. So if you have two data poits that are very siilar ad have the sae label, at least oe of the correspodig αs will be sall. This akes ituitive sese: if you have two data poits that are basically the sae (both i the x ad y sese) the you oly eed to keep oe of the aroud. Suppose that you have two data poits with differet labels: y y = 1. Agai, if K(x, x ) is sall, othig happes. But if it is large, the the correspodig αs are ecouraged to be as large as possible. I other words, if you have two siilar exaples with differet labels, you are strogly ecouraged to keep the correspodig αs as large as C. A alterative way of uderstadig the SVM dual proble is geoetrically. Reeber that the whole poit of itroducig the variable α was to esure that the th traiig exaple was correctly classified, odulo slack. More forally, the goal of α is to esure that y (w x + b) 1 + ξ 0. Suppose that this costrait it ot satisfied. There is a iportat result i optiizatio theory, called the Karush-Kuh-Tucker coditios (or KKT coditios, for short) that states that at the optiu, the product of the Lagrage ultiplier for a costrait, ad the value of that costrait, will equal zero. I this case, this says that at the optiu, you have: [ ] α y (w x + b) 1 + ξ = 0 (11.39) I order for this to be true, it eas that (at least) oe of the followig ust be true: α = 0 or y (w x + b) 1 + ξ = 0 (11.40)

13 kerel ethods 153 A reasoable questio to ask is: uder what circustaces will α be o-zero? Fro the KKT coditios, you ca discer that α ca be o-zero oly whe the costrait holds exactly; aely, that y (w x + b) 1 + ξ = 0. Whe does that costrait hold exactly? It holds exactly oly for those poits precisely o the argi of the hyperplae. I other words, the oly traiig exaples for which α = 0 are those that lie precisely 1 uit away fro the axiu argi decisio boudary! (Or those that are oved there by the correspodig slack.) These poits are called the support vectors because they support the decisio boudary. I geeral, the uber of support vectors is far saller tha the uber of traiig exaples, ad therefore you aturally ed up with a solutio that oly uses a subset of the traiig data. Fro the first discussio, you kow that the poits that wid up beig support vectors are exactly those that are cofusable i the sese that you have to exaples that are earby, but have differet labels. This is a copletely i lie with the previous discussio. If you have a decisio boudary, it will pass betwee these cofusable poits, ad therefore they will ed up beig part of the set of support vectors Further Readig TODO further readig

1 The Primal and Dual of an Optimization Problem

1 The Primal and Dual of an Optimization Problem CS 189 Itroductio to Machie Learig Fall 2017 Note 18 Previously, i our ivestigatio of SVMs, we forulated a costraied optiizatio proble that we ca solve to fid the optial paraeters for our hyperplae decisio

More information

Statistics and Data Analysis in MATLAB Kendrick Kay, February 28, Lecture 4: Model fitting

Statistics and Data Analysis in MATLAB Kendrick Kay, February 28, Lecture 4: Model fitting Statistics ad Data Aalysis i MATLAB Kedrick Kay, kedrick.kay@wustl.edu February 28, 2014 Lecture 4: Model fittig 1. The basics - Suppose that we have a set of data ad suppose that we have selected the

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

5.6 Binomial Multi-section Matching Transformer

5.6 Binomial Multi-section Matching Transformer 4/14/21 5_6 Bioial Multisectio Matchig Trasforers 1/1 5.6 Bioial Multi-sectio Matchig Trasforer Readig Assiget: pp. 246-25 Oe way to axiize badwidth is to costruct a ultisectio Γ f that is axially flat.

More information

A string of not-so-obvious statements about correlation in the data. (This refers to the mechanical calculation of correlation in the data.

A string of not-so-obvious statements about correlation in the data. (This refers to the mechanical calculation of correlation in the data. STAT-UB.003 NOTES for Wedesday 0.MAY.0 We will use the file JulieApartet.tw. We ll give the regressio of Price o SqFt, show residual versus fitted plot, save residuals ad fitted. Give plot of (Resid, Price,

More information

5.6 Binomial Multi-section Matching Transformer

5.6 Binomial Multi-section Matching Transformer 4/14/2010 5_6 Bioial Multisectio Matchig Trasforers 1/1 5.6 Bioial Multi-sectio Matchig Trasforer Readig Assiget: pp. 246-250 Oe way to axiize badwidth is to costruct a ultisectio Γ f that is axially flat.

More information

) is a square matrix with the property that for any m n matrix A, the product AI equals A. The identity matrix has a ii

) is a square matrix with the property that for any m n matrix A, the product AI equals A. The identity matrix has a ii square atrix is oe that has the sae uber of rows as colus; that is, a atrix. he idetity atrix (deoted by I, I, or [] I ) is a square atrix with the property that for ay atrix, the product I equals. he

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

The Binomial Multi-Section Transformer

The Binomial Multi-Section Transformer 4/15/2010 The Bioial Multisectio Matchig Trasforer preset.doc 1/24 The Bioial Multi-Sectio Trasforer Recall that a ulti-sectio atchig etwork ca be described usig the theory of sall reflectios as: where:

More information

19.1 The dictionary problem

19.1 The dictionary problem CS125 Lecture 19 Fall 2016 19.1 The dictioary proble Cosider the followig data structural proble, usually called the dictioary proble. We have a set of ites. Each ite is a (key, value pair. Keys are i

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Lecture 19. Curve fitting I. 1 Introduction. 2 Fitting a constant to measured data

Lecture 19. Curve fitting I. 1 Introduction. 2 Fitting a constant to measured data Lecture 9 Curve fittig I Itroductio Suppose we are preseted with eight poits of easured data (x i, y j ). As show i Fig. o the left, we could represet the uderlyig fuctio of which these data are saples

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

X. Perturbation Theory

X. Perturbation Theory X. Perturbatio Theory I perturbatio theory, oe deals with a ailtoia that is coposed Ĥ that is typically exactly solvable of two pieces: a referece part ad a perturbatio ( Ĥ ) that is assued to be sall.

More information

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

42 Dependence and Bases

42 Dependence and Bases 42 Depedece ad Bases The spa s(a) of a subset A i vector space V is a subspace of V. This spa ay be the whole vector space V (we say the A spas V). I this paragraph we study subsets A of V which spa V

More information

and then substitute this into the second equation to get 5(11 4 y) 3y

and then substitute this into the second equation to get 5(11 4 y) 3y Math E-b Lecture # Notes The priary focus of this week s lecture is a systeatic way of solvig ad uderstadig systes of liear equatios algebraically, geoetrically, ad logically. Eaple #: Solve the syste

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Binomial transform of products

Binomial transform of products Jauary 02 207 Bioial trasfor of products Khristo N Boyadzhiev Departet of Matheatics ad Statistics Ohio Norther Uiversity Ada OH 4580 USA -boyadzhiev@ouedu Abstract Give the bioial trasfors { b } ad {

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Lecture Outline. 2 Separating Hyperplanes. 3 Banach Mazur Distance An Algorithmist s Toolkit October 22, 2009

Lecture Outline. 2 Separating Hyperplanes. 3 Banach Mazur Distance An Algorithmist s Toolkit October 22, 2009 18.409 A Algorithist s Toolkit October, 009 Lecture 1 Lecturer: Joatha Keler Scribes: Alex Levi (009) 1 Outlie Today we ll go over soe of the details fro last class ad ake precise ay details that were

More information

Introduction to Optimization, DIKU Monday 19 November David Pisinger. Duality, motivation

Introduction to Optimization, DIKU Monday 19 November David Pisinger. Duality, motivation Itroductio to Optiizatio, DIKU 007-08 Moday 9 Noveber David Pisiger Lecture, Duality ad sesitivity aalysis Duality, shadow prices, sesitivity aalysis, post-optial aalysis, copleetary slackess, KKT optiality

More information

16 EXPECTATION MAXIMIZATION

16 EXPECTATION MAXIMIZATION 16 EXPECTATION MAXIMIZATION A he is oly a egg s way of akig aother egg. Sauel Butler Suppose you were buildig a aive Bayes odel for a text categorizatio proble. After you were doe, your boss told you that

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

AVERAGE MARKS SCALING

AVERAGE MARKS SCALING TERTIARY INSTITUTIONS SERVICE CENTRE Level 1, 100 Royal Street East Perth, Wester Australia 6004 Telephoe (08) 9318 8000 Facsiile (08) 95 7050 http://wwwtisceduau/ 1 Itroductio AVERAGE MARKS SCALING I

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

6.867 Machine learning, lecture 13 (Jaakkola)

6.867 Machine learning, lecture 13 (Jaakkola) Lecture topics: Boostig, argi, ad gradiet descet copleity of classifiers, geeralizatio Boostig Last tie we arrived at a boostig algorith for sequetially creatig a eseble of base classifiers. Our base classifiers

More information

A PROBABILITY PROBLEM

A PROBABILITY PROBLEM A PROBABILITY PROBLEM A big superarket chai has the followig policy: For every Euros you sped per buy, you ear oe poit (suppose, e.g., that = 3; i this case, if you sped 8.45 Euros, you get two poits,

More information

Chapter 2. Asymptotic Notation

Chapter 2. Asymptotic Notation Asyptotic Notatio 3 Chapter Asyptotic Notatio Goal : To siplify the aalysis of ruig tie by gettig rid of details which ay be affected by specific ipleetatio ad hardware. [1] The Big Oh (O-Notatio) : It

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Perturbation Theory, Zeeman Effect, Stark Effect

Perturbation Theory, Zeeman Effect, Stark Effect Chapter 8 Perturbatio Theory, Zeea Effect, Stark Effect Ufortuately, apart fro a few siple exaples, the Schrödiger equatio is geerally ot exactly solvable ad we therefore have to rely upo approxiative

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

The Binomial Multi- Section Transformer

The Binomial Multi- Section Transformer 4/4/26 The Bioial Multisectio Matchig Trasforer /2 The Bioial Multi- Sectio Trasforer Recall that a ulti-sectio atchig etwork ca be described usig the theory of sall reflectios as: where: ( ω ) = + e +

More information

MA131 - Analysis 1. Workbook 2 Sequences I

MA131 - Analysis 1. Workbook 2 Sequences I MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

More information

Orthogonal Functions

Orthogonal Functions Royal Holloway Uiversity of odo Departet of Physics Orthogoal Fuctios Motivatio Aalogy with vectors You are probably failiar with the cocept of orthogoality fro vectors; two vectors are orthogoal whe they

More information

Automated Proofs for Some Stirling Number Identities

Automated Proofs for Some Stirling Number Identities Autoated Proofs for Soe Stirlig Nuber Idetities Mauel Kauers ad Carste Scheider Research Istitute for Sybolic Coputatio Johaes Kepler Uiversity Altebergerstraße 69 A4040 Liz, Austria Subitted: Sep 1, 2007;

More information

The Binomial Theorem

The Binomial Theorem The Biomial Theorem Robert Marti Itroductio The Biomial Theorem is used to expad biomials, that is, brackets cosistig of two distict terms The formula for the Biomial Theorem is as follows: (a + b ( k

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

September 2012 C1 Note. C1 Notes (Edexcel) Copyright   - For AS, A2 notes and IGCSE / GCSE worksheets 1 September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

Integrals of Functions of Several Variables

Integrals of Functions of Several Variables Itegrals of Fuctios of Several Variables We ofte resort to itegratios i order to deterie the exact value I of soe quatity which we are uable to evaluate by perforig a fiite uber of additio or ultiplicatio

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Kinetics of Complex Reactions

Kinetics of Complex Reactions Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Polynomial Functions and Their Graphs

Polynomial Functions and Their Graphs Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively

More information

The Hypergeometric Coupon Collection Problem and its Dual

The Hypergeometric Coupon Collection Problem and its Dual Joural of Idustrial ad Systes Egieerig Vol., o., pp -7 Sprig 7 The Hypergeoetric Coupo Collectio Proble ad its Dual Sheldo M. Ross Epstei Departet of Idustrial ad Systes Egieerig, Uiversity of Souther

More information

On Modeling On Minimum Description Length Modeling. M-closed

On Modeling On Minimum Description Length Modeling. M-closed O Modelig O Miiu Descriptio Legth Modelig M M-closed M-ope Do you believe that the data geeratig echais really is i your odel class M? 7 73 Miiu Descriptio Legth Priciple o-m-closed predictive iferece

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Mixture models (cont d)

Mixture models (cont d) 6.867 Machie learig, lecture 5 (Jaakkola) Lecture topics: Differet types of ixture odels (cot d) Estiatig ixtures: the EM algorith Mixture odels (cot d) Basic ixture odel Mixture odels try to capture ad

More information

We have also learned that, thanks to the Central Limit Theorem and the Law of Large Numbers,

We have also learned that, thanks to the Central Limit Theorem and the Law of Large Numbers, Cofidece Itervals III What we kow so far: We have see how to set cofidece itervals for the ea, or expected value, of a oral probability distributio, both whe the variace is kow (usig the stadard oral,

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

6.4 Binomial Coefficients

6.4 Binomial Coefficients 64 Bioial Coefficiets Pascal s Forula Pascal s forula, aed after the seveteeth-cetury Frech atheaticia ad philosopher Blaise Pascal, is oe of the ost faous ad useful i cobiatorics (which is the foral ter

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

COMP 2804 Solutions Assignment 1

COMP 2804 Solutions Assignment 1 COMP 2804 Solutios Assiget 1 Questio 1: O the first page of your assiget, write your ae ad studet uber Solutio: Nae: Jaes Bod Studet uber: 007 Questio 2: I Tic-Tac-Toe, we are give a 3 3 grid, cosistig

More information

Contents Two Sample t Tests Two Sample t Tests

Contents Two Sample t Tests Two Sample t Tests Cotets 3.5.3 Two Saple t Tests................................... 3.5.3 Two Saple t Tests Setup: Two Saples We ow focus o a sceario where we have two idepedet saples fro possibly differet populatios. Our

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

Sequences I. Chapter Introduction

Sequences I. Chapter Introduction Chapter 2 Sequeces I 2. Itroductio A sequece is a list of umbers i a defiite order so that we kow which umber is i the first place, which umber is i the secod place ad, for ay atural umber, we kow which

More information

INTEGRATION BY PARTS (TABLE METHOD)

INTEGRATION BY PARTS (TABLE METHOD) INTEGRATION BY PARTS (TABLE METHOD) Suppose you wat to evaluate cos d usig itegratio by parts. Usig the u dv otatio, we get So, u dv d cos du d v si cos d si si d or si si d We see that it is ecessary

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

Answer Key, Problem Set 1, Written

Answer Key, Problem Set 1, Written Cheistry 1 Mies, Sprig, 018 Aswer Key, Proble Set 1, Writte 1. 14.3;. 14.34 (add part (e): Estiate / calculate the iitial rate of the reactio); 3. NT1; 4. NT; 5. 14.37; 6. 14.39; 7. 14.41; 8. NT3; 9. 14.46;

More information

Bernoulli Polynomials Talks given at LSBU, October and November 2015 Tony Forbes

Bernoulli Polynomials Talks given at LSBU, October and November 2015 Tony Forbes Beroulli Polyoials Tals give at LSBU, October ad Noveber 5 Toy Forbes Beroulli Polyoials The Beroulli polyoials B (x) are defied by B (x), Thus B (x) B (x) ad B (x) x, B (x) x x + 6, B (x) dx,. () B 3

More information

Lecture 6: Integration and the Mean Value Theorem. slope =

Lecture 6: Integration and the Mean Value Theorem. slope = Math 8 Istructor: Padraic Bartlett Lecture 6: Itegratio ad the Mea Value Theorem Week 6 Caltech 202 The Mea Value Theorem The Mea Value Theorem abbreviated MVT is the followig result: Theorem. Suppose

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x

More information

P1 Chapter 8 :: Binomial Expansion

P1 Chapter 8 :: Binomial Expansion P Chapter 8 :: Biomial Expasio jfrost@tiffi.kigsto.sch.uk www.drfrostmaths.com @DrFrostMaths Last modified: 6 th August 7 Use of DrFrostMaths for practice Register for free at: www.drfrostmaths.com/homework

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Sequences, Series, and All That

Sequences, Series, and All That Chapter Te Sequeces, Series, ad All That. Itroductio Suppose we wat to compute a approximatio of the umber e by usig the Taylor polyomial p for f ( x) = e x at a =. This polyomial is easily see to be 3

More information

Lecture 3: Catalan Numbers

Lecture 3: Catalan Numbers CCS Discrete Math I Professor: Padraic Bartlett Lecture 3: Catala Numbers Week 3 UCSB 2014 I this week, we start studyig specific examples of commoly-occurrig sequeces of umbers (as opposed to the more

More information

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials Math 60 www.timetodare.com 3. Properties of Divisio 3.3 Zeros of Polyomials 3.4 Complex ad Ratioal Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered

More information

CS 70 Second Midterm 7 April NAME (1 pt): SID (1 pt): TA (1 pt): Name of Neighbor to your left (1 pt): Name of Neighbor to your right (1 pt):

CS 70 Second Midterm 7 April NAME (1 pt): SID (1 pt): TA (1 pt): Name of Neighbor to your left (1 pt): Name of Neighbor to your right (1 pt): CS 70 Secod Midter 7 April 2011 NAME (1 pt): SID (1 pt): TA (1 pt): Nae of Neighbor to your left (1 pt): Nae of Neighbor to your right (1 pt): Istructios: This is a closed book, closed calculator, closed

More information

Name Period ALGEBRA II Chapter 1B and 2A Notes Solving Inequalities and Absolute Value / Numbers and Functions

Name Period ALGEBRA II Chapter 1B and 2A Notes Solving Inequalities and Absolute Value / Numbers and Functions Nae Period ALGEBRA II Chapter B ad A Notes Solvig Iequalities ad Absolute Value / Nubers ad Fuctios SECTION.6 Itroductio to Solvig Equatios Objectives: Write ad solve a liear equatio i oe variable. Solve

More information

1 (12 points) Red-Black trees and Red-Purple trees

1 (12 points) Red-Black trees and Red-Purple trees CS6 Hoework 3 Due: 29 April 206, 2 oo Subit o Gradescope Haded out: 22 April 206 Istructios: Please aswer the followig questios to the best of your ability. If you are asked to desig a algorith, please

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row: Math 5-4 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig facts,

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Math 4707 Spring 2018 (Darij Grinberg): midterm 2 page 1. Math 4707 Spring 2018 (Darij Grinberg): midterm 2 with solutions [preliminary version]

Math 4707 Spring 2018 (Darij Grinberg): midterm 2 page 1. Math 4707 Spring 2018 (Darij Grinberg): midterm 2 with solutions [preliminary version] Math 4707 Sprig 08 Darij Griberg: idter page Math 4707 Sprig 08 Darij Griberg: idter with solutios [preliiary versio] Cotets 0.. Coutig first-eve tuples......................... 3 0.. Coutig legal paths

More information

Summer MA Lesson 13 Section 1.6, Section 1.7 (part 1)

Summer MA Lesson 13 Section 1.6, Section 1.7 (part 1) Suer MA 1500 Lesso 1 Sectio 1.6, Sectio 1.7 (part 1) I Solvig Polyoial Equatios Liear equatio ad quadratic equatios of 1 variable are specific types of polyoial equatios. Soe polyoial equatios of a higher

More information