Contents. This is page i Printer: Opaque this

Size: px
Start display at page:

Download "Contents. This is page i Printer: Opaque this"

Transcription

1 Cntents This is page i Printer: Opaque this Supprt Vectr Machines and Flexible Discriminants. Intrductin The Supprt Vectr Classifier.... Cmputing the Supprt Vectr Classifier Mixture Example (Cntinued) Supprt Vectr Machines Cmputing the SVM fr Classificatin The SVM as a Penalizatin Methd Functin Estimatin and Reprducing Kernels..... SVMs and the Curse f Dimensinality A Path Algrithm fr the SVM Classifier Supprt Vectr Machines fr Regressin Regressin and Kernels... Discussin Generalizing Linear Discriminant Analysis Flexible Discriminant Analysis... Cmputing the FDA Estimates Penalized Discriminant Analysis. Mixture Discriminant Analysis... Example: Wavefrm Data Bibligraphic Ntes Exercises.

2 ii Cntents References

3 This is page i Printer: Opaque this The Elements f Statistical Learning Data Mining, Inference and Predictin Chapter : Supprt Vectr Machines and Flexible Discriminants Jerme Friedman Trevr Hastie Rbert Tibshirani August, c Friedman, Hastie & Tibshirani

4 Supprt Vectr Machines and Flexible Discriminants This is page Printer: Opaque this. Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal separating hyperplanes are intrduced in Chapter fr the case when tw classes are linearly separable. Here we cver extensins t the nnseparable case, where the classes verlap. These techniques are then generalized t what is knwn as the supprt vectr machine, which prduces nnlinear bundaries by cnstructing a linear bundary in a large, transfrmed versin f the feature space. The secnd set f methds generalize Fisher s linear discriminant analysis (LDA). The generalizatins include flexible discriminant analysis which facilitates cnstructin f nnlinear bundaries in a manner very similar t the supprt vectr machines, penalized discriminant analysis fr prblems such as signal and image classificatin where the large number f features are highly crrelated, and mixture discriminant analysis fr irregularly shaped classes.. The Supprt Vectr Classifier In Chapter we discussed a technique fr cnstructing an ptimal separating hyperplane between tw perfectly separated classes. We review this and generalize t the nnseparable case, where the classes may nt be separable by a linear bundary.

5 . Flexible Discriminants PSfrag replacements ξ ξ ξ ξ ξ x T β + β = PSfrag replacements M = β M = β margin x T β + β = ξ ξ ξ ξ ξ M = β M = β margin FIGURE.. Supprt vectr classifiers. The left panel shws the separable case. The decisin bundary is the slid line, while brken lines bund the shaded maximal margin f width M = / β. The right panel shws the nnseparable (verlap) case. The pints labeled ξj are n the wrng side f their margin by an amunt ξj = Mξ j; pints n the crrect side have ξj =. The margin is maximized subject t a ttal budget P ξ i cnstant. Hence P ξj is the ttal distance f pints n the wrng side f their margin. Our training data cnsists f N pairs (x, y ), (x, y ),..., (x N, y N ), with x i IR p and y i {, }. Define a hyperplane by {x : f(x) = x T β + β = }, (.) where β is a unit vectr: β =. A classificatin rule induced by f(x) is G(x) = sign[x T β + β ]. (.) The gemetry f hyperplanes is reviewed in Sectin., where we shw that f(x) in (.) gives the signed distance frm a pint x t the hyperplane f(x) = x T β +β =. Since the classes are separable, we can find a functin f(x) = x T β + β with y i f(x i ) > i. Hence we are able t find the hyperplane that creates the biggest margin between the training pints fr class and (see Figure.). The ptimizatin prblem max M β,β, β = subject t y i (x T i β + β ) M, i =,..., N, (.) captures this cncept. The band in the figure is M units away frm the hyperplane n either side, and hence M units wide. It is called the margin. We shwed that this prblem can be mre cnveniently rephrased as min β β,β subject t y i (x T i β + β ), i =,..., N, (.)

6 . The Supprt Vectr Classifier where we have drpped the nrm cnstraint n β. Nte that M = / β. Expressin (.) is the usual way f writing the supprt vectr criterin fr separated data. This is a cnvex ptimizatin prblem (quadratic criterin, linear inequality cnstraints), and the slutin is characterized in Sectin... Suppse nw that the classes verlap in feature space. One way t deal with the verlap is t still maximize M, but allw fr sme pints t be n the wrng side f the margin. Define the slack variables ξ = (ξ, ξ,..., ξ N ). There are tw natural ways t mdify the cnstraint in (.): y i (x T i β + β ), M ξ i, (.) r y i (x T i β + β ), M( ξ i ), (.) i, ξ i, N i= ξ i cnstant. The tw chices lead t different slutins. The first chice seems mre natural, since it measures verlap in actual distance frm the margin; the secnd chice measures the verlap in relative distance, which changes with the width f the margin M. Hwever, the first chice results in a nncnvex ptimizatin prblem, while the secnd is cnvex; thus (.) leads t the standard supprt vectr classifier, which we use frm here n. Here is the idea f the frmulatin. The value ξ i in the cnstraint y i (x T i β+ β ) M( ξ i ) is the prprtinal amunt by which the predictin f(x i ) = x T i β +β is n the wrng side f its margin. Hence by bunding the sum ξ i, we bund the ttal prprtinal amunt by which predictins fall n the wrng side f their margin. Misclassificatins ccur when ξ i >, s bunding ξ i at a value K say, bunds the ttal number f training misclassificatins at K. As in (.) in Sectin.., we can drp the nrm cnstraint n β, define M = / β, and write (.) in the equivalent frm min β subject t { y i (x T i β + β ) ξ i i, ξ i, ξ i cnstant. (.) This is the usual way the supprt vectr classifier is defined fr the nnseparable case. Hwever we find cnfusing the presence f the fixed scale in the cnstraint y i (x T i β + β ) ξ i, and prefer t start with (.). The right panel f Figure. illustrates this verlapping case. By the nature f the criterin (.), we see that pints well inside their class bundary d nt play a big rle in shaping the bundary. This seems like an attractive prperty, and ne that differentiates it frm linear discriminant analysis (Sectin.). In LDA, the decisin bundary is determined by the cvariance f the class distributins and the psitins f the class centrids. We will see in Sectin.. that lgistic regressin is mre similar t the supprt vectr classifier in this regard.

7 . Flexible Discriminants.. Cmputing the Supprt Vectr Classifier The prblem (.) is quadratic with linear inequality cnstraints, hence it is a cnvex ptimizatin prblem. We describe a quadratic prgramming slutin using Lagrange multipliers. Cmputatinally it is cnvenient t re-express (.) in the equivalent frm min β,β β + C subject t i= ξ i ξ i, y i (x T i β + β ) ξ i i, (.) where the cst parameter C replaces the cnstant in (.); the separable case crrespnds t C =. The Lagrange (primal) functin is L P = β + C ξ i α i [y i (x T i β + β ) ( ξ i )] µ i ξ i, (.) i= i= which we minimize w.r.t β, β and ξ i. Setting the respective derivatives t zer, we get β = = i= α i y i x i, (.) i= α i y i, (.) i= α i = C µ i, i, (.) as well as the psitivity cnstraints α i, µ i, ξ i i. By substituting (.) (.) int (.), we btain the Lagrangian (Wlfe) dual bjective functin L D = α i α i α i y i y i x T i x i, (.) i= i= i = which gives a lwer bund n the bjective functin (.) fr any feasible pint. We maximize L D subject t α i C and N i= α iy i =. In additin t (.) (.), the Karush Kuhn Tucker cnditins include the cnstraints α i [y i (x T i β + β ) ( ξ i )] =, (.) µ i ξ i =, (.) y i (x T i β + β ) ( ξ i ), (.) fr i =,..., N. Tgether these equatins (.) (.) uniquely characterize the slutin t the primal and dual prblem.

8 . The Supprt Vectr Classifier Frm (.) we see that the slutin fr β has the frm ˆβ = ˆα i y i x i, (.) i= with nnzer cefficients ˆα i nly fr thse bservatins i fr which the cnstraints in (.) are exactly met (due t (.)). These bservatins are called the supprt vectrs, since ˆβ is represented in terms f them alne. Amng these supprt pints, sme will lie n the edge f the margin (ˆξ i = ), and hence frm (.) and (.) will be characterized by < ˆα i < C; the remainder (ˆξ i > ) have ˆα i = C. Frm (.) we can see that any f these margin pints ( < ˆα i, ˆξi = ) can be used t slve fr β, and we typically use an average f all the slutins fr numerical stability. Maximizing the dual (.) is a simpler cnvex quadratic prgramming prblem than the primal (.), and can be slved with standard techniques (Murray et al., fr example). Given the slutins ˆβ and ˆβ, the decisin functin can be written as Ĝ(x) = sign[ ˆf(x)] = sign[x T ˆβ + ˆβ ]. (.) The tuning parameter f this prcedure is the cst parameter C... Mixture Example (Cntinued) Figure. shws the supprt vectr bundary fr the mixture example f Figure. n page, with tw verlapping classes, fr tw different values f the cst parameter C. The classifiers are rather similar in their perfrmance. Pints n the wrng side f the bundary are supprt vectrs. In additin, pints n the crrect side f the bundary but clse t it (in the margin), are als supprt vectrs. The margin is larger fr C =. than it is fr C =,. Hence larger values f C fcus attentin mre n (crrectly classified) pints near the decisin bundary, while smaller values invlve data further away. Either way, misclassified pints are given weight, n matter hw far away. In this example the prcedure is nt very sensitive t chices f C, because f the rigidity f a linear bundary. The ptimal value fr C can be estimated by crss-validatin, as discussed in Chapter. Interestingly, the leave-ne-ut crss-validatin errr can be bunded abve by the prprtin f supprt pints in the data. The reasn is that leaving ut an bservatin that is nt a supprt vectr will nt change the slutin. Hence these bservatins, being classified crrectly by the riginal bundary, will be classified crrectly in the crss-validatin prcess. Hwever this bund tends t be t high, and nt generally useful fr chsing C (% and %, respectively, in ur examples).

9 . Flexible Discriminants Training Errr:. Test Errr:. Bayes Errr:. PSfrag replacements C =. C = Training Errr:. Test Errr:. Bayes Errr:. PSfrag replacements C =. C = FIGURE.. The linear supprt vectr bundary fr the mixture data example with tw verlapping classes, fr tw different values f C. The brken lines indicate the margins, where f(x) = ±. The supprt pints (α i > ) are all the pints n the wrng side f their margin. The black slid dts are thse supprt pints falling exactly n the margin (ξ i =, α i > ). In the upper panel % f the bservatins are supprt pints, while in the lwer panel % are. The brken purple curve in the backgrund is the Bayes decisin bundary.

10 . Supprt Vectr Machines. Supprt Vectr Machines The supprt vectr classifier described s far finds linear bundaries in the input feature space. As with ther linear methds, we can make the prcedure mre flexible by enlarging the feature space using basis expansins such as plynmials r splines (Chapter ). Generally linear bundaries in the enlarged space achieve better training-class separatin, and translate t nnlinear bundaries in the riginal space. Once the basis functins h m (x), m =,..., M are selected, the prcedure is the same as befre. We fit the SV classifier using input features h(x i ) = (h (x i ), h (x i ),..., h M (x i )), i =,..., N, and prduce the (nnlinear) functin ˆf(x) = h(x) T ˆβ + ˆβ. The classifier is Ĝ(x) = sign( ˆf(x)) as befre. The supprt vectr machine classifier is an extensin f this idea, where the dimensin f the enlarged space is allwed t get very large, infinite in sme cases. It might seem that the cmputatins wuld becme prhibitive. It wuld als seem that with sufficient basis functins, the data wuld be separable, and verfitting wuld ccur. We first shw hw the SVM technlgy deals with these issues. We then see that in fact the SVM classifier is slving a functin-fitting prblem using a particular criterin and frm f regularizatin, and is part f a much bigger class f prblems that includes the smthing splines f Chapter. The reader may wish t cnsult Sectin., which prvides backgrund material and verlaps smewhat with the next tw sectins... Cmputing the SVM fr Classificatin We can represent the ptimizatin prblem (.) and its slutin in a special way that nly invlves the input features via inner prducts. We d this directly fr the transfrmed feature vectrs h(x i ). We then see that fr particular chices f h, these inner prducts can be cmputed very cheaply. The Lagrange dual functin (.) has the frm L D = α i α i α i y i y i h(x i ), h(x i ). (.) i= i= i = Frm (.) we see that the slutin functin f(x) can be written f(x) = h(x) T β + β = α i y i h(x), h(x i ) + β. (.) i= As befre, given α i, β can be determined by slving y i f(x i ) = in (.) fr any (r all) x i fr which < α i < C.

11 . Flexible Discriminants S bth (.) and (.) invlve h(x) nly thrugh inner prducts. In fact, we need nt specify the transfrmatin h(x) at all, but require nly knwledge f the kernel functin K(x, x ) = h(x), h(x ) (.) that cmputes inner prducts in the transfrmed space. K shuld be a symmetric psitive (semi-) definite functin; see Sectin... Three ppular chices fr K in the SVM literature are dth-degree plynmial: K(x, x ) = ( + x, x ) d, Radial basis: K(x, x ) = exp( γ x x ), Neural netwrk: K(x, x ) = tanh(κ x, x + κ ). (.) Cnsider fr example a feature space with tw inputs X and X, and a plynmial kernel f degree. Then K(X, X ) = ( + X, X ) = ( + X X + X X ) = + X X + X X + (X X ) + (X X ) + X X X X. (.) Then M =, and if we chse h (X) =, h (X) = X, h (X) = X, h (X) = X, h (X) = X, and h (X) = X X, then K(X, X ) = h(x), h(x ). Frm (.) we see that the slutin can be written ˆf(x) = ˆα i y i K(x, x i ) + ˆβ. (.) i= The rle f the parameter C is clearer in an enlarged feature space, since perfect separatin is ften achievable there. A large value f C will discurage any psitive ξ i, and lead t an verfit wiggly bundary in the riginal feature space; a small value f C will encurage a small value f β, which in turn causes f(x) and hence the bundary t be smther. Figure. shw tw nnlinear supprt vectr machines applied t the mixture example f Chapter. The regularizatin parameter was chsen in bth cases t achieve gd test errr. The radial basis kernel prduces a bundary quite similar t the Bayes ptimal bundary fr this example; cmpare Figure.. In the early literature n supprt vectrs, there were claims that the kernel prperty f the supprt vectr machine is unique t it and allws ne t finesse the curse f dimensinality. Neither f these claims is true, and we g int bth f these issues in the next three subsectins.

12 . Supprt Vectr Machines SVM - Degree- Plynmial in Feature Space Training Errr:. Test Errr:. Bayes Errr:. SVM - Radial Kernel in Feature Space Training Errr:. Test Errr:. Bayes Errr:. FIGURE.. Tw nnlinear SVMs fr the mixture data. The upper plt uses a th degree plynmial kernel, the lwer a radial basis kernel (with γ = ). In each case C was tuned t apprximately achieve the best test errr perfrmance, and C = wrked well in bth cases. The radial basis kernel perfrms the best (clse t Bayes ptimal), as might be expected given the data arise frm mixtures f Gaussians. The brken purple curve in the backgrund is the Bayes decisin bundary.

13 . Flexible Discriminants Lss Hinge Lss Binmial Deviance Squared Errr Class Huber PSfrag replacements yf FIGURE.. The supprt vectr lss functin (hinge lss), cmpared t the negative lg-likelihd lss (binmial deviance) fr lgistic regressin, squared-errr lss, and a Huberized versin f the squared hinge lss. All are shwn as a functin f yf rather than f, because f the symmetry between the y = + and y = case. The deviance and Huber have the same asympttes as the SVM lss, but are runded in the interir All are scaled t have the limiting left-tail slpe f... The SVM as a Penalizatin Methd With f(x) = h(x) T β + β, cnsider the ptimizatin prblem min β, β [ y i f(x i )] + + λ β (.) i= where the subscript + indicates psitive part. This has the frm lss + penalty, which is a familiar paradigm in functin estimatin. It is easy t shw (Exercise.) that the slutin t (.), with λ = /C, is the same as that fr (.). Examinatin f the hinge lss functin L(y, f) = [ yf] + shws that it is reasnable fr tw-class classificatin, when cmpared t ther mre traditinal lss functins. Figure. cmpares it t the lg-likelihd lss fr lgistic regressin, as well as squared-errr lss and a variant theref. The (negative) lg-likelihd r binmial deviance has similar tails as the SVM lss, giving zer penalty t pints well inside their margin, and a

14 . Supprt Vectr Machines TABLE.. The ppulatin minimizers fr the different lss functins in Figure.. Lgistic regressin uses the binmial lg-likelihd r deviance. Linear discriminant analysis (Exercise.) uses squared-errr lss. The SVM hinge lss estimates the mde f the psterir class prbabilities, whereas the thers estimate a linear transfrmatin f these prbabilities. Lss Functin L[y, f(x)] Minimizing Functin Binmial Pr(Y = + x) Deviance lg[ + e yf(x) ] f(x) = lg Pr(Y = - x) SVM Hinge Lss Squared Errr [ yf(x)] + f(x) = sign[pr(y = + x) ] [y f(x)] = [ yf(x)] f(x) = Pr(Y = + x) Huberised Square Hinge Lss yf(x), yf(x) < - [ yf(x)] + therwise f(x) = Pr(Y = + x) linear penalty t pints n the wrng side and far away. Squared-errr, n the ther hand gives a quadratic penalty, and pints well inside their wn margin have a strng influence n the mdel as well. The squared hinge lss L(y, f) = [ yf] + is like the quadratic, except it is zer fr pints inside their margin. It still rises quadratically in the left tail, and will be less rbust than hinge r deviance t misclassified bservatins. Recently Rsset and Zhu () prpsed a Huberized versin f the squared hinge lss, which cnverts smthly t a linear lss at yf =. We can characterize these lss functins in terms f what they are estimating at the ppulatin level. We cnsider minimizing EL(Y, f(x)). Table. summarizes the results. Whereas the hinge lss estimates the classifier G(x) itself, all the thers estimate a transfrmatin f the class psterir prbabilities. The Huberized square hinge lss shares attractive prperties f lgistic regressin (smth lss functin, estimates prbabilities), as well as the SVM hinge lss (supprt pints). Frmulatin (.) casts the SVM as a regularized functin estimatin prblem, where the cefficients f the linear expansin f(x) = β + h(x) T β are shrunk tward zer (excluding the cnstant). If h(x) represents a hierarchical basis having sme rdered structure (such as rdered in rughness),

15 . Flexible Discriminants then the unifrm shrinkage makes mre sense if the rugher elements h j in the vectr h have smaller nrm. All the lss-functin in Table. except squared-errr are s called margin maximizing lss-functins (Rsset et al. ). This means that if the data are separable, then the limit f ˆβ λ in (.) as λ defines the ptimal separating hyperplane... Functin Estimatin and Reprducing Kernels Here we describe SVMs in terms f functin estimatin in reprducing kernel Hilbert spaces, where the kernel prperty abunds. This material is discussed in sme detail in Sectin.. This prvides anther view f the supprt vectr classifier, and helps t clarify hw it wrks. Suppse the basis h arises frm the (pssibly finite) eigen-expansin f a psitive definite kernel K, K(x, x ) = φ m (x)φ m (x )δ m (.) m= and h m (x) = δ m φ m (x). Then with θ m = δ m β m, we can write (.) as [ ] min y i (β + θ m φ m (x i )) + λ θm. (.) β, θ δ m i= m= + m= Nw (.) is identical in frm t (.) n page in Sectin., and the thery f reprducing kernel Hilbert spaces described there guarantees a finite-dimensinal slutin f the frm f(x) = β + α i K(x, x i ). (.) i= In particular we see there an equivalent versin f the ptimizatin criterin (.) [Equatin (.) in Sectin..; see als Wahba et al. ()], min α,α i= ( y i f(x i )) + + λ αt Kα, (.) where K is the N N matrix f kernel evaluatins fr all pairs f training features (Exercise.). These mdels are quite general, and include, fr example, the entire family f smthing splines, additive and interactin spline mdels discussed Fr lgistic regressin with separable data, ˆβλ diverges, but ˆβ λ / ˆβ λ cnverges t the ptimal separating directin.

16 . Supprt Vectr Machines in Chapters and, and in mre detail in Wahba () and Hastie and Tibshirani (). They can be expressed mre generally as min f H i= [ y i f(x i )] + + λj(f), (.) where H is the structured space f functins, and J(f) an apprpriate regularizer n that space. Fr example, suppse H is the space f additive functins f(x) = p j= f j(x j ), and J(f) = j {f j (x j )} dx j. Then the slutin t (.) is an additive cubic spline, and has a kernel representatin (.) with K(x, x ) = p j= K j(x j, x j ). Each f the K j is the kernel apprpriate fr the univariate smthing spline in x j (Wahba ). Cnversely this discussin als shws that, fr example, any f the kernels described in (.) abve can be used with any cnvex lss functin, and will als lead t a finite-dimensinal representatin f the frm (.). Figure. uses the same kernel functins as in Figure., except using the binmial lg-likelihd as a lss functin. The fitted functin is hence an estimate f the lg-dds, ˆf(x) = lg ˆPr(Y = + x) ˆPr(Y = x) = ˆβ + ˆα i K(x, x i ), (.) i= r cnversely we get an estimate f the class prbabilities ˆPr(Y = + x) = + e ˆβ P N i= ˆα ik(x,x i). (.) The fitted mdels are quite similar in shape and perfrmance. Examples and mre details are given in Sectin.. It des happen that fr SVMs, a sizeable fractin f the N values f α i can be zer (the nnsupprt pints). In the tw examples in Figure., these fractins are % and %, respectively. This is a cnsequence f the piecewise linear nature f the first part f the criterin (.). The lwer the class verlap (n the training data), the greater this fractin will be. Reducing λ will generally reduce the verlap (allwing a mre flexible f). A small number f supprt pints means that ˆf(x) can be evaluated mre quickly, which is imprtant at lkup time. Of curse, reducing the verlap t much can lead t pr generalizatin. Ji Zhu assisted in the preparatin f these examples.

17 . Flexible Discriminants LR - Degree- Plynmial in Feature Space Training Errr:. Test Errr:. Bayes Errr:. LR - Radial Kernel in Feature Space Training Errr:. Test Errr:. Bayes Errr:. FIGURE.. The lgistic regressin versins f the SVM mdels in Figure., using the identical kernels and hence penalties, but the lg-likelihd lss instead f the SVM lss functin. The tw brken cnturs crrespnd t psterir prbabilities f. and. fr the + class (r vice versa). The brken purple curve in the backgrund is the Bayes decisin bundary.

18 . Supprt Vectr Machines TABLE.. Skin f the range: Shwn are mean (standard errr f the mean) f the test errr ver simulatins. BRUTO fits an additive spline mdel adaptively, while MARS fits a lw-rder interactin mdel adaptively. Test Errr (SE) Methd N Nise Features Six Nise Features SV Classifier. (.). (.) SVM/ply. (.). (.) SVM/ply. (.). (.) SVM/ply. (.). (.) BRUTO. (.). (.) MARS. (.). (.) Bayes.... SVMs and the Curse f Dimensinality In this sectin, we address the questin f whether SVMs have sme edge n the curse f dimensinality. Ntice that in expressin (.) we are nt allwed a fully general inner prduct in the space f pwers and prducts. Fr example, all terms f the frm X j X j are given equal weight, and the kernel cannt adapt itself t cncentrate n subspaces. If the number f features p were large, but the class separatin ccurred nly in the linear subspace spanned by say X and X, this kernel wuld nt easily find the structure and wuld suffer frm having many dimensins t search ver. One wuld have t build knwledge abut the subspace int the kernel; that is, tell it t ignre all but the first tw inputs. If such knwledge were available a priri, much f statistical learning wuld be made much easier. A majr gal f adaptive methds is t discver such structure. We supprt these statements with an illustrative example. We generated bservatins in each f tw classes. The first class has fur standard nrmal independent features X, X, X, X. The secnd class als has fur standard nrmal independent features, but cnditined n Xj. This is a relatively easy prblem. As a secnd harder prblem, we augmented the features with an additinal six standard Gaussian nise features. Hence the secnd class almst cmpletely surrunds the first, like the skin surrunding the range, in a fur-dimensinal subspace. The Bayes errr rate fr this prblem is. (irrespective f dimensin). We generated test bservatins t cmpare different prcedures. The average test errrs ver simulatins, with and withut nise features, are shwn in Table.. Line uses the supprt vectr classifier in the riginal feature space. Lines refer t the supprt vectr machine with a -, - and -dimensinal plynmial kernel. Fr all supprt vectr prcedures, we chse the cst parameter C t minimize the test errr, t be as fair as pssible t the

19 . Flexible Discriminants Test Errr Curves SVM with Radial Kernel γ = γ = γ =. γ =. PSfrag replacements Test Errr.... e e+ e+ e e+ e+ e e+ e+ e e+ e+ FIGURE.. Test-errr curves as a functin f the cst parameter C fr the radial-kernel SVM classifier n the mixture data. At the tp f each plt is the scale parameter γ fr the radial kernel: K γ(x, y) = exp γ x y. The ptimal value fr C depends quite strngly n the scale f the kernel. The Bayes errr rate is indicated by the brken hrizntal lines. methd. Line fits an additive spline mdel t the (, +) respnse by least squares, using the BRUTO algrithm fr additive mdels, described in Hastie and Tibshirani (). Line uses MARS (multivariate adaptive regressin splines) allwing interactin f all rders, as described in Chapter ; as such it is cmparable with the SVM/ply. Bth BRUTO and MARS have the ability t ignre redundant variables. Test errr was nt used t chse the smthing parameters in either f lines r. In the riginal feature space, a hyperplane cannt separate the classes, and the supprt vectr classifier (line ) des prly. The plynmial supprt vectr machine makes a substantial imprvement in test errr rate, but is adversely affected by the six nise features. It is als very sensitive t the chice f kernel: the secnd degree plynmial kernel (line ) des best, since the true decisin bundary is a secnd-degree plynmial. Hwever, higher-degree plynmial kernels (lines and ) d much wrse. BRUTO perfrms well, since the bundary is additive. BRUTO and MARS adapt well: their perfrmance des nt deterirate much in the presence f nise. C.. A Path Algrithm fr the SVM Classifier The regularizatin parameter fr the SVM classifier is the cst parameter C, r its inverse λ in (.). Cmmn usage is t set C high, leading ften t smewhat verfit classifiers. Figure. shws the test errr n the mixture data as a functin f C, using different radial-kernel parameters γ. When γ = (narrw peaked kernels), the heaviest regularizatin (small C) is called fr. With γ =

20 . Supprt Vectr Machines PSfrag replacements α i(λ) λ f(x) = f(x) = f(x) = / β PSfrag replacements / β f(x) = f(x) = + f(x) = α i(λ) FIGURE.. A simple example illustrates the SVM path algrithm. (left panel:) This plt illustrates the state f the mdel at λ =.. The + pints are range, the blue. λ = /, and the width f the sft margin is / β =.. Tw blue pints {, } are misclassified, while the tw range pints {, } are crrectly classified, but n the wrng side f their margin f(x) = +; each f these has y if(x i) i <. The three square shaped pints {,, } are exactly n their margins. (right panel:) This plt shws the piecewise linear prfiles α i(λ). The hrizntal brken line at λ = / indicates the state f the α i fr the mdel in the left plt. λ (the value used in Figure.), an intermediate value f C is required. Clearly in situatins such as these, we need t determine a gd chice fr C, perhaps by crss-validatin. Here we describe a path algrithm (in the spirit f Sectin.) fr efficiently fitting the entire sequence f SVM mdels btained by varying C. It is cnvenient t use the lss+penalty frmulatin (.), alng with Figure.. This leads t a slutin fr β at a given value f λ: β λ = λ α i y i x i. (.) i= The α i are again Lagrange multipliers, but in this case they all lie in [, ]. Figure. illustrates the setup. It can be shwn that the KKT ptimality cnditins imply that the labeled pints (x i, y i ) fall int three distinct grups:

21 . Flexible Discriminants Observatins crrectly classified and utside their margins. They have y i f(x i ) >, and Lagrange multipliers α i =. Examples are the range pints, and, and the blue pints and. Observatins sitting n their margins with y i f(x i ) =, with Lagrange multipliers α i [, ]. Examples are the range and the blue and. Observatins inside their margins have y i f(x i ) <, with α i =. Examples are the the blue and, and the range and. The idea fr the path algrithm is as fllws. Initially λ is large, the margin / β λ is wide, and all pints are inside their margin and have α i =. As λ decreases, / β λ decreases, and the margin gets narrwer. Sme pints will mve frm inside their margins t utside their margins, and their α i will change frm t. By cntinuity f the α i (λ), these pints will linger n the margin during this transitin. Frm (.) we see that the pints with α i = make fixed cntributins t β(λ), and thse with α i = make n cntributin. S all that changes as λ decreases are the α i [, ] f thse (small number) f pints n the margin. Since all these pints have y i f(x i ) =, this results in a small set f linear equatins that prescribe hw α i (λ) and hence β λ changes during these transitins. This results in piecewise linear paths fr each f the α i (λ). The breaks ccur when pints crss the margin. Figure. (right panel) shws the α i (λ) prfiles fr the small example in the left panel. Althugh we have described this fr linear SVMs, exactly the same idea wrks fr nnlinear mdels, in which (.) is replaced by f λ (x) = λ α i y i K(x, x i ). (.) i= Details can be fund in Hastie et al. (). An R package svmpath is available n CRAN fr fitting these mdels... Supprt Vectr Machines fr Regressin In this sectin we shw hw SVMs can be adapted fr regressin with a quantitative respnse, in ways that inherit sme f the prperties f the SVM classifier. We first discuss the linear regressin mdel f(x) = x T β + β, (.) and then handle nnlinear generalizatins. T estimate β, we cnsider minimizatin f H(β, β ) = V (y i f(x i )) + λ β, (.) i=

22 . Supprt Vectr Machines PSfrag replacements Vɛ(r) - ɛ - - r ɛ VH(r) c - - FIGURE.. The left panel shws the ɛ-insensitive errr functin used by the supprt vectr regressin machine. The right panel shws the errr functin used in Huber s rbust regressin (blue curve). Beynd c, the functin changes frm quadratic t linear. r c where V ɛ (r) = { if r < ɛ, r ɛ, therwise. (.) This is an ɛ-insensitive errr measure, ignring errrs f size less than ɛ (left panel f Figure.). There is a rugh analgy with the supprt vectr classificatin setup, where pints n the crrect side f the decisin bundary and far away frm it, are ignred in the ptimizatin. In regressin, these lw errr pints are the nes with small residuals. It is interesting t cntrast this with errr measures used in rbust regressin in statistics. The mst ppular, due t Huber (), has the frm V H (r) = { r / if r c, c r c /, r > c, (.) shwn in the right panel f Figure.. This functin reduces frm quadratic t linear the cntributins f bservatins with abslute residual greater than a prechsen cnstant c. This makes the fitting less sensitive t utliers. The supprt vectr errr measure (.) als has linear tails (beynd ɛ), but in additin it flattens the cntributins f thse cases with small residuals. If ˆβ, ˆβ are the minimizers f H, the slutin functin can be shwn t have the frm ˆβ = ˆf(x) = (ˆα i ˆα i )x i, (.) i= (ˆα i ˆα i ) x, x i + β, (.) i=

23 . Flexible Discriminants where ˆα i, ˆα i are psitive and slve the quadratic prgramming prblem ɛ (α i + α i ) y i (α i α i ) + (α i α i )(α i α i ) x i, x i min α i,α i i= i= subject t the cnstraints i,i = α i, α i /λ, (α i α i) =, (.) i= α i α i =. Due t the nature f these cnstraints, typically nly a subset f the slutin values (ˆα i ˆα i) are nnzer, and the assciated data values are called the supprt vectrs. As was the case in the classificatin setting, the slutin depends n the input values nly thrugh the inner prducts x i, x i. Thus we can generalize the methds t richer spaces by defining an apprpriate inner prduct, fr example, ne f thse defined in (.). Nte that there are parameters, ɛ and λ, assciated with the criterin (.). These seem t play different rles. ɛ is a parameter f the lss functin V ɛ, just like c is fr V H. Nte that bth V ɛ and V H depend n the scale f y and hence r. If we scale ur respnse (and hence use V H (r/σ) and V ɛ (r/σ) instead), then we might cnsider using preset values fr c and ɛ (the value c =. achieves % efficiency fr the Gaussian). The quantity λ is a mre traditinal regularizatin parameter, and can be estimated fr example by crss-validatin... Regressin and Kernels As discussed in Sectin.., this kernel prperty is nt unique t supprt vectr machines. Suppse we cnsider apprximatin f the regressin functin in terms f a set f basis functins {h m (x)}, m =,,..., M: f(x) = M β m h m (x) + β. (.) m= T estimate β and β we minimize H(β, β ) = V (y i f(x i )) + λ β m (.) i= fr sme general errr measure V (r). Fr any chice f V (r), the slutin ˆf(x) = ˆβm h m (x) + ˆβ has the frm ˆf(x) = â i K(x, x i ) (.) i=

24 . Supprt Vectr Machines with K(x, y) = M m= h m(x)h m (y). Ntice that this has the same frm as bth the radial basis functin expansin and a regularizatin estimate, discussed in Chapters and. Fr cncreteness, let s wrk ut the case V (r) = r. Let H be the N M basis matrix with imth element h m (x i ), and suppse that M > N is large. Fr simplicity we assume that β =, r that the cnstant is absrbed in h; see Exercise. fr an alternative. We estimate β by minimizing the penalized least squares criterin The slutin is with ˆβ determined by H(β) = (y Hβ) T (y Hβ) + λ β. (.) ŷ = H ˆβ (.) H T (y H ˆβ) + λ ˆβ =. (.) Frm this it appears that we need t evaluate the M M matrix f inner prducts in the transfrmed space. Hwever, we can premultiply by H t give H ˆβ = (HH T + λi) HH T y. (.) The N N matrix HH T cnsists f inner prducts between pairs f bservatins i, i ; that is, the evaluatin f an inner prduct kernel {HH T } i,i = K(x i, x i ). It is easy t shw (.) directly in this case, that the predicted values at an arbitrary x satisfy ˆf(x) = h(x) T ˆβ = ˆα i K(x, x i ), (.) i= where ˆα = (HH T +λi) y. As in the supprt vectr machine, we need nt specify r evaluate the large set f functins h (x), h (x),..., h M (x). Only the inner prduct kernel K(x i, x i ) need be evaluated, at the N training pints fr each i, i and at pints x fr predictins there. Careful chice f h m (such as the eigenfunctins f particular, easy-t-evaluate kernels K) means, fr example, that HH T can be cmputed at a cst f N / evaluatins f K, rather than the direct cst N M. Nte, hwever, that this prperty depends n the chice f squared nrm β in the penalty. It des nt hld, fr example, fr the L nrm β, which may lead t a superir mdel.

25 . Flexible Discriminants.. Discussin The supprt vectr machine can be extended t multiclass prblems, essentially by slving many tw-class prblems. A classifier is built fr each pair f classes, and the final classifier is the ne that dminates the mst (Kressel, Friedman, Hastie and Tibshirani ). Alternatively, ne culd use the multinmial lss functin alng with a suitable kernel, as in Sectin... SVMs have applicatins in many ther supervised and unsupervised learning prblems. At the time f this writing, empirical evidence suggests that it perfrms well in many real learning prblems. Finally, we mentin the cnnectin f the supprt vectr machine and structural risk minimizatin (.). Suppse the training pints (r their basis expansin) are cntained in a sphere f radius R, and let G(x) = sign[f(x)] = sign[β T x + β ] as in (.). Then ne can shw that the class f functins {G(x), β A} has VC-dimensin h satisfying h R A. (.) If f(x) separates the training data, ptimally fr β A, then with prbability at least η ver training sets (Vapnik, page ): h[lg (N/h) + ] lg (η/) Errr Test. (.) N The supprt vectr classifier was ne f the first practical learning prcedures fr which useful bunds n the VC dimensin culd be btained, and hence the SRM prgram culd be carried ut. Hwever in the derivatin, balls are put arund the data pints a prcess that depends n the bserved values f the features. Hence in a strict sense, the VC cmplexity f the class is nt fixed a priri, befre seeing the features. The regularizatin parameter C cntrls an upper bund n the VC dimensin f the classifier. Fllwing the SRM paradigm, we culd chse C by minimizing the upper bund n the test errr, given in (.). Hwever, it is nt clear that this has any advantage ver the use f crss-validatin fr chice f C.. Generalizing Linear Discriminant Analysis In Sectin. we discussed linear discriminant analysis (LDA), a fundamental tl fr classificatin. Fr the remainder f this chapter we discuss a class f techniques that prduce better classifiers than LDA by directly generalizing LDA. Sme f the virtues f LDA are as fllws: It is a simple prttype classifier. A new bservatin is classified t the class with clsest centrid. A slight twist is that distance is measured in the Mahalanbis metric, using a pled cvariance estimate.

26 . Generalizing Linear Discriminant Analysis LDA is the estimated Bayes classifier if the bservatins are multivariate Gaussian in each class, with a cmmn cvariance matrix. Since this assumptin is unlikely t be true, this might nt seem t be much f a virtue. The decisin bundaries created by LDA are linear, leading t decisin rules that are simple t describe and implement. LDA prvides natural lw-dimensinal views f the data. Fr example, Figure. is an infrmative tw-dimensinal view f data in dimensins with ten classes. Often LDA prduces the best classificatin results, because f its simplicity and lw variance. LDA was amng the tp three classifiers fr f the datasets studied in the STATLOG prject (Michie et al. ). Unfrtunately the simplicity f LDA causes it t fail in a number f situatins as well: Often linear decisin bundaries d nt adequately separate the classes. When N is large, it is pssible t estimate mre cmplex decisin bundaries. Quadratic discriminant analysis (QDA) is ften useful here, and allws fr quadratic decisin bundaries. Mre generally we wuld like t be able t mdel irregular decisin bundaries. The afrementined shrtcming f LDA can ften be paraphrased by saying that a single prttype per class is insufficient. LDA uses a single prttype (class centrid) plus a cmmn cvariance matrix t describe the spread f the data in each class. In many situatins, several prttypes are mre apprpriate. At the ther end f the spectrum, we may have way t many (crrelated) predictrs, fr example, in the case f digitized analgue signals and images. In this case LDA uses t many parameters, which are estimated with high variance, and its perfrmance suffers. In cases such as this we need t restrict r regularize LDA even further. In the remainder f this chapter we describe a class f techniques that attend t all these issues by generalizing the LDA mdel. This is achieved largely by three different ideas. The first idea is t recast the LDA prblem as a linear regressin prblem. Many techniques exist fr generalizing linear regressin t mre flexible, nnparametric frms f regressin. This in turn leads t mre flexible frms f discriminant analysis, which we call FDA. In mst cases f interest, the This study predated the emergence f SVMs.

27 . Flexible Discriminants regressin prcedures can be seen t identify an enlarged set f predictrs via basis expansins. FDA amunts t LDA in this enlarged space, the same paradigm used in SVMs. In the case f t many predictrs, such as the pixels f a digitized image, we d nt want t expand the set: it is already t large. The secnd idea is t fit an LDA mdel, but penalize its cefficients t be smth r therwise cherent in the spatial dmain, that is, as an image. We call this prcedure penalized discriminant analysis r PDA. With FDA itself, the expanded basis set is ften s large that regularizatin is als required (again as in SVMs). Bth f these can be achieved via a suitably regularized regressin in the cntext f the FDA mdel. The third idea is t mdel each class by a mixture f tw r mre Gaussians with different centrids, but with every cmpnent Gaussian, bth within and between classes, sharing the same cvariance matrix. This allws fr mre cmplex decisin bundaries, and allws fr subspace reductin as in LDA. We call this extensin mixture discriminant analysis r MDA. All three f these generalizatins use a cmmn framewrk by expliting their cnnectin with LDA.. Flexible Discriminant Analysis In this sectin we describe a methd fr perfrming LDA using linear regressin n derived respnses. This in turn leads t nnparametric and flexible alternatives t LDA. As in Chapter, we assume we have bservatins with a quantitative respnse G falling int ne f K classes G = {,..., K}, each having measured features X. Suppse θ : G IR is a functin that assigns scres t the classes, such that the transfrmed class labels are ptimally predicted by linear regressin n X: If ur training sample has the frm (g i, x i ), i =,,..., N, then we slve min β,θ ( θ(gi ) x T i β), (.) i= with restrictins n θ t avid a trivial slutin (mean zer and unit variance ver the training data). This prduces a ne-dimensinal separatin between the classes. Mre generally, we can find up t L K sets f independent scrings fr the class labels, θ, θ,..., θ L, and L crrespnding linear maps η l (X) = X T β l, l =,..., L, chsen t be ptimal fr multiple regressin in IR p. The scres θ l (g) and the maps β l are chsen t minimize the average squared residual, [ ASR = L N ] ( θl (g i ) x T i N β ) l. (.) l= i=

28 . Flexible Discriminant Analysis The set f scres are assumed t be mutually rthgnal and nrmalized with respect t an apprpriate inner prduct t prevent trivial zer slutins. Why are we ging dwn this rad? It can be shwn that the sequence f discriminant (cannical) vectrs ν l derived in Sectin(..) are identical t the sequence β l up t a cnstant (Mardia et al., Hastie et al. ). Mrever, the Mahalanbis distance f a test pint x t the kth class centrid ˆµ k is given by δ J (x, ˆµ k ) = K l= w l (ˆη l (x) η k l ) + D(x), (.) where η l k is the mean f the ˆη l(x i ) in the kth class, and D(x) des nt depend n k. Here w l are crdinate weights that are defined in terms f the mean squared residual rl f the lth ptimally scred fit w l = rl ( (.) r l ). In Sectin.. we saw that these cannical distances are all that is needed fr classificatin in the Gaussian setup, with equal cvariances in each class. T summarize: LDA can be perfrmed by a sequence f linear regressins, fllwed by classificatin t the clsest class centrid in the space f fits. The analgy applies bth t the reduced rank versin, r the full rank case when L = K. The real pwer f this result is in the generalizatins that it invites. We can replace the linear regressin fits η l (x) = x T β l by far mre flexible, nnparametric fits, and by analgy achieve a mre flexible classifier than LDA. We have in mind generalized additive fits, spline functins, MARS mdels and the like. In this mre general frm the regressin prblems are defined via the criterin [ ASR({θ l, η l } L l= ) = L N ] (θ l (g i ) η l (x i )) + λj(η l ), (.) N l= i= where J is a regularizer apprpriate fr sme frms f nnparametric regressin, such as smthing splines, additive splines and lwer-rder ANOVA spline mdels. Als included are the classes f functins and assciated penalties generated by kernels, as in Sectin... Befre we describe the cmputatins invlved in this generalizatin, let us cnsider a very simple example. Suppse we use degree- plynmial regressin fr each η l. The decisin bundaries implied by the (.) will be quadratic surfaces, since each f the fitted functins is quadratic, and as

29 . Flexible Discriminants - - FIGURE.. The data cnsist f pints generated frm each f N(, I) and N(, I). The slid black ellipse is the decisin bundary fund by FDA using degree-tw plynmial regressin. The dashed purple circle is the Bayes decisin bundary. in LDA their squares cancel ut when cmparing distances. We culd have achieved identical quadratic bundaries in a mre cnventinal way, by augmenting ur riginal predictrs with their squares and crss-prducts. In the enlarged space ne perfrms an LDA, and the linear bundaries in the enlarged space map dwn t quadratic bundaries in the riginal space. A classic example is a pair f multivariate Gaussians centered at the rigin, ne having cvariance matrix I, and the ther ci fr c > ; Figure. pc lg c illustrates. The Bayes decisin bundary is the sphere x = (c ), which is a linear bundary in the enlarged space. Many nnparametric regressin prcedures perate by generating a basis expansin f derived variables, and then perfrming a linear regressin in the enlarged space. The MARS prcedure (Chapter ) is exactly f this frm. Smthing splines and additive spline mdels generate an extremely large basis set (N p basis functins fr additive splines), but then perfrm a penalized regressin fit in the enlarged space. SVMs d as well; see als the kernel-based regressin example in Sectin... FDA in this case can be shwn t perfrm a penalized linear discriminant analysis in the enlarged space. We elabrate in Sectin.. Linear bundaries in the enlarged space map dwn t nnlinear bundaries in the reduced space. This is exactly the same paradigm that is used with supprt vectr machines (Sectin.). We illustrate FDA n the speech recgnitin example used in Chapter.), with K = classes and p = predictrs. The classes crrespnd t

30 . Flexible Discriminant Analysis Linear Discriminant Analysis Flexible Discriminant Analysis -- Brut Crdinate fr Training Data Crdinate fr Training Data Crdinate fr Training Data Crdinate fr Training Data FIGURE.. The left plt shws the first tw LDA cannical variates fr the vwel training data. The right plt shws the crrespnding prjectin when FDA/BRUTO is used t fit the mdel; pltted are the fitted regressin functins ˆη (x i) and ˆη (x i). Ntice the imprved separatin. The letters label the vwel sunds. vwel sunds, each cntained in different wrds. Here are the wrds, preceded by the symbls that represent them: Vwel Wrd Vwel Wrd Vwel Wrd Vwel Wrd i: heed O hd I hid C: hard E head U hd A had u: wh d a: hard : heard Y hud Each f eight speakers spke each wrd six times in the training set, and likewise seven speakers in the test set. The ten predictrs are derived frm the digitized speech in a rather cmplicated way, but standard in the speech recgnitin wrld. There are thus training bservatins, and test bservatins. Figure. shws tw-dimensinal prjectins prduced by LDA and FDA. The FDA mdel used adaptive additive-spline regressin functins t mdel the η l (x), and the pints pltted in the right plt have crdinates ˆη (x i ) and ˆη (x i ). The rutine used in S-PLUS is called brut, hence the heading n the plt and in Table.. We see that flexible mdeling has helped t separate the classes in this case. Table. shws training and test errr rates fr a number f classificatin techniques. FDA/MARS refers t Friedman s multivariate adaptive regressin splines; degree = means pairwise prducts are permitted. Ntice that fr FDA/MARS, the best classificatin results are btained in a reduced-rank subspace.

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants Supprt Vectr Machines and Flexible Discriminants This is page Printer: Opaque this. Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal separating

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

Linear programming III

Linear programming III Linear prgramming III Review 1/33 What have cvered in previus tw classes LP prblem setup: linear bjective functin, linear cnstraints. exist extreme pint ptimal slutin. Simplex methd: g thrugh extreme pint

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall Stats 415 - Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin

More information

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001) CN700 Additive Mdels and Trees Chapter 9: Hastie et al. (2001) Madhusudana Shashanka Department f Cgnitive and Neural Systems Bstn University CN700 - Additive Mdels and Trees March 02, 2004 p.1/34 Overview

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Elements of Machine Intelligence - I

Elements of Machine Intelligence - I ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

Linear Classification

Linear Classification Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

7 TH GRADE MATH STANDARDS

7 TH GRADE MATH STANDARDS ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES 1 SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES Wlfgang HÄRDLE Ruslan MORO Center fr Applied Statistics and Ecnmics (CASE), Humbldt-Universität zu Berlin Mtivatin 2 Applicatins in Medicine estimatin f

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

1 The limitations of Hartree Fock approximation

1 The limitations of Hartree Fock approximation Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

Lecture 8: Multiclass Classification (I)

Lecture 8: Multiclass Classification (I) Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Lecture 8: Multiclass Classificatin (I) Ha Helen Zhang Fall 07 Ha Helen Zhang Lecture 8: Multiclass Classificatin

More information

Statistical Learning. 2.1 What Is Statistical Learning?

Statistical Learning. 2.1 What Is Statistical Learning? 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t

More information

The Solution Path of the Slab Support Vector Machine

The Solution Path of the Slab Support Vector Machine CCCG 2008, Mntréal, Québec, August 3 5, 2008 The Slutin Path f the Slab Supprt Vectr Machine Michael Eigensatz Jachim Giesen Madhusudan Manjunath Abstract Given a set f pints in a Hilbert space that can

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

Preparation work for A2 Mathematics [2017]

Preparation work for A2 Mathematics [2017] Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line Chapter 13: The Crrelatin Cefficient and the Regressin Line We begin with a sme useful facts abut straight lines. Recall the x, y crdinate system, as pictured belw. 3 2 1 y = 2.5 y = 0.5x 3 2 1 1 2 3 1

More information

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion .54 Neutrn Interactins and Applicatins (Spring 004) Chapter (3//04) Neutrn Diffusin References -- J. R. Lamarsh, Intrductin t Nuclear Reactr Thery (Addisn-Wesley, Reading, 966) T study neutrn diffusin

More information

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are: Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b . REVIEW OF SOME BASIC ALGEBRA MODULE () Slving Equatins Yu shuld be able t slve fr x: a + b = c a d + e x + c and get x = e(ba +) b(c a) d(ba +) c Cmmn mistakes and strategies:. a b + c a b + a c, but

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

EDA Engineering Design & Analysis Ltd

EDA Engineering Design & Analysis Ltd EDA Engineering Design & Analysis Ltd THE FINITE ELEMENT METHOD A shrt tutrial giving an verview f the histry, thery and applicatin f the finite element methd. Intrductin Value f FEM Applicatins Elements

More information

Overview of Supervised Learning

Overview of Supervised Learning 2 Overview f Supervised Learning 2.1 Intrductin The first three examples described in Chapter 1 have several cmpnents in cmmn. Fr each there is a set f variables that might be dented as inputs, which are

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 11: Mdeling with systems f ODEs In Petre Department f IT, Ab Akademi http://www.users.ab.fi/ipetre/cmpmd/ Mdeling with differential equatins Mdeling strategy Fcus

More information

Margin Distribution and Learning Algorithms

Margin Distribution and Learning Algorithms ICML 03 Margin Distributin and Learning Algrithms Ashutsh Garg IBM Almaden Research Center, San Jse, CA 9513 USA Dan Rth Department f Cmputer Science, University f Illinis, Urbana, IL 61801 USA ASHUTOSH@US.IBM.COM

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm

More information

Aerodynamic Separability in Tip Speed Ratio and Separability in Wind Speed- a Comparison

Aerodynamic Separability in Tip Speed Ratio and Separability in Wind Speed- a Comparison Jurnal f Physics: Cnference Series OPEN ACCESS Aerdynamic Separability in Tip Speed Rati and Separability in Wind Speed- a Cmparisn T cite this article: M L Gala Sants et al 14 J. Phys.: Cnf. Ser. 555

More information

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern. Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large

More information

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus A Crrelatin f Suth Carlina Academic Standards fr Mathematics Precalculus INTRODUCTION This dcument demnstrates hw Precalculus (Blitzer), 4 th Editin 010, meets the indicatrs f the. Crrelatin page references

More information

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Emphases in Common Core Standards for Mathematical Content Kindergarten High School Emphases in Cmmn Cre Standards fr Mathematical Cntent Kindergarten High Schl Cntent Emphases by Cluster March 12, 2012 Describes cntent emphases in the standards at the cluster level fr each grade. These

More information

Lecture 10, Principal Component Analysis

Lecture 10, Principal Component Analysis Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

Preparation work for A2 Mathematics [2018]

Preparation work for A2 Mathematics [2018] Preparatin wrk fr A Mathematics [018] The wrk studied in Y1 will frm the fundatins n which will build upn in Year 13. It will nly be reviewed during Year 13, it will nt be retaught. This is t allw time

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Thermodynamics and Equilibrium

Thermodynamics and Equilibrium Thermdynamics and Equilibrium Thermdynamics Thermdynamics is the study f the relatinship between heat and ther frms f energy in a chemical r physical prcess. We intrduced the thermdynamic prperty f enthalpy,

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

Linear Methods for Regression

Linear Methods for Regression 3 Linear Methds fr Regressin This is page 43 Printer: Opaque this 3.1 Intrductin A linear regressin mdel assumes that the regressin functin E(Y X) is linear in the inputs X 1,...,X p. Linear mdels were

More information

Tutorial 4: Parameter optimization

Tutorial 4: Parameter optimization SRM Curse 2013 Tutrial 4 Parameters Tutrial 4: Parameter ptimizatin The aim f this tutrial is t prvide yu with a feeling f hw a few f the parameters that can be set n a QQQ instrument affect SRM results.

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information