Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall
|
|
- Amos Phillips
- 5 years ago
- Views:
Transcription
1 Stats Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall jizhu@umich.edu
2 Stats Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin Predicting tumr cells as benign r malignant. Classifying credit card transactins as legitimate r fraudulent. Classifying secndary structures f prtein as alpha-helix, beta-sheet, r randm cil. Categrizing news stries as finance, weather, entertainment, sprts, etc.
3 Stats Classificatin Ji Zhu, Michigan Statistics 3 Classificatin: Definitin Given a cllectin f data pints Each data pint cntains a set f variables, ne f the variables is the class (categrical, qualitative). Find a mdel fr the class variable as a functin f the values f ther variables. Gal: previusly unseen data pints shuld be assigned a class as accurately as pssible. Usually, the given data set is divided int training and test sets, with training set used t build the mdel and test set used t validate it.
4 Stats Classificatin Ji Zhu, Michigan Statistics 4 Illustrating Classificatin
5 Stats Classificatin Ji Zhu, Michigan Statistics 5 Mathematical Setup Class label Y Input variables X = (X 1, X 2,..., X p ) Y takes values in a finite, unrdered set (survived/died, cancer class f tissue sample...). Tw-class: Y {c 1, c 2 } Multi-class: Y {c 1, c 2,..., c K } We have training data, which are bservatins (examples, instances) f these measurements.
6 Stats Classificatin Ji Zhu, Michigan Statistics 6 Objectives On the basis f the training data we wuld like t: Prduce a classifier Ĉ(x) that accurately predicts unseen test cases. Understand which inputs affect the utput, and hw.
7 Stats Classificatin Ji Zhu, Michigan Statistics 7 Optimal Classifier (X, Y) have a jint prbability distributin. Chse Ĉ(x) t have small misclassificatin errr: Bayes ptimal classifier: R(Ĉ) = Pr(Ĉ(X) = Y) C (x) = arg min R(C) C = arg max Pr(Y = c k X = x) k
8 Stats Classificatin Ji Zhu, Michigan Statistics 8 Generative Methds Estimate f (x Y = c k ), then use the Bayes rule Pr(Y = c k X = x) f (x Y = c k ) Pr(Y = c k ) Linear discriminant analysis (LDA); Quadratic discriminant analysis (QDA); Naive Bayes
9 Stats Classificatin Ji Zhu, Michigan Statistics 9 X 2 X 1
10 Stats Classificatin Ji Zhu, Michigan Statistics 10 Discriminative Methds Estimate Pr(Y = c k X = x) directly Lgistic regressin K-nearest neighbr (KNN) Supprt vectr machines (SVM) Classificatin tree (CART) Ensemble methds: bsting, randm frest
11 Stats Classificatin Ji Zhu, Michigan Statistics 11 Linear Discriminant Analysis Let π k be the prir prbability f class k. Let f k (x) be the class-cnditinal density f X in class k. The psterir prbability Pr(Y = c k X = x) = f k (x)π k K k=1 f k(x)π k
12 Stats Classificatin Ji Zhu, Michigan Statistics 12 We mdel each class density as multivariate Gaussian 1 N(µ k, Σ k ) : (2π) p/2 Σ k 1/2 e 1 2 (x µ k) T Σ 1 k (x µ k ). Assume Σ k = Σ fr all k. Fr each k, the discriminant functin is Decisin rule δ k (x) = x T Σ 1 µ k 1 2 µt k Σ 1 µ k + lg π k Ĉ(x) = arg max k δ k (x)
13 Stats Classificatin Ji Zhu, Michigan Statistics 13 Remarks Classify x t the class with the clsest centrid t x, using the squared Mahalanbis distance. Special case: Σ = I (then Euclidean distance is used) δ k (x) = 1 2 x µ k 2 + lg π k
14 Stats Classificatin Ji Zhu, Michigan Statistics 14 Cmparing class k and class k, the lg-rati is lg Pr(Y = c k X = x) Pr(Y = c k X = x) = x T Σ 1 (µ k µ k ) + lg π k π k 1 2 (µ k + µ k ) T Σ 1 (µ k µ k ) Linear decisin bundary, with the directinal vectr Σ 1 (µ k µ k ); generally nt in the directin f (µ k µ k ).
15 Stats Classificatin Ji Zhu, Michigan Statistics 15
16 Stats Classificatin Ji Zhu, Michigan Statistics 16
17 Stats Classificatin Ji Zhu, Michigan Statistics 17 Parameter Estimatin f LDA In practice, we estimate the parameters frm the training data. ˆπ k = n k /n, where n k is the number f bservatins in class k. ˆµ k = yi =c k x i /n k The pled cvariance ˆΣ = K k=1 y i =c k (x i ˆµ k )(x i ˆµ k ) T /(n K)
18 Stats Classificatin Ji Zhu, Michigan Statistics 18 Quadratic Discriminant Analysis Σ k s are allwed t be different. Discriminant functin δ k (x) = 1 2 lg Σ k 1 2 (x µ k) T Σ 1 k (x µ k ) + lg π k = x T W k x + x T w k + b k The decisin bundary between class k and class k is a quadratic functin {x : x T (W k W k )x + x T (w k w k ) + (b k b k ) = 0} Mre parameters in QDA than in LDA, especially when p is large.
19 Stats Classificatin Ji Zhu, Michigan Statistics 19 Bth LDA and QDA perfrm well n many real classificatin prblems.
20 Stats Classificatin Ji Zhu, Michigan Statistics 20 Naive Bayes Classifier Assume independence amng input variables when class is given f (x 1,..., x p Y = c k ) = f (x 1 c k ) f (x 2 c k )... f (x p c k ) Estimate f (x j c k ) fr all j and c k. New pint is classified t c k if p j=1 f (xj c k ) π k is maximal. Strng assumptin, but ften classifies well when p is large.
21 Stats Classificatin Ji Zhu, Michigan Statistics 21 Generative methds Discriminative methds
22 Stats Classificatin Ji Zhu, Michigan Statistics 22 Discriminative Methds Lgistic regressin K-nearest neighbr (KNN) Supprt vectr machines (SVM) Classificatin tree (CART) Ensemble methds: bsting, randm frest
23 Stats Classificatin Ji Zhu, Michigan Statistics 23 Lgistic Regressin Tw-class case: Y {c 1, c 2 } (K = 2) Use the lgit transfrmatin Prbabilities lg Pr(Y = c 1 X = x) Pr(Y = c 2 X = x) = β 0 + β T x Pr(Y = c 1 X = x) = Pr(Y = c 2 X = x) = e β 0+β T x 1 + e β 0+β T x e β 0+β T x Ensure the prbabilities t be in [0, 1].
24 Stats Classificatin Ji Zhu, Michigan Statistics 24 Fitting the Lgistic Regressin Mdel Maximum likelihd estimatin Dente θ = (β 0, β) Let x (1, x) Cnditinal lg-likelihd f Y given X l(θ) = n lg Pr(Y = y i X = x i ; θ) i=1
25 Stats Classificatin Ji Zhu, Michigan Statistics 25 Cde {c 1, c 2 } as {0, 1}, then l(θ) = = n i=1 n i=1 [y i lg Pr(c 1 x i ; θ) + (1 y i ) lg Pr(c 2 x i ; θ)] [y i θ T x i lg(1 + e θt x i )]
26 Stats Classificatin Ji Zhu, Michigan Statistics 26 Partial derivative Scre Equatin l(θ) θ = n i=1 x i (y i p(x i ; θ)) = 0 where p(x i ; θ) = e θt x i /(1 + e θt x i ). There are (p + 1) equatins. Nnlinear in θ
27 Stats Classificatin Ji Zhu, Michigan Statistics 27 Newtn-Raphsn Algrithm The secnd-derivative (Hessian) matrix 2 l(θ) θ θ T = n i=1 1. Chse an initial value θ 0 2. Update θ by x i x T i p(x i; θ)[1 p(x i ; θ)] θ new = θ ld [ 2 ] 1 l(θ) l(θ) θ θ T θ
28 Stats Classificatin Ji Zhu, Michigan Statistics 28 Iteratively Reweighted Least Squares Using vectr and matrix ntatins Let W = diag[p(x i ; θ ld )(1 p(x i ; θ ld ))] l(θ) θ = X T (y p) 2 l(θ) θ θ T = = X T WX
29 Stats Classificatin Ji Zhu, Michigan Statistics 29 Newtn-Raphsn step θ new = θ ld + (X T WX) 1 X T (y p) = (X T WX) 1 X T Wz where z is the adjusted respnse z = Xθ ld + W 1 (y p) z i = x T i θ + (y i p i ) p i (1 p i )
30 Stats Classificatin Ji Zhu, Michigan Statistics 30 In each iteratin, we slve the weighted least squares prblem θ new = arg min θ (z Xθ) T W(z Xθ) θ = 0 can be used as a starting pint.
31 Stats Classificatin Ji Zhu, Michigan Statistics 31 Inference If the mdel is crrect, ˆθ is cnsistent. Using the central limit therem, the distributin f ˆθ cnverges t N(θ, (X T WX) 1 ).
32 Stats Classificatin Ji Zhu, Michigan Statistics 32 Multi-class Case Use class K as a reference lg Pr(Y = c 1 X = x) Pr(Y = c K X = x) lg Pr(Y = c 2 X = x) Pr(Y = c K X = x) = β 10 + β T 1 x = β 20 + β T 2 x. =. lg Pr(Y = c K 1 X = x) Pr(Y = c K X = x) = β (K 1)0 + β T (K 1) x Multinmial lgistic regressin
33 Stats Classificatin Ji Zhu, Michigan Statistics 33 Lgistic Regressin vs LDA Fr LDA, the lg-psterir dds between class k and class K is linear lg Pr(Y = c k X = x) Pr(Y = c K X = x) = x T Σ 1 (µ k µ K ) + lg π k π K 1 2 (µ k + µ K ) T Σ 1 (µ k µ K ) = α k0 + α T k x Lgistic mdel has linear lgits by cnstructin lg Pr(Y = c k X = x) Pr(Y = c K X = x) = β k0 + β T k x The same frm. Are they the same estimatr?
34 Stats Classificatin Ji Zhu, Michigan Statistics 34 Where is Linearity Frm Fr LDA, the linearity is a cnsequence f the Gaussian assumptin fr the class densities and the assumptin f a cmmn cvariance matrix. Fr lgistic regressin, the linearity cmes by cnstructin. The difference lies in the way the linear cefficients are estimated.
35 Stats Classificatin Ji Zhu, Michigan Statistics 35 Cmmn Cmpnent The jint density f (X, Y) is Pr(X, Y = c k ) = Pr(X) Pr(Y = c k X) where Pr(X) is the marginal density f the input X. Fr bth LDA and lgistic regressin, the secnd term Pr(Y = c k X) has the same lgit linear frm Pr(Y = c k X = x) = exp(θ k0 + θ T k x) 1 + K k =1 exp(θ k 0 + θ T k x)
36 Stats Classificatin Ji Zhu, Michigan Statistics 36 Which Mdel is Mre General Hwever, they make different assumptins abut Pr(X). The lgistic mdel leaves the marginal density f X arbitrary and unspecified. The LDA mdel assumes a Gaussian mixture density Pr(x) = K π k φ(x; µ k, Σ) k=1 Lgistic mdel makes fewer assumptins abut the data, and is mre general.
37 Stats Classificatin Ji Zhu, Michigan Statistics 37 Parameter Estimatin Lgistic regressin Maximizing the cnditinal likelihd, the multinmial likelihd with prbabilities Pr(Y = c k X). The marginal density Pr(X) is ignred (fully nnparametric using the empirical distributin functin which places 1/n at each bservatin).
38 Stats Classificatin Ji Zhu, Michigan Statistics 38 LDA Maximizing the full likelihd based n the jint density Pr(x, Y = c k ) = φ(x; µ k, Σ) π k Marginal density des play a rle.
39 Stats Classificatin Ji Zhu, Michigan Statistics 39 Remarks LDA is easier t cmpute than lgistic regressin. If the true f k (x) s are Gaussian, LDA is better. Lgistic regressin may lse efficiency arund 30% asympttically in errr rate (Efrn 1975). Rbustness LDA uses all the data pints t estimate the cvariance matrix mre infrmatin but nt rbust against utliers. Lgistic regressin dwn-weights pints far frm decisin bundary mre rbust.
40 Stats Classificatin Ji Zhu, Michigan Statistics 40 Discriminative Methds Lgistic regressin K-nearest neighbr (KNN) Supprt vectr machines (SVM) Classificatin tree (CART) Ensemble methds: bsting, randm frest
41 Stats Classificatin Ji Zhu, Michigan Statistics 41 K-nearest Neighbr Methd
42 Stats Classificatin Ji Zhu, Michigan Statistics 42 Cde Y = 1 if Red, and Y = 1 if Green. A natural way t classify a new pint is t have a lk at its neighbrs, and take a vte: ˆf (x ) = 1 K x i N K (x ) y i where N K (x ) cntains the K clsest pints t x in the training data (K-nearest neighbrhd).
43 Stats Classificatin Ji Zhu, Michigan Statistics 43 If there is a clear dminance f ne f the classes in the neighbrhd f an bservatin x, then it is likely that the bservatin itself wuld belng t that class, t. Thus the classificatin rule is the majrity vting amng the members f N K (x ). Thus, Ĉ(x ) = Red if ˆf (x ) > 0 Green if ˆf (x ) < 0
44 Stats Classificatin Ji Zhu, Michigan Statistics 44
45 Stats Classificatin Ji Zhu, Michigan Statistics 45 Oracle Oracle: The data in each class are generated frm a mixture f Gaussians. The density fr each class was an equal mixture f 10 Gaussians. Fr the Green class, its 10 means were generated frm a N((1, 0) T, I) distributin (and cnsidered fixed). Fr the Red class, the 10 means were generated frm a N((0, 1) T, I). The within cluster variances were 1/5.
46 Stats Classificatin Ji Zhu, Michigan Statistics 46
47 Stats Classificatin Ji Zhu, Michigan Statistics 47 K-NN tries t implement cnditinal expectatins directly, by apprximating expectatins by sample averages, relaxing the ntin f cnditining at a pint, t cnditining in a regin clse t the target pint. In thery, when n, K, such that K/n 0, the K-nearest neighbr estimate ˆf (x) f (x) = E(Y X = x) (cnsistent)
48 Stats Classificatin Ji Zhu, Michigan Statistics 48 Degrees f Freedm fr K-NN Hw many parameters des K-nearest neighbrs use t describe the fit? One, the value f K? Mre realistically, K-nearest neighbrs uses n/k effective number f parameters. K cntrls the mdel cmplexity: the smaller K, the mre cmplex the mdel. In general n/k > p, thus K-NN is mre flexible than linear mdels.
49 Stats Classificatin Ji Zhu, Michigan Statistics 49
50 Stats Classificatin Ji Zhu, Michigan Statistics 50 Hw t chse the ptimal K? Can we minimize the training errr? N. When K = 1, the training errr is zer. (Overfitting) Chse K t minimize the misclassificatin errr. Generate an independent test set, using the test errr t estimate the misclassificatin errr.
51 Stats Classificatin Ji Zhu, Michigan Statistics 51
52 Stats Classificatin Ji Zhu, Michigan Statistics 52 Mdel Selectin Suppse the data arise frm a mdel Y = f (X) + ɛ, with E(ɛ) = 0 and Var(ɛ) = σɛ 2. Let Γ = {(x i, y i ), i = 1,..., n} and ŷ = K 1 K l=1 y (l). The subscript (l) indicates the sequence f nearest neighbrs t x. Then the expected predictin errr at x is EPE(x ) = E y x E Γ(y ŷ ) 2 = σɛ 2 + ( f (x ) E Γ (ŷ )) 2 + Var Γ (ŷ )
53 Stats Classificatin Ji Zhu, Michigan Statistics 53 Fr simplicity, assume x i s in the sample are fixed (nnrandm). Then E Γ (y (l) ) = f (x (l) ) Var Γ (y (l) ) = σ 2 ɛ EPE(x ) = σ 2 ɛ + ( f (x ) 1 K K l=1 f (x (l) ) ) 2 + σ2 ɛ K The first term is an irreducible errr. The secnd and third terms make up the mean squared errr (MSE) at x.
54 Stats Classificatin Ji Zhu, Michigan Statistics 54 Bias-Variance Tradeff The squared bias term tends t increase with K. Fr small K, the clsest neighbrs have values f (x (l) ) similar t f (x ). Fr large K, mre further away pints are cunted as neighbrs. The variance term decreases as the inverse f K when K increases. Bias-variance tradeff: as the mdel cmplexity increases, the variance tends t increase and the squared bias tends t decrease. We chse the mdel cmplexity t minimize the test errr.
55 Stats Classificatin Ji Zhu, Michigan Statistics 55
56 Stats Classificatin Ji Zhu, Michigan Statistics 56 Objectives: Mdel Assessment 1. Chse a value f a tuning parameter fr a technique. 2. Estimate the predictin perfrmance f a given mdel. Fr bth f these purpses, the best apprach is t run the prcedure n an independent test set, if ne is available. If pssible ne shuld use different test data fr (1) and (2) abve: a validatin set fr (1) and a test set fr (2).
57 Stats Classificatin Ji Zhu, Michigan Statistics 57 Crss-Validatin Often there is insufficient data t create a separate validatin r test set; setting sme data aside fr validatin is pssible, but affects the accuracy f training estimates In this instance, V-fld crss-validatin is useful.
58 Stats Classificatin Ji Zhu, Michigan Statistics Train Train Test Train Train 1. Divide the data int V disjint subsets. 2. Use subsets 2,..., V as training data and subset 1 as validatin data. Cmpute the PE n subset Repeat fr each subset. 4. Average the result.
59 Stats Classificatin Ji Zhu, Michigan Statistics 59 Curse f Dimensinality K-nearest neighbrs can fail in high dimensins, because it becmes difficult t gather K bservatins clse t a target pint x : near neighbrhds tend t be spatially large, and estimates are biased; reducing the spatial size f the neighbrhd means reducing K, and the variance f the estimate increases.
60 Stats Classificatin Ji Zhu, Michigan Statistics 60 Illustrating Example Suppse the pints are unifrmly distributed in a p-dimensinal unit hypercube. T cnstruct a hypercube neighbrhd f x t capture a fractin ρ f the bservatins, what is the edge length f this cube? Since the vlume f cube l p = ρ, we have l = ρ 1/p. When p = 1: If ρ = 0.01, l = 0.01 and if ρ = 0.1, l = 0.1. When p = 10: If ρ = 0.01, l = 0.63 and if ρ = 0.1, l = When p = 10, in rder t capture 10% f the data, we must cver 80% f the range f each input.
61 Stats Classificatin Ji Zhu, Michigan Statistics 61
62 Stats Classificatin Ji Zhu, Michigan Statistics 62 Lcal methds are n lnger lcal when the dimensin p increases. Sampling density is prprtinal t n 1/p ; if 100 pints are sufficient t estimate a functin in R 1, are needed t achieve similar accuracy in R 10.
63 Stats Classificatin Ji Zhu, Michigan Statistics 63 Discriminative Methds Lgistic regressin K-nearest neighbr (KNN) Supprt vectr machines (SVM) Classificatin tree (CART) Ensemble methds: bsting, randm frest
64 Stats Classificatin Ji Zhu, Michigan Statistics 64 Cnstrained Optimizatin Cnstrained ptimizatin has the frm min subject t Q(θ) θ S R d Q(θ): bjective functin S: feasible set Cnvex ptimizatin: bth bjective functin and feasible set are cnvex.
65 Stats Classificatin Ji Zhu, Michigan Statistics 65 Cnsider Lagrange Multiplier min Q(θ) subject t R(θ) = 0 S = {θ : R(θ) = 0} is a (d 1)-dimensinal surface in R d. Fr every θ such that R(θ) = 0, R(θ) is rthgnal t the surface. If θ is a lcal minimum, then Q is rthgnal t the surface at θ.
66 Stats Classificatin Ji Zhu, Michigan Statistics 66 Cnclusin: at a lcal minimum, there exists λ R such that Q(θ ) = λ R(θ ) This leads us t intrduce the Lagrangian L(θ, λ) = Q(θ) λr(θ) where λ is the Lagrange multiplier. We have argued that a lcal minimum crrespnds t a statinary pint f the Lagrangian. Furthermre, we can reverse ur lgic t deduce that a statinary pint f the Lagrangian is a lcal ptimum.
67 Stats Classificatin Ji Zhu, Michigan Statistics 67 Nw cnsider (the primal prblem) min Q(θ) subject t R(θ) 0 Suppse θ is a lcal minimum. There are tw cases: Inactive cnstraint: R(θ ) > 0 Q(θ ) = 0 statinary pint f L(θ, λ) with λ = 0 Active cnstraint: R(θ ) = 0 same as equality cnstraint except we require λ > 0.
68 Stats Classificatin Ji Zhu, Michigan Statistics 68 In either case, we have λ R(θ ) = 0. Therefre, a lcal minimum satisfies (Karush-Kuhn-Tucker cnditins) L(θ ) = Q(θ ) λ R(θ ) = 0 λr(θ ) = 0 λ 0 Often the KKT cnditins may be used t transfrm the primal prblem t an equivalent dual prblem, where the variables being ptimized are the Lagrange multipliers.
69 Stats Classificatin Ji Zhu, Michigan Statistics 69 Outline Maximum margin classifier Kernel trick SVM & functin estimatin
70 Stats Classificatin Ji Zhu, Michigan Statistics 70 Maximum Margin Classifier cements β 0 + x T β = 0 m m margin s.t. max β,β 0 m 1 β y i(β 0 + x T i β) m Maximize the minimum distance Need cnstraint β = 1 Vapnik (1995)
71 Stats Classificatin Ji Zhu, Michigan Statistics 71 Signed Distance t Hyperplanes cements Hyperplane is defined by {x : β 0 + x T β = 0}. margin x 0 x Fr any pint x 0 in the hyperplane, x0 Tβ = β 0. β β β 0 + x T β = 0 Signed distance f pint x t β the plane is β, x x 0, where x 0 is any pint in the plane.
72 Stats Classificatin Ji Zhu, Michigan Statistics 72 Equivalently Quadratic Prgramming min β 0,β subject t 1 2 β 2 y i (β 0 + x T i β) 1, i = 1,..., n The Lagrange primal is where α i 0. L p = 1 2 β 2 n α i [y i (β 0 + xi T i=1 β) 1]
73 Stats Classificatin Ji Zhu, Michigan Statistics 73 Setting the derivatives t zer, we get n β : β = α i y i x i i=1 n : 0 = β 0 α i y i i=1
74 Stats Classificatin Ji Zhu, Michigan Statistics 74 Substituting int the Lagrange primal, we btain the Lagrange dual L D = n i=1 α i 1 2 n i=1 n α i α i y i y i xi T x i i =1 We maximize L D subject t α i 0 and n α i y i = 0 i=1
75 Stats Classificatin Ji Zhu, Michigan Statistics 75 Minimize L P with respect t primal variables β 0, β Maximize L D with respect t dual variables α i Maximizing the dual is ften a simpler cnvex QP than the primal.
76 Stats Classificatin Ji Zhu, Michigan Statistics 76 Supprt Vectrs The Karush-Kuhn-Tucker cnditins include [ ˆα i yi ( ˆβ 0 + xi T ˆβ) 1 ] = 0 These imply If y i ˆf (xi ) > 1, then ˆα i = 0. If ˆα i > 0, then y i ˆf (xi ) = 1, r in ther wrds, x i is n the bundary f the slab. The slutin ˆβ is defined in terms f a linear cmbinatin f the supprt pints.
77 Stats Classificatin Ji Zhu, Michigan Statistics 77 cements Overlapping Classes β 0 + x T β = 0 ξ ξ 3 1 ξ 2 ξ i = mξ i ξ 4 ξ 4 ξ 5 m m margin max β,β 0, β =1 m s.t. y i (β 0 + x T i β) m(1 ξ i) ξ i 0, i ξ i B ξ i : slack variables B: tuning parameter
78 Stats Classificatin Ji Zhu, Michigan Statistics 78 Equivalently Quadratic Prgramming n 1 min β 0,β,ξ i 2 β 2 + C ξ i i=1 subject t y i (β 0 + xi T β) 1 ξ i, ξ i 0 The Lagrange primal is L P = 1 2 β 2 + C where α i, γ i 0. n ξ i i=1 n α i [y i (β 0 + xi T β) (1 ξ i)] i=1 n γ i ξ i i=1
79 Stats Classificatin Ji Zhu, Michigan Statistics 79 Setting the derivatives t zer, we get n β : β = α i y i x i i=1 n : 0 = β 0 α i y i i=1 : α ξ i = C γ i i
80 Stats Classificatin Ji Zhu, Michigan Statistics 80 Substituting int the Lagrange primal, we btain the Lagrange dual L D = n i=1 α i 1 2 n i=1 n α i α i y i y i x i, x i i =1 We maximize L D subject t 0 α i C and n α i y i = 0 i=1
81 Stats Classificatin Ji Zhu, Michigan Statistics 81 Supprt Vectrs The Karush-Kuhn-Tucker cnditins include [ ˆα i yi ( ˆβ 0 + xi T ˆβ) (1 ξ i ) ] = 0 γ i ξ i = 0 These imply y i ˆf (xi ) > 1 ˆα i = 0 y i ˆf (xi ) < 1 ˆα i = C y i ˆf (xi ) = 1 0 ˆα i C
82 Stats Classificatin Ji Zhu, Michigan Statistics 82 Slutin The slutin is expressed in terms f fitted Lagrange multipliers ˆα i : ˆβ = n i=1 ˆα i y i x i Sme fractin f ˆα i are exactly zer (frm KKT cnditins); the x i fr which ˆα i = 0 are called supprt pints S. ˆf (x) = ˆβ 0 + x T ˆβ = ˆβ 0 + i S ˆα i y i x, x i
83 Stats Classificatin Ji Zhu, Michigan Statistics 83 Example Bayes Optimal Classifier Mixture f Gaussian. Red class: 10 centers µ k frm N(( 1, 1) T, I); then randmly pick ne center, and generate a data pint frm N(µ k, I/5). Green class is similar, with N((1, 1) T, I). Bayes errr: 0.21.
84 Stats Classificatin Ji Zhu, Michigan Statistics 84 Linear SVMs Training Errr: Test Errr: Bayes Errr: lacements C = C = Training Errr: 0.26 Test Errr: 0.30 Bayes Errr: 0.21 PSfrag replacements C = C = 0.01 Resulting classifier is sign( ˆβ 0 + x T ˆβ).
85 Stats Classificatin Ji Zhu, Michigan Statistics 85 Outline Maximum margin classifier Kernel trick SVM & functin estimatin
86 Stats Classificatin Ji Zhu, Michigan Statistics 86 Flexible Classifiers Enlarge the input space via basis expansin (p q): h(x) = ( h 1 (x), h 2 (x),..., h q (x) ) Lagrange dual and slutin becme L D = n i=1 α i 1 2 n i=1 n α i α i y i y i h(x i ), h(x i ) i =1 and ˆf (x) = ˆβ 0 + i S ˆα i y i h(x), h(x i )
87 Stats Classificatin Ji Zhu, Michigan Statistics 87 Example 2nd degree plynmial in R 2. We chse: h 1 (x) = 1 h 2 (x) = 2x 1 h 3 (x) = 2x 2 h 4 (x) = x1 2 h 5 (x) = x2 2 h 6 (x) = 2x 1 x 2
88 Stats Classificatin Ji Zhu, Michigan Statistics 88 Kernels L D and cnstraints invlve h(x) nly thrugh inner-prducts K(x, x ) = h(x), h(x ) Given a suitable kernel functin K(x, x ), dn t need h(x) at all. ˆf (x) = ˆβ 0 + i S ˆα i y i K(x, x i )
89 Stats Classificatin Ji Zhu, Michigan Statistics 89 Example Cntd If we chse K(x, x ) = (1 + x, x ) 2 then K(x, x ) = (1 + x 1 x 1 + x 2x 2 )2 = 1 + 2x 1 x 1 + 2x 2x 2 + (x 1x 1 )2 +(x 2 x 2 )2 + 2x 1 x 1 x 2x 2 = h(x), h(x )
90 Stats Classificatin Ji Zhu, Michigan Statistics 90 Ppular Kernels dth degree plynmial: K(x, x ) = (1 + x, x ) d radial basis: K(x, x ) = exp( x x 2 /σ 2 ) K(x, x ) is a symmetric, psitive (semi-) definite functin: Fr every n = 1, 2,..., and every set f real numbers {a 1, a 2,..., a n } and x 1, x 2,..., x n, we have i,i n =1 a ia i K(x i, x i ) 0.
91 Stats Classificatin Ji Zhu, Michigan Statistics 91 Nnlinear SVMs SVM - Degree-4 Plynmial in Feature Space Training Errr: Test Errr: Bayes Errr: SVM - Radial Kernel in Feature Space Training Errr: Test Errr: Bayes Errr: 0.210
92 Stats Classificatin Ji Zhu, Michigan Statistics 92 Outline Maximum margin classifier Kernel trick SVM & functin estimatin
93 Stats Classificatin Ji Zhu, Michigan Statistics 93 SVM via Lss + Penalty Lss cements Binmial Lg-likelihd Supprt Vectr y f (x) With f (x) = β 0 + x T β, cnsider min β 0,β n [1 y i f (x i )] + + λ 2 β 2 i=1 Slutin identical t SVM slutin, with λ = 1/C.
94 Stats Classificatin Ji Zhu, Michigan Statistics 94 SVM and Functin Estimatin SVM with general kernel K(, ) minimizes: n [1 y i f (x i )] + + λ 2 f 2 H K i=1 with f H K. H K is the reprducing kernel Hilbert space (RKHS) f functins generated by the kernel K(, ).
95 Stats Classificatin Ji Zhu, Michigan Statistics 95 RKHS Functin space H K generated by a psitive (semi-) definite functin K(x, x ). Eigen expansin (Mercer s therem) K(x, x ) = γ j φ j (x)φ j (x ) j=1 where γ j 0, γ 2 j < j=1
96 Stats Classificatin Ji Zhu, Michigan Statistics 96 Define H K t be the set f functins f the frm f (x) = θ j φ j (x) j=1 and define the inner prduct θ j φ j (x), δ j φ j (x) j=1 j =1 H K def = j=1 θ j δ j γ j Then the squared nrm f f is f (x) 2 H K = θ 2 j /γ j j=1 which is generally viewed as a rughness penalty.
97 Stats Classificatin Ji Zhu, Michigan Statistics 97 The Representer Therem Mre generally we can ptimize min f H K [ n i=1 L(y i, f (x i )) + λ 2 f 2 H K ] The slutin has the finite frm (Wahba 1990) ˆf (x) = n i=1 ˆα i K(x, x i ) a finite expansin in the representers K(x, x i ).
98 Stats Classificatin Ji Zhu, Michigan Statistics 98 Lss Functins SVM: L[y, f (x)] = (1 y f (x)) + Called hinge lss Estimates the classifier (threshld) sign (Pr(Y = 1 x) Pr(Y = 1 x))
99 Stats Classificatin Ji Zhu, Michigan Statistics 99 Binmial Deviance: L[y, f (x)] = lg (1 + e y f (x)) (Negative) binmial lg-likelihd Estimates the lgit lg Pr(Y = 1 x) Pr(Y = 1 x) Why nt the squared errr lss?
100 Stats Classificatin Ji Zhu, Michigan Statistics 100 Kernel Lgistic Regressin Replace (1 y f ) + with ln(1 + e y f ), the binmial deviance. Similar classificatin perfrmance as the SVM. Prvide estimates f class prbabilities. Natural generalizatin t the multi-class case.
101 Stats Classificatin Ji Zhu, Michigan Statistics 101 KLR vs SVM LR - Radial Kernel in Feature Space Training Errr: Test Errr: Bayes Errr: SVM - Radial Kernel in Feature Space Training Errr: Test Errr: Bayes Errr: 0.210
102 Stats Classificatin Ji Zhu, Michigan Statistics 102 Remark SVM can be viewed as regularized fitting with a particular lss functin: hinge lss. The hinge lss allws fr cmpressin in terms f basis functins, frm n t sme fractin f n. Regularized lgistic regressin gives very similar fit, using binmial deviance as the lss.
103 Stats Classificatin Ji Zhu, Michigan Statistics 103 Discriminative Methds Lgistic regressin K-nearest neighbr (KNN) Supprt vectr machines (SVM) Classificatin tree (CART) Ensemble methds: bsting, randm frest
104 Stats Classificatin Ji Zhu, Michigan Statistics 104 Example f a Classificatin Tree
105 Stats Classificatin Ji Zhu, Michigan Statistics 105 Classify Test Data
106 Stats Classificatin Ji Zhu, Michigan Statistics 106 Classify Test Data
107 Stats Classificatin Ji Zhu, Michigan Statistics 107 Classify Test Data
108 Stats Classificatin Ji Zhu, Michigan Statistics 108 Classify Test Data
109 Stats Classificatin Ji Zhu, Michigan Statistics 109 Classify Test Data
Pattern Recognition 2014 Support Vector Machines
Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft
More informationWhat is Statistical Learning?
What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,
More informationCOMP 551 Applied Machine Learning Lecture 11: Support Vector Machines
COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse
More informationResampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017
Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with
More informationSupport-Vector Machines
Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material
More informationIAML: Support Vector Machines
1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int
More informationCOMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)
COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise
More informationSupport Vector Machines and Flexible Discriminants
12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551
More informationCOMP 551 Applied Machine Learning Lecture 4: Linear classification
COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationIn SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:
In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin
More informationResampling Methods. Chapter 5. Chapter 5 1 / 52
Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and
More informationLecture 8: Multiclass Classification (I)
Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Lecture 8: Multiclass Classificatin (I) Ha Helen Zhang Fall 07 Ha Helen Zhang Lecture 8: Multiclass Classificatin
More informationLinear programming III
Linear prgramming III Review 1/33 What have cvered in previus tw classes LP prblem setup: linear bjective functin, linear cnstraints. exist extreme pint ptimal slutin. Simplex methd: g thrugh extreme pint
More informationx 1 Outline IAML: Logistic Regression Decision Boundaries Example Data
Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares
More informationThe blessing of dimensionality for kernel methods
fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented
More informationk-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels
Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t
More informationMidwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter
Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline
More informationBootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >
Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);
More informationSURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES
1 SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES Wlfgang HÄRDLE Ruslan MORO Center fr Applied Statistics and Ecnmics (CASE), Humbldt-Universität zu Berlin Mtivatin 2 Applicatins in Medicine estimatin f
More informationContents. This is page i Printer: Opaque this
Cntents This is page i Printer: Opaque this Supprt Vectr Machines and Flexible Discriminants. Intrductin............. The Supprt Vectr Classifier.... Cmputing the Supprt Vectr Classifier........ Mixture
More informationDistributions, spatial statistics and a Bayesian perspective
Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics
More informationSmoothing, penalized least squares and splines
Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin
More informationCN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)
CN700 Additive Mdels and Trees Chapter 9: Hastie et al. (2001) Madhusudana Shashanka Department f Cgnitive and Neural Systems Bstn University CN700 - Additive Mdels and Trees March 02, 2004 p.1/34 Overview
More informationChapter 3: Cluster Analysis
Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA
More informationA Matrix Representation of Panel Data
web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins
More informationTree Structured Classifier
Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients
More informationSimple Linear Regression (single variable)
Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins
More informationLinear Classification
Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we
More informationSupport Vector Machines and Flexible Discriminants
Supprt Vectr Machines and Flexible Discriminants This is page Printer: Opaque this. Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal separating
More informationComputational modeling techniques
Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours
STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university
More informationModule 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics
Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm
More informationT Algorithmic methods for data mining. Slide set 6: dimensionality reduction
T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,
More informationComparing Several Means: ANOVA. Group Means and Grand Mean
STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal
More informationOverview of Supervised Learning
2 Overview f Supervised Learning 2.1 Intrductin The first three examples described in Chapter 1 have several cmpnents in cmmn. Fr each there is a set f variables that might be dented as inputs, which are
More informationInternal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.
Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.
More informationMATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank
MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use
More information3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression
3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets
More informationInference in the Multiple-Regression
Sectin 5 Mdel Inference in the Multiple-Regressin Kinds f hypthesis tests in a multiple regressin There are several distinct kinds f hypthesis tests we can run in a multiple regressin. Suppse that amng
More information4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression
4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw
More informationMATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank
MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis
More informationElements of Machine Intelligence - I
ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin
More informationCHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.
MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the
More informationThe Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition
The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge
More informationPSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa
There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the
More informationLocalized Model Selection for Regression
Lcalized Mdel Selectin fr Regressin Yuhng Yang Schl f Statistics University f Minnesta Church Street S.E. Minneaplis, MN 5555 May 7, 007 Abstract Research n mdel/prcedure selectin has fcused n selecting
More informationSUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis
SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm
More informationPart 3 Introduction to statistical classification techniques
Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms
More informationStatistical classifiers: Bayesian decision theory and density estimation
3 rd NOSE Shrt Curse Alpbach, st 6 th Mar 004 Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Department f Cmputer Science rgutier@cs.tamu.edu http://research.cs.tamu.edu/prism
More informationLyapunov Stability Stability of Equilibrium Points
Lyapunv Stability Stability f Equilibrium Pints 1. Stability f Equilibrium Pints - Definitins In this sectin we cnsider n-th rder nnlinear time varying cntinuus time (C) systems f the frm x = f ( t, x),
More informationChapter 15 & 16: Random Forests & Ensemble Learning
Chapter 15 & 16: Randm Frests & Ensemble Learning DD3364 Nvember 27, 2012 Ty Prblem fr Bsted Tree Bsted Tree Example Estimate this functin with a sum f trees with 9-terminal ndes by minimizing the sum
More informationEnhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme
Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr
More information, which yields. where z1. and z2
The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin
More informationMargin Distribution and Learning Algorithms
ICML 03 Margin Distributin and Learning Algrithms Ashutsh Garg IBM Almaden Research Center, San Jse, CA 9513 USA Dan Rth Department f Cmputer Science, University f Illinis, Urbana, IL 61801 USA ASHUTOSH@US.IBM.COM
More informationCS 109 Lecture 23 May 18th, 2016
CS 109 Lecture 23 May 18th, 2016 New Datasets Heart Ancestry Netflix Our Path Parameter Estimatin Machine Learning: Frmally Many different frms f Machine Learning We fcus n the prblem f predictin Want
More informationSlide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons
Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large
More informationNUMBERS, MATHEMATICS AND EQUATIONS
AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t
More information7 TH GRADE MATH STANDARDS
ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,
More informationMath Foundations 20 Work Plan
Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant
More informationSequential Allocation with Minimal Switching
In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University
More informationLecture 3: Principal Components Analysis (PCA)
Lecture 3: Principal Cmpnents Analysis (PCA) Reading: Sectins 6.3.1, 10.1, 10.2, 10.4 STATS 202: Data mining and analysis Jnathan Taylr, 9/28 Slide credits: Sergi Bacallad 1 / 24 The bias variance decmpsitin
More informationChecking the resolved resonance region in EXFOR database
Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,
More informationMaximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016
Maximum A Psteriri (MAP) CS 109 Lecture 22 May 16th, 2016 Previusly in CS109 Game f Estimatrs Maximum Likelihd Nn spiler: this didn t happen Side Plt argmax argmax f lg Mther f ptimizatins? Reviving an
More informationCAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank
CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal
More information1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp
THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*
More informationinitially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur
Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract
More informationMATHEMATICS SYLLABUS SECONDARY 5th YEAR
Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE
More informationcfl Cpyright by Ji Zhu 2003 All Rights Reserved ii
FLEXIBLE STATISTICAL MODELING a dissertatin submitted t the department f statistics and the cmmittee n graduate studies f stanfrd university in partial fulfillment f the requirements fr the degree f dctr
More informationFebruary 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA
February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal
More informationSAMPLING DYNAMICAL SYSTEMS
SAMPLING DYNAMICAL SYSTEMS Melvin J. Hinich Applied Research Labratries The University f Texas at Austin Austin, TX 78713-8029, USA (512) 835-3278 (Vice) 835-3259 (Fax) hinich@mail.la.utexas.edu ABSTRACT
More informationStatistical Learning. 2.1 What Is Statistical Learning?
2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t
More informationEric Klein and Ning Sa
Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure
More informationAdmissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs
Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationECEN 4872/5827 Lecture Notes
ECEN 4872/5827 Lecture Ntes Lecture #5 Objectives fr lecture #5: 1. Analysis f precisin current reference 2. Appraches fr evaluating tlerances 3. Temperature Cefficients evaluatin technique 4. Fundamentals
More informationFall 2013 Physics 172 Recitation 3 Momentum and Springs
Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.
More informationEDA Engineering Design & Analysis Ltd
EDA Engineering Design & Analysis Ltd THE FINITE ELEMENT METHOD A shrt tutrial giving an verview f the histry, thery and applicatin f the finite element methd. Intrductin Value f FEM Applicatins Elements
More informationYou need to be able to define the following terms and answer basic questions about them:
CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f
More informationLogistic Regression. and Maximum Likelihood. Marek Petrik. Feb
Lgistic Regressin and Maximum Likelihd Marek Petrik Feb 09 2017 S Far in ML Regressin vs Classificatin Linear regressin Bias-variance decmpsitin Practical methds fr linear regressin Simple Linear Regressin
More informationand the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:
Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track
More informationAP Statistics Notes Unit Two: The Normal Distributions
AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).
More informationThe Solution Path of the Slab Support Vector Machine
CCCG 2008, Mntréal, Québec, August 3 5, 2008 The Slutin Path f the Slab Supprt Vectr Machine Michael Eigensatz Jachim Giesen Madhusudan Manjunath Abstract Given a set f pints in a Hilbert space that can
More informationA Scalable Recurrent Neural Network Framework for Model-free
A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee
More informationHypothesis Tests for One Population Mean
Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be
More information22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion
.54 Neutrn Interactins and Applicatins (Spring 004) Chapter (3//04) Neutrn Diffusin References -- J. R. Lamarsh, Intrductin t Nuclear Reactr Thery (Addisn-Wesley, Reading, 966) T study neutrn diffusin
More informationLHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers
LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the
More informationDepartment of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets
Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0
More informationthe results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must
M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins
More informationKinetic Model Completeness
5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins
More informationLecture 10, Principal Component Analysis
Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal
More informationON-LINE PROCEDURE FOR TERMINATING AN ACCELERATED DEGRADATION TEST
Statistica Sinica 8(1998), 207-220 ON-LINE PROCEDURE FOR TERMINATING AN ACCELERATED DEGRADATION TEST Hng-Fwu Yu and Sheng-Tsaing Tseng Natinal Taiwan University f Science and Technlgy and Natinal Tsing-Hua
More informationENSC Discrete Time Systems. Project Outline. Semester
ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding
More informationINSTRUMENTAL VARIABLES
INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment
More informationChapter 11: Neural Networks
Chapter 11: Neural Netwrks DD3364 December 16, 2012 Prjectin Pursuit Regressin Prjectin Pursuit Regressin mdel: Prjectin Pursuit Regressin f(x) = M g m (wmx) t i=1 where X R p and have targets Y R. Additive
More informationComputational Statistics
Cmputatinal Statistics Spring 2008 Peter Bühlmann and Martin Mächler Seminar für Statistik ETH Zürich February 2008 (February 23, 2011) ii Cntents 1 Multiple Linear Regressin 1 1.1 Intrductin....................................
More informationLinear Methods for Regression
3 Linear Methds fr Regressin This is page 43 Printer: Opaque this 3.1 Intrductin A linear regressin mdel assumes that the regressin functin E(Y X) is linear in the inputs X 1,...,X p. Linear mdels were
More informationDetermining the Accuracy of Modal Parameter Estimation Methods
Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system
More information