cfl Cpyright by Ji Zhu 2003 All Rights Reserved ii

Size: px
Start display at page:

Download "cfl Cpyright by Ji Zhu 2003 All Rights Reserved ii"

Transcription

1 FLEXIBLE STATISTICAL MODELING a dissertatin submitted t the department f statistics and the cmmittee n graduate studies f stanfrd university in partial fulfillment f the requirements fr the degree f dctr f philsphy Ji Zhu August 2003

2 cfl Cpyright by Ji Zhu 2003 All Rights Reserved ii

3 I certify that I have read this dissertatin and that, in my pinin, it is fully adequate in scpe and quality as a dissertatin fr the degree f Dctr f Philsphy. Trevr Hastie (Principal Advisr) I certify that I have read this dissertatin and that, in my pinin, it is fully adequate in scpe and quality as a dissertatin fr the degree f Dctr f Philsphy. Bradley Efrn I certify that I have read this dissertatin and that, in my pinin, it is fully adequate in scpe and quality as a dissertatin fr the degree f Dctr f Philsphy. Rbert Tibshirani Apprved fr the University Cmmittee n Graduate Studies: iii

4 Abstract The supprt vectr machine is knwn fr its gd perfrmance in tw-class classificatin. The tpic f this thesis is based n tw variatins f the standard 2-nrm supprt vectr machine. In the first part f the thesis, we replace the hinge lss f the supprt vectr machine with the negative binmial lg-likelihd and cnsider the kernel lgistic regressin mdel. We shw that kernel lgistic regressin perfrms as well as the supprt vectr machine in tw-class classificatin. Further mre, kernel lgistic regressin prvides an estimate f the underlying prbability. Based n the kernel lgistic regressin mdel, we prpse a new apprach fr classificatin, called the imprt vectr machine. Similar t the supprt pints f the supprt vectr machine, the imprt vectr machine mdel uses nly a fractin f the training data t index kernel basis functins, typically a much smaller fractin than the supprt vectr machine. This gives the imprt vectr machine a cmputatinal advantage ver the supprt vectr machine, especially when the size f the training data set is large. In the secnd part, we replace the L 2 -nrm penalty term f the supprt vectr machine with the L 1 -nrm penalty, and cnsider the 1-nrm supprt vectr machine. We argue that the 1-nrm supprt vectr machine may have sme advantage ver the standard 2- nrm supprt vectr machine, especially when there are redundant nise features. We als prpse an efficient algrithm that cmputes the whle slutin path f the 1-nrm supprt vectr machine, hence facilitates adaptive selectin f the tuning parameter fr the 1-nrm supprt vectr machine. iv

5 Acknwledgments First and fremst, I wish t express my great appreciatin t my advisr Trevr Hastie. He suggested the tpic fr this thesis and prvided invaluable advice and cnstant encuragement thrughut the curse f my research. Withut his supprt and guidance, this wrk wuld nt have been cmpleted. Trevr has been mre than an academic advisr t me. When my sn was ne mnth ld, my wife became infected with mastitis, which resulted in avery difficult time fr ur family. I still vividly remember hw Trevr immediately called us after he heard the news and ffered his help. I feel very frtunate t have Trevr as a mentr and a friend. I shall always appreciate his guidance which led me t the wnderful wrld f statistics. I am very lucky t have met Saharn Rsset here at Stanfrd I nw have a talented cllabratr. He is a cnstant inspiratin t me and I value ur relatinship greatly. I am als very grateful t Prfessr Rb Tibshirani, whse stimulating suggestins, sharp cmments and warm encuragement accmpanied me thrughut the dissertatin prcess. I we my gratitude t Prfessr Susan Hlmes and Prfessr Persi Diacnis I wuld prbably nt have chsen t study statistics r cme t Stanfrd had I nt met Susan and Persi when I was at Crnell. Iwant t thank Susan again fr writing me tw imprtant recmmendatin letters at tw different stages,ne when I applied t the graduate schl and the ther when I applied fr a jb. I wish t thank Prfessr Brad Efrn, Prfessr Jerry Friedman and Prfessr Art Owen fr acting as members f my ral cmmittee and prviding useful suggestins n my research. I als wish t thank my friends and classmates at Stanfrd, t many t name, and the v

6 Mnday fish-bwl grup fr helping me ut in many ways and making my life here clrful and enjyable. My special thanks g t Xizha, whse cmpaninship and unselfish supprt ever since we began ur jurney tgether have brightened my life. Finally, Iwuld like t thank my parents fr their lve and cnfidence in me. My parents' supprt is a big part f everything that I accmplish. vi

7 Cntents Abstract Acknwledgments iv v 1 Intrductin The Classificatin Prblem Handwritten Digit Recgnitin DNA Micrarray Classificatin Supprt Vectr Machines Margin Maximizer Regularized Functin Fitting Multi-class Supprt Vectr Machine Outline f the Thesis Imprt Vectr Machines Kernel Lgistic Regressin KLR as Margin Maximizer Newtn-Raphsn Methd Sequential Minimal Optimizatin Methd Imprt Vectr Machines Algrithm Selectin f vii

8 2.2.3 Simulatin Results Real Data Results Multi-class Case Multi-class KLR and Multi-class SVM Multi-class IVM Summary nrm Supprt Vectr Machines Intrductin Regularized supprt vectr machines Algrithm Piece-wise linearity Initial slutin (i.e. s =0) Main algrithm Remarks Cmputatinal cst Numerical results Simulatin results Real data results Summary Micrarray Classificatin Intrductin Penalized Lgistic Regressin Penalized lgistic regressin Frmulatin Cmputatinal Issues Feature Selectin Univariate Ranking Recursive Feature Eliminatin Results viii

9 4.4.1 Chsing Leukemia Data SRBCT Data Ramaswamy Data Discussin Summary Summary f Thesis 68 A Therems and Prfs 70 Bibligraphy 83 ix

10 List f Tables 2.1 Summary f the ten benchmark datasets Classificatin perfrmance f SVMs vs IVMs Number f kernel basis used by SVMs vs IVMs Simulatin results f 1-nrm and 2-nrm supprt vectr machine Results n Micrarray Classificatin Cmparisn f leukemia classificatin methds Cmparisn f SRBCT classificatin methds Cmparisn f Ramaswamy data classificatin methds Cmparisn f leukemia classificatin methds Cmparisn f SRBCT classificatin methds x

11 List f Figures 1.1 Handwritten digit SRBCT data Linear SVMs Sme lss functins SVM vs KLR Chse Chse imprt pints SVM vs IVM Effect f training data size Multi-class IVM Piece-wise linearity slutin path nrm SVMs Chse Leukemia data SRBCT data Ramaswamy data Ramaswamy data xi

12 Chapter 1 Intrductin 1.1 The Classificatin Prblem In standard classificatin prblems, we are given a set f training data (x 1 ;y 1 ), (x 2 ;y 2 ), :::(x n ;y n ), where the input (predictr variable) x i 2R p and the utput (respnse variable) y i is qualitative and assumes values in a finite set, e.g. C = f1; 2;:::Kg. The aim is t find a classificatin rule frm the training data, s that when given a new input x, we can assign a class c(x) frmc t it. The questin, then, is what is the best pssible classificatin rule. T answer this questin, we need t define what we meanby best. A cmmn definitin f best is t achieve the lwest misclassificatin errr rate. Usually it is assumed that the training data are an independently and identically distributed samples frm an unknwn prbability distributin P (X; Y ). Then the misclassificatin errr rate is: E X;Y 1 c(x)6=y = E X P (c(x) 6= Y jx) (1.1) = 1 E X P (c(x) =Y jx) (1.2) = 1 KX Λ E X 1c(X)=k P (Y = kjx) (1.3) k=1 1

13 CHAPTER 1. INTRODUCTION 2 It is clear that c(x) = argmax k P (Y = kjx = x) will minimize this quantity with the misclassificatin errr rate equal t 1 E X max k P (Y = kjx). This classifier is knwn as the Bayes classifier, and the errr rate it achieves is the Bayes errr rate. Belw we see tw real classificatin examples that are discussed in greater detail in the thesis Handwritten Digit Recgnitin Cnsider the digits in Figure (1.1) (Reprinted frm Hastie, Tibshirani & Friedman (2001)). These are scanned in images f handwritten zip cdes frm the US Pstal Service Zip Cde Data Set. The images are greyscale maps. We thus have a p = 256 dimensinal predictr space, x 2 R 256, and a respnse space cntaining 10 pssible utcmes, C = f0; 1;:::9g. The task is t use a set f training data (pairs f x i and y i ) t frm a rule t predict y (the digit) given x (the pixels) DNA Micrarray Classificatin DNA micrarrays measure the expressin f a gene in a cell by measuring the amunt f mrna present fr that gene. Micrarrays are cnsidered a breakthrugh technlgy in bilgy, fr they facilitate the quantitative study f thusands f genes simultaneusly frm a single sample f cells. Figure (1.2) displays the heat map f a data set cntaining 2; 308 genes (rws) and 63 samples (clumns). The samples are a cllectin f small rund blue cell tumrs (SRBCT) frm children and each sample belngs t ne f fur tumr classes. The gal is t predict the diagnstic tumr categry f a sample n the basis f its gene expressin prfile. Here we have a situatin that the number f inputs (genes) (p = 2; 308) is much larger than the number f samples (n = 63). Hence, besides predicting the crrect tumr class fr a

14 CHAPTER 1. INTRODUCTION 3 Figure 1.1: Examples f hand written digits frm U.S. pstal envelpes. given sample, anther challenge in micrarray diagnsis is t identify relevant genes that cntribute mst t the classificatin. 1.2 Supprt Vectr Machines The supprt vectr machine (SVM) has been a ppular tl fr classificatin prblems in the machine learning field. Recently, it has als gained increasing attentin frm the statistics cmmunity. Belw we briefly g ver the supprt vectr machine fr tw-class classificatin frm tw perspectives. See Vapnik (1995), Burges (1998), Evgeniu, Pntil & Pggi (1999) and Hastie, Tibshirani & Friedman (2001) fr details Margin Maximizer Recall the training data are (x 1 ;y 1 ), (x 2 ;y 2 ), :::(x n ;y n ), x i 2R p. In tw-class classificatin, y i 2f 1; 1g. Let us first cnsider the case when the training data can be perfectly separated

15 E BL EWS 2 5 -! % " &'() NB RMS! " #$

16 CHAPTER 1. INTRODUCTION 5 by ahyperplane in R p. Define the hyperplane by fx : f(x) =fi 0 + x T fi =0g; where fi is a unit vectr: kfik 2 = 1, then f(x) gives the signed distance frm a pint x t the hyperplane. Since the training data are linearly separable, we are able t find a hyperplane such that y i f(x i ) > 0 8i: (1.4) Indeed, there are infinitely many such hyperplanes. Amng the hyperplanes satisfying (1.4), the supprt vectr machine lks fr the ne that maximizes the margin. The margin is defined as the shrtest distance frm the training data t the hyperplane. Hence we can write the supprt vectr machine prblem as: max fi;fi0;kfik2=1 C (1.5) subject t y i (fi 0 + x T i fi) C; i =1;::: ;n (1.6) When the training data are nt linearly separable, we allw sme training data t be n the wrng side f the edges f margin and intrduce slack variables ο i ;ο i 0. The supprt vectr machine prblem then becmes max fi;fi0;kfik2=1 C (1.7) subject t y i (fi 0 + x T i fi) C(1 ο i ); i =1;::: ;n (1.8) ο i 0; X ο i» B; (1.9) where B is a prespecified psitive number, which can be regarded as a tuning parameter. (1.7) (1.9) can be turned int a quadratic prgramming prblem and the slutin has

17 CHAPTER 1. INTRODUCTION 6 x T fi + fi0 =0 C C margin x T fi + fi0 =0 ο Λ ο 3 Λ 1 ο 2 Λ ο 4 Λ ο5 Λ C C margin Figure 1.3: Linear supprt vectr machine classifiers. the frm: f(x) =fi 0 + ff i y i hx i ;xi; (1.10) where ff i 's are Lagrangian multipliers fr the quadratic prgramming prblem, and fi = ff i y i x i : Ntice each training data has a crrespnding ff i. Using the Karush-Kuhn-Tucker (KKT) cnditins f the quadratic prgramming prblem, ne can shw a sizeable fractin f the n values f ff i are zer. This seems t be an attractive prperty, because nly the data pints near the classificatin bundary (including thse n the wrng side f the bundary) have an influence in determining the psitin f the bundary, and hence have nn-zer ff i 's. The crrespnding x i 's are called supprt pints (supprt vectrs). Figure (1.3) illustrates bth the linearly separable and nn-separable cases. As with ther linear mdels, we can make the supprt vectr machine mre flexible by enlarging the feature space using basis expansins such as plynmials r splines. Generally,

18 CHAPTER 1. INTRODUCTION 7 linear bundaries in the enlarged space achieve better training data separatin and translate t nnlinear bundaries in the riginal input space. Suppse the dictinary f the basis functins f the enlarged feature space is D = fh 1 (x);h 2 (x);:::h q (x)g ; where q is the dimensin f the enlarged feature space. Nte if q = p and h j (x) is the jth cmpnent f x, the enlarged feature space is reduced t the riginal input space. The classificatin bundary in the enlarged feature space is given by fx : f(x) =fi 0 + h(x) T fi =0g: Nte that in the linear supprt vectr machine, the slutin (1.10) depends n the basis functins nly thrugh their inner prduct. Hence, when using the enlarged basis functins, the new slutin will have the frm: f(x) =fi 0 + ff i y i hh(x i );h(x)i: (1.11) This implies that if the enlarged basis functins are basis functins f a reprducing kernel Hilbert space (RKHS) (Wahba (1990)), with K(x; x 0 )=hh(x);h(x 0 )i; then we d nt need t knw the enlarged basis functins h(x) explicitly; all we need t knw is the kernel functin K(x; x 0 ) that generates the reprducing kernel Hilbert space. This kernel trick allws the enlarged feature space t be even infinite dimensinal, i.e., q = 1, withut causing any further cmputatinal burden, since the slutin (1.11) has a finite dimensinal frm in terms f the kernel bases K(x i ;x), and the number f parameters is always n +1(n fr ff i 's and 1 fr fi 0 ). Three ppular chices f K( ; ) in the supprt

19 CHAPTER 1. INTRODUCTION 8 vectr machine literature are: dth Degree plynmial : K(x; x 0 )= 1+hx; x 0 i d ; (1.12) Radial basis : K(x; x 0 ) = exp kx x 0 k 2 =2ff 2 ; (1.13) Neural netwrk : K(x; x 0 )=tanh» 1 hx; x 0 i +» 2 ; (1.14) where d, ff,» 1 and» 2 are pre-specified parameters Regularized Functin Fitting Sectin (1.2.1) views the supprt vectr machine frm a gemetric pint f view, i.e., a hyperplane in an enlarged reprducing kernel Hilbert space that maximizes the margin f the training data. It turns ut that the supprt vectr machine is als equivalent t a regularized functin fitting prblem. With f(x) =fi 0 + h(x) T fi, cnsider the ptimizatin prblem: min fi0;fi [1 y i f(x i )] + + kfik 2 2 (1.15) where the subscript +" indicates a psitive part and is a tuning parameter. One can shw that the slutin t (1.15) is the same as the supprt vectr machine (1.7) (1.9). Furthermre, if h(x) are apprpriate basis functins f a reprducing kernel Hilbert space, (1.15) can be written as: min f2h K [1 y i f(x i )] + + J (f); (1.16) where H K is the reprducing kernel Hilbert space, and J (f) =kfk 2 H K is the square f the L 2 nrm f f(x) defined n H K. Similar t Sectin (1.2.1), althugh H K can be infinite

20 CHAPTER 1. INTRODUCTION 9 dimensinal, the slutin f (1.16) has a finite dimensinal frm: f(x) =fi 0 + ff i y i K(x i ;x); (1.17) where K( ; ) is the psitive definite kernel functin that generates the reprducing kernel Hilbert space H K. Ntice bth (1.15) and (1.16) have the frm lss + penalty, which is a familiar paradigm t statisticians in functin estimatin. The lss functin (1 yf) + is called the hinge lss. Lin (2002) shws: p 1 (x) argmin f E [(1 Yf(x)) + ]=sign(p 1 (x) 1=2) r sign lg ; 1 p 1 (x) where p 1 (x) = P (Y = 1jX = x) is the cnditinal prbability f a pint being in class 1 given X = x. Hence the supprt vectr machine tries t implement the ptimal Bayes classificatin rule withut estimating the actual cnditinal prbability p 1 (x) Multi-class Supprt Vectr Machine The supprt vectr machine classifier s far described is fr tw-class classificatin. In ging frm tw-class t multi-class classificatin, many researchers have prpsed varius prcedures. In practice, the ne-vs-rest scheme is ften used: given K classes, the prblem is divided int a series f K ne-vs-rest prblems, and each ne-vs-rest prblem is addressed by a different class-specific supprt vectr machine classifier (e.g., class 1" vs. nt class 1"); then a new sample takes the class f the classifier with the largest real valued utput c = argmax k=1;:::k f k, where f k is the real valued utput f the kth supprt vectr machine classifier. Instead f slving K prblems, Westn & Watkins (1999) and Vapnik (1998) generalize

21 CHAPTER 1. INTRODUCTION 10 (1.7) (1.9) by slving ne single ptimizatin prblem: max fi k ;fi0k C (1.18) subject t (fi 0yi fi 0k )+x T i (fi yi fi k ) C(1 ο ik ); (1.19) i =1;:::n; k =1;:::K;k 6= y i (1.20) ο ik 0; X i KX k=1 X k6=y i ο ik» B (1.21) kfi k k 2 2 =1: (1.22) Similar t sectin 1.2.1, this setup als has a nice gemetric interpretatin. Recently, Lee, Lin & Wahba (2002) prpsed an algrithm that implements the Bayes classificatin rule and estimates argmax k P (Y picture f their algrithm is still nt clear. = kjx = x) directly, but the gemetric 1.3 Outline f the Thesis In Chapter 2 we prpse a new apprach fr classificatin, called the imprt vectr machine (IVM), which isbuiltnkernel lgistic regressin (KLR).We shw that the imprt vectr machine nt nly perfrms as well as the supprt vectr machine in tw-class classificatin, but als can naturally be generalized t the multi-class case. Furthermre, the imprt vectr machine prvides an estimate f the underlying prbability. Similar t the supprt pints f the supprt vectr machine, the imprt vectr machine mdel uses nly a fractin f the training data t index kernel basis functins, typically a much smaller fractin than the supprt vectr machine. This gives the imprt vectr machine a cmputatinal advantage ver the supprt vectr machine, especially when the size f the training data set is large. The imprt vectr machine is based n kernel lgistic regressin, which replaces the hinge lss f the supprt vectr machine with negative binmial lg-likelihd. In Chapter 3, we replace the penalty term kfik 2 2, the square f the L 2 nrm f fi, in (1.15) f the supprt vectr machine, with the L 1 nrm kfik 1,andwe fit the 1-nrm supprt vectr machine. The

22 CHAPTER 1. INTRODUCTION 11 mtivatin fr ding such a replacement is that in additin t shrinking the fitted cefficients ^fi twards zer, just as the L 2 penalty des, the L 1 penalty als tends t set sme f the fitted cefficients exactly equal t zer. Hence the 1-nrm supprt vectr machine fits a mdel that des autmatic feature selectin. We shw that the fitted cefficients path ^fi as a functin f the tuning parameter is piece-wise linear, and we give an efficient algrithm that cmputes the whle cefficients path. This facilitates efficient adaptive selectin f the tuning parameter. In Chapter 4, we cncentrate n DNA micrarray classificatin using techniques develped in Chapter 2 and Chapter 3. Often a primary gal in micrarray diagnsis is t identify the genes respnsible fr the classificatin, rather than class predictin. Besides the autmatic gene selectin dne by thel 1 penalty as described in Chapter 3, we cnsider tw gene selectin methds used in the literature, univariate ranking (UR) and recursive feature eliminatin (RFE). Empirical results indicate that recursive feature eliminatin methd tends t select fewer genes than ther methds and als perfrms well in bth crss-validatin and test data. In the Appendix we give prfs fr all the therems cntained in the thesis.

23 Chapter 2 Imprt Vectr Machines In this chapter, we prpse a new apprach fr classificatin, the imprt vectr machine (IVM), that is built n kernel lgistic regressin (KLR). We shw that the imprt vectr machine nt nly perfrms as well as the supprt vectr machine in tw-class classificatin, but als can naturally be generalized t the multi-class case. Furthermre, the imprt vectr machine prvides an estimate f the underlying prbability. Similar t the supprt pints f the supprt vectr machine, the imprt vectr machine mdel uses nly a fractin f the training data t index kernel basis functins, typically a much smaller fractin than the supprt vectr machine. This gives the imprt vectr machine a ptential cmputatinal advantage ver the supprt vectr machine, especially when the size f the training data set is large. 2.1 Kernel Lgistic Regressin As described in Chapter 1, the standard supprt vectr machine prduces a nn-linear classificatin bundary in the riginal input space by cnstructing a linear bundary in an enlarged versin f the riginal input space. The dimensin f the enlarged space can be very large, even infinite, in sme cases. This seemingly prhibitive cmputatin is achieved thrugh a psitive definite reprducing kernel K( ; ), which gives the inner prduct in the enlarged space. 12

24 CHAPTER 2. IMPORT VECTOR MACHINES 13 In Chapter 1 we have als nted the relatinshipbetween the supprt vectr machine and regularized functin fitting in the reprducing kernel Hilbert spaces (RKHS).An verview can be fund in Evgeniu, Pntil & Pggi (1999), Wahba (1999) and Hastie, Tibshirani & Friedman (2001). Fitting a supprt vectr machine is equivalent t: min f2h K [1 y i f(x i )] + + kfk 2 H K ; (2.1) where H K is the reprducing kernel Hilbert space generated by a kernel K( ; ). By the representer therem (Kimeldrf & Wahba (1971)), the ptimal f(x) has the frm: f(x) =fi 0 + and nly the supprt pints will have nn-zer ff i 's. ff i K(x i ;x); (2.2) Nte that (2.1) has the frm lss + penalty. The lss functin (1 yf) + is pltted in Figure 2.1, alng with several traditinal lss functins. As we can see, the negative lg-likelihd (NLL) f the binmial distributin has a shape similar t that f the supprt vectr machine: bth increase linearly as yf gets very small (negative) and encurage y and f t have the same sign. If we replace (1 yf) + in (2.1) with ln(1 + e yf ), the negative lg-likelihd f the binmial distributin, the prblem becmes a kernel lgistic regressin prblem: min f2h K ln 1+e y if(x i ) + kfk 2 H K : (2.3) Because f the similarity between the tw lss functins, we expect that the fitted functin perfrms similarly t the supprt vectr machine fr tw-class classficatin. There are tw immediate advantages f making such a replacement: (a) Kernel lgistic

25 CHAPTER 2. IMPORT VECTOR MACHINES 14 Lss Binmial Lg-likelihd Squared Errr Supprt Vectr yf(x) Figure 2.1: Sme lss functins. y 2f 1; 1g: regressin estimates the lg-dds: P (Y =1jX = x) f(x) = lg P (Y = 1jX = x) = fi 0 + (2.4) ff i K(x i ;x): (2.5) Hence, besides giving a classificatin rule, kernel lgistic regressin als ffers a natural estimate f the prbability p 1 (x) = ef(x) 1+e f(x) ; while the supprt vectr machine nly estimates (Lin (2002))» p 1 (x) sign [p 1 (x) 1=2] r sign lg ; 1 p 1 (x) where p 1 (x) P (Y = 1jX = x) is the cnditinal prbability fa pint being in class 1

26 CHAPTER 2. IMPORT VECTOR MACHINES 15 given X = x; (b) The kernel lgistic regressin can naturally be generalized t the multiclass case thrugh kernel multi-lgit regressin, whereas this is nt the case fr the supprt vectr machine. Hwever, because the kernel lgistic regressin cmprmises the hinge lss functin f the supprt vectr machine, it n lnger has the supprt pints prperty; in ther wrds, all the ff i 's in (2.5) are nn-zer. Kernel lgistic regressin is a well-studied prblem; see Wahba, Gu, Wang & Chappell (1995), Green & Yandell (1985), Hastie & Tibshirani (1990) and references therein; hwever, they are all under the smthing spline analysis f variance scheme. We use a simulatin example t illustrate the similar perfrmance f kernel lgistic regressin and the supprt vectr machine. The data in each class are simulated frm a mixture f Gaussian distributin (Hastie, Tibshirani & Friedman (2001)): first we generate 10 means μ k frm a bivariate Gaussian distributin N((1; 0) T ; I) and label this class +1. Similarly, 10 mre are drawn frm N((0; 1) T ; I) and labeled class 1. Then fr each class, we generate 100 bservatins as fllws: fr each bservatin, we pickanμ k at randm with prbability 1=10, and then generate a N(μ k ; I=5), thus leading t a mixture f Gaussian clusters fr each class. We use a radial basis kernel (1.13). The regularizatin parameter is chsen t achieve gd misclassificatin errr. The results are shwn in Figure 2.2. The radial basis kernel prduces a bundary quite clse t the Bayes ptimal bundary fr this simulatin. We see that the fitted mdel f kernel lgistic regressin is quite similar in classificatin perfrmance t that f the supprt vectr machine. In additin t a classificatin bundary, since kernel lgisitic regressin estimates the lg-dds f class prbabilities, it can als prduce prbability cnturs (Figure 2.2) KLR as Margin Maximizer Recall that in Chapter 1 we described the supprt vectr machine frm tw perspectives: (a) gemetrically as the margin maximizer, and (b) as a regularized functin fitting prblem. The mtivatin fr fitting a kernel lgistic regressin mdel is the similarity in shape between the negative lg-likelihd f the binmial distributin and the hinge lss f the supprt

27 CHAPTER 2. IMPORT VECTOR MACHINES 16 SVM 130 Supprt Pints Training Errr: Test Errr: Bayes Errr: KLR Radial Basis Training Errr: Test Errr: Bayes Errr: Figure 2.2: The slid black lines are classificatin bundaries; the dashed purple lines are Bayes ptimal bundaries. Fr the SVM, the dashed black lines are the edges f the margins and the black pints are the pints exactly n the edges f the margin. Fr KLR, the dashed black lines are the p1(x) = 0:25 and 0:75 lines. vectr machine, and this mtivatin is frm the regularized functin fitting perspective. Since the supprt vectr machine was initiated as a methd t maximize the margin f the training data, then a natural questin is: what des kernel lgistic regressin d with the margin? It turns ut that kernel lgistic regressin can als be regarded as a margin maximizer. Similar t sectin 1.2.1, let D = fh 1 (x);h 2 (x);:::h q (x)g be the dictinary f the basis functins f the enlarged feature space, where q is the dimensin f the enlarged feature space. The classificatin bundary, a hyperplane in this enlarged feature space, is given by: fx : f(x) =fi 0 + h(x) T fi =0g:

28 CHAPTER 2. IMPORT VECTOR MACHINES 17 Suppse the enlarged feature space is s rich that the training data are separable, then the margin-maximing supprt vectr machine can be written as: max fi0;fi;kfik2=1 C (2.6) subject t y i fi0 + h(x i ) T fi C; i =1;::: ;n (2.7) where C is the shrtest distance frm the training data t the separating hyperplane and defined as the margin. Nw cnsider an equivalent setup f kernel lgistic regressin: min fi0;fi ln 1+e y if(x i ) (2.8) subject t kfik 2 2» s (2.9) f(x i )=fi 0 + h(x i ) T fi; i =1;::: ;n: (2.10) Then we have the fllwing therem: Therem 2.1 Suppse the training data are separable, i.e. 9fi 0 ;fi,s.t.y i (fi 0 + h(x i ) T fi) > 0; 8i. Let the slutin f (2.8) (2.10) be dented by ^fi(s), then ^fi(s) s! fi Λ as s!1; where fi Λ is the slutin f the margin-maximizing supprt vectr machine (2.6) (2.7), if fi Λ is unique. If fi Λ is nt unique, then ^fi(s) s represent margin-maximizing separating hyperplanes. may have multiple cnvergence pints, but they will all The prf f the therem is delayed in the Appendix. Therem 2.1 implies that kernel lgistic regressin, similar t the supprt vectr machine, is als a kind f margin maximizer.

29 CHAPTER 2. IMPORT VECTOR MACHINES Newtn-Raphsn Methd In this sectin, we describe hw t slve kernel lgistic regressin using the Newtn-Raphsn methd. Let H = ln 1+e y if(x i ) + 2 kfk2 H K : (2.11) Then, by the representer therem (Kimeldrf & Wahba (1971)), the slutin that minimizes H has the frm: f(x) =fi 0 + ff i K(x i ;x): Let 1 p i = ; i =1;:::n (2.12) 1+e y if(x i ) ~ff = (fi 0 ;ff 1 ;:::ff n ) T (2.13) ~p = (p 1 ;:::p n ) T (2.14) ~y = (y 1 ;::: ;y n ) T (2.15) K 1 = ~ 1;K(x i ;x i 0) n i;i 0 =1 (2.16) A (2.17) K 2 = 0 K(x i ;x i 0) n i;i 0 =1 W = diag (p 1 (1 p 1 );:::p n (1 p n )) (2.18) Ntice K 1 is a n (n + 1) matrix, K 2 is a (n +1) (n + 1) matrix and W is a n n matrix. With sme abuse f ntatin, (2.11) can be written in a finite dimensinal frm: H = ~1 T ln 1+e ~y (K 1~ff) + 2 ~fft K 2 ~ff: (2.19)

30 CHAPTER 2. IMPORT VECTOR MACHINES 19 Nw wehave where " dentes = KT 1 (~y ~p)+ K 2 ~ff 2 T = K T 1 WK 1 + K 2 (2.21) The Newtn-Raphsn methd then iterates as Algrithm 2.1: Algrithm 2.1 Newtn-Raphsn Methd fr KLR 1. Initialize ~ff Cmpute ~p; K 1 ;K 2 and W H ~ff t = ~ff t = ~ff t 1 + K T 1 WK 1 + K 2 1 K T 1 (~y ~p) K 2 ~ff t 1 (2.22) (2.23) = K T 1 WK 1 + K 2 1 K T 1 W~z (2.24) where ~z = K 1 ~ff t 1 + W 1 (~y ~p) 4. Repeat steps (2) and (3) until ~ff t cnverges. In (2.24), we have re-expressed the Newtn-Raphsn step as a weighted least-squares step. ~z is smetimes knwn as the adjusted respnse, and the step is referred t as iteratively reweighted least squares. ~ff 0 = 0 is usually a gd starting value. Since H is cnvex, the algrithm typically des cnverge, but vershting can ccur. In the rare case that vershting ccurs, step size halving will gurantee cnvergence.

31 CHAPTER 2. IMPORT VECTOR MACHINES Sequential Minimal Optimizatin Methd The drawback f the Newtn-Raphsn methd is that in each iteratin, an n n matrix needs t be inverted. The crrespnding cmputatinal cst can be high when n is large. Recently, Keerthi, Duan, Shevade & P (2002) prpsed a dual algrithm fr kernel lgistic regressin that avids inverting huge matrices. It fllws the spirit f the ppular sequential minimal ptimizatin (SMO) algrithm (Platt (1999)). Let C =1=. Let F i = f(ff) = 1 2 i 0 =1 ff i 0y i 0K(x i 0;x i )+y i ln ffi C ff i XX i + C X i ff i ff i 0y i y i 0K(x i ;x i 0) i 0 ff i C ln ff i C +(1 ff i C )ln 1 ff i C (2.25) (2.26) Then, the sequential minimal ptimizatin algrithm prceeds as Algrithm 2.2: Algrithm 2.2 Sequential Minimal Optimizatin Algrithm fr KLR 1. Initialize ff i such that 0 <ff i <C and ff i y i =0: 2. Let F up = max i F i (2.27) F lw = min i F i (2.28) i up = argmax i F i (2.29) i lw = argmin i F i (2.30)

32 CHAPTER 2. IMPORT VECTOR MACHINES 21 If F up = F lw, stp. If nt, let ~ff iup (s) = ff iup s=y iup (2.31) ~ff ilw (s) = ff ilw + s=y ilw (2.32) ~ff i (s) = ff i 8i 6= i up ;i lw (2.33) Find s Λ that minimize f (~ff(s)). 3. Update ff ψ ~ff(s Λ ): G t step (2). At the end f the algrithm, we have fi 0 = F up = F lw and f(x) =fi 0 + ff i y i K(x i ;x) (2.34) Preliminary cmputatinal experiments shw that this sequential minimal ptimizatin algrithm is rbust and fast (Keerthi, Duan, Shevade & P (2002)). We have generalized this algrithm t the multi-class case. A detailed descriptin f the multi-class case algrithm is given in the Appendix. 2.2 Imprt Vectr Machines Althugh the sequential minimal ptimizatin methd helps reduce the cmputatinal cst f kernel lgistic regressin, in the fitted mdel f(x) =fi 0 + ff i K(x i ;x); (2.35)

33 CHAPTER 2. IMPORT VECTOR MACHINES 22 as mentined in sectin 2.1, all the ff i 's are nn-zer. This is nt the case fr the supprt vectr machine, fr nly supprt pints have nn-zer ff i 's. S the supprt vectr machine allws fr data cmpressin and has the advantage f less strage and quicker evaluatin. In this sectin, we prpse an imprt vectr machine mdel that finds a sub-mdel t apprximate the full mdel (2.35) given by kernel lgistic regressin. The sub-mdel has the frm: f(x) =fi 0 + X x i 2S ff i K(x i ;x) (2.36) where S is a subset f the training data fx 1 ;x 2 ;:::x n g, and the data in S are called imprt pints. The advantage f this sub-mdel is that the cmputatinal cst is reduced, especially fr large training data sets, while nt jepardizing the perfrmance in classificatin; and since nly a subset f the training data are used t index the fitted mdel, data cmpressin is achieved. Several ther researchers have als investigated techniques in selecting the subset S. Lin, Wahba, Xiang, Ga, Klein & Klein (2000) divide the training data int several clusters, then randmly select a representative frm each cluster t make up S. Smla & Schölkpf (2000) develpe a greedy technique t sequentially select m clumns f the kernel matrix [K(x i ;x i 0)] n n, such that the span f these m clumns apprximates the span f [K(x i ;x i 0)] n n well in the Frbenius nrm. Williams & Seeger (2001) prpse randmly selecting m pints f the training data, then using the Nystrm methd t apprximate the eigen-decmpsitin f the kernel matrix [K(x i ;x i 0)] n n, and expanding the results back up t n dimensins. Nne f these methds uses the utput y i in selecting the subset S (i.e., the prcedure invlves nly x i ). The imprt vectr machine algrithm uses bth the utput y i and the input x i t select the subset S in such away that the resulting fit apprximates the full mdel well.

34 CHAPTER 2. IMPORT VECTOR MACHINES Algrithm As mentined befre, we want t find a subset S f fx 1 ;x 2 ;:::x n g, such that the submdel (2.36) is a gd apprximatin f the full mdel (2.35). Since it is cmputatinally impssible t search fr every subset S, we use a greedy frward strategy as described in Algrithm 2.3. We call the pints in S imprt pints. Algrithm 2.3 Basic IVM Algrithm 1. Let S = ;, R = fx 1 ;x 2 ;::: ;x n g, t =1. 2. Fr each x l 2R, let f l (x) =fi 0 + X x i 2S[fx l g ff i K(x i ;x) Find ~ff t minimize H(x l ) = ln (1 + exp( y i f l (x i ))) + 2 kf l(x)k 2 H K (2.37) = ~1 T ln 1 + exp( ~y (K1~ff)) l + 2 ~fft K2~ff l (2.38) where the regressr matrix K l 1 =[~1;K(x i ;x i 0)] n (m+1) ; x i 2fx 1 ;:::x n g;x i 0 2S[fx l g; the regularizatin matrix and m = jsj. 3. Find K l 2 = K(x i ;x i 0) 1 A (m+1) (m+1) ;x i ;x i 0 2S[fx l g; x l Λ = argmin xl 2RH(x l ):

35 CHAPTER 2. IMPORT VECTOR MACHINES 24 Let S = S[fx l Λg, R = Rnfx l Λg, H t = H(x l Λ), t! t Repeat steps (2) and (3) until H t cnverges. Algrithm 2.3 is cmputatinally feasible, but in step (2) we need t use the Newtn- Raphsn methd r sequential minimal ptimizatin methd t find ~ff iteratively. When the number f imprt pints m becmes large, the cmputatin can be expensive. T reduce this cmputatin, we use a further apprximatin. Instead f iteratively cmputing ~ff until it cnverges, we can simply d a ne-step Newtn-Raphsn iteratin, and use it as an apprximatin t the cnverged ne. T get a gd apprximatin, we take advantage f the fitted result frm the current ptimal" S, i.e., the sub-mdel when jsj = m, and use it as the initial value. This ne-step update is similar t the scre test in generalized linear mdels (GLM), but the latter des nt have a penalty term. The updating frmula allws the weighted regressin (2.24) t be cmputed in O(nm) time. Hence, we have the revised step (2) f Algrithm 2.3 in Algrithm??. Algrithm 2.4 Revised Step (2) (2 Λ ) Fr each x l 2R, crrespndingly augment K 1 with a clumn, and K 2 with a clumn and a rw. Use the updating frmula t find ~ff in (2.24). Cmpute (2.37). In step (4) f Algrithm 2.3, we need t decide when t stp the algrithm. A natural stpping rule is t lk at the regularized negative lg-likelihd. Let H 1 ;H 2 ;::: be the sequence f regularized negative lg-likelihd btained in step (3). At each step t, we cmpare H t with H t t, where t is a pre-chsen small integer, fr example, t =1. If the rati jht H t tj jh tj is less than sme pre-chsen small number ffl, fr example, ffl =0:001, we stp adding new imprt pints t S Selectin f S far, we have assumed that the tuning parameter is fixed. In practice, we als need t chse an ptimal". We can randmly split all the data int a training set and a tuning set, and use the misclassificatin errr n the tuning set as a criterin fr chsing

36 CHAPTER 2. IMPORT VECTOR MACHINES 25. T reduce the cmputatin, we take advantage f the fact that the regularized negative lg-likelihd cnverges faster fr a larger. Thus, instead f running the entire revised algrithm fr each, we prpse Algrithm??, which cmbines bth adding imprt pints t S and chsing the ptimal. Algrithm 2.5 Simultaneus Selectin f S and 1. Start with a large tuning parameter. 2. Let S = ;, R = fx 1 ;::: ;x n g, t =1. 3. Run steps (2 Λ ), (3) and (4) f the revised Algrithm 2.4, until the stpping criterin is satisfied at S = fx i 1 ;::: ;x i t g. Alng the way, als cmpute the misclassficatin errr n the tuning set. 4. Decrease t a smaller value. 5. Repeat steps (3) and (4), starting with S = fx i 1 ;::: ;x i t g. We chse the ptimal as the ne that crrespnds t the minimum misclassificatin errr n the tuning set Simulatin Results In this sectin, we use a simulatin t illustrate the imprt vectr machine methd. The data are generated in the same way as Figure 2.2. The simulatin results are shwn in Figure 2.3 Figure 2.5. Figure 2.3 shws hw the tuning parameter is selected. The ptimal is fund t be equal t 1 and crrespnds t a misclassificatin rate 0:262. Figure 2.4 fixes the tuning paramter t = 1 and finds 19 imprt pints. Figure 2.5 cmpares the results f the supprt vectr machine and the imprt vectr machine: the supprt vectr machine has 130 supprt pints, and the imprt vectr machine uses 19 imprt pints; they give similar classificatin bundaries. Figure 2.6 is fr the same simulatin but different sizes f training data: n = 200; 400; 600; 800. We see that as the training data size n increases, the number f imprt pints des nt tend t increase.

37 CHAPTER 2. IMPORT VECTOR MACHINES 26 Chsing Lambda Chsing Lambda Regularized Deviance Optimal Lambda = 1 Misclassificatin Rate Optimal Lambda = # f Imprt Pints # f Imprt Pints Figure 2.3: Radial kernel is used. n =200, ff 2 =0:7, t =3, ffl =0:001, decreases frm e 10 t e 10. The minimum misclassificatin rate 0:262 is fund t crrespnd t =1. Remark: The supprt pints f the supprt vectr machine are thse which are clse t the classificatin bundary r misclassified and usually have large weights [p(x)(1 p(x))]. The imprt pints f the imprt vectr machine are thse that decrease the regularized negative lg-likelihd the mst, and can be either clse t r far frm the classificatin bundary. The difference in prperties is natural, because the supprt vectr machine is nly cncerned with the classificatin sign[p(x) 1=2], while the imprt vectr machine als fcuses n the unknwn prbability p(x). Thugh pints away frm the classificatin bundary d nt cntribute t determining the psitin f the classificatin bundary, they may cntribute t estimating the unknwn prbability p(x). The ttal cmputatinal cst f the supprt vectr machine is O(n 2 s), where s is the number f supprt pints, while the cmputatinal cst f the imprt vectr machine methd is O(n 2 m 2 ), where m is the number f imprt pints. Since m des nt tend t increase as n increases, as illustrated in

38 CHAPTER 2. IMPORT VECTOR MACHINES 27 Lambda = 1 Lambda = 1 Regularized Deviance # f Imprt Pints = 19 Misclassificatin Rate Test Errr = # f Imprt Pints # f Imprt Pints Figure 2.4: Radial kernel is used. n =200, ff 2 =0:7, t = 3, ffl =0:001, =1. The stpping criterin is satisfied when jsj =19. Figure 2.6, the cmputatinal cst f the imprt vectr machine can be smaller than that f the supprt vectr machine, especially fr large training data sets Real Data Results In this sectin, we cmpare the perfrmance f the imprt vectr machine and the supprt vectr machine n sme real datasets. Ten benchmark datasets are used fr this purpse: Banana, Breast-cancer, Flare-slar, German, Heart, Image, Ringnrm, Splice, Thyrid, Titanic, Twnrm and Wavefrm. Detailed infrmatin abut these datasets can be fund in Rätsch, T.Onda & K.R.Müller (2000) r are available at http : ==ida:f irst:gmd:edu= ο raetsch=data.

39 CHAPTER 2. IMPORT VECTOR MACHINES 28 SVM 130 Supprt Pints Training Errr: Test Errr: Bayes Errr: IVM 19 Imprt Pints Training Errr: Test Errr: Bayes Errr: Figure 2.5: The slid black lines are classificatin bundaries; the dashed purple lines are Bayes ptimal bundaries. Fr the SVM, the dashed black lines are the edges f the margins, and the black pints are the supprt pints exactly n the margin. Fr the IVM, the dashed black lines are the p1(x) = 0:25 and 0:75 lines, and the black pints are the imprt pints. Table 2.1 cntains a summary f these datasets. Radial kernel (1.13) K(x; x 0 )=e kx x0 k 2 2ff2 is used thrughut these datasets. The parameters ff and are fixed at specific values that are ptimal fr the supprt vectr machine's generalizatin perfrmance (Rätsch, T.Onda & K. R. Müller (2000)). Each dataset has 20 realizatins f the training and test data. The results are in Table 2.2 and Table 2.3. The number utside each bracket is the mean ver 20 realizatins f the training and test data, and the number in each bracket is the standard deviatin. Frm Table 2.2, we can see that the imprt vectr machine perfrms as well as the supprt vectr machine in classificatin n these benchmark datasets. Frm Table 2.3, we can see that the imprt vectr machine typically uses a much smaller fractin f the training data than the supprt vectr machine t index kernel basis functins. This may give the imprt vectr machine a cmputatinal advantage ver the supprt vectr

40 CHAPTER 2. IMPORT VECTOR MACHINES 29 # f Training Data = 200 # f Training Data = 400 # f Training Data = 600 # f Training Data = 800 Regularized Deviance # f Imprt Pints = 19 Regularized Deviance # f Imprt Pints = 22 Regularized Deviance # f Imprt Pints = 26 Regularized Deviance # f Imprt Pints = # f Imprt Pints # f Imprt Pints # f Imprt Pints # f Imprt Pints Figure 2.6: The data are generated in the same way as Figure 2.3 Figure 2.5. Radial kernel is used. ff = 0:7, =1, t =3, ffl =0:001. The sizes f training data are n =200; 400; 600; 800, and the crrespnding numbers f imprt pints are 19; 22; 26; 22. machine, especially when the size f the training data is large. 2.3 Multi-class Case In this sectin, we briefly describe a generalizatin f the imprt vectr machine t multiclass classificatin. Suppse there are K classes. The cnditinal prbability f a pint being in class k given X = x is dented as p k (x) = P (Y = kjx = x). Hence the Bayes classificatin rule is given by: c(x) = argmax k2f1;::: ;Kg p k (x)

41 CHAPTER 2. IMPORT VECTOR MACHINES 30 Table 2.1: Summary f the ten benchmark datasets. n is the size f the training data, p is the dimensin f the riginal input, ff 2 is the parameter f the radial kernel, is the tuning parameter, and N is the size f the test data. Dataset n p ff 2 N Banana : Breast-cancer : Flare-slar German Heart Image Ringnrm Thyrid Titanic Twnrm Wavefrm Table 2.2: Cmparisn f classificatin perfrmance f SVM and IVM n ten benchmark datasets. Dataset SVM Errr (%) IVM Errr (%) Banana 10:78(±0:68) 10:34(±0:46) Breast-cancer 25:58(±4:50) 25:92(±4:79) Flare-slar 32:65(±1:42) 33:66(±1:64) German 22:88(±2:28) 23:53(±2:48) Heart 15:95(±3:14) 15:80(±3:49) Image 3:34(0:70) 3:31(±0:80) Ringnrm 2:03(±0:19) 1:97(±0:29) Thyrid 4:80(±2:98) 5:00(±3:02) Titanic 22:16(±0:60) 22:39(±1:03) Twnrm 2:90(±0:25) 2:45(±0:15) Wavefrm 9:98(±0:43) 10:13(±0:47)

42 CHAPTER 2. IMPORT VECTOR MACHINES 31 Table 2.3: Cmparisn f number f kernel basis used by SVM and IVM n ten benchmark datasets. Dataset #fsv #fiv Banana 90(±10) 21(±7) Breast-cancer 115(±5) 14(±3) Flare-slar 597(±8) 9(±1) German 407(±10) 17(±2) Heart 90(±4) 12(±2) Image 221(±11) 72(±18) Ringnrm 89(±5) 72(±30) Thyrid 21(±2) 22(±3) Titanic 69(±9) 8(±2) Twnrm 70(±5) 24(±4) Wavefrm 151(±9) 26(±3) The mdel has the frm: p 1 (x) = p 2 (x) = e f 1(x) P K k=1 ef k(x) ; (2.39) e f 2(x) P K k=1 ef k(x) ; (2.40) p K (x) =. (2.41) e f K (x) P K k=1 ef k(x) ; (2.42) where f k (x) 2 H K ; H K is the reprducing kernel Hilbert space generated by a psitive definite kernel K( ; ). Ntice that f 1 (x);::: ;f K (x) are nt identifiable in this mdel, fr if we add a cmmn term t each f k (x), p 1 (x);::: ;p K (x) will nt change. T make f k (x) identifiable, we cnsider the symmetric cnstraint KX k=1 f k (x) =0: (2.43) Then the multi-class kernel lgistic regressin fits a mdel t minimize the regularized

43 CHAPTER 2. IMPORT VECTOR MACHINES 32 negative lg-likelihd H = = ln p yi (x i )+ 2 kfk2 H K (2.44) h y T i f(x i )+ln i e f 1(x i ) + + e f K (x i ) + 2 kfk2 H K (2.45) where y i is a binary K-vectr with values all zer except a 1 in psitin k if the class is k, and f(x i ) = (f 1 (x i );::: ;f K (x i )) T ; (2.46) kfk 2 H K = KX k=1 kf k k 2 H K : (2.47) Using the representer therem (Kimeldrf & Wahba (1971)), ne can shwthatf k (x), which minimizes H, has the frm f k (x) =fi 0k + ff ik K(x i ;x): (2.48) Hence, (2.44) becmes H = h i yi T (K 1 (i; )A) T +ln 1 T e (K 1(i;)A) T + 2 KX k=1 ff T k K 2ff k (2.49) where A =(ff 1 :::ff K ), K 1 and K 2 are defined in the same way asinthetw-class case; and K 1 (i; )istheith rw fk 1. Ntice that in this mdel, the cnstraint (2.43) is nt necessary anymre, fr at the minimum f (2.44), P K k=1 f k(x) = 0 is autmatically satisfied.

44 CHAPTER 2. IMPORT VECTOR MACHINES Multi-class KLR and Multi-class SVM Similar t Therem 2.1, a cnnectin between the multi-class kernel lgistic regressin and the multi-class supprt vectr machine als exists. Let D = fh 1 (x);::: ;h q (x)g be the dictinary f the basis functins f the enlarged feature space. Cnsider min fi0k;fi k h i yi T f(x i)+ln e f 1(x i ) + + e f K (x i ) (2.50) subject t f k (x i )=fi 0k + h(x i ) T fi k ; i =1;::: ;n (2.51) KX k=1 kfi k k 2 2» s: (2.52) Therem 2.2 Suppse the training data are pairwise separable, i.e. 9fi 0k ;fi k, s.t. (fi 0yi fi 0k )+h(x i ) T (fi yi fi k ) > 0; 8i; 8k 6= y i. Let the slutin f (2.50) (2.52) be dented by ^fi k (s), then ^fi k (s) s! fi Λ as s!1; where fi Λ is the slutin f the multi-class supprt vectr machine (1.18) (1.22), if fi Λ is unique. If fi Λ is nt unique, then ^fi k (s) s slutins f (1.18) (1.22). may have multiple cnvergence pints, but they are all The prf f the therem is very similar t that f Therem 2.1, we mit it here Multi-class IVM The multi-class imprt vectr machine prcedure is similar t the tw-class case, and the cmputatinal cst is O(Kn 2 m 2 ). Figure 2.7 is a simulatin f the multi-class imprt vectr machine. The data in each class are generated frm a mixture f Gaussians (Hastie,

45 CHAPTER 2. IMPORT VECTOR MACHINES 34 Tibshirani & Friedman (2001)). Multi-class IVM - with 32 imprt pints Training Errr: Test Errr: Bayes Errr: Figure 2.7: Radial kernel is used. K =3, N = 300, =0:368, jsj = Summary We have discussed the imprt vectr machine methd in tw-class and multi-class classificatin. We shwed that it nt nly perfrms as well as the supprt vectr machine, but als prvides an estimate f the prbability p(x). The cmputatinal cst f the imprt vectr machine is O(n 2 m 2 ) fr the tw-class case and O(Kn 2 m 2 ) fr the multi-class case, where m is the number f imprt pints.

46 Chapter 3 1-nrm Supprt Vectr Machines In Chapter 2, we replace the hinge lss functin f the supprt vectr machine with the binmial lg-likelihd and fit kernel lgistic regressin and imprt vectr machine mdels. In this chapter, we replace the L 2 -nrm penalty f the supprt vectr machine with the L 1 -nrm, and cnsider the 1-nrm supprt vectr machine. We argue that the 1-nrm supprt vectr machine may have sme advantage ver the standard 2-nrm supprt vectr machine, especially when there are redundant nise features. We als prpse an efficient algrithm that cmputes the whle slutin path f the 1-nrm supprt vectr machine, hence facilitates adaptive selectin f the tuning parameter fr the 1-nrm supprt vectr machine. 3.1 Intrductin In standard tw-class classificatin prblems, we are given a set f training data (x 1 ;y 1 ), :::(x n ;y n ), where the input x i 2 R p, and the utput y i 2 f1; 1g is binary. We wish t find a classficatin rule frm the training data, s that when given a new input x, we can assign a class y frm f1; 1g t it. 35

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Contents. This is page i Printer: Opaque this

Contents. This is page i Printer: Opaque this Cntents This is page i Printer: Opaque this Supprt Vectr Machines and Flexible Discriminants. Intrductin............. The Supprt Vectr Classifier.... Cmputing the Supprt Vectr Classifier........ Mixture

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall Stats 415 - Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin

More information

Linear Classification

Linear Classification Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants Supprt Vectr Machines and Flexible Discriminants This is page Printer: Opaque this. Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal separating

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Linear programming III

Linear programming III Linear prgramming III Review 1/33 What have cvered in previus tw classes LP prblem setup: linear bjective functin, linear cnstraints. exist extreme pint ptimal slutin. Simplex methd: g thrugh extreme pint

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES 1 SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES Wlfgang HÄRDLE Ruslan MORO Center fr Applied Statistics and Ecnmics (CASE), Humbldt-Universität zu Berlin Mtivatin 2 Applicatins in Medicine estimatin f

More information

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001) CN700 Additive Mdels and Trees Chapter 9: Hastie et al. (2001) Madhusudana Shashanka Department f Cgnitive and Neural Systems Bstn University CN700 - Additive Mdels and Trees March 02, 2004 p.1/34 Overview

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

Preparation work for A2 Mathematics [2018]

Preparation work for A2 Mathematics [2018] Preparatin wrk fr A Mathematics [018] The wrk studied in Y1 will frm the fundatins n which will build upn in Year 13. It will nly be reviewed during Year 13, it will nt be retaught. This is t allw time

More information

B. Definition of an exponential

B. Definition of an exponential Expnents and Lgarithms Chapter IV - Expnents and Lgarithms A. Intrductin Starting with additin and defining the ntatins fr subtractin, multiplicatin and divisin, we discvered negative numbers and fractins.

More information

The Solution Path of the Slab Support Vector Machine

The Solution Path of the Slab Support Vector Machine CCCG 2008, Mntréal, Québec, August 3 5, 2008 The Slutin Path f the Slab Supprt Vectr Machine Michael Eigensatz Jachim Giesen Madhusudan Manjunath Abstract Given a set f pints in a Hilbert space that can

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

A Scalable Recurrent Neural Network Framework for Model-free

A Scalable Recurrent Neural Network Framework for Model-free A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

Thermodynamics and Equilibrium

Thermodynamics and Equilibrium Thermdynamics and Equilibrium Thermdynamics Thermdynamics is the study f the relatinship between heat and ther frms f energy in a chemical r physical prcess. We intrduced the thermdynamic prperty f enthalpy,

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis

More information

NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN. Junjiro Ogawa University of North Carolina

NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN. Junjiro Ogawa University of North Carolina NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN by Junjir Ogawa University f Nrth Carlina This research was supprted by the Office f Naval Research under Cntract N. Nnr-855(06) fr research in prbability

More information

Preparation work for A2 Mathematics [2017]

Preparation work for A2 Mathematics [2017] Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will

More information

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Emphases in Common Core Standards for Mathematical Content Kindergarten High School Emphases in Cmmn Cre Standards fr Mathematical Cntent Kindergarten High Schl Cntent Emphases by Cluster March 12, 2012 Describes cntent emphases in the standards at the cluster level fr each grade. These

More information

Section 6-2: Simplex Method: Maximization with Problem Constraints of the Form ~

Section 6-2: Simplex Method: Maximization with Problem Constraints of the Form ~ Sectin 6-2: Simplex Methd: Maximizatin with Prblem Cnstraints f the Frm ~ Nte: This methd was develped by Gerge B. Dantzig in 1947 while n assignment t the U.S. Department f the Air Frce. Definitin: Standard

More information

Pipetting 101 Developed by BSU CityLab

Pipetting 101 Developed by BSU CityLab Discver the Micrbes Within: The Wlbachia Prject Pipetting 101 Develped by BSU CityLab Clr Cmparisns Pipetting Exercise #1 STUDENT OBJECTIVES Students will be able t: Chse the crrect size micrpipette fr

More information

Statistical Learning. 2.1 What Is Statistical Learning?

Statistical Learning. 2.1 What Is Statistical Learning? 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t

More information

8 th Grade Math: Pre-Algebra

8 th Grade Math: Pre-Algebra Hardin Cunty Middle Schl (2013-2014) 1 8 th Grade Math: Pre-Algebra Curse Descriptin The purpse f this curse is t enhance student understanding, participatin, and real-life applicatin f middle-schl mathematics

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

Principal Components

Principal Components Principal Cmpnents Suppse we have N measurements n each f p variables X j, j = 1,..., p. There are several equivalent appraches t principal cmpnents: Given X = (X 1,... X p ), prduce a derived (and small)

More information

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network Research Jurnal f Applied Sciences, Engineering and Technlgy 5(2): 465-469, 2013 ISSN: 2040-7459; E-ISSN: 2040-7467 Maxwell Scientific Organizatin, 2013 Submitted: May 08, 2012 Accepted: May 29, 2012 Published:

More information

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion .54 Neutrn Interactins and Applicatins (Spring 004) Chapter (3//04) Neutrn Diffusin References -- J. R. Lamarsh, Intrductin t Nuclear Reactr Thery (Addisn-Wesley, Reading, 966) T study neutrn diffusin

More information

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method. Lessn Plan Reach: Ask the students if they ever ppped a bag f micrwave ppcrn and nticed hw many kernels were unppped at the bttm f the bag which made yu wnder if ther brands pp better than the ne yu are

More information

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are: Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

Department of Electrical Engineering, University of Waterloo. Introduction

Department of Electrical Engineering, University of Waterloo. Introduction Sectin 4: Sequential Circuits Majr Tpics Types f sequential circuits Flip-flps Analysis f clcked sequential circuits Mre and Mealy machines Design f clcked sequential circuits State transitin design methd

More information

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

Reinforcement Learning CMPSCI 383 Nov 29, 2011! Reinfrcement Learning" CMPSCI 383 Nv 29, 2011! 1 Tdayʼs lecture" Review f Chapter 17: Making Cmple Decisins! Sequential decisin prblems! The mtivatin and advantages f reinfrcement learning.! Passive learning!

More information

Multiple Source Multiple. using Network Coding

Multiple Source Multiple. using Network Coding Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,

More information

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 6. An electrchemical cell is cnstructed with an pen switch, as shwn in the diagram abve. A strip f Sn and a strip f an unknwn metal, X, are used as electrdes.

More information