COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Size: px
Start display at page:

Download "COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification"

Transcription

1 COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf Slides mstly by: Jelle Pineau Class web page: Unless therwise nted, all material psted fr this curse are cpyright f the instructrs, and cannt be reused r repsted withut the instructr s written permissin.

2 Mdeling fr binary classificatin Tw prbabilistic appraches: 1. Generative learning: Separately mdel P(x y) and P(y). Use Bayes rule, t estimate P(y x): P(y =1 x) = P(x y =1)P(y =1) P(x) 2. Discriminative learning: Directly estimate P(y x). 2 Jelle Pineau

3 Hw abut ther types f data? Last lecture, we saw ne generative apprach (LDA) LDA wrks with cntinuus data What abut ther types f data? 3 Jelle Pineau

4 Hw abut ther types f data? LDA nly wrks with cntinuus input data Let s lk at an apprach fr handling ther types f data (mainly: binary) 4 Jelle Pineau

5 Generative learning with binary input data Generative learning: Estimate P(x y), P(y). Then calculate P(y x). Simple principle: fr every class y, estimate the cnditinal prbability P(x y) f every input pattern x What happens if the number f input variables m is large? 5 Jelle Pineau

6 Generative learning with binary input data Generative learning: Estimate P(x y), P(y). Then calculate P(y x). Simple principle: fr every class y, estimate the cnditinal prbability P(x y) f every input pattern x What happens if the number f input variables m is large? O(2 m ) parameters necessary t describe the mdel! Need additinal assumptin n structure f input t keep manageable! 6 Jelle Pineau

7 Naïve Bayes assumptin Generative learning: Estimate P(x y), P(y). Then calculate P(y x). Naïve Bayes: Assume the x j are cnditinally independent given y. In ther wrds: P(x j y) = P(x j y, x k ), fr all j, k 7 Jelle Pineau

8 Naïve Bayes assumptin Generative learning: Estimate P(x y), P(y). Then calculate P(y x). Naïve Bayes: Assume the x j are cnditinally independent given y. In ther wrds: P(x j y) = P(x j y, x k ), fr all j, k Generative mdel structure: P(x y) = P(x 1, x 2,, x m y) = P(x 1 y) P(x 2 y, x 1 ) P(x 3 y, x 1, x 2 ) P(x m y, x 1, x 2,, x m-1 ) (frm general rules f prbabilities) = P(x 1 y) P(x 2 y) P(x 3 y) P(x m y) (frm the Naïve Bayes assumptin abve) 8 Jelle Pineau

9 Cnditinally independence example Offer and pprtunity might bth ccur ften in spam s Let s say we get 50% spam and 50% regular. Let s say, spam e- mails cntain 50% ffer and 50% pprtunity independently. Let s say it s 10% fr either in regular . Spam Regular Tgether % Expected % Cntains nly ffer Cntains nly pprtunity Cntains neither Cntains bth Offer and pprtunity nt independent ver all s! We say cnditinally independent given the class 9 Jelle Pineau

10 Naïve Bayes graphical mdel y x 1 x 2 x 3 x m Hw many parameters t estimate? Assume m binary features. 10 Jelle Pineau

11 Naïve Bayes graphical mdel y x 1 x 2 x 3 x m Hw many parameters t estimate? Assume m binary features. withut Naïve Bayes assumptin: O(2 m ) numbers t describe mdel. With Naïve Bayes assumptin: O(m) numbers t describe mdel. Useful when the number f features is high. 11 Jelle Pineau

12 Training a Naïve Bayes classifier Assume x, y are binary variables, m=1. Estimate the parameters P(x y) and P(y) frm data. y Define: Θ 1 = Pr (y=1) Θ j,1 = Pr (x j =1 y=1) Θ j,0 = Pr (x j =1 y=0). x 12 Jelle Pineau

13 Training a Naïve Bayes classifier Assume x, y are binary variables, m=1. Estimate the parameters P(x y) and P(y) frm data. y Define: Θ 1 = Pr (y=1) Θ j,1 = Pr (x j =1 y=1) Θ j,0 = Pr (x j =1 y=0). x Evaluatin criteria: Find parameters that maximize the lglikelihd functin. Likelihd: Pr(y x) Pr(y)Pr(x y) = i=1:n ( P(y i ) j=1:m P(x i,j y i ) ) Samples i are independent, s we take prduct ver n. Input features are independent (cnd. n y) s we take prduct ver m. 13 Jelle Pineau

14 Training a Naïve Bayes classifier Likelihd fr binary utput variable: L(Θ 1 y) = Θ 1y (1-Θ 1 ) 1-y Lg-likelihd fr all parameters (like befre): lg L(Θ 1,Θ i,1,θ i,0 D) = Σ i=1:n [ lg P(y i ) + Σ j=1:m lg P(x i,j y i ) ] 14 Jelle Pineau

15 Training a Naïve Bayes classifier Likelihd fr binary utput variable: L(Θ 1 y) = Θ 1y (1-Θ 1 ) 1-y Lg-likelihd fr all parameters (like befre): lg L(Θ 1,Θ i,1,θ i,0 D) = Σ i=1:n [ lg P(y i ) + Σ j=1:m lg P(x i,j y i ) ] = Σ i=1:n [ y i lg Θ 1 + (1-y i )lg(1-θ 1 ) + Σ j=1:m y i ( x i,j lgθ i,1 + (1-x i,j )lg(1-θ i,1 ) ) + Σ j=1:m (1-y i )( x i,j lgθ i,0 + (1-x i,j )lg(1-θ i,0 ) ) ] 15 Jelle Pineau

16 Training a Naïve Bayes classifier Likelihd fr binary utput variable: L(Θ 1 y) = Θ 1y (1-Θ 1 ) 1-y Lg-likelihd fr all parameters (like befre): lg L(Θ 1,Θ i,1,θ i,0 D) = Σ i=1:n [ lg P(y i ) + Σ j=1:m lg P(x i,j y i ) ] = Σ i=1:n [ y i lg Θ 1 + (1-y i )lg(1-θ 1 ) + Σ j=1:m y i ( x i,j lgθ i,1 + (1-x i,j )lg(1-θ i,1 ) ) + Σ j=1:m (1-y i )( x i,j lgθ i,0 + (1-x i,j )lg(1-θ i,0 ) ) ] (will have ther frm if params P(x y) have ther frm, e.g. Gaussian). 16 Jelle Pineau

17 Training a Naïve Bayes classifier Likelihd fr binary utput variable: L(Θ 1 y) = Θ 1y (1-Θ 1 ) 1-y Lg-likelihd fr all parameters (like befre): lg L(Θ 1,Θ i,1,θ i,0 D) = Σ i=1:n [ lg P(y i ) + Σ j=1:m lg P(x i,j y i ) ] = Σ i=1:n [ y i lg Θ 1 + (1-y i )lg(1-θ 1 ) + Σ j=1:m y i ( x i,j lgθ i,1 + (1-x i,j )lg(1-θ i,1 ) ) + Σ j=1:m (1-y i )( x i,j lgθ i,0 + (1-x i,j )lg(1-θ i,0 ) ) ] (will have ther frm if params P(x y) have ther frm, e.g. Gaussian). Maximize t estimate Θ 1 : take derivative f lgl, set t 0: L / Θ 1 = Σ i=1:n (y i /Θ 1 - (1-y i )/(1-Θ 1 )) = 0 17 Jelle Pineau

18 Training a Naïve Bayes classifier Slving fr Θ 1 we get: Θ 1 = (1/n) Σ i=1:n y i = number f examples where y=1 / number f examples Similarly, we get: Θ j,1 = number f examples where x j =1 and y=1 / number f examples where y=1 Θ j,0 = number f examples where x j =1 and y=0 / number f examples where y=0 18 Jelle Pineau

19 Naïve Bayes decisin bundary Decisin bundary where prbability f classes are equal: lg-dds rati = 0 lg Pr(y =1 x) Pr(y = 0 x) = lg Pr(x y =1)P(y =1) Pr(x y = 0)P(y = 0) = lg = lg m ( ) ( ) ( ) ( ) P(y =1) P(y = 0) + lg P x j y =1 j=1 P x j y = 0 m j=1 m P(y =1) P(y = 0) + lg P x j y =1 P x j y = 0 j=1 19 Jelle Pineau

20 Naïve Bayes decisin bundary Cnsider the case where features are binary: x j = {0, 1} Define: w j,0 = lg P(x j = 0 y =1) P(x j = 0 y = 0) ; w j,1 = lg P(x j =1 y =1) P(x j =1 y = 0) Nw we have: lg Pr(y =1 x) Pr(y = 0 x) ( ) ( ) m P(y =1) = lg P(y = 0) + lg P x j y =1 P x j y = 0 This is a linear decisin bundary! cnstant + linear in x j=1 m P(y =1) = lg P(y = 0) + (w j,0(1 x j )+ w j,1 x j ) j=1 m m P(y =1) = lg P(y = 0) + w j,0 + (w j,1 w j,0 )x j j=1 j=1 20 Jelle Pineau

21 Text classificatin example Using Naïve Bayes, we can cmpute prbabilities fr all the wrds which appear in the dcument cllectin. P(y=c) is the prbability f class c P(x j y=c) is the prbability f seeing wrd j in dcuments f class c Class c wrd 1 wrd 2 wrd 3 wrd m 21 Jelle Pineau

22 Text classificatin example Using Naïve Bayes, we can cmpute prbabilities fr all the wrds which appear in the dcument cllectin. P(y=c) is the prbability f class c P(x j y=c) is the prbability f seeing wrd j in dcuments f class c Set f classes depends n the applicatin, e.g. Tpic mdeling: each class crrespnds t dcuments n a given tpic, e.g. {Plitics, Finance, Sprts, Arts}. Class c What happens when a wrd is nt bserved in the training data? wrd 1 wrd 2 wrd 3 wrd m 22 Jelle Pineau

23 Laplace smthing Replace the maximum likelihd estimatr: Pr(x j y=1) = number f instance with x j =1 and y=1 number f examples with y=1 23 Jelle Pineau

24 Laplace smthing Replace the maximum likelihd estimatr: Pr(x j y=1) = number f instance with x j =1 and y=1 number f examples with y=1 With the fllwing: Pr(x j y=1) = (number f instance with x j =1 and y=1) + 1 (number f examples with y=1) Jelle Pineau

25 Laplace smthing Replace the maximum likelihd estimatr: Pr(x j y=1) = number f instance with x j =1 and y=1 number f examples with y=1 With the fllwing: Pr(x j y=1) = (number f instance with x j =1 and y=1) + 1 (number f examples with y=1) + 2 If n example frm that class, it reduces t a prir prbability r Pr=1/2. If all examples have x j =1, then Pr(x j =0 y) has Pr = 1 / (#examples + 1). If a wrd appears frequently, the new estimate is nly slightly biased. This is a frm f regularizatin (decreases variance at the cst f bias ) 25 Jelle Pineau

26 Example: 20 newsgrups Given 1000 training dcuments frm each grup, learn t classify new dcuments accrding t which newsgrup they came frm: cmp.graphics cmp.s.ms-windws.misc cmp.sys.ibm.pc.hardware cmp.sys.mac.hardware cmp.windws.x alt.atheism sc.religin.christian talk.religin.misc talk.plitics.mideast talk.plitics.misc misc.frsale rec.auts rec.mtrcycles rec.sprt.baseball rec.sprt.hckey sci.space sci.crypt sci.electrnics sci.med talk.plitics.guns Naïve Bayes: 89% classificatin accuracy (cmparable t ther state-f-the-art methds.) 26 Jelle Pineau

27 Gaussian Naïve Bayes Extending Naïve Bayes t cntinuus inputs: P(y) is still assumed t be a binmial distributin. P(x y) is assumed t be a multivariate Gaussian (nrmal) distributin with mean μ R n and cvariance matrix Σ R n xr n 27 Jelle Pineau

28 Gaussian Naïve Bayes Extending Naïve Bayes t cntinuus inputs: P(y) is still assumed t be a binmial distributin. P(x y) is assumed t be a multivariate Gaussian (nrmal) distributin with mean μ R n and cvariance matrix Σ R n xr n If we assume the same Σ fr all classes: Linear discriminant analysis. If Σ is distinct between classes: Quadratic discriminant analysis. If Σ is diagnal (i.e. features are independent): Gaussian Naïve Bayes. (linear if same fr all classes) 28 Jelle Pineau

29 Gaussian Naïve Bayes Extending Naïve Bayes t cntinuus inputs: P(y) is still assumed t be a binmial distributin. P(x y) is assumed t be a multivariate Gaussian (nrmal) distributin with mean μ R n and cvariance matrix Σ R n xr n If we assume the same Σ fr all classes: Linear discriminant analysis. If Σ is distinct between classes: Quadratic discriminant analysis. If Σ is diagnal (i.e. features are independent): Gaussian Naïve Bayes. (linear if same fr all classes) Hw d we estimate parameters? Derive the maximum likelihd estimatrs fr μ and Σ. 29 Jelle Pineau

30 Mdeling fr binary classificatin Tw prbabilistic appraches: 1. Generative learning: Separately mdel P(x y) and P(y). Use Bayes rule, t estimate P(y x): P(y =1 x) = P(x y =1)P(y =1) P(x) 2. Discriminative learning: Directly estimate P(y x). 30 Jelle Pineau

31 Discriminative learning We have seen that under several assumptins, we get linear decisin bundaries p(x y) are Gaussian with shared cvariance (LDA) p(x y) are independent Bernulli distributins (Naïve Bayes) D we really need t estimate p(x y) and p(y)? Can we directly find the parameters f the best decisin bundary? E.g. cvariance matrix requires estimating O(m 2 ) but decisin bundary nly requires O(m) parameters 31 Jelle Pineau

32 Prbabilistic view f discriminative learning Suppse we have 2 classes: y {0, 1} What is the prbability f a given input x having class y = 1? Cnsider Bayes rule: P(y =1 x) = P(x, y =1) P(x) = P(x y =1)P(y =1) P(x y =1)P(y =1)+ P(x y = 0)P(y = 0) 32 Jelle Pineau

33 Prbabilistic view f discriminative learning Suppse we have 2 classes: y {0, 1} What is the prbability f a given input x having class y = 1? Cnsider Bayes rule: P(y =1 x) = = 1+ P(x, y =1) P(x) = 1 P(x y = 0)P(y = 0) P(x y =1)P(y =1) P(x y =1)P(y =1) P(x y =1)P(y =1)+ P(x y = 0)P(y = 0) = 1+ exp(ln 1 = P(x y = 0)P(y = 0) P(x y =1)P(y =1) ) 1 1+ exp( a) = =σ(-a) σ 33 Jelle Pineau

34 Prbabilistic view f discriminative learning Suppse we have 2 classes: y {0, 1} What is the prbability f a given input x having class y = 1? Cnsider Bayes rule: P(y =1 x) = = 1+ P(x, y =1) P(x) = 1 P(x y = 0)P(y = 0) P(x y =1)P(y =1) P(x y =1)P(y =1) P(x y =1)P(y =1)+ P(x y = 0)P(y = 0) = 1+ exp(ln 1 = P(x y = 0)P(y = 0) P(x y =1)P(y =1) ) 1 1+ exp( a) = =σ(-a) σ where a = ln P(x y =1)P(y =1) P(x y = 0)P(y = 0) = ln P(y =1 x) P(y = 0 x) (By Bayes rule; P(x) n tp and bttm cancels ut.) 34 Jelle Pineau

35 Prbabilistic view f discriminative learning Suppse we have 2 classes: y {0, 1} What is the prbability f a given input x having class y = 1? Cnsider Bayes rule: P(y =1 x) = where = 1+ P(x, y =1) P(x) = 1 P(x y = 0)P(y = 0) P(x y =1)P(y =1) a = ln P(x y =1)P(y =1) P(x y =1)P(y =1)+ P(x y = 0)P(y = 0) = 1+ exp(ln P(x y =1)P(y =1) P(x y = 0)P(y = 0) 1 = P(x y = 0)P(y = 0) P(x y =1)P(y =1) ) = ln P(y =1 x) P(y = 0 x) Here σ has a special frm, called the lgistic functin 1 1+ exp( a) = =σ(-a) σ (By Bayes rule; P(x) n tp and bttm cancels ut.) and a is the lg-dds rati f data being class 1 vs. class Jelle Pineau

36 Discriminative learning: Lgistic regressin The lgistic functin (= sigmid curve): σ(w T x) = 1 / (1 + e -wtx ) Transfrms learned functin s.t. it can be interpreted as prbability 36 Jelle Pineau

37 Discriminative learning: Lgistic regressin The lgistic functin (= sigmid curve): σ(w T x) = 1 / (1 + e -wtx ) Transfrms learned functin s.t. it can be interpreted as prbability The decisin bundary is the set f pints fr which a=0. Idea: Directly mdel the lg-dds with a linear functin: a = ln P(x y =1)P(y =1) P(x y = 0)P(y = 0) = w 0 + w 1 x w m x m 37 Jelle Pineau

38 Discriminative learning: Lgistic regressin The lgistic functin (= sigmid curve): σ(w T x) = 1 / (1 + e -wtx ) Transfrms learned functin s.t. it can be interpreted as prbability The decisin bundary is the set f pints fr which a=0. Idea: Directly mdel the lg-dds with a linear functin: a = ln P(x y =1)P(y =1) P(x y = 0)P(y = 0) = w 0 + w 1 x w m x m Hw d we find the weights? Need an bjective functin! 38 Jelle Pineau

39 Fitting the weights Recall: σ(w T x i ) is the prbability that y i =1 (given x i ) 1-σ(w T x i ) be the prbability that y i = 0. Fr y {0, 1}, the likelihd functin, Pr(x 1,y 1,, x n,y h w), is: i=1:n σ(w T x i ) yi (1- σ(w T x i )) (1-yi) (samples are i.i.d.) 39 Jelle Pineau

40 Fitting the weights Recall: σ(w T x i ) is the prbability that y i =1 (given x i ) 1-σ(w T x i ) be the prbability that y i = 0. Fr y {0, 1}, the likelihd functin, Pr(x 1,y 1,, x n,y h w), is: i=1:n σ(w T x i ) yi (1- σ(w T x i )) (1-yi) (samples are i.i.d.) Gal: Minimize the negative lg-likelihd (als called crssentrpy errr functin): - i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) 40 Jelle Pineau

41 Gradient descent fr lgistic regressin Errr fn: Err(w) = - [ i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) ] Take the derivative: 41 Jelle Pineau

42 Gradient descent fr lgistic regressin Errr fn: Err(w) = - [ i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) ] Take the derivative: δlg(σ)/δw=1/σ Err(w)/ w = - [ i=1:n y i (1/σ(w T x i ))(1-σ(w T x i )) σ(w T x i )x i + 42 Jelle Pineau

43 Gradient descent fr lgistic regressin Errr fn: Err(w) = - [ i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) ] Take the derivative: δσ/δw=σ(1-σ) Err(w)/ w = - [ i=1:n y i (1/σ(w T x i ))(1-σ(w T x i )) σ(w T x i )x i + 43 Jelle Pineau

44 Gradient descent fr lgistic regressin Errr fn: Err(w) = - [ i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) ] Take the derivative: δw T x/δw=x Err(w)/ w = - [ i=1:n y i (1/σ(w T x i ))(1-σ(w T x i )) σ(w T x i )x i + 44 Jelle Pineau

45 Gradient descent fr lgistic regressin Errr fn: Err(w) = - [ i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) ] Take the derivative: δ(1-σ)/δw= (1-σ)σ(-1) Err(w)/ w = - [ i=1:n y i (1/σ(w T x i ))(1-σ(w T x i )) σ(w T x i )x i + (1-y i )(1/(1-σ(w T x i )))(1-σ(w T x i ))σ(w T x i )(-1) x i ] 45 Jelle Pineau

46 Gradient descent fr lgistic regressin Errr fn: Err(w) = - [ i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) ] Take the derivative: Err(w)/ w = - [ i=1:n y i (1/σ(w T x i ))(1-σ(w T x i )) σ(w T x i )x i + (1-y i )(1/(1-σ(w T x i )))(1-σ(w T x i ))σ(w T x i )(-1) x i ] = - i=1:n x i (y i (1-σ(w T x i )) - (1-y i )σ(w T x i )) = - i=1:n x i (y i - σ(w T x i )) 46 Jelle Pineau

47 Gradient descent fr lgistic regressin Errr fn: Err(w) = - [ i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) ] Take the derivative: Err(w)/ w = - [ i=1:n y i (1/σ(w T x i ))(1-σ(w T x i )) σ(w T x i )x i + (1-y i )(1/(1-σ(w T x i )))(1-σ(w T x i ))σ(w T x i )(-1) x i ] = - i=1:n x i (y i (1-σ(w T x i )) - (1-y i )σ(w T x i )) = - i=1:n x i (y i - σ(w T x i )) Nw apply iteratively: w k+1 = w k + α k i=1:n x i (y i σ(w kt x i )) 47 Jelle Pineau

48 Gradient descent fr lgistic regressin Errr fn: Err(w) = - [ i=1:n y i lg(σ(w T x i )) + (1-y i )lg(1-σ(w T x i )) ] Take the derivative: Err(w)/ w = - [ i=1:n y i (1/σ(w T x i ))(1-σ(w T x i )) σ(w T x i )x i + (1-y i )(1/(1-σ(w T x i )))(1-σ(w T x i ))σ(w T x i )(-1) x i ] = - i=1:n x i (y i (1-σ(w T x i )) - (1-y i )σ(w T x i )) = - i=1:n x i (y i - σ(w T x i )) Nw apply iteratively: w k+1 = w k + α k i=1:n x i (y i σ(w kt x i )) Can als apply ther iterative methds, e.g. Newtn s methd, crdinate descent, L-BFGS, etc. 48 Jelle Pineau

49 Multi-class classificatin Generally tw ptins: 1. Learn a single classifier that can prduce 20 distinct utput values. 2. Learn 20 different 1-vs-all binary classifiers. 49 Jelle Pineau

50 Multi-class classificatin Generally tw ptins: 1. Learn a single classifier that can prduce 20 distinct utput values. 2. Learn 20 different 1-vs-all binary classifiers. Optin 1 assumes yu have a multi-class versin f the classifier. Fr Naïve Bayes, cmpute P(y x) fr each class, and select the class with highest prbability. 50 Jelle Pineau

51 Multi-class classificatin Generally tw ptins: 1. Learn a single classifier that can prduce 20 distinct utput values. 2. Learn 20 different 1-vs-all binary classifiers. Optin 1 assumes yu have a multi-class versin f the classifier. Fr Naïve Bayes, cmpute P(y x) fr each class, and select the class with highest prbability. Optin 2 applies t all binary classifiers, s mre flexible. But: ften slwer (need t learn many classifiers) creates a class imbalance prblem (say, 5% vs 95% fr 20 classes) what if tw classifiers say belngs t class? Or zer d? 51 Jelle Pineau

52 Cmparing linear classificatin methds Crdinate 2 fr Training Data Technique Errr Rates Training Test Linear regressin Linear discriminant analysis Quadratic discriminant analysis Lgistic regressin Crdinate 1 fr Training Data FIGURE 4.4. A tw-dimensinal plt f the vwel training data. There are eleven classes with X IR 10,andthisisthebestviewintermsfaLDAmdel (Sectin 4.3.3). The heavy circles are the prjected mean vectrs fr each class. The class verlap is cnsiderable. 52 Jelle Pineau

53 Discriminative vs generative Discriminative classifiers ften have less parameters t estimate Discriminative classifiers ften d better but Generative mdel might give us mre insight in data It can tell us when all classes are bad (lw prbability) With many classes, discriminative mdels need t find the decisin bundary between every pair 53 Jelle Pineau

54 What yu shuld knw Naïve Bayes assumptin Lg-dds rati decisin bundary Hw t estimate parameters fr Naïve Bayes Laplace smthing Relatin between Naïve Bayes, LDA, QDA, Gaussian Naïve Bayes. Derivatin f lgistic regressin. Wrth reading further: Relatin between Lgistic regressin and LDA (Hastie et al., 4.4.5) 54 Jelle Pineau

55 What yu shuld knw 55 Jelle Pineau

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Linear Classification

Linear Classification Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016 Maximum A Psteriri (MAP) CS 109 Lecture 22 May 16th, 2016 Previusly in CS109 Game f Estimatrs Maximum Likelihd Nn spiler: this didn t happen Side Plt argmax argmax f lg Mther f ptimizatins? Reviving an

More information

Lecture 8: Multiclass Classification (I)

Lecture 8: Multiclass Classification (I) Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Lecture 8: Multiclass Classificatin (I) Ha Helen Zhang Fall 07 Ha Helen Zhang Lecture 8: Multiclass Classificatin

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

CS 109 Lecture 23 May 18th, 2016

CS 109 Lecture 23 May 18th, 2016 CS 109 Lecture 23 May 18th, 2016 New Datasets Heart Ancestry Netflix Our Path Parameter Estimatin Machine Learning: Frmally Many different frms f Machine Learning We fcus n the prblem f predictin Want

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Trigonometric Ratios Unit 5 Tentative TEST date

Trigonometric Ratios Unit 5 Tentative TEST date 1 U n i t 5 11U Date: Name: Trignmetric Ratis Unit 5 Tentative TEST date Big idea/learning Gals In this unit yu will extend yur knwledge f SOH CAH TOA t wrk with btuse and reflex angles. This extensin

More information

Chapter 3 Kinematics in Two Dimensions; Vectors

Chapter 3 Kinematics in Two Dimensions; Vectors Chapter 3 Kinematics in Tw Dimensins; Vectrs Vectrs and Scalars Additin f Vectrs Graphical Methds (One and Tw- Dimensin) Multiplicatin f a Vectr b a Scalar Subtractin f Vectrs Graphical Methds Adding Vectrs

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Experiment #3. Graphing with Excel

Experiment #3. Graphing with Excel Experiment #3. Graphing with Excel Study the "Graphing with Excel" instructins that have been prvided. Additinal help with learning t use Excel can be fund n several web sites, including http://www.ncsu.edu/labwrite/res/gt/gt-

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Logistic Regression. and Maximum Likelihood. Marek Petrik. Feb

Logistic Regression. and Maximum Likelihood. Marek Petrik. Feb Lgistic Regressin and Maximum Likelihd Marek Petrik Feb 09 2017 S Far in ML Regressin vs Classificatin Linear regressin Bias-variance decmpsitin Practical methds fr linear regressin Simple Linear Regressin

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law Sectin 5.8 Ntes Page 1 5.8 Expnential Grwth and Decay Mdels; Newtn s Law There are many applicatins t expnential functins that we will fcus n in this sectin. First let s lk at the expnential mdel. Expnential

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

CHM112 Lab Graphing with Excel Grading Rubric

CHM112 Lab Graphing with Excel Grading Rubric Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline

More information

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall Stats 415 - Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin

More information

ECE 5318/6352 Antenna Engineering. Spring 2006 Dr. Stuart Long. Chapter 6. Part 7 Schelkunoff s Polynomial

ECE 5318/6352 Antenna Engineering. Spring 2006 Dr. Stuart Long. Chapter 6. Part 7 Schelkunoff s Polynomial ECE 538/635 Antenna Engineering Spring 006 Dr. Stuart Lng Chapter 6 Part 7 Schelkunff s Plynmial 7 Schelkunff s Plynmial Representatin (fr discrete arrays) AF( ψ ) N n 0 A n e jnψ N number f elements in

More information

ELE Final Exam - Dec. 2018

ELE Final Exam - Dec. 2018 ELE 509 Final Exam Dec 2018 1 Cnsider tw Gaussian randm sequences X[n] and Y[n] Assume that they are independent f each ther with means and autcvariances μ ' 3 μ * 4 C ' [m] 1 2 1 3 and C * [m] 3 1 10

More information

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm

More information

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y ) (Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

CONSTRUCTING STATECHART DIAGRAMS

CONSTRUCTING STATECHART DIAGRAMS CONSTRUCTING STATECHART DIAGRAMS The fllwing checklist shws the necessary steps fr cnstructing the statechart diagrams f a class. Subsequently, we will explain the individual steps further. Checklist 4.6

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

THE LIFE OF AN OBJECT IT SYSTEMS

THE LIFE OF AN OBJECT IT SYSTEMS THE LIFE OF AN OBJECT IT SYSTEMS Persns, bjects, r cncepts frm the real wrld, which we mdel as bjects in the IT system, have "lives". Actually, they have tw lives; the riginal in the real wrld has a life,

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

Ecology 302 Lecture III. Exponential Growth (Gotelli, Chapter 1; Ricklefs, Chapter 11, pp )

Ecology 302 Lecture III. Exponential Growth (Gotelli, Chapter 1; Ricklefs, Chapter 11, pp ) Eclgy 302 Lecture III. Expnential Grwth (Gtelli, Chapter 1; Ricklefs, Chapter 11, pp. 222-227) Apcalypse nw. The Santa Ana Watershed Prject Authrity pulls n punches in prtraying its missin in apcalyptic

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1 Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?

More information

ELT COMMUNICATION THEORY

ELT COMMUNICATION THEORY ELT 41307 COMMUNICATION THEORY Matlab Exercise #2 Randm variables and randm prcesses 1 RANDOM VARIABLES 1.1 ROLLING A FAIR 6 FACED DICE (DISCRETE VALIABLE) Generate randm samples fr rlling a fair 6 faced

More information

Preparation work for A2 Mathematics [2017]

Preparation work for A2 Mathematics [2017] Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will

More information

Contents. This is page i Printer: Opaque this

Contents. This is page i Printer: Opaque this Cntents This is page i Printer: Opaque this Supprt Vectr Machines and Flexible Discriminants. Intrductin............. The Supprt Vectr Classifier.... Cmputing the Supprt Vectr Classifier........ Mixture

More information

Introduction to Regression

Introduction to Regression Intrductin t Regressin Administrivia Hmewrk 6 psted later tnight. Due Friday after Break. 2 Statistical Mdeling Thus far we ve talked abut Descriptive Statistics: This is the way my sample is Inferential

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large

More information

Statistical classifiers: Bayesian decision theory and density estimation

Statistical classifiers: Bayesian decision theory and density estimation 3 rd NOSE Shrt Curse Alpbach, st 6 th Mar 004 Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Department f Cmputer Science rgutier@cs.tamu.edu http://research.cs.tamu.edu/prism

More information

AIP Logic Chapter 4 Notes

AIP Logic Chapter 4 Notes AIP Lgic Chapter 4 Ntes Sectin 4.1 Sectin 4.2 Sectin 4.3 Sectin 4.4 Sectin 4.5 Sectin 4.6 Sectin 4.7 4.1 The Cmpnents f Categrical Prpsitins There are fur types f categrical prpsitins. Prpsitin Letter

More information

Inference in the Multiple-Regression

Inference in the Multiple-Regression Sectin 5 Mdel Inference in the Multiple-Regressin Kinds f hypthesis tests in a multiple regressin There are several distinct kinds f hypthesis tests we can run in a multiple regressin. Suppse that amng

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10] EECS 270, Winter 2017, Lecture 3 Page 1 f 6 Medium Scale Integrated (MSI) devices [Sectins 2.9 and 2.10] As we ve seen, it s smetimes nt reasnable t d all the design wrk at the gate-level smetimes we just

More information

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!) The Law f Ttal Prbability, Bayes Rule, and Randm Variables (Oh My!) Administrivia Hmewrk 2 is psted and is due tw Friday s frm nw If yu didn t start early last time, please d s this time. Gd Milestnes:

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change? Name Chem 163 Sectin: Team Number: ALE 21. Gibbs Free Energy (Reference: 20.3 Silberberg 5 th editin) At what temperature des the spntaneity f a reactin change? The Mdel: The Definitin f Free Energy S

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

ECEN 4872/5827 Lecture Notes

ECEN 4872/5827 Lecture Notes ECEN 4872/5827 Lecture Ntes Lecture #5 Objectives fr lecture #5: 1. Analysis f precisin current reference 2. Appraches fr evaluating tlerances 3. Temperature Cefficients evaluatin technique 4. Fundamentals

More information

BLAST / HIDDEN MARKOV MODELS

BLAST / HIDDEN MARKOV MODELS CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)

More information

Data mining/machine learning large data sets. STA 302 or 442 (Applied Statistics) :, 1

Data mining/machine learning large data sets. STA 302 or 442 (Applied Statistics) :, 1 Data mining/machine learning large data sets STA 302 r 442 (Applied Statistics) :, 1 Data mining/machine learning large data sets high dimensinal spaces STA 302 r 442 (Applied Statistics) :, 2 Data mining/machine

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Physics 212. Lecture 12. Today's Concept: Magnetic Force on moving charges. Physics 212 Lecture 12, Slide 1

Physics 212. Lecture 12. Today's Concept: Magnetic Force on moving charges. Physics 212 Lecture 12, Slide 1 Physics 1 Lecture 1 Tday's Cncept: Magnetic Frce n mving charges F qv Physics 1 Lecture 1, Slide 1 Music Wh is the Artist? A) The Meters ) The Neville rthers C) Trmbne Shrty D) Michael Franti E) Radiatrs

More information

Lyapunov Stability Stability of Equilibrium Points

Lyapunov Stability Stability of Equilibrium Points Lyapunv Stability Stability f Equilibrium Pints 1. Stability f Equilibrium Pints - Definitins In this sectin we cnsider n-th rder nnlinear time varying cntinuus time (C) systems f the frm x = f ( t, x),

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes Chemistry 20 Lessn 11 Electrnegativity, Plarity and Shapes In ur previus wrk we learned why atms frm cvalent bnds and hw t draw the resulting rganizatin f atms. In this lessn we will learn (a) hw the cmbinatin

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants Supprt Vectr Machines and Flexible Discriminants This is page Printer: Opaque this. Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal separating

More information

Dead-beat controller design

Dead-beat controller design J. Hetthéssy, A. Barta, R. Bars: Dead beat cntrller design Nvember, 4 Dead-beat cntrller design In sampled data cntrl systems the cntrller is realised by an intelligent device, typically by a PLC (Prgrammable

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b . REVIEW OF SOME BASIC ALGEBRA MODULE () Slving Equatins Yu shuld be able t slve fr x: a + b = c a d + e x + c and get x = e(ba +) b(c a) d(ba +) c Cmmn mistakes and strategies:. a b + c a b + a c, but

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling Lecture 13: Markv hain Mnte arl Gibbs sampling Gibbs sampling Markv chains 1 Recall: Apprximate inference using samples Main idea: we generate samples frm ur Bayes net, then cmpute prbabilities using (weighted)

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

37 Maxwell s Equations

37 Maxwell s Equations 37 Maxwell s quatins In this chapter, the plan is t summarize much f what we knw abut electricity and magnetism in a manner similar t the way in which James Clerk Maxwell summarized what was knwn abut

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

Materials Engineering 272-C Fall 2001, Lecture 7 & 8 Fundamentals of Diffusion

Materials Engineering 272-C Fall 2001, Lecture 7 & 8 Fundamentals of Diffusion Materials Engineering 272-C Fall 2001, Lecture 7 & 8 Fundamentals f Diffusin Diffusin: Transprt in a slid, liquid, r gas driven by a cncentratin gradient (r, in the case f mass transprt, a chemical ptential

More information

15-381/781 Bayesian Nets & Probabilistic Inference

15-381/781 Bayesian Nets & Probabilistic Inference 15-381/781 Bayesian Nets & Prbabilistic Inference Emma Brunskill (this time) Ariel Prcaccia With thanks t Dan Klein (Berkeley), Percy Liang (Stanfrd) and Past 15-381 Instructrs fr sme slide cntent, and

More information