Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter
|
|
- Debra Sparks
- 5 years ago
- Views:
Transcription
1 Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter Iwa State University Department f Statistics Department f Cmputer Science June 24, /24
2 Outline 1 Intrductin t Pattern Recgnitin 2 Bayes classifier and variants 3 k-nearest Neighbr Classifier 4 Crss-validatin Leave-ne-ut Crss-validatin v-fld Crss-Validatin Drawbacks & ther methds 5 Beynd simple linear regressin: Shrinkage methds Ridge regressin LASSO and variable selectin 2/24
3 What is Pattern Recgnitin? Devrye, Györfy & Lugsi (1996): Pattern recgnitin r discriminatin is abut guessing r predicting the unknwn nature f an bservatin, a discrete quantity such as black r white, ne r zer, sick r healthy, real r fake. 3/24
4 Tw imprtant cncepts: Bias and Variance 4/24
5 Bayes classifier and variants 1 Intrductin t Pattern Recgnitin 2 Bayes classifier and variants 3 k-nearest Neighbr Classifier 4 Crss-validatin Leave-ne-ut Crss-validatin v-fld Crss-Validatin Drawbacks & ther methds 5 Beynd simple linear regressin: Shrinkage methds Ridge regressin LASSO and variable selectin 5/24
6 Bayes rule: An intuitive classifier Key idea: Assign an bservatin x t class k such that P[Y = k X = x] is largest. 6/24
7 Bayes rule: An intuitive classifier Key idea: Assign an bservatin x t class k such that P[Y = k X = x] is largest. Prblem I: Hw t determine P[Y = k X = x]? 6/24
8 Bayes rule: An intuitive classifier Key idea: Assign an bservatin x t class k such that P[Y = k X = x] is largest. Prblem I: Hw t determine P[Y = k X = x]? This prbability is usually nt knwn/nt readily available in practice. 6/24
9 Bayes rule: An intuitive classifier Key idea: Assign an bservatin x t class k such that P[Y = k X = x] is largest. Prblem I: Hw t determine P[Y = k X = x]? This prbability is usually nt knwn/nt readily available in practice. Bayes rule (therem): P[Y = k X = x] }{{} psterir prbability = = P[Y = k]p[x = x Y = k] P[X = x] P[Y = k]p[x = x Y = k] K i=1p[y = i]p[x = x Y = i] 6/24
10 Bayes rule: An intuitive classifier Key idea: Assign an bservatin x t class k such that P[Y = k X = x] is largest. Prblem I: Hw t determine P[Y = k X = x]? This prbability is usually nt knwn/nt readily available in practice. Bayes rule (therem): P[Y = k X = x] }{{} psterir prbability r fr cntinuus randm variables X = = P[Y = k X = x] = P[Y = k]p[x = x Y = k] P[X = x] P[Y = k]p[x = x Y = k] K i=1p[y = i]p[x = x Y = i] P[Y = k]f(x = x Y = k) K i=1 P[Y = i]f(x = x Y = i) 6/24
11 Bayes rule: An intuitive classifier Since lg is a strictly mntnic functin and the denminatr des nt depend n k, assign class k s.t. the psterir prbability is maximized 7/24
12 Bayes rule: An intuitive classifier Since lg is a strictly mntnic functin and the denminatr des nt depend n k, assign class k s.t. the psterir prbability is maximized Ŷ(x) = argmaxlgp[y = k X = x] k = argmaxlg[p[y = k]f(x = x Y = k)] k = argmax[lgp[y = k]+lgf(x = x Y = k)] k 7/24
13 Bayes rule: An intuitive classifier Since lg is a strictly mntnic functin and the denminatr des nt depend n k, assign class k s.t. the psterir prbability is maximized Ŷ(x) = argmaxlgp[y = k X = x] k = argmaxlg[p[y = k]f(x = x Y = k)] k = argmax[lgp[y = k]+lgf(x = x Y = k)] k Prblem II: Still 2 unknwn quantities (P[Y = k] and f(x = x Y = k)) 7/24
14 Bayes rule: An intuitive classifier Since lg is a strictly mntnic functin and the denminatr des nt depend n k, assign class k s.t. the psterir prbability is maximized Ŷ(x) = argmaxlgp[y = k X = x] k = argmaxlg[p[y = k]f(x = x Y = k)] k = argmax[lgp[y = k]+lgf(x = x Y = k)] k Prblem II: Still 2 unknwn quantities (P[Y = k] and f(x = x Y = k)) P[Y = k] = n k n and fr f(x = x Y = k), assume a density (usually nrmal) r estimate using kernel density estimatin (r histgram) 7/24
15 Bayes rule: An intuitive classifier Assume f(x = x Y = k) multivariate nrmal with equal variance-cvariance matrix fr each class LDA 8/24
16 Bayes rule: An intuitive classifier Assume f(x = x Y = k) multivariate nrmal with equal variance-cvariance matrix fr each class LDA f(x = x Y = k) multivariate nrmal with unequal variance-cvariance matrix fr each class QDA 8/24
17 Bayes rule: An intuitive classifier Assume f(x = x Y = k) multivariate nrmal with equal variance-cvariance matrix fr each class LDA f(x = x Y = k) multivariate nrmal with unequal variance-cvariance matrix fr each class QDA feature X i is cnditinally independent f every ther feature X j fr i j given class k Naive Bayes Classifier 8/24
18 Example: QDA vs. naive Bayes X X X1 X1 (a) QDA (b) Naive Bayes 9/24
19 k-nearest Neighbr Classifier 1 Intrductin t Pattern Recgnitin 2 Bayes classifier and variants 3 k-nearest Neighbr Classifier 4 Crss-validatin Leave-ne-ut Crss-validatin v-fld Crss-Validatin Drawbacks & ther methds 5 Beynd simple linear regressin: Shrinkage methds Ridge regressin LASSO and variable selectin 10/24
20 k-nearest Neighbr Classifier Figure: Illustratin f k-nearest Neighbr Classifier. Taken frm Gareth, Witten, Hastie & Tibshirani, An Intrductin t Statistical Learning with Applicatins in R, Springer, /24
21 Effect f k KNN: K=1 KNN: K=100 Figure: Effect f k in the Nearest Neighbr Classifier. Taken frm Gareth, Witten, Hastie & Tibshirani, An Intrductin t Statistical Learning with Applicatins in R, Springer, /24
22 Effect f k KNN: K=1 KNN: K=100 Figure: Effect f k in the Nearest Neighbr Classifier. Taken frm Gareth, Witten, Hastie & Tibshirani, An Intrductin t Statistical Learning with Applicatins in R, Springer, 2013 Hw t chse k? 12/24
23 Crss-validatin 1 Intrductin t Pattern Recgnitin 2 Bayes classifier and variants 3 k-nearest Neighbr Classifier 4 Crss-validatin Leave-ne-ut Crss-validatin v-fld Crss-Validatin Drawbacks & ther methds 5 Beynd simple linear regressin: Shrinkage methds Ridge regressin LASSO and variable selectin 13/24
24 Leave-ne-ut Crss-validatin Figure: Leave-ne-ut Crss-Validatin idea. Taken frm Gareth, Witten, Hastie & Tibshirani, An Intrductin t Statistical Learning with Applicatins in R, Springer, /24
25 Leave-ne-ut Crss-validatin Lw bias, high variance Figure: Leave-ne-ut Crss-Validatin idea. Taken frm Gareth, Witten, Hastie & Tibshirani, An Intrductin t Statistical Learning with Applicatins in R, Springer, /24
26 v-fld Crss-Validatin Figure: v-fld Crss-Validatin idea. Taken frm Gareth, Witten, Hastie & Tibshirani, An Intrductin t Statistical Learning with Applicatins in R, Springer, /24
27 v-fld Crss-Validatin higher bias, lw variance Figure: v-fld Crss-Validatin idea. Taken frm Gareth, Witten, Hastie & Tibshirani, An Intrductin t Statistical Learning with Applicatins in R, Springer, /24
28 Crss-validatin Crss-validatin gives an ESTIMATE f the predictin errr 16/24
29 Crss-validatin Crss-validatin gives an ESTIMATE f the predictin errr Figure: 10-fld CV errr (black), test errr (brwn) and training errr (blue). Taken frm Gareth, Witten, Hastie & Tibshirani, An Intrductin t Statistical Learning with Applicatins in R, Springer, /24
30 Drawbacks & ther methds Easy t implement Can be applied t many situatins Data-driven N explicit variance estimate f the errrs needed 17/24
31 Drawbacks & ther methds Easy t implement Can be applied t many situatins Data-driven N explicit variance estimate f the errrs needed Cmputatinally expensive slw cnvergence 17/24
32 Drawbacks & ther methds Easy t implement Can be applied t many situatins Data-driven N explicit variance estimate f the errrs needed Cmputatinally expensive slw cnvergence There exist ther methds e.g. cmplexity criteria (AIC, BIC, Mallw s C p,...), generalized CV, etc. 17/24
33 Beynd simple linear regressin: Shrinkage methds 1 Intrductin t Pattern Recgnitin 2 Bayes classifier and variants 3 k-nearest Neighbr Classifier 4 Crss-validatin Leave-ne-ut Crss-validatin v-fld Crss-Validatin Drawbacks & ther methds 5 Beynd simple linear regressin: Shrinkage methds Ridge regressin LASSO and variable selectin 18/24
34 Shrinkage methds When p > n, OLS des nt have a unique slutin. Remember OLS (ˆβ 0,..., ˆβ p ) = argmin β 0,...,β p R p+1 n Y i β 0 i=1 p β i x ij j= /24
35 Shrinkage methds When p > n, OLS des nt have a unique slutin. Remember OLS (ˆβ 0,..., ˆβ p ) = argmin β 0,...,β p R p+1 In matrix frm: β = (X T X) 1 X T y n Y i β 0 i=1 p β i x ij j=1 y = (Y 1,...,Y n ) T,β = (β 0,...,β p ) T 2. and 1 x 11 x 1p X =... 1 x n1 x np 19/24
36 Shrinkage methds When p > n, OLS des nt have a unique slutin. Remember OLS (ˆβ 0,..., ˆβ p ) = argmin β 0,...,β p R p+1 In matrix frm: β = (X T X) 1 X T y n Y i β 0 i=1 p β i x ij j=1 y = (Y 1,...,Y n ) T,β = (β 0,...,β p ) T 2. and 1 x 11 x 1p X =... 1 x n1 x np X T X will be (clse t) singular and hence nt invertible 19/24
37 Ridge regressin Slutin t previus prblem: make matrix X T X invertible by adding a little nise n the main diagnal 20/24
38 Ridge regressin Slutin t previus prblem: make matrix X T X invertible by adding a little nise n the main diagnal ˆβ ridge = (X T X+λI p ) 1 X T y, λ 0. 20/24
39 Ridge regressin Slutin t previus prblem: make matrix X T X invertible by adding a little nise n the main diagnal ˆβ ridge = (X T X+λI p ) 1 X T y, λ 0. This crrespnds t slving the fllwing penalized residual sums f squares n ( p ) 2 p ˆβ ridge = argmin Y β 0,...,β p i β 0 x ij β j +λ βj 2. i=1 j=1 j=1 20/24
40 Ridge regressin Slutin t previus prblem: make matrix X T X invertible by adding a little nise n the main diagnal ˆβ ridge = (X T X+λI p ) 1 X T y, λ 0. This crrespnds t slving the fllwing penalized residual sums f squares n ( p ) 2 p ˆβ ridge = argmin Y β 0,...,β p i β 0 x ij β j +λ βj 2. i=1 j=1 j=1 as λ 0, ˆβ ridge ˆβ OLS as λ, ˆβ ridge 0 shrinks the clumn space f X having small variance the mst λ depends n σ 2 e,p and β 20/24
41 Ridge regressin Slutin t previus prblem: make matrix X T X invertible by adding a little nise n the main diagnal ˆβ ridge = (X T X+λI p ) 1 X T y, λ 0. This crrespnds t slving the fllwing penalized residual sums f squares n ( p ) 2 p ˆβ ridge = argmin Y β 0,...,β p i β 0 x ij β j +λ βj 2. i=1 j=1 j=1 as λ 0, ˆβ ridge ˆβ OLS as λ, ˆβ ridge 0 shrinks the clumn space f X having small variance the mst λ depends n σ 2 e,p and β Find λ via crss-validatin 20/24
42 Why Ridge regressin? Therem (Existence Therem (Herl & Kennard, 1970)) There always exist a λ s.t. that TMSE(ˆβ ridge ) < TMSE(ˆβ OLS ) 21/24
43 Why Ridge regressin? Therem (Existence Therem (Herl & Kennard, 1970)) There always exist a λ s.t. that TMSE(ˆβ ridge ) < TMSE(ˆβ OLS ) Figure: bias T bias and trace(variance) f ridge regressin 21/24
44 The existence therem in practice Y th rder plynmial regressin x data OLS Ridge, λ = 0.1 OLS ceff. Ridge ceff /24
45 LASSO and variable selectin (Tibshirani, 1996) n ( ˆβ lass = argmin Y β 0,...,β p i β 0 i=1 p ) 2 x ij β j +λ j=1 p β j j=1 23/24
46 LASSO and variable selectin (Tibshirani, 1996) n ( ˆβ lass = argmin Y β 0,...,β p i β 0 i=1 p ) 2 x ij β j +λ j=1 N clsed frm, numerical ptimizatin needed cnvex prblem Sets certain cefficients β exactly equal t zer λ can be determined via crss-validatin p β j j=1 23/24
47 LASSO and variable selectin (Tibshirani, 1996) n ( ˆβ lass = argmin Y β 0,...,β p i β 0 i=1 p ) 2 x ij β j +λ j=1 N clsed frm, numerical ptimizatin needed cnvex prblem Sets certain cefficients β exactly equal t zer λ can be determined via crss-validatin p β j j=1 23/24
48 References Bühlmann, P & van de Geer S. (2011), Statistics fr High-Dimensinal Data: Methds, Thery and Applicatins, Springer Devrye L., Györfi L. & Lugsi G. (1996). A Prbabilistic Thery f Pattern Recgnitin, Springer Gareth J., Witten D., Hastie T. & Tibshirani R. (2013). An Intrductin t Statistical Learning with Applicatins in R, Springer Hastie T., Tibshirani R. & Friedman J. (2009), The Elements f Statistical Learning: Data Mining, Inference, and Predictin (2nd Ed.), Springer Herl A.E. & Kennard R. (1974), Ridge regressin: biased estimatin fr nnrthgnal prblems, Technmetrics, 12: Györfi L, Khler M., Krzyżak A. & Walk H. (2002). A Distributin-Free Thery f Nnparametric Regressin, Springer Hampel F. R., Rnchetti E. M., Russeeuw P. J. & Werner A. (1986), Rbust Statistics: The Apprach Based n Influence Functins, Jhn Wiley & Sns Silverman B. W., Density Estimatin fr Statistics and Data Analysis, Chapman & Hall, 1986 Simnff J.S. (1996), Smthing Methds in Statistics, Springer R. Tibshirani (1996), Regressin shrinkage and selectin via the lass, J. Ryal. Statist. Sc B., 58(1): /24
Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017
Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with
More informationSimple Linear Regression (single variable)
Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins
More informationCOMP 551 Applied Machine Learning Lecture 4: Linear classification
COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted
More informationResampling Methods. Chapter 5. Chapter 5 1 / 52
Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and
More informationBootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >
Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551
More informationk-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels
Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t
More informationLinear Classification
Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we
More information3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression
3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets
More informationWhat is Statistical Learning?
What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,
More informationx 1 Outline IAML: Logistic Regression Decision Boundaries Example Data
Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationSmoothing, penalized least squares and splines
Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin
More informationPart 3 Introduction to statistical classification techniques
Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms
More informationIAML: Support Vector Machines
1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int
More informationDistributions, spatial statistics and a Bayesian perspective
Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics
More informationCOMP 551 Applied Machine Learning Lecture 11: Support Vector Machines
COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse
More informationInternal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.
Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.
More informationModule 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics
Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm
More informationPattern Recognition 2014 Support Vector Machines
Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft
More informationStats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall
Stats 415 - Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin
More informationStatistical classifiers: Bayesian decision theory and density estimation
3 rd NOSE Shrt Curse Alpbach, st 6 th Mar 004 Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Department f Cmputer Science rgutier@cs.tamu.edu http://research.cs.tamu.edu/prism
More informationTree Structured Classifier
Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients
More informationElements of Machine Intelligence - I
ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin
More informationChapter 3: Cluster Analysis
Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA
More informationT Algorithmic methods for data mining. Slide set 6: dimensionality reduction
T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,
More informationThe blessing of dimensionality for kernel methods
fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented
More information4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression
4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw
More informationDiscussion on Regularized Regression for Categorical Data (Tutz and Gertheiss)
Discussin n Regularized Regressin fr Categrical Data (Tutz and Gertheiss) Peter Bühlmann, Ruben Dezeure Seminar fr Statistics, Department f Mathematics, ETH Zürich, Switzerland Address fr crrespndence:
More informationLecture 8: Multiclass Classification (I)
Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Lecture 8: Multiclass Classificatin (I) Ha Helen Zhang Fall 07 Ha Helen Zhang Lecture 8: Multiclass Classificatin
More informationCOMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)
COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise
More information, which yields. where z1. and z2
The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin
More informationIn SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:
In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin
More informationChapter 15 & 16: Random Forests & Ensemble Learning
Chapter 15 & 16: Randm Frests & Ensemble Learning DD3364 Nvember 27, 2012 Ty Prblem fr Bsted Tree Bsted Tree Example Estimate this functin with a sum f trees with 9-terminal ndes by minimizing the sum
More informationThe general linear model and Statistical Parametric Mapping I: Introduction to the GLM
The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design
More informationCN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)
CN700 Additive Mdels and Trees Chapter 9: Hastie et al. (2001) Madhusudana Shashanka Department f Cgnitive and Neural Systems Bstn University CN700 - Additive Mdels and Trees March 02, 2004 p.1/34 Overview
More informationA Matrix Representation of Panel Data
web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins
More informationComparing Several Means: ANOVA. Group Means and Grand Mean
STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal
More informationSupport-Vector Machines
Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material
More informationLinear programming III
Linear prgramming III Review 1/33 What have cvered in previus tw classes LP prblem setup: linear bjective functin, linear cnstraints. exist extreme pint ptimal slutin. Simplex methd: g thrugh extreme pint
More informationSupport Vector Machines and Flexible Discriminants
12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal
More informationAdmissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs
Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department
More informationLeast Squares Optimal Filtering with Multirate Observations
Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical
More informationPrincipal Components
Principal Cmpnents Suppse we have N measurements n each f p variables X j, j = 1,..., p. There are several equivalent appraches t principal cmpnents: Given X = (X 1,... X p ), prduce a derived (and small)
More informationEnhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme
Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr
More informationLogistic Regression. and Maximum Likelihood. Marek Petrik. Feb
Lgistic Regressin and Maximum Likelihd Marek Petrik Feb 09 2017 S Far in ML Regressin vs Classificatin Linear regressin Bias-variance decmpsitin Practical methds fr linear regressin Simple Linear Regressin
More informationBayesian nonparametric modeling approaches for quantile regression
Bayesian nnparametric mdeling appraches fr quantile regressin Athanasis Kttas Department f Applied Mathematics and Statistics University f Califrnia, Santa Cruz Department f Statistics Athens University
More informationLinear Methods for Regression
3 Linear Methds fr Regressin This is page 43 Printer: Opaque this 3.1 Intrductin A linear regressin mdel assumes that the regressin functin E(Y X) is linear in the inputs X 1,...,X p. Linear mdels were
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMaximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016
Maximum A Psteriri (MAP) CS 109 Lecture 22 May 16th, 2016 Previusly in CS109 Game f Estimatrs Maximum Likelihd Nn spiler: this didn t happen Side Plt argmax argmax f lg Mther f ptimizatins? Reviving an
More informationLocalized Model Selection for Regression
Lcalized Mdel Selectin fr Regressin Yuhng Yang Schl f Statistics University f Minnesta Church Street S.E. Minneaplis, MN 5555 May 7, 007 Abstract Research n mdel/prcedure selectin has fcused n selecting
More informationHow do we solve it and what does the solution look like?
Hw d we slve it and what des the slutin l lie? KF/PFs ffer slutins t dynamical systems, nnlinear in general, using predictin and update as data becmes available. Tracing in time r space ffers an ideal
More information7 TH GRADE MATH STANDARDS
ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,
More informationMath Foundations 20 Work Plan
Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant
More informationMATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank
MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use
More informationinitially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur
Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract
More informationBuilding to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.
Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationPSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa
There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the
More informationCAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank
CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal
More informationEASTERN ARIZONA COLLEGE Introduction to Statistics
EASTERN ARIZONA COLLEGE Intrductin t Statistics Curse Design 2014-2015 Curse Infrmatin Divisin Scial Sciences Curse Number PSY 220 Title Intrductin t Statistics Credits 3 Develped by Adam Stinchcmbe Lecture/Lab
More informationQuantum Harmonic Oscillator, a computational approach
IOSR Jurnal f Applied Physics (IOSR-JAP) e-issn: 78-4861.Vlume 7, Issue 5 Ver. II (Sep. - Oct. 015), PP 33-38 www.isrjurnals Quantum Harmnic Oscillatr, a cmputatinal apprach Sarmistha Sahu, Maharani Lakshmi
More informationBiplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint
Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:
More informationA Scalable Recurrent Neural Network Framework for Model-free
A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee
More informationSlide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons
Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large
More informationMidwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter
Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline
More informationSIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.
SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State
More informationContents. This is page i Printer: Opaque this
Cntents This is page i Printer: Opaque this Supprt Vectr Machines and Flexible Discriminants. Intrductin............. The Supprt Vectr Classifier.... Cmputing the Supprt Vectr Classifier........ Mixture
More informationSome Theory Behind Algorithms for Stochastic Optimization
Sme Thery Behind Algrithms fr Stchastic Optimizatin Zelda Zabinsky University f Washingtn Industrial and Systems Engineering May 24, 2010 NSF Wrkshp n Simulatin Optimizatin Overview Prblem frmulatin Theretical
More informationSource Coding and Compression
Surce Cding and Cmpressin Heik Schwarz Cntact: Dr.-Ing. Heik Schwarz heik.schwarz@hhi.fraunhfer.de Heik Schwarz Surce Cding and Cmpressin September 22, 2013 1 / 60 PartI: Surce Cding Fundamentals Heik
More informationArtificial Neural Networks MLP, Backpropagation
Artificial Neural Netwrks MLP, Backprpagatin 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000
More informationEDA Engineering Design & Analysis Ltd
EDA Engineering Design & Analysis Ltd THE FINITE ELEMENT METHOD A shrt tutrial giving an verview f the histry, thery and applicatin f the finite element methd. Intrductin Value f FEM Applicatins Elements
More informationAP Statistics Notes Unit Two: The Normal Distributions
AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).
More informationOnline Model Racing based on Extreme Performance
Online Mdel Racing based n Extreme Perfrmance Tiantian Zhang, Michael Gergipuls, Gergis Anagnstpuls Electrical & Cmputer Engineering University f Central Flrida Overview Racing Algrithm Offline vs Online
More informationProbability, Random Variables, and Processes. Probability
Prbability, Randm Variables, and Prcesses Prbability Prbability Prbability thery: branch f mathematics fr descriptin and mdelling f randm events Mdern prbability thery - the aximatic definitin f prbability
More informationCHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.
MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the
More informationFunctional Form and Nonlinearities
Sectin 6 Functinal Frm and Nnlinearities This is a gd place t remind urselves f Assumptin #0: That all bservatins fllw the same mdel. Levels f measurement and kinds f variables There are (at least) three
More informationModeling of ship structural systems by events
UNIVERISTY OF SPLIT FACULTY OF ELECTRICAL ENGINEERING, MECHANICAL ENGINEERING AND NAVAL ARCHITECTURE Mdeling ship structural systems by events Brank Blagjević Mtivatin Inclusin the cncept entrpy rm inrmatin
More informationLecture 10, Principal Component Analysis
Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal
More informationSupport Vector Machines and Flexible Discriminants
Supprt Vectr Machines and Flexible Discriminants This is page Printer: Opaque this. Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal separating
More informationCOMP9444 Neural Networks and Deep Learning 3. Backpropagation
COMP9444 Neural Netwrks and Deep Learning 3. Backprpagatin Tetbk, Sectins 4.3, 5.2, 6.5.2 COMP9444 17s2 Backprpagatin 1 Outline Supervised Learning Ockham s Razr (5.2) Multi-Layer Netwrks Gradient Descent
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours
STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university
More informationIntroduction to Regression
Intrductin t Regressin Administrivia Hmewrk 6 psted later tnight. Due Friday after Break. 2 Statistical Mdeling Thus far we ve talked abut Descriptive Statistics: This is the way my sample is Inferential
More informationFunction notation & composite functions Factoring Dividing polynomials Remainder theorem & factor property
Functin ntatin & cmpsite functins Factring Dividing plynmials Remainder therem & factr prperty Can d s by gruping r by: Always lk fr a cmmn factr first 2 numbers that ADD t give yu middle term and MULTIPLY
More informationStatistics, Numerical Models and Ensembles
Statistics, Numerical Mdels and Ensembles Duglas Nychka, Reinhard Furrer,, Dan Cley Claudia Tebaldi, Linda Mearns, Jerry Meehl and Richard Smith (UNC). Spatial predictin and data assimilatin Precipitatin
More informationComputational Statistics
Cmputatinal Statistics Spring 2008 Peter Bühlmann and Martin Mächler Seminar für Statistik ETH Zürich February 2008 (February 23, 2011) ii Cntents 1 Multiple Linear Regressin 1 1.1 Intrductin....................................
More informationModelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA
Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview
More informationHiding in plain sight
Hiding in plain sight Principles f stegangraphy CS349 Cryptgraphy Department f Cmputer Science Wellesley Cllege The prisners prblem Stegangraphy 1-2 1 Secret writing Lemn juice is very nearly clear s it
More informationMATHEMATICS SYLLABUS SECONDARY 5th YEAR
Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE
More information22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion
.54 Neutrn Interactins and Applicatins (Spring 004) Chapter (3//04) Neutrn Diffusin References -- J. R. Lamarsh, Intrductin t Nuclear Reactr Thery (Addisn-Wesley, Reading, 966) T study neutrn diffusin
More informationINSTRUMENTAL VARIABLES
INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment
More informationYou need to be able to define the following terms and answer basic questions about them:
CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f
More informationOptimization of frequency quantization. VN Tibabishev. Keywords: optimization, sampling frequency, the substitution frequencies.
UDC 519.21 Otimizatin f frequency quantizatin VN Tibabishev Asvt51@nard.ru We btain the functinal defining the rice and quality f samle readings f the generalized velcities. It is shwn that the timal samling
More informationGroup Report Lincoln Laboratory. The Angular Resolution of Multiple Target. J. R. Sklar F. C. Schweppe. January 1964
Grup Reprt 1964-2 The Angular Reslutin f Multiple Target J. R. Sklar F. C. Schweppe January 1964 Prepared. tract AF 19 (628)-500 by Lincln Labratry MASSACHUSETTS INSTITUTE OF TECHNOLOGY Lexingtn, M The
More informationMethods for Determination of Mean Speckle Size in Simulated Speckle Pattern
0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationKinetics of Particles. Chapter 3
Kinetics f Particles Chapter 3 1 Kinetics f Particles It is the study f the relatins existing between the frces acting n bdy, the mass f the bdy, and the mtin f the bdy. It is the study f the relatin between
More informationData Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1
Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?
More informationCS 109 Lecture 23 May 18th, 2016
CS 109 Lecture 23 May 18th, 2016 New Datasets Heart Ancestry Netflix Our Path Parameter Estimatin Machine Learning: Frmally Many different frms f Machine Learning We fcus n the prblem f predictin Want
More information