Weighted Principal Support Vector Machines for Sufficient Dimension Reduction in Binary Classification

Size: px
Start display at page:

Download "Weighted Principal Support Vector Machines for Sufficient Dimension Reduction in Binary Classification"

Transcription

1 s for Sufficient Dimension Reduction in Binary Classification A joint work with Seung Jun Shin, Hao Helen Zhang and Yufeng Liu

2 Outline Introduction 1 Introduction

3 Outline Introduction 1 Introduction

4 Sufficient Dimension Reduction For a given pair of (Y,X) R R p, Sufficient Dimension Reduction (SDR) seeks a matrix B = (b 1,,b d ) R p d which satisfies Y X B X. (1)

5 Central Subspace Dimension Reduction Subspace (DRS) is defined by span(b) R p. Central Subspace Central Subspace, S Y X is the intersection of all DRSes. S Y X has a minimum dimension among all DRS and uniquely exists under very mild conditions. (Cook, 1998, Prop. 6.4) We assume S Y X = span(b). The dimension of S Y X, d, is called the structure dimension.

6 Estimation of S Y X Seminal paper in Early K-C Li (1991) Sliced Inverse Regression for Dimension Reduction (with discussion). JASA, 86, Many other methods: Sliced Average Variance Estimation (SAVE, 1991) Principal Hessian Directions (phd, 1992) Contour Regression (2005) Fourier-Transformation-Based Estimation (2005) Directional Regression (2007) Cumulative Sliced Regression (CUME; 2008) and many others...

7 Sliced Inverse Regression Foundation of SIR Under the linearity condition, E(Z Y) S Y Z = Σ 1/2 S Y X. where Z = Σ 1/2 {X E(X)}. Slice response into H non-overlapping intervals, I 1,,I H, ˆm h := E n (Z Y I h ) = 1 n h Y I h z i, h = 1,,H. B is estimated by premultiplying first d leading eigenvectors of H h=1 ˆm h ˆm h by ˆΣ 1/2.

8 SIR with Binary Response If Y { 1,1} is binary: Only one possible choice to slice. I 1 = {i : y i = 1} and I 2 = {i : y i = 1} Associated z 1 and z 2 are linearly dependent since z n = 0. SIR can estimate at most ONE direction.

9 Illustration to Wisconsin Diagnostic Breast Cancer Data y SIR SAVE x %*% b x %*% b x %*% b1 (a) SIR (Y vs. ˆb 1 X) (b) SAVE (ˆb 1 X vs. ˆb 2 X)

10 Outline Introduction 1 Introduction

11 Principal Support Vector Machine For (Y,X) R R p, PSVMs (Li et al., 2011; AOS) solve the following SVM-like problem: (a 0,c,b 0,c ) = argmin b } {{ Σb } λe [ 1 Ỹc(ab (X EX)) ] a,b }{{}. Var(b X) f(x) - Ỹ c = 1{Y c} 1{Y < c} for a given constant c. - Σ = cov(x) - [u] = max(0,u). Foundation of the PSVM Under linearity condition, b 0,c S Y X for any given c.

12 PSVM: Sample Estimation Given a set of data (X i,y i ),i = 1,,n: 1. For a given grid miny i < c 1 < < c H < maxy i, solve a sequence of PSVMs for different values of c h : (â n,h,ˆb n,h ) = argminb ˆΣn b λ a,b n 2. First k leading eigenvectors of estimate the basis set of S Y X. n [ 1 Ỹi,c h (ab (X i X ] n )) i=1 ˆM H L n = ˆb n,hˆb n,h. h=1.

13 PSVM: Remarks Pros: Outperforms SIR. Can be extended to kernel PSVM to handle nonlinear SDR. Cons: Estimates only one direction if Y is binary.

14 s Toward SDR with binary Y, WPSVM minimizes { [ ] } Λ π (θ) = β Σβ λe π(y) 1 Y{αβ (X EX)}. - θ = (α,β ) R R p. - π(y) = 1 π if Y = 1 and π otherwise for a given π (0,1). - Y itself is binary (no need Ỹc). θ 0,π = (α 0,π,β 0,π ) = argmin θ Λ π (θ). Foundation of the Weighted PSVM Under linearity condition, β 0,π S Y X for any given π (0,1).

15 Sample Estimation Given (X i,y i ) R p {1, 1},i= 1,,n: 1. For a given grid of π, 0 < π 1 < < π H < 1, solve a sequence of WPSVMs ˆΛ n,πh (θ) = β ˆΣn β λ n n π h (Y i )[1 Y i (β (X i X n )], i=1 and let ˆθ n,h = (ˆα n,h, ˆβ n,h ) = argmin θ ˆΛn,πh (θ). 2. First k leading eigenvectors of the WPSVM candidate matrix ˆM WL n = estimate the basis set of S Y X. H h=1 ˆβ n,hˆβ n,h

16 Computation Let η = U i = ˆΣ 1/2 n β. ˆΣ 1/2 n (X i X n). The WPSVM objective function ˆΛ n,πh (θ) becomes η η λ n n π h (Y i ) [ 1 Y i (αη U i ) ]. i=1 Equivalent to solve the linear WSVM w.r.t (U i,y i ). Solve WSVM H times for different weights of π h,h = 1,,H.

17 π-path Introduction Wang et al. (2008, Biometrika) show that the WSVM solutions move piecewise-linearly as a function of π. Shin et al. (2012, JCGS) implemented the π-path algorithm in R while developing a two-dimensional solution surface for weighted SVMs. Solution paths of the linear WPSVM β^n π

18 Asymptotic Results (1) Standard approach based on M-estimation scheme. Similar to the results for the linear SVM: - Koo et al., 2008; JMLR - Jiang et al., 2008; JMLR Consistency of ˆθ n Suppose Σ is positive definite, ˆθ n θ 0 in probability.

19 Asymptotic Results (2) Asymptotic Normality of ˆθ n (A Bahadur Representation) Under some regularity conditions to ensure the existence of both Gradient vector D θ and Hessian matrix H θ of Λ π (θ), where n(ˆθn θ 0 ) = n 1/2 H 1 θ 0 n D θ0 (Z i )o p (1), i=1 D θ (Z) = (0,2Σβ) λ[π(y) XY 1{θ XY < 1}] and H θ = 2diag(0,Σ) λ P(Y = y)π(y)f β X Y(y α y)e( X X θ X = y), y= 1,1 with X = (1,X ).

20 Asymptotic Results (3) For a given grid of π 1 < < π H, we define the population WPSVM kernel matrix M WL 0 = H β 0,h β 0,h. h=1 Asymptotic Normality of ˆMn Suppose rank(m WL 0 ) = k. Under the regularity conditions, } n WL {vec( ˆM n ) vec(mwl 0 ) N(0,Σ M ), where Σ M is explicitly provided. Asymptotic normality of eigenvectors of ˆMWL n normality of ˆMn. (Bura & Pfeiffer, 2008) is followed by the

21 Structure Dimensionality k Selection We estimate k as: ˆk = argmax k k {1,,p} j=1 υ j ρ klogn n υ 1, where υ 1 υ p are eigenvalues of ˆMn. Then lim P(ˆk = k) = 1. n

22 Outline Introduction 1 Introduction

23 Nonlinear SDR Nonlinear SDR assumes Y X φ(x). φ : R p R k is an arbitrary function of X which lives on H, a Hilbert space of functions of X. SDR is achieved by estimating φ.

24 Kernel WPSVM: Objective Function Kernel WPSVM objective function is Λ π (α,ψ) = var(ψ(x))λe{π(y)[1 Y(aψ(X) Eψ(X))] } Kernel WPSVM solves (α 0,π,ψ 0,π ) = argmin Λ π (α,ψ). α R,ψ H

25 Kernel WPSVM: Foundation Foundation of the Kernel WPSVM For a given π, ψ 0,π has a version that is σ{φ(x)}-measurable. Roughly speaking, ψ 0,π is a function of φ. It is a nonlinear-generalization of linear SDR: β 0,π S Y X = span(b) β 0,πX is a linear function of B X.

26 Kernel WPSVM: Sample Estimation Use Reproducing Kernel Hilbert Space. Using a linear operator Σ : ψ 1,Σψ 2 H = cov{ψ 1 (X),ψ 2 (X)}, Λ π (α,ψ) = ψ,σψ H λe{π(y)[1 Y(aψ(X) Eψ(X))] }. Li et al. (2011) proposed to use the first d leading eigenfunctions of the operator Σ n : H H such that as a basis set. ψ 1,Σ n ψ 2 H = cov n (ψ 1 (X),ψ 2 (X)), By proposition 2 in Li et al. (2011), ω j (X),j = 1,,d can be readily obtained by eigen-decomposition of (I n J n )K n (I n J n ). We chose d n/4.

27 Kernel WPSVM: Sample Estimation Sample version of Λ π (α,ψ) is ˆΛ n,π (α,γ) = γ Ω Ωγ λ n π(y i ) [ 1 Y i {αγ Ω i } ]. i=1 ω 1,,ω d be the first d leading eigenfunctions of the operator Σ n. Then, ω1(x 1 ) ωd (X 1) Ω = ω1 (X n) ωd (X n) where ω j (X) = ω j(x) n 1 n i=1 ω j(x i ).

28 Kernel WPSVM: Dual Problem Dual Formulation subject to ˆν = argmax ν 1,,ν n n ν i 1 4 i=1 n n i=1 j=1 ν i ν j Y i Y j P (i,j) Ω i) 0 ν i λπ(y i ),i = 1,,n ii) ii) n i=1 ν iy i = 0 where P (i,j) Ω is the (i,j)th element of P Ω = Ω(Ω Ω) 1 Ω. The kernel WPSVM solution is given by ˆγ n = λ n ˆν i Y i {(Ω Ω) 1 Ω i }. 2 i=1

29 Kernel WPSVM: 1. For a given gird π 1 < < π H, we compute a sequence of kernel WPSVM solutions: (ˆα n,h,ˆγ n,h ) = argmin α,γ 2. Corresponding kernel matrix is ˆΛ n,πh (α,γ). H ˆγ n,hˆγ n,h. (2) h=1 3. Let ˆV n = (ˆv 1,,v k ) denote the first k leading eigenvectors of (2), ˆφ(x) = ˆV n (ω 1(x),,ω d (x)).

30 Outline Introduction 1 Introduction

31 Simulation -Set Up X i = (X i1,,x ip ) N p (0,I), i = 1,,n where (n,p) = (500,10). We consider 5 Models: Model I: Y = sign{x 1/[0.5(X 2 1) 2 ]0.2ǫ}. Model II: Y = sign{(x 1 0.5)(X 2 0.5) 2 0.2ǫ}. Model III: Y = sign{sin(x 1)/e X 2 0.2ǫ}. Model IV: Y = sign{x 1(X 1 X 2 1)0.2ǫ}. Model V: Y = sign{(x 2 1 X 2 2) 1/2 log(x 2 1 X 2 2) 1/2 0.2ǫ}. B = (e 1,e 2 ) s.t. e i X = X i,i = 1,2 (k = 2). Performance is measured by PˆB P B F, where P A = A(A A) 1 A and F denotes Frobenius norm.

32 True Classification Function (c) Model IV (d) Model V Figure: Surface plots of the Model IV and V.

33 Results - Linear WPSVM Table: Averaged F-distance measures over 100 independent repetitions with associated standard deviations in parentheses. SAVE phd Fourier IHT LWPSVM I (.161) (.193) (.156) (.254) (.171) II (.187) (.186) (.214) (.199) (.198) III (.186) (.198) (.163) (.232) (.180) IV (.272) (.194) (.103) (.105) (.101) V (.052) (.053) (.241) (.011) (.171)

34 Results - Structure Dimensionality Model k n p = 10 p = 20 SAVE WPSVM SAVE WPSVM f f f f f f Table: Empirical probabilities (in percentage) of correctly estimating true k based on 100 independent repetitions. SAVE: the permutation test (Cook and Yin, 2001).

35 Results - Kernel WPSVM original predictors SAVE Linear WPSVM x b2 * x b2 * x x1 (a) Original b1 * x (b) SAVE b1 * x (c) Linear WPSVM Figure: Nonlinear SDR results for a random data set from Model V.

36 Results - Kernel WPSVM Kernel WPSVM φ^2(x) estimated φ^1(x) true (a) Kernel WPSVM(ˆφ 1 (X) vs. ˆφ2 (X)) (b) ˆφ 1 (X) vs. (X 2 1 X2 2 )1/2 Figure: Kernel WPSVM results for a random data set from Model V.

37 Results - Kernel WPSVM Two-sample Hotelling s T 2 test statistics: { ( Tn 2 = ( X X 1 ) ˆΣ n 1 )} 1 ( X n n X ). Table: Averaged T 2 n computed from the first two estimated sufficient predictors over 100 independent repetitions. Model SAVE phd FCN IHT LWPSVM KWPSVM IV (30.7) (20.6) (25.7) (24.5) (25.7) (71.9) V (1.2) (1.1) (4.2) (4.5) (4.6) (78.1)

38 WDBC data - SDR results Gn(M) ˆb 3 X ˆb 2 X φ^2(x) Structure Dimension ˆb 1 X φ^1(x) (a) k-selection (b) Linear WPSVM (c) Kernel WPSVM

39 WDBC data - Classification Accuracy 3-NN test error rate for the raw data: 7.7% (1.2) k SAVE phd FCN IHT LWPSVM KWPSVM (2.2) (4.5) (2.3) (3.1) (1.1) (2.0) (1.8) (4.8) (2.0) (2.8) (1.1) (2.0) (1.7) (5.0) (1.8) (1.5) (1.2) (2.0) (1.6) (4.6) (1.9) (1.4) (1.3) (1.9) (1.8) (4.2) (1.8) (1.3) (1.5) (1.9) Table: Averaged test error rates (in percentage) of the knn classifier (κ = 3) over 100 random partitions for the WDBC data with respect to the first k sufficient predictors (k = 1,2,3,4,5), which are estimated by different SDR methods. Corresponding standard deviations are given in parentheses.

40 Outline Introduction 1 Introduction

41 Introduction Most existing SDR methods suffer if Y is binary. The proposed WPSVM preserves all the merits of the PSVM and performs very well in binary classification. Computational efficiency can be improved by employing the π-path algorithm.

42 Thank you!!!

43 Selected References Li, K.-C. (1991), Sliced inverse regression for dimension reduction (with discussion), Journal of the American Statistical Association, 86, Li, B., Artemiou, A., and Li, L. (2011), Principal Support Vector Machines for Linear and Nonlinear Sufficient Dimension Reduction, Annals of Statistics, 39, Wang, J., Shen, X., and Liu, Y. (2008), Probability estimation for large-margin classifier, Biometrika, 95, Shin, S.J., Wu, Y., and Zhang, H.H. (2013), Two dimensional solution surface of the weighted support vector machines, Journal of Computational and Graphical Statistics, to appear. Koo, J.-Y., Lee, Y., Kim, Y., and Park, C. (2008) A Bahadur Representation of the Linear Support Vector Machine, Journal of Machine Learning Research, 9(Jul): Jiang, B., Zhang, X., and Cai, T. (2008) Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers, Journal of Machine Learning Research, 9(Mar): Bura, E. and Pfeiffer, C. (2008), On the distribution of the left singular vectors of a random matrix and its applications, Statistics and Probability Letters, 78,

44 Linearity Condition Linearity Condition For any a R p, E(a X B X) is a linear function of B X. E(X B X) = P Σ (B)X = B(B ΣB) 1 B ΣX Common and essential assumption in SDR. Hard to check since B is unknown. Holds if X is elliptically symmetric. (eg. X is multivariate normal) Approximately holds if p gets large for fixed d. (Hall and Li, 1993) Assumption is only for the marginal distribution of X.

45 ρ Selection 1. Randomly split the data into the training and testing sets. 2. Apply the WPSVM to the training set and compute its candidate tr matrix, M n. 3. For a given ρ, 3.a Compute ˆk tr = argmax k {1,,p} = G n(k;ρ, M tr n). tr tr 3.b Transform training predictors X j = ( V n) X tr j where V n tr = ( v 1 tr,, v tr ˆk tr ) are the first ˆk tr tr leading eigenvectors of M n. 3.c For each π h,h = 1,,H, apply the WSVM to {( X tr tr j,yj ) : j = 1,,n tr} to predict Y ts j. 3.d Denoting the predicted label Ŷ ts j, compute the total cost on the test data set. H n ts TC(ρ) = π h (Y ts ts j ) 1(Ŷj Y ts j ). h=1 j =1 4. Repeat 3.a d to select ρ which minimizes TC(ρ).

Sufficient Dimension Reduction using Support Vector Machine and it s variants

Sufficient Dimension Reduction using Support Vector Machine and it s variants Sufficient Dimension Reduction using Support Vector Machine and it s variants Andreas Artemiou School of Mathematics, Cardiff University @AG DANK/BCS Meeting 2013 SDR PSVM Real Data Current Research and

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

A Selective Review of Sufficient Dimension Reduction

A Selective Review of Sufficient Dimension Reduction A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

A Bahadur Representation of the Linear Support Vector Machine

A Bahadur Representation of the Linear Support Vector Machine A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support

More information

Simulation study on using moment functions for sufficient dimension reduction

Simulation study on using moment functions for sufficient dimension reduction Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master's Theses and Master's Reports - Open Dissertations, Master's Theses and Master's Reports 2012 Simulation study on

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Sufficient dimension reduction via distance covariance

Sufficient dimension reduction via distance covariance Sufficient dimension reduction via distance covariance Wenhui Sheng Xiangrong Yin University of Georgia July 17, 2013 Outline 1 Sufficient dimension reduction 2 The model 3 Distance covariance 4 Methodology

More information

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee Predictive

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Lecture 18: Multiclass Support Vector Machines

Lecture 18: Multiclass Support Vector Machines Fall, 2017 Outlines Overview of Multiclass Learning Traditional Methods for Multiclass Problems One-vs-rest approaches Pairwise approaches Recent development for Multiclass Problems Simultaneous Classification

More information

Combining eigenvalues and variation of eigenvectors for order determination

Combining eigenvalues and variation of eigenvectors for order determination Combining eigenvalues and variation of eigenvectors for order determination Wei Luo and Bing Li City University of New York and Penn State University wei.luo@baruch.cuny.edu bing@stat.psu.edu 1 1 Introduction

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Lecture 16: Modern Classification (I) - Separating Hyperplanes Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Supervised Dimension Reduction:

Supervised Dimension Reduction: Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department

More information

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

9. Least squares data fitting

9. Least squares data fitting L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

A GENERAL THEORY FOR NONLINEAR SUFFICIENT DIMENSION REDUCTION: FORMULATION AND ESTIMATION

A GENERAL THEORY FOR NONLINEAR SUFFICIENT DIMENSION REDUCTION: FORMULATION AND ESTIMATION The Pennsylvania State University The Graduate School Eberly College of Science A GENERAL THEORY FOR NONLINEAR SUFFICIENT DIMENSION REDUCTION: FORMULATION AND ESTIMATION A Dissertation in Statistics by

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108

Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction

Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction Sliced Inverse Moment Regression Using Weighted Chi-Squared Tests for Dimension Reduction Zhishen Ye a, Jie Yang,b,1 a Amgen Inc., Thousand Oaks, CA 91320-1799, USA b Department of Mathematics, Statistics,

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Localized Sliced Inverse Regression

Localized Sliced Inverse Regression Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,

More information

5 Operations on Multiple Random Variables

5 Operations on Multiple Random Variables EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

Scaling up Vector Autoregressive Models with Operator Random Fourier Features

Scaling up Vector Autoregressive Models with Operator Random Fourier Features Scaling up Vector Autoregressive Models with Operator Random Fourier Features Romain Brault, Néhémy Lim, Florence d Alché-Buc August, 6 Abstract A nonparametric approach to Vector Autoregressive Modeling

More information

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,

More information

Infinitely Imbalanced Logistic Regression

Infinitely Imbalanced Logistic Regression p. 1/1 Infinitely Imbalanced Logistic Regression Art B. Owen Journal of Machine Learning Research, April 2007 Presenter: Ivo D. Shterev p. 2/1 Outline Motivation Introduction Numerical Examples Notation

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

c 4, < y 2, 1 0, otherwise,

c 4, < y 2, 1 0, otherwise, Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Introduction and a quick repetition of analysis/linear algebra First lecture, 12.04.2010 Jun.-Prof. Matthias Hein Organization of the lecture Advanced course, 2+2 hours,

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Dimension reduction techniques for classification Milan, May 2003

Dimension reduction techniques for classification Milan, May 2003 Dimension reduction techniques for classification Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Dimension reduction techniquesfor classification p.1/53

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls ~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

Kernel-based Feature Extraction under Maximum Margin Criterion

Kernel-based Feature Extraction under Maximum Margin Criterion Kernel-based Feature Extraction under Maximum Margin Criterion Jiangping Wang, Jieyan Fan, Huanghuang Li, and Dapeng Wu 1 Department of Electrical and Computer Engineering, University of Florida, Gainesville,

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Sufficient Dimension Reduction for Longitudinally Measured Predictors

Sufficient Dimension Reduction for Longitudinally Measured Predictors Sufficient Dimension Reduction for Longitudinally Measured Predictors Ruth Pfeiffer National Cancer Institute, NIH, HHS joint work with Efstathia Bura and Wei Wang TU Wien and GWU University JSM Vancouver

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information