Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Size: px
Start display at page:

Download "Kernel-Based Contrast Functions for Sufficient Dimension Reduction"

Transcription

1 Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach 1

2 Outline Introduction dimension reduction and conditional independence Conditional covariance operators on RKHS Kernel Dimensionality Reduction for regression Manifold KDR Summary 2

3 Sufficient Dimension Reduction Regression setting: observe (X,Y) pairs, where the covariate X is high-dimensional Find a (hopefully small) subspace S of the covariate space that retains the information pertinent to the response Y Semiparametric formulation: treat the conditional distribution p(y X) nonparametrically, and estimate the parameter S 3

4 Perspectives Classically the covariate vector X has been treated as ancillary in regression The sufficient dimension reduction (SDR) literature has aimed at making use of the randomness in X (in settings where this is reasonable) This has generally been achieved via inverse regression at the cost of introducing strong assumptions on the distribution of the covariate X We ll make use of the randomness in X without employing inverse regression 4

5 Dimension Reduction for Regression Regression: p( Y X ) Y : response variable, X = (X 1,...,X m ): m-dimensional covariate Goal: Find the central subspace, which is defined via: ( ~ ( T = p Y B ) ) ~ ( T T p( Y X ) = p Y b1 X,..., b X ) X d 5

6 Y Y X 1 X 2 X Y = N(0; exp( X ) + 1 ) Y central subspace = X 1 axis X 2 6

7 Some Existing Methods Sliced Inverse Regression (SIR, Li 1991) PCA of E[X Y] use slice of Y Elliptic assumption on the distribution of X Principal Hessian Directions (phd, Li 1992) T Average Hessian Σ yxx E[( Y Y )( X X )( X X ) ] is used If X is Gaussian, eigenvectors gives the central subspace Gaussian assumption on X. Y must be one-dimensional Projection pursuit approach (e.g., Friedman et al. 1981) Additive model E[Y X] = g 1 (b 1T X) g d (b dt X) is used Canonical Correlation Analysis (CCA) / Partial Least Squares (PLS) Linear assumption on the regression Contour Regression (Li, Zha & Chiaromonte, 2004) Elliptic assumption on the distribution of X 7

8 Dimension Reduction and Conditional Independence (U, V)=(B T X, C T X) where C: m x (m-d) with columns orthogonal to B B gives the projector onto the central subspace p X ( y x) py U ( y B x) Y = py U, V ( y u, v) = py U ( y u) for all y, u, v Conditional independence Y V U T U Y V X Our approach: Characterize conditional independence 8

9 Outline Introduction dimension reduction and conditional independence Conditional covariance operators on RKHS Kernel Dimensionality Reduction for regression Manifold KDR Summary 9

10 Reproducing Kernel Hilbert Spaces Kernel methods RKHS s have generally been used to provide basis expansions for regression and classification (e.g., support vector machine) Kernelization: map data into the RKHS and apply linear or secondorder methods in the RKHS But RKHS s can also be used to characterize independence and conditional independence Ω X feature map Φ X (X) Φ Y (Y) feature map Ω Y X Φ X Φ Y Y H X H Y RKHS RKHS 10

11 Positive Definite Kernels and RKHS Positive definite kernel (p.d. kernel) k :Ω Ω R k is positive definite if k(x,y) = k(y,x) and for any ( k(x i, x j )) i, j the matrix (Gram matrix) is positive semidefinite. Example: Gaussian RBF kernel Reproducing kernel Hilbert space (RKHS) k: p.d. kernel on Ω H : reproducing kernel Hilbert space (RKHS) k(, x) H for all x Ω. 1) 2) Span{ k(, x) x Ω} is dense in H. 3) k(, x), f = f (x) H (reproducing property) n N, x 1, x n Ω k(x, y) = exp( x y 2 σ 2 ) 11

12 Functional data Φ : Ω H, x k(, x) i. e. Φ( x) = k(, x) Data: X 1,, X N Φ X (X 1 ),, Φ X (X N ) : functional data Why RKHS? By the reproducing property, computing the inner product on RKHS is easy: Φ(x), Φ(y) = k(x, y) N 1 = i j N f = i= aiφ( xi ) = i aik(, xi ), g = j= 1 bjφ ( xj) = jbjk(, xj) f, g, a b k( x, x ) i j The computational cost essentially depends on the sample size. Advantageous for high-dimensional data of small sample size. i j 12

13 Covariance Operators on RKHS X, Y : random variables on Ω X and Ω Y, resp. Prepare RKHS (H X, k X ) and (H Y, k Y ) defined on Ω X and Ω Y, resp. Define random variables on the RKHS H X and H Y by Φ X ( X ) = k (, X ) X Φ Y ( Y ) = k (, Y ) Y Define the covariance operator Σ YX Σ YX = E[Φ Y (Y ) Φ X (X), ] E[Φ Y (Y )]E[ Φ X (X), ] Ω X Φ X (X) Φ Y (Y) Ω Y Σ YX X Φ X Φ Y Y H X H Y 13

14 Covariance Operators on RKHS Definition Σ YX = E[Φ Y (Y ) Φ X (X), ] E[Φ Y (Y )]E[ Φ X (X), ] Σ YX is an operator from H X to H Y such that g, Σ YX f = E[g(Y ) f (X)] E[g(Y )]E[ f (X)] (= Cov[ f (X), g(y )]) for all f H, g H X Y cf. Euclidean case V YX = E[YX T ] E[Y]E[X] T ( b, V YX a) = Cov[( b, Y ),( a, X )] : covariance matrix 14

15 Characterization of Independence Independence and cross-covariance operators If the RKHS s are rich enough : X Y Σ XY = O is always true requires an assumption on the kernel (universality) e.g., Gaussian RBF kernels are universal ( ) k(x, y) = exp x y 2 σ 2 cf. for Gaussian variables, X and Y are independent Cov[ f (X), g(y )] = 0 or E[g(Y ) f (X)] = E[g(Y )]E[ f (X)] for all f H, g H V XY = O i.e. uncorrelated X Y 15

16 Independence and characteristic functions Random variables X and Y are independent E XY e iω T X e iηt Y = E X e iω T X E Y e iηt Y for all ω and η I.e., e iω T x and e iηt y work as test functions RKHS characterization X Ω X Random variables and are independent [ f ( X ) g( Y )] E [ f ( X )] E [ g( Y )] E for all f H X, g H Y XY = X Y Ω Y Y RKHS approach is a generalization of the characteristic-function approach 16

17 RKHS and Conditional Independence Conditional covariance operator X and Y are random vectors. H X, H Y : RKHS with kernel k X, k Y, resp. Def. Σ YY X Σ YY Σ YX Σ XX 1 Σ XY : conditional covariance operator Under a universality assumption on the kernel g, Σ YY X g = E[ Var[g(Y ) X]] cf. For Gaussian Var Y X [a T Y X = x] = a T ( V YY V YX V 1 XX V XY )a Monotonicity of conditional covariance operators X = (U,V) : random vectors Σ YY U ΣYY X : in the sense of self-adjoint operators 17

18 Conditional independence Theorem X = (U,V) and Y are random vectors. H X, H U, H Y : RKHS with Gaussian kernel k X, k U, k Y, resp. Y V U ΣYY U = ΣYY X This theorem provides a new methodology for solving the sufficient dimension reduction problem 18

19 Outline Introduction dimension reduction and conditional independence Conditional covariance operators on RKHS Kernel Dimensionality Reduction for regression Manifold KDR Summary 19

20 Kernel Dimension Reduction Use a universal kernel for B T X and Y Σ YY B T X Σ YY X ( : the partial order of self-adjoint operators) Σ YY B T X = Σ YY X X Y B T X KDR objective function: min Tr Σ B: B T B= I YY B T X d which is an optimization over the Stiefel manifold 20

21 Empirical cross-covariance operator (N ˆΣ ) YX = 1 N N i=1 ˆm X = 1 N Estimator { k Y (,Y i ) ˆm Y } k X (, X i ) ˆm X N i=1 k X (, X i ) { } ˆm Y = 1 N N i=1 k Y (,Y i ) (N ) ˆΣ YX gives the empirical covariance: ( N g, öσ ) YX f = 1 N f ( X N i=1 i )g(y i ) 1 N 1 N f ( X N i=1 i ) g(y N i=1 i ) Empirical conditional covariance operator (N ) ˆΣ YY X = ˆΣ (N ) YY ˆΣ YX ( (N ) + ε N I) 1 ˆΣ XY (N ) ˆΣ XX (N ) ε N : regularization coefficient 21

22 Estimating function for KDR: (N ) Tr ˆΣYY U = Tr (N ˆΣYY ) ˆΣ YU ( (N ) + ε N I) 1 ˆΣUY (N ) ˆΣUU (N ) U = B T X ( ) 1 = Tr G Y G Y G U G U + Nε N I N where G U = I N 1 T ( N 1 N 1 N )K U I N 1 T N 1 N 1 N T K U = k( B X i, B X j ) T ( ) : centered Gram matrix Optimization problem: min Tr G B:B T Y B=I d ( G + Nε B T X N I ) 1 N 22

23 Experiments with KDR Wine data k(z 1, z 2 ) Data 13 dim. 178 data. 3 classes 2 dim. projection ( ) = exp z 1 z 2 2 σ 2 σ = KDR CCA Partial Least Square Sliced Inverse Regression

24 Consistency of KDR Theorem Suppose k d is bounded and continuous, and ε N 0, N 1/2 ε N (N ). Let S 0 be the set of optimal parameters: S 0 = B B T B = I d,tr Σ B YY X = min Tr Σ B' YY X { } Then, under some conditions, for any open set Pr( öb ( N ) U) 1 (N ). B' U S 0 24

25 Lemma Suppose k d is bounded and continuous, and ε N 0, N 1/2 ε N (N ). Then, under some conditions, sup Tr öσ YY X B:B T B= I d in probability. B( N ) Tr Σ B YY X 0 (N ) 25

26 Conclusions Introduction dimension reduction and conditional independence Conditional covariance operators on RKHS Kernel Dimensionality Reduction for regression Technical report available at: 26

27 Background Reduction to Manifolds? Experimental Results Summary Regression on Manifolds using Kernel Dimension Reduction Jens Nilsson 1 Fei Sha 2 Michael I. Jordan 3 1 Centre for Mathematical Sciences Lund University 2 Computer Science Division University of California, Berkeley 3 Computer Science Division and Department of Statistics University of California, Berkeley June 28, 2008 Regression on Manifolds using Kernel Dimension Reduction

28 Background Reduction to Manifolds? Experimental Results Summary Dimensionality Reduction Sufficient Dimension Reduction Dimensionality Reduction in Regression Find a lower-dimensional subspace Z X and a mapping X x i z i Z, i = 1,..., m such that {z i } retains maximal predictive power w.r.t. {y i }. Supervised dimensionality reduction Regression on Manifolds using Kernel Dimension Reduction

29 Background Reduction to Manifolds? Experimental Results Summary Dimensionality Reduction Sufficient Dimension Reduction Sufficient Dimension Reduction Parameterize Z by B R D d where B T B = I. Find B such that Y X B T X (1) Under weak conditions, the intersection of all such Z B defines the central subspace, S. Regression on Manifolds using Kernel Dimension Reduction

30 Background Reduction to Manifolds? Experimental Results Summary Dimensionality Reduction Sufficient Dimension Reduction Kernel Dimension Reduction Measure conditional independence in RKHS Map X and Y to reproducing kernel Hilbert spaces H X, H Y. X f H X, Y g H Y Cross-covariance C f g between f and g can be represented by an operator Σ YX : H X H Y such that Conditional covariance operator g, Σ YX f HY = C f g, f, g (2) Σ YY X = Σ YY Σ YX Σ 1 XX Σ XY. (3) Regression on Manifolds using Kernel Dimension Reduction

31 Background Reduction to Manifolds? Experimental Results Summary Dimensionality Reduction Sufficient Dimension Reduction KDR Theorem 1. Σ YY X Σ YY B T X 2. Σ YY X = Σ YY B T X Y X B T X The central space can be found by minimizing Σ YY B T X. Regression on Manifolds using Kernel Dimension Reduction

32 Background Reduction to Manifolds? Experimental Results Summary Dimensionality Reduction Sufficient Dimension Reduction KDR Algorithm The minimization of Σ YY B T X can be formulated as min such that Tr K c Y (K c + B T X NɛI) 1 B T B = I (4) where K c Y and K c B T X are centered Gram matrices. Regression on Manifolds using Kernel Dimension Reduction

33 Background Reduction to Manifolds? Experimental Results Summary Reduction to Manifolds? Large literature on manifold learning The goal is to uncover the intrinsic geometry underlying a data set often the goal is visualization This is usually done without taking into account a response variable As before, we re motivated to find a way to estimate manifolds while taking into account a response e.g., can help provide guidance for visualization We ll combine (normalized) graph Laplacian technology with KDR Regression on Manifolds using Kernel Dimension Reduction

34 Background Reduction to Manifolds? Experimental Results Summary Eigenvectors of the Graph Laplacian Figure: First four (non-constant) eigenvectors of the graph Laplacian on a torus. Harmonics on the manifold Reflect intrinsic coordinates Regression on Manifolds using Kernel Dimension Reduction

35 Background Reduction to Manifolds? Experimental Results Summary mkdr Formulation Nilsson, Sha, and Jordan (2007) Compute the eigenvectors v i, i = 1,..., M of the normalized graph Laplacian Define an RKHS explicitly as the span of these eigenvectors Approximate the image of central subspace with a linear transformation ΦV T Regression on Manifolds using Kernel Dimension Reduction

36 Background Reduction to Manifolds? Experimental Results Summary mkdr Algorithm mkdr minimization problem: min Tr K c Y (VΩV T + NɛI) 1 such that Ω 0 Tr(Ω) = 1 (5) Φ = Ω Regression on Manifolds using Kernel Dimension Reduction

37 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data Regression on a Torus {x i } have intrinsic coordinates [θ i, φ i ] S 1 S 1 y is a logistic function of (θ, φ) Regression on Manifolds using Kernel Dimension Reduction

38 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data Regression on a Torus {x i } have intrinsic coordinates [θ i, φ i ] S 1 S 1 y is a logistic function of (θ, φ) Regression on Manifolds using Kernel Dimension Reduction

39 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data mkdr finds the Central Subspace Ω nearly rank 1 Ω aa T ; Project onto a y y Va x 10 3 Figure: Uniform grid sampling Va x 10 3 Figure: Uniformly random sampling with additive noise Regression on Manifolds using Kernel Dimension Reduction

40 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data mkdr can be used to guide visualization Map {x i } onto the eigenvectors {v i } with largest weight in Φ. Predictive eigenvectors in contrast to principal eigenvectors used in e.g. Laplacian eigenmaps. v 3 v 5 v 2 v 1 v 3 v 1 Figure: Principal eigenvectors Figure: Predictive eigenvectors Regression on Manifolds using Kernel Dimension Reduction

41 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data Global Temperature Data {y i } are satellite measurements of atmospherical temperatures around the globe observation points {x i } lie on a spheroid in R 3 Regress the temperature y on x. Regression on Manifolds using Kernel Dimension Reduction

42 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data Regression Model of Temperature Distribution Compute the central space ΦV T and use linear regression to model E[Y ΦV T ]. Figure: Predicted temperature Figure: Central space coordinate Figure: Prediction error Regression on Manifolds using Kernel Dimension Reduction

43 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data Visualization of an Image Data Manifold {x i } are a set of 1000 grayscale images of size pixels 4 degrees of freedom: rotation angle, tilt angle and translations in the image plane Data lie on a 4-dimensional manifold in R Create a lower-dimensional embedding that captures the variation in rotation angle Regression on Manifolds using Kernel Dimension Reduction

44 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data Unsupervised Embedding Project onto the principal eigenvectors, i.e. Laplacian Eigenmaps Figure: Principal eigenvectors. Color by tilt angle. Figure: Principal eigenvectors. Color by rotation angle. Regression on Manifolds using Kernel Dimension Reduction

45 Background Reduction to Manifolds? Experimental Results Summary Regression on a Torus Global Temperature Data Image Data Predictive embedding guided by mkdr Apply mkdr with rotation angle as response Map data onto predictive eigenvectors of the Graph Laplacian Figure: Predictive eigenvectors Regression on Manifolds using Kernel Dimension Reduction

46 Background Reduction to Manifolds? Experimental Results Summary Summary mkdr discovers manifolds that optimally preserve predictive power w.r.t response variables. mkdr enables: flexible regression modeling supervised exploration of nonlinear data manifolds mkdr extends: sufficient dimension reduction to nonlinear manifolds. manifold learning to the supervised setting. Regression on Manifolds using Kernel Dimension Reduction

Regression on Manifolds Using Kernel Dimension Reduction

Regression on Manifolds Using Kernel Dimension Reduction Jens Nilsson JENSN@MATHS.LTH.SE Centre for Mathematical Sciences, Lund University, Box 118, SE-221 00 Lund, Sweden Fei Sha FEISHA@CS.BERKELEY.EDU Computer Science Division, University of California, Berkeley,

More information

REGRESSION USING GAUSSIAN PROCESS MANIFOLD KERNEL DIMENSIONALITY REDUCTION. Kooksang Moon and Vladimir Pavlović

REGRESSION USING GAUSSIAN PROCESS MANIFOLD KERNEL DIMENSIONALITY REDUCTION. Kooksang Moon and Vladimir Pavlović REGRESSION USING GAUSSIAN PROCESS MANIFOLD KERNEL DIMENSIONALITY REDUCTION Kooksang Moon and Vladimir Pavlović Rutgers University Department of Computer Science Piscataway, NJ 8854 ABSTRACT This paper

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

A Selective Review of Sufficient Dimension Reduction

A Selective Review of Sufficient Dimension Reduction A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Sufficient Dimension Reduction using Support Vector Machine and it s variants

Sufficient Dimension Reduction using Support Vector Machine and it s variants Sufficient Dimension Reduction using Support Vector Machine and it s variants Andreas Artemiou School of Mathematics, Cardiff University @AG DANK/BCS Meeting 2013 SDR PSVM Real Data Current Research and

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Unsupervised Kernel Dimension Reduction Supplemental Material

Unsupervised Kernel Dimension Reduction Supplemental Material Unsupervised Kernel Dimension Reduction Supplemental Material Meihong Wang Dept. of Computer Science U. of Southern California Los Angeles, CA meihongw@usc.edu Fei Sha Dept. of Computer Science U. of Southern

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Dimensionality Reduction by Unsupervised Regression

Dimensionality Reduction by Unsupervised Regression Dimensionality Reduction by Unsupervised Regression Miguel Á. Carreira-Perpiñán, EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Zhengdong Lu, CSEE, OGI http://www.csee.ogi.edu/~zhengdon

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Supervised Dimension Reduction:

Supervised Dimension Reduction: Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces Journal of Machine Learning Research 5 (2004) 73-99 Submitted 12/01; Revised 11/02; Published 1/04 Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces Kenji Fukumizu

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Supervised dimensionality reduction using mixture models

Supervised dimensionality reduction using mixture models Sajama Alon Orlitsky University of california at San Diego sajama@ucsd.edu alon@ucsd.edu Abstract Given a classification problem, our goal is to find a low-dimensional linear transformation of the feature

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer

More information

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444 Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Localized Sliced Inverse Regression

Localized Sliced Inverse Regression Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Lecture 4 February 2

Lecture 4 February 2 4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have

More information

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569, Japan Francis R. Bach Computer Science Division

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Nonparametric Bayesian Methods

Nonparametric Bayesian Methods Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and

More information

Independent Component Analysis

Independent Component Analysis 1 Independent Component Analysis Background paper: http://www-stat.stanford.edu/ hastie/papers/ica.pdf 2 ICA Problem X = AS where X is a random p-vector representing multivariate input measurements. S

More information

Weighted Principal Support Vector Machines for Sufficient Dimension Reduction in Binary Classification

Weighted Principal Support Vector Machines for Sufficient Dimension Reduction in Binary Classification s for Sufficient Dimension Reduction in Binary Classification A joint work with Seung Jun Shin, Hao Helen Zhang and Yufeng Liu Outline Introduction 1 Introduction 2 3 4 5 Outline Introduction 1 Introduction

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up! Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

More information

Automatic Relevance Determination

Automatic Relevance Determination Automatic Relevance Determination Elia Liitiäinen (eliitiai@cc.hut.fi) Time Series Prediction Group Adaptive Informatics Research Centre Helsinki University of Technology, Finland October 24, 2006 Introduction

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information