Learning gradients: prescriptive models
|
|
- Janis Cross
- 5 years ago
- Views:
Transcription
1 Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007
2 Relevant papers Learning Coordinate Covariances via Gradients. Sayan Mukherjee, Ding-Xuan Zhou; Journal of Machine Learning Research, 7(Mar): , Estimation of Gradients and Coordinate Covariation in Classification., Qiang Wu; Journal of Machine Learning Research, 7(Nov): , Learning Gradients and Feature Selection on Manifolds. Sayan Mukherjee, Qiang Wu, Ding-Xuan Zhou; Annals of Statistics, submitted. Learning Gradients: simultaneous regression and inverse regression. Mauro Maggioni,, Qiang Wu; Journal of Machine Learning Research, in preparation.
3 Table of contents 1 Regression and inverse regression Learning gradients: a justification for inverse regression 2 Nonparametric kernel model Convergence of estimate
4 Generative vs. predictive modelling Learning gradients: a justification for inverse regression Given data = {Z i = (x i, y i )} n i=1 with Z i iid ρ(x, Y ). X X IR p and Y IR and p n.
5 Generative vs. predictive modelling Learning gradients: a justification for inverse regression Given data = {Z i = (x i, y i )} n i=1 with Z i iid ρ(x, Y ). X X IR p and Y IR and p n. Two options 1 discriminative or regression Y X 2 generative X Y (sometimes called inverse regression)
6 Motivation and related work Learning gradients: a justification for inverse regression Data generated by measuring thousands of variables lies on or near a low-dimensional manifold.
7 Motivation and related work Learning gradients: a justification for inverse regression Data generated by measuring thousands of variables lies on or near a low-dimensional manifold. Manifold learning: LLE, ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps.
8 Motivation and related work Learning gradients: a justification for inverse regression Data generated by measuring thousands of variables lies on or near a low-dimensional manifold. Manifold learning: LLE, ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps. Simultaneous dimensionality reduction and regression: SIR, MAVE, SAVE.
9 Regression Regression and inverse regression Learning gradients: a justification for inverse regression Given X X IR p and Y IR and p n and ρ(x, Y ) we want Y X. A natural idea f r (x) = arg min[var (f )] = arg min E Y (Y f (X )) 2, and f r (x) = E Y [Y x] provides a summary of Y X.
10 Inverse regression Regression and inverse regression Learning gradients: a justification for inverse regression Given X X IR p and Y IR and p n and ρ(x, Y ) we want X Y. Ω = cov (X Y ) provides a summary of X Y.
11 Inverse regression Regression and inverse regression Learning gradients: a justification for inverse regression Given X X IR p and Y IR and p n and ρ(x, Y ) we want X Y. Ω = cov (X Y ) provides a summary of X Y. 1 Ω ii relevance of variable with respect to label 2 Ω ij covariation with respect to label
12 Learning gradients Regression and inverse regression Learning gradients: a justification for inverse regression Given data = {Z i = (x i, y i )} n i=1 with Z i iid ρ(x, Y ). We will simultaneously estimate f r (x) and f r = ( f r x 1,..., fr x p ) T.
13 Learning gradients Regression and inverse regression Learning gradients: a justification for inverse regression Given data = {Z i = (x i, y i )} n i=1 with Z iid i ρ(x, Y ). We will simultaneously estimate f r (x) and f r = ( f r ) T,..., fr x 1 x. p 1 regression: f r (x) 2 inverse regression: outer product of gradients Γ = E[ f r f r ] or fr Γ ij = x i, f r x j.
14 Linear case Regression and inverse regression Learning gradients: a justification for inverse regression We start with the linear case y = w x + ε, ε iid No(0, σ 2 ). Σ X = cov (X ), σ 2 Y = var (Y ). Ω = cov (X Y ) = 1 σ 2 Y Σ X Γ Σ X.
15 Linear case Regression and inverse regression Learning gradients: a justification for inverse regression We start with the linear case y = w x + ε, ε iid No(0, σ 2 ). Σ X = cov (X ), σ 2 Y = var (Y ). Ω = cov (X Y ) = 1 σ 2 Y Σ X Γ Σ X. Γ and Ω are equivalent modulo rotation and scale.
16 Nonlinear case Regression and inverse regression Learning gradients: a justification for inverse regression For smooth f (x) Ω = cov (X Y ) not so clear. y = f (x) + ε, ε iid No(0, σ 2 ).
17 Nonlinear case Regression and inverse regression Learning gradients: a justification for inverse regression Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi ) σi 2 = var (Y χi ).
18 Nonlinear case Regression and inverse regression Learning gradients: a justification for inverse regression Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi ) σi 2 = var (Y χi ). Γ = I i=1 σi 2 Σ 1 i Ω i Σ 1 i.
19 Taylor expansion and gradients Nonparametric kernel model Convergence of estimate Given D = (Z 1,..., Z n ) the variance of f may be approximated as n 2, Var n (f ) = w ij [y i f (x j ) f (x j ) (x i x j )] i,j=1 w ij ensures the locality of x i x j.
20 Penalized loss estimator Nonparametric kernel model Convergence of estimate ˆf (x) = arg min [error on data + smoothness of function] f bs
21 Penalized loss estimator Nonparametric kernel model Convergence of estimate ˆf (x) = arg min [error on data + smoothness of function] f bs error on data = L(f, data) = Var n (f ) 2 smoothness of function = f 2 K = f (x) 2 dx big function space = reproducing kernel Hilbert space = H K
22 Penalized loss estimator Nonparametric kernel model Convergence of estimate ˆf (x) = arg min f H K [ L(f, data) + λ f 2 K ] The kernel: K : X X IR e.g. K(u, v) = e ( u v 2). The RKHS { H K = f f (x) = } l α i K(x, x i ), x i X, α i R, l N. i=1
23 Gradient estimate Regression and inverse regression Nonparametric kernel model Convergence of estimate Nonparametric model (f D, [ n f D ) := arg min w ij (y j f (x i ) ) 2 f (x i ) (x j x i ) f, f H p+1 K i,j=1 +λ 1 f 2 K + λ 2 ] f 2 K, H p K is the space of p functions f = (f 1,..., f p ) where f i H K, f 2 K = p i=1 f i 2 K, and λ 1, λ 2 > 0.
24 Computational efficiency Nonparametric kernel model Convergence of estimate The computation requires fewer than n 2 parameters and is O(n 6 ) time and O(pn) memory f D (x) = n a i,d K(x i, x), fd (x) = i=1 n c i,d K(x i, x) i=1 with a D = (a 1,D,..., a n,d ) R n and c D = (c 1,D,..., c n,d ) T R np.
25 Consistency Regression and inverse regression Nonparametric kernel model Convergence of estimate Theorem Under mild regularity conditions on the distribution and corresponding density, with probability 1 δ ( ) 1 f D f ρx C log n 1/p f f ρx C log δ ( 1 δ ) n 1/p.
26 Linear example Samples from class 1 were drawn from x j No(1.5, 1), for j = 1,..., 10, x j No( 3, 1), for j = 11,..., 20, x j No(0, σ noise ), for j = 21,..., 80, Samples from class +1 were drawn from x j No(1.5, 1), for j = 41,..., 50, x j No( 3, 1), for j = 51,..., 60, x j No(0, σ noise )), for j = 1,..., 40, 61,..., 80.
27 Linear example Dimensions RKHS norm Samples Dimensions
28 Linear example Dimensions Probability y= Dimensions Samples
29 Nonlinear example Samples from class +1 were drawn from (x 1, x 2 ) = (r sin(θ), r cos(θ)), where r U[0, 1] and θ U[0, 2π], x j No(0.0,.2), for j = 3,..., 200, Samples from class 1 were drawn from (x 1, x 2 ) = (r sin(θ), r cos(θ)), where r U[2, 3] and θ U[0, 2π], x j N(0.0,.2), for j = 3,..., 200.
30 Nonlinear example 3 Class Class Dimensions Dimensions
31 Nonlinear example RKHS norm RKHS norm Dimensions Dimensions
32 Nonlinear example Probability y= Probability y= Samples Samples
33 Restriction to a manifold Assume the data is concentrated on a manifold M IR p with M IR d and there exists an isometric embedding ϕ : M R p.
34 Restriction to a manifold Assume the data is concentrated on a manifold M IR p with M IR d and there exists an isometric embedding ϕ : M R p. Given a smooth orthonormal vector field {e 1,..., e d } we can define the gradient on the manifold M f = (e 1 f,..., e d f ).
35 Restriction to a manifold Assume the data is concentrated on a manifold M IR p with M IR d and there exists an isometric embedding ϕ : M R p. Given a smooth orthonormal vector field {e 1,..., e d } we can define the gradient on the manifold M f = (e 1 f,..., e d f ). For q U M a chart u : U R d satisfying exists. The Taylor expansion on the manifold around q u i (q) = e i (q) f (q ) f (q) + M f (q) (u(q ) u(q)) for q q.
36 A problem with all manifold methods The Taylor expansion on the manifold around q f (q ) f (q) + M f (q) (u(q ) u(q)) for q q.
37 A problem with all manifold methods The Taylor expansion on the manifold around q f (q ) f (q) + M f (q) (u(q ) u(q)) for q q. {(q i, y i )} n i=1 M Y are drawn from the manifold but neither M nor a local expression of M are given. Also we have only the image x i = ϕ(q i ) R p.
38 The standard solution The Taylor expansion on the manifold around q f (q ) f (q) + M f (q) (u(q ) u(q)) for q q.
39 The standard solution The Taylor expansion on the manifold around q f (q ) f (q) + M f (q) (u(q ) u(q)) for q q. The Taylor expansion on the manifold around x in terms of f ϕ 1 R p (f ϕ 1 )(u) (f ϕ 1 )(x) (f ϕ 1 )(x) (u x) for u x.
40 The standard solution The Taylor expansion on the manifold around q f (q ) f (q) + M f (q) (u(q ) u(q)) for q q. The Taylor expansion on the manifold around x in terms of f ϕ 1 R p (f ϕ 1 )(u) (f ϕ 1 )(x) (f ϕ 1 )(x) (u x) for u x. Due to this equivalence f D dϕ M f
41 Improved rate of convergence Theorem Under mild regularity conditions on the distribution and corresponding density, with probability 1 δ ( ) (dϕ) 1 fd M f ρx C log n 1/d f f ρx C log where (dϕ) is the dual of the map dϕ. δ ( 1 δ ) n 1/d,
42 Sensitive features Proposition Let f be a smooth function on R p with gradient f. The sensitivity of the function f along a (unit normalized) direction u is u K. The d most sensitive features are those {u 1,..., u d } that are orthogonal and maximize u i f K. A spectral decomposition of Γ is used to compute these features, the eigenvectors corresponding to the d top eigenvalues.
43 Projection of data The matrix is an empirical estimate of Γ. ˆΓ = f D f D = c T D Kc D,
44 Projection of data The matrix is an empirical estimate of Γ. ˆΓ = f D f D = c T D Kc D, Geometry is preserved by projection onto top k-eigenvectors. No need to compute the p p matrix, method is O(n 2 p + n 3 ) time and O(p n) memory.
45 Linear example 8 x 10 4 Dimensions samples index of eigenvalues
46 Linear example Dimensions Feature 1
47 Linear example 3.5 x Dimensions samples index of eigenvalues
48 Nonlinear example Dimension Feature Dimension Feature 1
49 Nonlinear example index of eigenvalues Dimensions
50 Nonlinear example Dimensions Feature 1
51 Nonlinear example Dimensions Dimensions
52 Digits: 6 vs
53 Digits: 6 vs. 9 8 x Norm index of eigenvalues DImensions
54 Digits: 6 vs feature 2 0 feature feature feature 1
55 Leukemia 48 samples of AML, 25 samples of ALL, p = 7, 129. Dataset split into a training set of 38 samples and a test set of 35 samples.
56 Leukemia x 10 4 x x x 10 4 x x 10 4
57 Leukemia 5 x distance to hyperplane index of eigenvalues samples
58 Gauss-Markov graphical models Give a multivariate normal distribution with covariance matrix C the matrix P = C 1 is the conditional independence matrix P ij = dependence of i j all other variables.
59 Gauss-Markov graphical models Give a multivariate normal distribution with covariance matrix C the matrix P = C 1 is the conditional independence matrix P ij = dependence of i j all other variables. Set C = ˆΓ. However, many values near zero and how to truncate.
60 Multiscale graphical models Assume for now ˆΓ has all positive entries and is normalized to be a Markov matrix. T = Λ 1/2 ˆΓ Λ 1/2, where Λ is the eigenvalue matrix for ˆΓ.
61 Multiscale graphical models Note the harmonic expansion (I T ) 1 = T k = k=1 (I + T 2k ), k=1 where k is path-length in the first inequality T k ij = probability i j in path-length k. (I + T 2k ) factorizes (I T ) 1 into low rank matrices with fewer entries.
62 Multiscale graphical models Note the harmonic expansion (I T ) 1 = T k = k=1 (I + T 2k ), k=1 where k is path-length in the first inequality T k ij = probability i j in path-length k. (I + T 2k ) factorizes (I T ) 1 into low rank matrices with fewer entries. This is the idea of diffusion wavelets.
63 Summary and toy example Key quantities C T C 1 (I T ) 1. so (I + T 2k ) factorizes C 1.
64 Summary and toy example X abcd ˆΓ =
65 Summary and toy example X abcd ˆΓ = d c b ab a c d
66 General covariance matrices Math goes through as above. Interpretation is slightly different. Path coefficients (Wright 1921) and path regression (Tukey 1954) X 1 = c 12 X 2 + c 13 X c 1m X m where c 12 is the contribution of X 2 to X 1.
67 General covariance matrices Math goes through as above. Interpretation is slightly different. Path coefficients (Wright 1921) and path regression (Tukey 1954) X 1 = c 12 X 2 + c 13 X c 1m X m where c 12 is the contribution of X 2 to X 1. Matrix Γ can be though of as matrix of path coefficients. Powers of k compute path coefficients of length k as in Markov case.
68 An early graphical model weight at 33 days Gain 0-33 days External conditions weight at birth Heredity Rate of growth condition of dam Gestation period Size of litter Heredity of dam
69 Discussion Lots of work left: Semi-supervised setting.
70 Discussion Lots of work left: Semi-supervised setting. Multi-task setting.
71 Discussion Lots of work left: Semi-supervised setting. Multi-task setting. Bayesian formulation.
72 Discussion Lots of work left: Semi-supervised setting. Multi-task setting. Bayesian formulation. Nonlinear projections diffusion maps.
73 Discussion Lots of work left: Semi-supervised setting. Multi-task setting. Bayesian formulation. Nonlinear projections diffusion maps. Noise on-off manifolds.
Bayesian simultaneous regression and dimension reduction
Bayesian simultaneous regression and dimension reduction MCMski II Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University January 10, 2008
More informationSupervised Dimension Reduction:
Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department
More informationSayan Mukherjee. June 15, 2007
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University June 15, 2007 To Tommy Poggio This talk is dedicated to my advisor Tommy Poggio as
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationLocalized Sliced Inverse Regression
Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationFisher s Linear Discriminant Analysis
Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMarch 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University
Kernels March 13, 2008 Paper: R.R. Coifman, S. Lafon, maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels Figure: Example Application from [LafonWWW] meaningful geometric
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationLearning Gradients on Manifolds
Learning Gradients on Manifolds Sayan Mukherjee, Qiang Wu and Ding-Xuan Zhou Duke University, Michigan State University, and City University of Hong Kong Sayan Mukherjee Department of Statistical Science
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationGlobal vs. Multiscale Approaches
Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationNonparametric Bayesian Methods
Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationMultiscale Wavelets on Trees, Graphs and High Dimensional Data
Multiscale Wavelets on Trees, Graphs and High Dimensional Data ICML 2010, Haifa Matan Gavish (Weizmann/Stanford) Boaz Nadler (Weizmann) Ronald Coifman (Yale) Boaz Nadler Ronald Coifman Motto... the relationships
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationA Selective Review of Sufficient Dimension Reduction
A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationThe Nyström Extension and Spectral Methods in Learning
Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali
More informationSpectral Regularization
Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationSufficient Dimension Reduction using Support Vector Machine and it s variants
Sufficient Dimension Reduction using Support Vector Machine and it s variants Andreas Artemiou School of Mathematics, Cardiff University @AG DANK/BCS Meeting 2013 SDR PSVM Real Data Current Research and
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationNext is material on matrix rank. Please see the handout
B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationScalable machine learning for massive datasets: Fast summation algorithms
Scalable machine learning for massive datasets: Fast summation algorithms Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationBi-stochastic kernels via asymmetric affinity functions
Bi-stochastic kernels via asymmetric affinity functions Ronald R. Coifman, Matthew J. Hirn Yale University Department of Mathematics P.O. Box 208283 New Haven, Connecticut 06520-8283 USA ariv:1209.0237v4
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationLecture: Some Practical Considerations (3 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationEfficient Complex Output Prediction
Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationDIMENSION REDUCTION. min. j=1
DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationDiffusion Wavelets and Applications
Diffusion Wavelets and Applications J.C. Bremer, R.R. Coifman, P.W. Jones, S. Lafon, M. Mohlenkamp, MM, R. Schul, A.D. Szlam Demos, web pages and preprints available at: S.Lafon: www.math.yale.edu/~sl349
More informationLocal Learning Projections
Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com
More informationContribution from: Springer Verlag Berlin Heidelberg 2005 ISBN
Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationGeometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationVariable Selection and Dimension Reduction by Learning Gradients
Variable Selection and Dimension Reduction by Learning Gradients Qiang Wu and Sayan Mukherjee August 6, 2008 1 Introduction High dimension data analysis has become a challenging problem in modern sciences.
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationDescriptive Statistics
Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More information