Supervised Dimension Reduction:
|
|
- Brianne McKenzie
- 5 years ago
- Views:
Transcription
1 Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Department of Mathematics Duke University November 4, 2009
2 Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth 1884.
3 Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth X 1,...,X n drawn iid form a Gaussian can be reduced to µ,σ 2.
4 Supervised dimension reduction Regression Assume the model with X X R p and Y R. Y = f (X) + ε, IEε = 0,
5 Supervised dimension reduction Regression Assume the model with X X R p and Y R. Y = f (X) + ε, IEε = 0, Data D = {(x i,y i )} n iid i=1 ρ(x,y ).
6 Supervised dimension reduction Dimension reduction If the data lives in a p-dimensional space X IR p replace X with Θ(X) IR d, p d.
7 Supervised dimension reduction Dimension reduction If the data lives in a p-dimensional space X IR p replace X with Θ(X) IR d, p d. My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold.
8 Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1,...,Y n IR and explanatory variables or covariates X 1,...,X n X R p Y i = f (X i ) + ε i, ε i iid No(0,σ 2 ).
9 Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1,...,Y n IR and explanatory variables or covariates X 1,...,X n X R p Y i = f (X i ) + ε i, ε i iid No(0,σ 2 ). Is there a submanifold S S Y X such that Y X P S (X)?
10 Supervised dimension reduction Visualization of SDR 20 (a) Data 1 (b) Diffusion map z y x Dimension Dimension (c) GOP 1 (d) GDM Dimension Dimension Dimension Dimension 1
11 Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S (X) = B T X where B = (b 1,...,b d ).
12 Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S (X) = B T X where B = (b 1,...,b d ). Semiparametric model Y i = f (X i ) + ε i = g(b1 T X i,...,bd T X i) + ε i, span B is the dimension reduction (d.r.) subspace.
13 Learning gradients SDR model Semiparametric model Y i = f (X i ) + ε i = g(b T 1 X i,...,b T d X i) + ε i, span B is the dimension reduction (d.r.) subspace.
14 Learning gradients SDR model Semiparametric model Y i = f (X i ) + ε i = g(b T 1 X i,...,b T d X i) + ε i, span B is the dimension reduction (d.r.) subspace. Assume marginal distribution ρ X is concentrated on a manifold M IR p of dimension d p.
15 Learning gradients Gradients and outer products Given a smooth function f the gradient is ( ) f (x) = f (x) f (x) T. x 1,..., x p
16 Learning gradients Gradients and outer products Given a smooth function f the gradient is ( ) f (x) = f (x) f (x) T. x 1,..., x p Define the gradient outer product matrix Γ f Γ ij = (x) f (x)dρ x i x X (x), j X Γ = E[( f ) ( f )].
17 Learning gradients GOP captures the d.r. space Suppose y = f (X) + ε = g(b1 T X,...,bd T X) + ε.
18 Learning gradients GOP captures the d.r. space Suppose y = f (X) + ε = g(b1 T X,...,bd T X) + ε. Note that for B = (b 1,...,b d ) λ i b i = Γb i.
19 Learning gradients GOP captures the d.r. space Suppose y = f (X) + ε = g(b1 T X,...,bd T X) + ε. Note that for B = (b 1,...,b d ) λ i b i = Γb i. For i = 1,..,d f (x) v i = v T i ( f (x)) 0 b T i Γb i 0. If w b i for all i then w T Γw = 0.
20 Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid No(0,σ 2 ). Ω = cov (E[X Y ]), Σ X = cov (X), σ 2 = var (Y ). Y
21 Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid No(0,σ 2 ). Ω = cov (E[X Y ]), Σ X = cov (X), σ 2 Y Γ = σ 2 Y ( 1 σ2 σ 2 Y ) 2 Σ 1 X ΩΣ 1 X = var (Y ). σ 2 Σ 1ΩΣ 1. Y X X
22 Learning gradients Statistical interpretation For smooth f (x) y = f (x) + ε, ε iid No(0,σ 2 ). Ω = cov (E[X Y ]) not so clear.
23 Learning gradients Nonlinear case Partition into sections and compute local quantities X = I i=1 χ i
24 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ])
25 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ]) Σ i = cov (X χi )
26 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ]) Σ i = cov (X χi ) σi 2 = var (Y χi )
27 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ]) Σ i = cov (X χi ) σi 2 = var (Y χi ) m i = ρ X (χ i ).
28 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ]) Σ i = cov (X χi ) σi 2 = var (Y χi ) m i = ρ X (χ i ). Γ I i=1 m i σi 2 Σ 1 i Ω i Σ 1 i.
29 Learning gradients Estimating the gradient Taylor expansion y i f (x i ) f (x j ) + f (x j ),x j x i y j + f (x j ),x j x i if x i x j.
30 Learning gradients Estimating the gradient Taylor expansion y i f (x i ) f (x j ) + f (x j ),x j x i y j + f (x j ),x j x i if x i x j. Let f f the following should be small w ij (y i y j f (x j ),x j x i ) 2, i,j w ij = 1 s p+2 exp( x i x j 2 /2s 2 ) enforces x i x j.
31 Learning gradients Estimating the gradient The gradient estimate fd = arg min 1 n n 2 f H p i,j=1 w ij (y i y j ( ) 2 f (x j )) T (x j x i ) + λ f 2 K where f K is a smoothness penalty, reproducing kernel Hilbert space norm.
32 Learning gradients Estimating the gradient The gradient estimate fd = arg min 1 n n 2 f H p i,j=1 w ij (y i y j ( ) 2 f (x j )) T (x j x i ) + λ f 2 K where f K is a smoothness penalty, reproducing kernel Hilbert space norm.
33 Learning gradients Computational efficiency The computation requires fewer than n 2 parameters and is O(n 6 ) time and O(pn) memory fd (x) = n c i,d K(x i,x) i=1 c D = (c 1,D,...,c n,d ) T R np.
34 Learning gradients Computational efficiency The computation requires fewer than n 2 parameters and is O(n 6 ) time and O(pn) memory fd (x) = n c i,d K(x i,x) i=1 c D = (c 1,D,...,c n,d ) T R np. Define gram matrix K where K ij = K(x i,x j ) ˆΓ = c D Kc T D.
35 Learning gradients Estimates on manifolds Mrginal distribution ρ X is concentrated on a compact Riemannian manifold M IR d with isometric embedding ϕ : M R p and metric d M and dµ is the uniform measure on M. Assume regular distribution (i) The density ν(x) = dρ X (x) dµ exists and is Hölder continuous (c 1 > 0 and 0 < θ 1) ν(x) ν(u) c 1 dm θ (x, u) x, u M. (ii) The measure along the boundary is small: (c 2 > 0) ρ M ` x M : dm (x, M) t c 2 t t > 0.
36 Learning gradients Convergence to gradient on manifold Theorem Under above regularity conditions on ρ X and f C 2 (M), with probability 1 δ ( ) (dϕ) 1 ( ) fd M f 2 L C log n 1 2 d. ρ M δ where (dϕ) (projection onto tangent space) is the dual of the map dϕ.
37 Baysian Mixture of Inverses Principal components analysis (PCA) Algorithmic view of PCA: 1. Given X = (X 1,...,X n ) a p n matrix construct ˆΣ = (X X)(X X) T
38 Baysian Mixture of Inverses Principal components analysis (PCA) Algorithmic view of PCA: 1. Given X = (X 1,...,X n ) a p n matrix construct ˆΣ = (X X)(X X) T 2. Eigen-decomposition of ˆΣ λ i v i = ˆΣv i.
39 Baysian Mixture of Inverses Probabilistic PCA X IR p is charterized by a multivariate normal µ IR p A IR p d IR p p ν IR d. X No(µ + Aν, ), ν No(0,I d )
40 Baysian Mixture of Inverses Probabilistic PCA X IR p is charterized by a multivariate normal µ IR p A IR p d IR p p ν IR d. ν is a latent variable X No(µ + Aν, ), ν No(0,I d )
41 Baysian Mixture of Inverses SDR model Semiparametric model Y i = f (X i ) + ε i = g(b T 1 X i,...,b T d X i) + ε i, span B is the dimension reduction (d.r.) subspace.
42 Baysian Mixture of Inverses Principal fitted components (PFC) Define X y (X Y = y) and specify multivariate normal distribution µ IR p A IR p d ν y IR d. X y No(µ y, ), µ y = µ + Aν y
43 Baysian Mixture of Inverses Principal fitted components (PFC) Define X y (X Y = y) and specify multivariate normal distribution µ IR p A IR p d ν y IR d. B = 1 A. X y No(µ y, ), µ y = µ + Aν y
44 Baysian Mixture of Inverses Principal fitted components (PFC) Define X y (X Y = y) and specify multivariate normal distribution µ IR p A IR p d ν y IR d. B = 1 A. X y No(µ y, ), µ y = µ + Aν y Captures global linear predictive structure. Does not generalize to manifolds.
45 Baysian Mixture of Inverses Mixture models and localization A driving idea in manifold learning is that manifolds are locally Euclidean.
46 Baysian Mixture of Inverses Mixture models and localization A driving idea in manifold learning is that manifolds are locally Euclidean. A driving idea in probabilistic modeling is that mixture models are flexible and can capture nonparametric distributions.
47 Baysian Mixture of Inverses Mixture models and localization A driving idea in manifold learning is that manifolds are locally Euclidean. A driving idea in probabilistic modeling is that mixture models are flexible and can capture nonparametric distributions. Mixture models can capture local nonlinear predictive manifold structure.
48 Baysian Mixture of Inverses Model specification X y No(µ yx, ) µ yx = µ + Aν yx ν yx G y G y : density indexed by y having multiple clusters µ IR p ε N(0, ) with IR p p A IR p d ν xy IR d.
49 Baysian Mixture of Inverses Dimension reduction space Proposition For this model the d.r. space is the span of B = 1 A Y X d = Y ( 1 A) T X.
50 Baysian Mixture of Inverses Sampling distribution Define ν i ν yi x i. Sampling distribution for data x i (y i,µ,ν i,a, ) N(µ + Aν i, ) ν i G yi.
51 Baysian Mixture of Inverses Categorical response: modeling G y Y = {1,...,C}, so each category has a distribution ν i (y i = k) G k, c = 1,...,C.
52 Baysian Mixture of Inverses Categorical response: modeling G y Y = {1,...,C}, so each category has a distribution ν i (y i = k) G k, c = 1,...,C. ν i modeled as a mixture of C distributions G 1,...,G C with a Dirichlet process model for ech distribution G c DP(α 0,G 0 ).
53 Baysian Mixture of Inverses Categorical response: modeling G y Y = {1,...,C}, so each category has a distribution ν i (y i = k) G k, c = 1,...,C. ν i modeled as a mixture of C distributions G 1,...,G C with a Dirichlet process model for ech distribution G c DP(α 0,G 0 ). Goto board.
54 Baysian Mixture of Inverses Likelihood Lik(data θ) Lik(data A,,ν 1,...,ν n,µ) Lik(data θ) det( 1 ) n 2 [ ] exp 1 n (x i µ Aν i ) T 1 (x i µ Aν i ). 2 i=1
55 Baysian Mixture of Inverses Posterior inference Given data P θ Post(θ data) Lik(θ data) π(θ).
56 Baysian Mixture of Inverses Posterior inference Given data P θ Post(θ data) Lik(θ data) π(θ). 1. P θ provides estimate of (un)certainty on θ
57 Baysian Mixture of Inverses Posterior inference Given data P θ Post(θ data) Lik(θ data) π(θ). 1. P θ provides estimate of (un)certainty on θ 2. Requires prior on θ
58 Baysian Mixture of Inverses Posterior inference Given data P θ Post(θ data) Lik(θ data) π(θ). 1. P θ provides estimate of (un)certainty on θ 2. Requires prior on θ 3. Sample from P θ?
59 Baysian Mixture of Inverses Markov chain Monte Carlo No closed form for P θ.
60 Baysian Mixture of Inverses Markov chain Monte Carlo No closed form for P θ. 1. Specify Markov transition kernel K(θ t,θ t+1 ) with stationary distribution P θ.
61 Baysian Mixture of Inverses Markov chain Monte Carlo No closed form for P θ. 1. Specify Markov transition kernel K(θ t,θ t+1 ) with stationary distribution P θ. 2. Run the Markov chain to obtain θ 1,...,θ T.
62 Baysian Mixture of Inverses Sampling from the posterior Inference consists of drawing samples θ (t) = (µ (t),a (t), 1 (t),ν (t)) from the posterior.
63 Baysian Mixture of Inverses Sampling from the posterior Inference consists of drawing samples θ (t) = (µ (t),a (t), 1 (t),ν (t)) from the posterior. Define θ /µ (t) (A (t), 1 (t),ν (t)) θ /A (t) (µ (t), 1 (t),ν (t)) θ / 1 (t) (µ (t),a (t),ν (t) ) θ /ν (t) (µ (t),a (t), 1 (t) ).
64 Baysian Mixture of Inverses Gibbs sampling Conditional probabilities can be used to sample µ, 1,A ( ) ( ) µ (t+1) data,θ /µ (t) No data,θ /µ (t),
65 Baysian Mixture of Inverses Gibbs sampling Conditional probabilities can be used to sample µ, 1,A ( ) ( ) µ (t+1) data,θ /µ (t) No data,θ /µ (t), ) ) 1 (t+1) (data,θ / 1 (t) InvWishart (data,θ / 1 (t)
66 Baysian Mixture of Inverses Gibbs sampling Conditional probabilities can be used to sample µ, 1,A ( ) ( ) µ (t+1) data,θ /µ (t) No data,θ /µ (t), ) ) 1 (t+1) (data,θ / 1 (t) InvWishart (data,θ / 1 (t) ( ) ( ) A (t+1) data,θ /A (t) No data.θ /A (t).
67 Baysian Mixture of Inverses Gibbs sampling Conditional probabilities can be used to sample µ, 1,A ( ) ( ) µ (t+1) data,θ /µ (t) No data,θ /µ (t), ) ) 1 (t+1) (data,θ / 1 (t) InvWishart (data,θ / 1 (t) ( ) ( ) A (t+1) data,θ /A (t) No data.θ /A (t). Sampling ν (t) is more involved.
68 Baysian Mixture of Inverses Posterior draws from the Grassmann manifold Given samples ( 1 (t),a (t)) m t=1 compute B (t) = 1 (t) A (t).
69 Baysian Mixture of Inverses Posterior draws from the Grassmann manifold Given samples ( 1 (t),a (t)) m t=1 compute B (t) = 1 (t) A (t). Each B (t) is a subspace which is a point in the Grassmann manifold G (d,p). There is a Riemannian metric on this manifold. This has two implications.
70 Baysian Mixture of Inverses Posterior mean and variance Given draws (B (t) ) m t=1 the posterior mean and variance should be computed with respect to the Riemannian metric.
71 Baysian Mixture of Inverses Posterior mean and variance Given draws (B (t) ) m t=1 the posterior mean and variance should be computed with respect to the Riemannian metric. Given two subspaces W and U spanned by orthonormal bases W and V the Karcher mean is (I X(X T X) 1 X T )Y (X T Y ) 1 = UΣV T Θ = atan(σ) dist(w, V) = Tr(Θ 2 ).
72 Baysian Mixture of Inverses Posterior mean and variance The posterior mean subspace B Bayes = arg min B G (d,p) m dist(b i, B). i=1
73 Baysian Mixture of Inverses Posterior mean and variance The posterior mean subspace B Bayes = arg min B G (d,p) m dist(b i, B). i=1 Uncertainty var({b 1,, B m }) = 1 m m dist(b i, B Bayes ). i=1
74 Baysian Mixture of Inverses Distribution theory on Grassmann manifolds If B is a linear space of d central normal vectors in IR p with covariance matrix Σ the density of Grassmannian distribution G Σ w.r.t. reference measure G I is ( dg Σ det(x T ) d/2 X) ( X ) = dg I det(x T Σ 1, X) where X span(x) where X = (x 1,...x d ).
75 Results on data Swiss roll Swiss roll X 1 = t cos(t), X 2 = h, X 3 = t sin(t), X 4,...,10 iid No(0,1) where t = 3π 2 (1 + 2θ), θ Unif(0,1), h Unif(0,1) and Y = sin(5πθ) + h 2 + ε, ε No(0,0.01).
76 Results on data Swiss roll Pictures
77 Results on data Swiss roll Metric Projection of the estimated d.r. space ˆB = (ˆb 1,, ˆb d ) onto B 1 d d P Bˆbi 2 = 1 d i=1 d (BB T )ˆb i 2 i=1
78 Results on data Swiss roll Comparison of algorithms 1 Accuracy BMI BAGL SIR LSIR PHD SAVE Sample size
79 Results on data Swiss roll Posterior variance Boxplot for the Distances
80 Results on data Swiss roll Error as a function of d error 0.4 error number of e.d.r. directions number of e.d.r. directions
81 Results on data Digits Digits
82 Results on data Digits Two classification problems 3 vs. 8 and 5 vs. 8.
83 Results on data Digits Two classification problems 3 vs. 8 and 5 vs training samples from each class.
84 Results on data Digits BMI x x
85 Results on data Digits All ten digits digit Nonlinear Linear (± 0.01) 0.05 (± 0.01) (± 0.003) 0.03 (± 0.01) (± 0.02) 0.19 (± 0.02) (± 0.01) 0.17 (± 0.03) (± 0.02) 0.13 (± 0.03) (± 0.02) 0.21 (± 0.03) (± 0.01) (± 0.02) (± 0.01) 0.14 (± 0.02) (± 0.02) 0.20 (± 0.03) (± 0.02) 0.15 (± 0.02) average Table: Average classification error rate and standard deviation on the digits data.
86 Results on data Cancer Cancer classification n = 38 samples with expression levels for p = 7129 genes or ests 19 samples are Acute Myeloid Leukemia (AML) 19 are Acute Lymphoblastic Leukemia, these fall into two subclusters B-cell and T-cell.
87 Results on data Cancer Substructure captured 5 x x 10 4
88 The end Funding IGSP Center for Systems Biology at Duke NSF DMS
Learning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationBayesian simultaneous regression and dimension reduction
Bayesian simultaneous regression and dimension reduction MCMski II Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University January 10, 2008
More informationSupervised Dimension Reduction Using Bayesian Mixture Modeling
Kai Mao Feng Liang Sayan Mukherjee Department of Statistics University of Illinois at Urbana-Champaign, IL 682 Department of Statistical Science Duke University, NC 2778 Departments of Staistical Science
More informationNonparametric Bayesian Models for Supervised Dimension R
Nonparametric Bayesian Models for Supervised Dimension Reduction Department of Statistical Science Duke University December 2, 2009 Nonparametric Bayesian Models for Supervised Dimension Reduction Department
More informationLocalized Sliced Inverse Regression
Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSayan Mukherjee. June 15, 2007
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University June 15, 2007 To Tommy Poggio This talk is dedicated to my advisor Tommy Poggio as
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationNonparametric Bayes Density Estimation and Regression with High Dimensional Data
Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationTwo models for Bayesian supervised dimension reduction
Two models for Bayesian supervised dimension reduction BY KAI MAO Department of Statistical Science Duke University, Durham NC 778-5, U.S.A. km68@stat.duke.edu QIANG WU Department of Mathematics Michigan
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationLearning Gradients on Manifolds
Learning Gradients on Manifolds Sayan Mukherjee, Qiang Wu and Ding-Xuan Zhou Duke University, Michigan State University, and City University of Hong Kong Sayan Mukherjee Department of Statistical Science
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationBayesian Registration of Functions with a Gaussian Process Prior
Bayesian Registration of Functions with a Gaussian Process Prior RELATED PROBLEMS Functions Curves Images Surfaces FUNCTIONAL DATA Electrocardiogram Signals (Heart Disease) Gait Pressure Measurements (Parkinson
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationNONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES
NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationNonparametric Bayes Inference on Manifolds with Applications
Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationSTATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS
STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationA Selective Review of Sufficient Dimension Reduction
A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationPoint spread function reconstruction from the image of a sharp edge
DOE/NV/5946--49 Point spread function reconstruction from the image of a sharp edge John Bardsley, Kevin Joyce, Aaron Luttman The University of Montana National Security Technologies LLC Montana Uncertainty
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationLikelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009
with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationShort Questions (Do two out of three) 15 points each
Econometrics Short Questions Do two out of three) 5 points each ) Let y = Xβ + u and Z be a set of instruments for X When we estimate β with OLS we project y onto the space spanned by X along a path orthogonal
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationThe Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model
Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationVariable Selection in Structured High-dimensional Covariate Spaces
Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More information