Supervised Dimension Reduction:

Size: px
Start display at page:

Download "Supervised Dimension Reduction:"

Transcription

1 Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Department of Mathematics Duke University November 4, 2009

2 Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth 1884.

3 Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth X 1,...,X n drawn iid form a Gaussian can be reduced to µ,σ 2.

4 Supervised dimension reduction Regression Assume the model with X X R p and Y R. Y = f (X) + ε, IEε = 0,

5 Supervised dimension reduction Regression Assume the model with X X R p and Y R. Y = f (X) + ε, IEε = 0, Data D = {(x i,y i )} n iid i=1 ρ(x,y ).

6 Supervised dimension reduction Dimension reduction If the data lives in a p-dimensional space X IR p replace X with Θ(X) IR d, p d.

7 Supervised dimension reduction Dimension reduction If the data lives in a p-dimensional space X IR p replace X with Θ(X) IR d, p d. My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold.

8 Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1,...,Y n IR and explanatory variables or covariates X 1,...,X n X R p Y i = f (X i ) + ε i, ε i iid No(0,σ 2 ).

9 Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1,...,Y n IR and explanatory variables or covariates X 1,...,X n X R p Y i = f (X i ) + ε i, ε i iid No(0,σ 2 ). Is there a submanifold S S Y X such that Y X P S (X)?

10 Supervised dimension reduction Visualization of SDR 20 (a) Data 1 (b) Diffusion map z y x Dimension Dimension (c) GOP 1 (d) GDM Dimension Dimension Dimension Dimension 1

11 Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S (X) = B T X where B = (b 1,...,b d ).

12 Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S (X) = B T X where B = (b 1,...,b d ). Semiparametric model Y i = f (X i ) + ε i = g(b1 T X i,...,bd T X i) + ε i, span B is the dimension reduction (d.r.) subspace.

13 Learning gradients SDR model Semiparametric model Y i = f (X i ) + ε i = g(b T 1 X i,...,b T d X i) + ε i, span B is the dimension reduction (d.r.) subspace.

14 Learning gradients SDR model Semiparametric model Y i = f (X i ) + ε i = g(b T 1 X i,...,b T d X i) + ε i, span B is the dimension reduction (d.r.) subspace. Assume marginal distribution ρ X is concentrated on a manifold M IR p of dimension d p.

15 Learning gradients Gradients and outer products Given a smooth function f the gradient is ( ) f (x) = f (x) f (x) T. x 1,..., x p

16 Learning gradients Gradients and outer products Given a smooth function f the gradient is ( ) f (x) = f (x) f (x) T. x 1,..., x p Define the gradient outer product matrix Γ f Γ ij = (x) f (x)dρ x i x X (x), j X Γ = E[( f ) ( f )].

17 Learning gradients GOP captures the d.r. space Suppose y = f (X) + ε = g(b1 T X,...,bd T X) + ε.

18 Learning gradients GOP captures the d.r. space Suppose y = f (X) + ε = g(b1 T X,...,bd T X) + ε. Note that for B = (b 1,...,b d ) λ i b i = Γb i.

19 Learning gradients GOP captures the d.r. space Suppose y = f (X) + ε = g(b1 T X,...,bd T X) + ε. Note that for B = (b 1,...,b d ) λ i b i = Γb i. For i = 1,..,d f (x) v i = v T i ( f (x)) 0 b T i Γb i 0. If w b i for all i then w T Γw = 0.

20 Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid No(0,σ 2 ). Ω = cov (E[X Y ]), Σ X = cov (X), σ 2 = var (Y ). Y

21 Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid No(0,σ 2 ). Ω = cov (E[X Y ]), Σ X = cov (X), σ 2 Y Γ = σ 2 Y ( 1 σ2 σ 2 Y ) 2 Σ 1 X ΩΣ 1 X = var (Y ). σ 2 Σ 1ΩΣ 1. Y X X

22 Learning gradients Statistical interpretation For smooth f (x) y = f (x) + ε, ε iid No(0,σ 2 ). Ω = cov (E[X Y ]) not so clear.

23 Learning gradients Nonlinear case Partition into sections and compute local quantities X = I i=1 χ i

24 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ])

25 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ]) Σ i = cov (X χi )

26 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ]) Σ i = cov (X χi ) σi 2 = var (Y χi )

27 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ]) Σ i = cov (X χi ) σi 2 = var (Y χi ) m i = ρ X (χ i ).

28 Learning gradients Nonlinear case Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (E[X χi Y χi ]) Σ i = cov (X χi ) σi 2 = var (Y χi ) m i = ρ X (χ i ). Γ I i=1 m i σi 2 Σ 1 i Ω i Σ 1 i.

29 Learning gradients Estimating the gradient Taylor expansion y i f (x i ) f (x j ) + f (x j ),x j x i y j + f (x j ),x j x i if x i x j.

30 Learning gradients Estimating the gradient Taylor expansion y i f (x i ) f (x j ) + f (x j ),x j x i y j + f (x j ),x j x i if x i x j. Let f f the following should be small w ij (y i y j f (x j ),x j x i ) 2, i,j w ij = 1 s p+2 exp( x i x j 2 /2s 2 ) enforces x i x j.

31 Learning gradients Estimating the gradient The gradient estimate fd = arg min 1 n n 2 f H p i,j=1 w ij (y i y j ( ) 2 f (x j )) T (x j x i ) + λ f 2 K where f K is a smoothness penalty, reproducing kernel Hilbert space norm.

32 Learning gradients Estimating the gradient The gradient estimate fd = arg min 1 n n 2 f H p i,j=1 w ij (y i y j ( ) 2 f (x j )) T (x j x i ) + λ f 2 K where f K is a smoothness penalty, reproducing kernel Hilbert space norm.

33 Learning gradients Computational efficiency The computation requires fewer than n 2 parameters and is O(n 6 ) time and O(pn) memory fd (x) = n c i,d K(x i,x) i=1 c D = (c 1,D,...,c n,d ) T R np.

34 Learning gradients Computational efficiency The computation requires fewer than n 2 parameters and is O(n 6 ) time and O(pn) memory fd (x) = n c i,d K(x i,x) i=1 c D = (c 1,D,...,c n,d ) T R np. Define gram matrix K where K ij = K(x i,x j ) ˆΓ = c D Kc T D.

35 Learning gradients Estimates on manifolds Mrginal distribution ρ X is concentrated on a compact Riemannian manifold M IR d with isometric embedding ϕ : M R p and metric d M and dµ is the uniform measure on M. Assume regular distribution (i) The density ν(x) = dρ X (x) dµ exists and is Hölder continuous (c 1 > 0 and 0 < θ 1) ν(x) ν(u) c 1 dm θ (x, u) x, u M. (ii) The measure along the boundary is small: (c 2 > 0) ρ M ` x M : dm (x, M) t c 2 t t > 0.

36 Learning gradients Convergence to gradient on manifold Theorem Under above regularity conditions on ρ X and f C 2 (M), with probability 1 δ ( ) (dϕ) 1 ( ) fd M f 2 L C log n 1 2 d. ρ M δ where (dϕ) (projection onto tangent space) is the dual of the map dϕ.

37 Baysian Mixture of Inverses Principal components analysis (PCA) Algorithmic view of PCA: 1. Given X = (X 1,...,X n ) a p n matrix construct ˆΣ = (X X)(X X) T

38 Baysian Mixture of Inverses Principal components analysis (PCA) Algorithmic view of PCA: 1. Given X = (X 1,...,X n ) a p n matrix construct ˆΣ = (X X)(X X) T 2. Eigen-decomposition of ˆΣ λ i v i = ˆΣv i.

39 Baysian Mixture of Inverses Probabilistic PCA X IR p is charterized by a multivariate normal µ IR p A IR p d IR p p ν IR d. X No(µ + Aν, ), ν No(0,I d )

40 Baysian Mixture of Inverses Probabilistic PCA X IR p is charterized by a multivariate normal µ IR p A IR p d IR p p ν IR d. ν is a latent variable X No(µ + Aν, ), ν No(0,I d )

41 Baysian Mixture of Inverses SDR model Semiparametric model Y i = f (X i ) + ε i = g(b T 1 X i,...,b T d X i) + ε i, span B is the dimension reduction (d.r.) subspace.

42 Baysian Mixture of Inverses Principal fitted components (PFC) Define X y (X Y = y) and specify multivariate normal distribution µ IR p A IR p d ν y IR d. X y No(µ y, ), µ y = µ + Aν y

43 Baysian Mixture of Inverses Principal fitted components (PFC) Define X y (X Y = y) and specify multivariate normal distribution µ IR p A IR p d ν y IR d. B = 1 A. X y No(µ y, ), µ y = µ + Aν y

44 Baysian Mixture of Inverses Principal fitted components (PFC) Define X y (X Y = y) and specify multivariate normal distribution µ IR p A IR p d ν y IR d. B = 1 A. X y No(µ y, ), µ y = µ + Aν y Captures global linear predictive structure. Does not generalize to manifolds.

45 Baysian Mixture of Inverses Mixture models and localization A driving idea in manifold learning is that manifolds are locally Euclidean.

46 Baysian Mixture of Inverses Mixture models and localization A driving idea in manifold learning is that manifolds are locally Euclidean. A driving idea in probabilistic modeling is that mixture models are flexible and can capture nonparametric distributions.

47 Baysian Mixture of Inverses Mixture models and localization A driving idea in manifold learning is that manifolds are locally Euclidean. A driving idea in probabilistic modeling is that mixture models are flexible and can capture nonparametric distributions. Mixture models can capture local nonlinear predictive manifold structure.

48 Baysian Mixture of Inverses Model specification X y No(µ yx, ) µ yx = µ + Aν yx ν yx G y G y : density indexed by y having multiple clusters µ IR p ε N(0, ) with IR p p A IR p d ν xy IR d.

49 Baysian Mixture of Inverses Dimension reduction space Proposition For this model the d.r. space is the span of B = 1 A Y X d = Y ( 1 A) T X.

50 Baysian Mixture of Inverses Sampling distribution Define ν i ν yi x i. Sampling distribution for data x i (y i,µ,ν i,a, ) N(µ + Aν i, ) ν i G yi.

51 Baysian Mixture of Inverses Categorical response: modeling G y Y = {1,...,C}, so each category has a distribution ν i (y i = k) G k, c = 1,...,C.

52 Baysian Mixture of Inverses Categorical response: modeling G y Y = {1,...,C}, so each category has a distribution ν i (y i = k) G k, c = 1,...,C. ν i modeled as a mixture of C distributions G 1,...,G C with a Dirichlet process model for ech distribution G c DP(α 0,G 0 ).

53 Baysian Mixture of Inverses Categorical response: modeling G y Y = {1,...,C}, so each category has a distribution ν i (y i = k) G k, c = 1,...,C. ν i modeled as a mixture of C distributions G 1,...,G C with a Dirichlet process model for ech distribution G c DP(α 0,G 0 ). Goto board.

54 Baysian Mixture of Inverses Likelihood Lik(data θ) Lik(data A,,ν 1,...,ν n,µ) Lik(data θ) det( 1 ) n 2 [ ] exp 1 n (x i µ Aν i ) T 1 (x i µ Aν i ). 2 i=1

55 Baysian Mixture of Inverses Posterior inference Given data P θ Post(θ data) Lik(θ data) π(θ).

56 Baysian Mixture of Inverses Posterior inference Given data P θ Post(θ data) Lik(θ data) π(θ). 1. P θ provides estimate of (un)certainty on θ

57 Baysian Mixture of Inverses Posterior inference Given data P θ Post(θ data) Lik(θ data) π(θ). 1. P θ provides estimate of (un)certainty on θ 2. Requires prior on θ

58 Baysian Mixture of Inverses Posterior inference Given data P θ Post(θ data) Lik(θ data) π(θ). 1. P θ provides estimate of (un)certainty on θ 2. Requires prior on θ 3. Sample from P θ?

59 Baysian Mixture of Inverses Markov chain Monte Carlo No closed form for P θ.

60 Baysian Mixture of Inverses Markov chain Monte Carlo No closed form for P θ. 1. Specify Markov transition kernel K(θ t,θ t+1 ) with stationary distribution P θ.

61 Baysian Mixture of Inverses Markov chain Monte Carlo No closed form for P θ. 1. Specify Markov transition kernel K(θ t,θ t+1 ) with stationary distribution P θ. 2. Run the Markov chain to obtain θ 1,...,θ T.

62 Baysian Mixture of Inverses Sampling from the posterior Inference consists of drawing samples θ (t) = (µ (t),a (t), 1 (t),ν (t)) from the posterior.

63 Baysian Mixture of Inverses Sampling from the posterior Inference consists of drawing samples θ (t) = (µ (t),a (t), 1 (t),ν (t)) from the posterior. Define θ /µ (t) (A (t), 1 (t),ν (t)) θ /A (t) (µ (t), 1 (t),ν (t)) θ / 1 (t) (µ (t),a (t),ν (t) ) θ /ν (t) (µ (t),a (t), 1 (t) ).

64 Baysian Mixture of Inverses Gibbs sampling Conditional probabilities can be used to sample µ, 1,A ( ) ( ) µ (t+1) data,θ /µ (t) No data,θ /µ (t),

65 Baysian Mixture of Inverses Gibbs sampling Conditional probabilities can be used to sample µ, 1,A ( ) ( ) µ (t+1) data,θ /µ (t) No data,θ /µ (t), ) ) 1 (t+1) (data,θ / 1 (t) InvWishart (data,θ / 1 (t)

66 Baysian Mixture of Inverses Gibbs sampling Conditional probabilities can be used to sample µ, 1,A ( ) ( ) µ (t+1) data,θ /µ (t) No data,θ /µ (t), ) ) 1 (t+1) (data,θ / 1 (t) InvWishart (data,θ / 1 (t) ( ) ( ) A (t+1) data,θ /A (t) No data.θ /A (t).

67 Baysian Mixture of Inverses Gibbs sampling Conditional probabilities can be used to sample µ, 1,A ( ) ( ) µ (t+1) data,θ /µ (t) No data,θ /µ (t), ) ) 1 (t+1) (data,θ / 1 (t) InvWishart (data,θ / 1 (t) ( ) ( ) A (t+1) data,θ /A (t) No data.θ /A (t). Sampling ν (t) is more involved.

68 Baysian Mixture of Inverses Posterior draws from the Grassmann manifold Given samples ( 1 (t),a (t)) m t=1 compute B (t) = 1 (t) A (t).

69 Baysian Mixture of Inverses Posterior draws from the Grassmann manifold Given samples ( 1 (t),a (t)) m t=1 compute B (t) = 1 (t) A (t). Each B (t) is a subspace which is a point in the Grassmann manifold G (d,p). There is a Riemannian metric on this manifold. This has two implications.

70 Baysian Mixture of Inverses Posterior mean and variance Given draws (B (t) ) m t=1 the posterior mean and variance should be computed with respect to the Riemannian metric.

71 Baysian Mixture of Inverses Posterior mean and variance Given draws (B (t) ) m t=1 the posterior mean and variance should be computed with respect to the Riemannian metric. Given two subspaces W and U spanned by orthonormal bases W and V the Karcher mean is (I X(X T X) 1 X T )Y (X T Y ) 1 = UΣV T Θ = atan(σ) dist(w, V) = Tr(Θ 2 ).

72 Baysian Mixture of Inverses Posterior mean and variance The posterior mean subspace B Bayes = arg min B G (d,p) m dist(b i, B). i=1

73 Baysian Mixture of Inverses Posterior mean and variance The posterior mean subspace B Bayes = arg min B G (d,p) m dist(b i, B). i=1 Uncertainty var({b 1,, B m }) = 1 m m dist(b i, B Bayes ). i=1

74 Baysian Mixture of Inverses Distribution theory on Grassmann manifolds If B is a linear space of d central normal vectors in IR p with covariance matrix Σ the density of Grassmannian distribution G Σ w.r.t. reference measure G I is ( dg Σ det(x T ) d/2 X) ( X ) = dg I det(x T Σ 1, X) where X span(x) where X = (x 1,...x d ).

75 Results on data Swiss roll Swiss roll X 1 = t cos(t), X 2 = h, X 3 = t sin(t), X 4,...,10 iid No(0,1) where t = 3π 2 (1 + 2θ), θ Unif(0,1), h Unif(0,1) and Y = sin(5πθ) + h 2 + ε, ε No(0,0.01).

76 Results on data Swiss roll Pictures

77 Results on data Swiss roll Metric Projection of the estimated d.r. space ˆB = (ˆb 1,, ˆb d ) onto B 1 d d P Bˆbi 2 = 1 d i=1 d (BB T )ˆb i 2 i=1

78 Results on data Swiss roll Comparison of algorithms 1 Accuracy BMI BAGL SIR LSIR PHD SAVE Sample size

79 Results on data Swiss roll Posterior variance Boxplot for the Distances

80 Results on data Swiss roll Error as a function of d error 0.4 error number of e.d.r. directions number of e.d.r. directions

81 Results on data Digits Digits

82 Results on data Digits Two classification problems 3 vs. 8 and 5 vs. 8.

83 Results on data Digits Two classification problems 3 vs. 8 and 5 vs training samples from each class.

84 Results on data Digits BMI x x

85 Results on data Digits All ten digits digit Nonlinear Linear (± 0.01) 0.05 (± 0.01) (± 0.003) 0.03 (± 0.01) (± 0.02) 0.19 (± 0.02) (± 0.01) 0.17 (± 0.03) (± 0.02) 0.13 (± 0.03) (± 0.02) 0.21 (± 0.03) (± 0.01) (± 0.02) (± 0.01) 0.14 (± 0.02) (± 0.02) 0.20 (± 0.03) (± 0.02) 0.15 (± 0.02) average Table: Average classification error rate and standard deviation on the digits data.

86 Results on data Cancer Cancer classification n = 38 samples with expression levels for p = 7129 genes or ests 19 samples are Acute Myeloid Leukemia (AML) 19 are Acute Lymphoblastic Leukemia, these fall into two subclusters B-cell and T-cell.

87 Results on data Cancer Substructure captured 5 x x 10 4

88 The end Funding IGSP Center for Systems Biology at Duke NSF DMS

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Bayesian simultaneous regression and dimension reduction

Bayesian simultaneous regression and dimension reduction Bayesian simultaneous regression and dimension reduction MCMski II Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University January 10, 2008

More information

Supervised Dimension Reduction Using Bayesian Mixture Modeling

Supervised Dimension Reduction Using Bayesian Mixture Modeling Kai Mao Feng Liang Sayan Mukherjee Department of Statistics University of Illinois at Urbana-Champaign, IL 682 Department of Statistical Science Duke University, NC 2778 Departments of Staistical Science

More information

Nonparametric Bayesian Models for Supervised Dimension R

Nonparametric Bayesian Models for Supervised Dimension R Nonparametric Bayesian Models for Supervised Dimension Reduction Department of Statistical Science Duke University December 2, 2009 Nonparametric Bayesian Models for Supervised Dimension Reduction Department

More information

Localized Sliced Inverse Regression

Localized Sliced Inverse Regression Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Sayan Mukherjee. June 15, 2007

Sayan Mukherjee. June 15, 2007 Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University June 15, 2007 To Tommy Poggio This talk is dedicated to my advisor Tommy Poggio as

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Two models for Bayesian supervised dimension reduction

Two models for Bayesian supervised dimension reduction Two models for Bayesian supervised dimension reduction BY KAI MAO Department of Statistical Science Duke University, Durham NC 778-5, U.S.A. km68@stat.duke.edu QIANG WU Department of Mathematics Michigan

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Learning Gradients on Manifolds

Learning Gradients on Manifolds Learning Gradients on Manifolds Sayan Mukherjee, Qiang Wu and Ding-Xuan Zhou Duke University, Michigan State University, and City University of Hong Kong Sayan Mukherjee Department of Statistical Science

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Bayesian Registration of Functions with a Gaussian Process Prior

Bayesian Registration of Functions with a Gaussian Process Prior Bayesian Registration of Functions with a Gaussian Process Prior RELATED PROBLEMS Functions Curves Images Surfaces FUNCTIONAL DATA Electrocardiogram Signals (Heart Disease) Gait Pressure Measurements (Parkinson

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

A Selective Review of Sufficient Dimension Reduction

A Selective Review of Sufficient Dimension Reduction A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Point spread function reconstruction from the image of a sharp edge

Point spread function reconstruction from the image of a sharp edge DOE/NV/5946--49 Point spread function reconstruction from the image of a sharp edge John Bardsley, Kevin Joyce, Aaron Luttman The University of Montana National Security Technologies LLC Montana Uncertainty

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009 with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Short Questions (Do two out of three) 15 points each

Short Questions (Do two out of three) 15 points each Econometrics Short Questions Do two out of three) 5 points each ) Let y = Xβ + u and Z be a set of instruments for X When we estimate β with OLS we project y onto the space spanned by X along a path orthogonal

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information