Bayesian simultaneous regression and dimension reduction

Size: px
Start display at page:

Download "Bayesian simultaneous regression and dimension reduction"

Transcription

1 Bayesian simultaneous regression and dimension reduction MCMski II Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University January 10, 2008

2 Table of contents 1 Statistical principles 2 Simulated data Digits 3 Pathways and gene sets Progression in prostate cancer 4

3 Motivation and related work Data generated by measuring thousands of variables lies on or near a low-dimensional manifold or strong dependencies between variables.

4 Motivation and related work Data generated by measuring thousands of variables lies on or near a low-dimensional manifold or strong dependencies between variables. Manifold learning: LLE, ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps.

5 Motivation and related work Data generated by measuring thousands of variables lies on or near a low-dimensional manifold or strong dependencies between variables. Manifold learning: LLE, ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps. Simultaneous dimensionality reduction and regression: SIR, MAVE, SAVE.

6 Generative vs. predictive modelling Given data = {Z i = (x i, y i )} n i=1 with Z i iid ρ(x, Y ). X X IR p and Y IR and p n.

7 Generative vs. predictive modelling Given data = {Z i = (x i, y i )} n i=1 with Z i iid ρ(x, Y ). X X IR p and Y IR and p n. Two options 1 discriminative or regression Y X 2 generative X Y (sometimes called inverse regression)

8 Regression Statistical principles Given X X IR p and Y IR and p n and ρ(x, Y ) we want Y X. A natural idea f r (x) = arg min[var (f )] = arg min E Y (Y f (X )) 2, and f r (x) = E Y [Y x] provides a summary of Y X.

9 Inverse regression Statistical principles Given X X IR p and Y IR and p n and ρ(x, Y ) we want X Y. Ω = cov (X Y ) provides a summary of X Y.

10 Inverse regression Statistical principles Given X X IR p and Y IR and p n and ρ(x, Y ) we want X Y. Ω = cov (X Y ) provides a summary of X Y. 1 Ω ii relevance of variable with respect to label 2 Ω ij covariation with respect to label

11 Statistical principles Model simultaneously f r (x) and f r = ( f r x 1,..., fr x p ) T.

12 Statistical principles Model simultaneously f r (x) and f r = ( f r x 1,..., fr x p ) T. 1 regression: f r (x) 2 inverse regression: gradient outer product (GOP) Γ = E[ f r f r ] or fr Γ ij = x i, f r x j.

13 Linear case Statistical principles We start with the linear case Σ X = cov (X ), σ 2 Y Γ = σ 2 Y y = w x + ε, ε iid No(0, σ 2 ). = var (Y ). ( 1 σ2 σ 2 Y ) 2 Σ 1 X ΩΣ 1 X σ 2 Σ 1ΩΣ 1. Y X X Γ and Ω are equivalent modulo rotation and scale.

14 Nonlinear case Statistical principles For smooth f (x) Ω = cov (X Y ) not so clear. y = f (x) + ε, ε iid No(0, σ 2 ).

15 Nonlinear case Statistical principles Partition into sections and compute local quantities X = I i=1 χ i

16 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi )

17 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi )

18 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi ) σi 2 = var (Y χi )

19 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi ) σi 2 = var (Y χi ) m i = ρ X (χ i ).

20 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi ) σi 2 = var (Y χi ) m i = ρ X (χ i ). Γ I i=1 m i σi 2 Σ 1 i Ω i Σ 1 i.

21 Gradient estimate for regression Taylor expanding f (x) around data should result in (f (x j ) f (x i ) + f (x i ) (x j x i )) 2 0 for x i x j.

22 Gradient estimate for regression Taylor expanding f (x) around data should result in (f (x j ) f (x i ) + f (x i ) (x j x i )) 2 0 for x i x j. L(f, f, data) = ij w ij (y j f (x i ) + f (x i ) (x j x i )) 2.

23 Gradient estimate for regression Taylor expanding f (x) around data should result in (f (x j ) f (x i ) + f (x i ) (x j x i )) 2 0 for x i x j. L(f, f, data) = ij w ij (y j f (x i ) + f (x i ) (x j x i )) 2. Similar idea for classification, link function.

24 Gradient estimate Statistical principles Optimization Problem { } (f D, f D ) = arg min (f, f ) H p+1 K L(f, f, data) + λ 1 f 2 K + λ 2 f 2 K f is vector of gradients λ 1, λ 2 are regularization terms L( ) is empirical error using convex loss function

25 Gradient estimate Statistical principles Optimization Problem { } (f D, f D ) = arg min (f, f ) H p+1 K L(f, f, data) + λ 1 f 2 K + λ 2 f 2 K f is vector of gradients λ 1, λ 2 are regularization terms L( ) is empirical error using convex loss function Representation form n f D (x) = a i,d K(x i, x), fd (x) = i=1 n c i,d K(x i, x) with a D = (a 1,D,..., a n,d ) R n and c D = (c 1,D,..., c n,d ) T R np. i=1

26 Gradient Outer Product (GOP) A central quantity in this talk will be the GOP. Definition (GOP) ˆΓ = f D f D = c T D Kc D E( f f )

27 Dimension reduction Statistical principles Proposition The eigenvectors corresponding to the d non-zero eigenvalues of Γ span the subspace relevant to prediction. Gradients provide information on the predictive directions b i, i = 1,..., d.

28 Gaussian Markov distributions over graphs Give a multivariate normal distribution with covariance matrix Σ the matrix P = Σ 1 is the conditional independence matrix P ij = dependence of i j all other variables.

29 Gaussian Markov distributions over graphs Give a multivariate normal distribution with covariance matrix Σ the matrix P = Σ 1 is the conditional independence matrix P ij = dependence of i j all other variables. Note by construction ˆΓ is a covariance matrix of a Gaussian process.

30 Gaussian Markov distributions over graphs Give a multivariate normal distribution with covariance matrix Σ the matrix P = Σ 1 is the conditional independence matrix P ij = dependence of i j all other variables. Note by construction ˆΓ is a covariance matrix of a Gaussian process. J = inv(ˆγ) is the inferred conditional independence matrix.

31 Restriction to a manifold Assume the data is concentrated on a manifold M IR p with M IR d and there exists an isometric embedding ϕ : M R p.

32 Restriction to a manifold Assume the data is concentrated on a manifold M IR p with M IR d and there exists an isometric embedding ϕ : M R p. Theorem Under mild regularity conditions on the distribution and corresponding density, with probability 1 δ ( ) (dϕ) 1 fd M f ρx C log n 1/d f D f ρx C log where (dϕ) is the dual of the map dϕ. δ ( 1 δ ) n 1/d,

33 Bayesian kernel model for regression y i = f (x i ) + ε, ε iid No(0, σ 2 ). f (x) = K(x, u)z(du) where Z(du) M(X ) is a signed measure on X. X

34 Bayesian kernel model for regression y i = f (x i ) + ε, ε iid No(0, σ 2 ). f (x) = K(x, u)z(du) where Z(du) M(X ) is a signed measure on X. X this implies a posterior on f. π(z data) L(data Z) π(z),

35 Priors and integral operators Integral operator L K : Γ G { G = f f (x) := L K [γ](x) = K(x, u) dγ(u), X } γ Γ, with Γ B(X ).

36 Priors and integral operators Integral operator L K : Γ G { G = f f (x) := L K [γ](x) = K(x, u) dγ(u), X } γ Γ, with Γ B(X ). A prior on Γ implies a prior on G.

37 Equivalence with RKHS For what Γ is H K = span(g)? What is L 1(H K K ) =??. This is hard to characterize.

38 Equivalence with RKHS For what Γ is H K = span(g)? What is L 1(H K K ) =??. This is hard to characterize. An appropriate choice for Γ is the union of integrable functions and discrete measures.

39 Signed measures are (almost) just right Nonsingular measures: M = L 1 (X ) M D Proposition L K (M) is dense in H K with respect to the RKHS norm.

40 Signed measures are (almost) just right Nonsingular measures: M = L 1 (X ) M D Proposition L K (M) is dense in H K with respect to the RKHS norm. Proposition B(X ) L 1 K (H K (X )).

41 The implication Statistical principles Take home message need priors on signed measures. A function theoretic foundation for random signed measures such as Gaussian, Dirichlet and Lévy process priors.

42 Bayesian kernel model y i = f (x i ) + ε, ε iid No(0, σ 2 ). f (x) = K(x, u)z(du) where Z(du) M(X ) is a signed measure on X. X

43 Bayesian kernel model y i = f (x i ) + ε, ε iid No(0, σ 2 ). f (x) = K(x, u)z(du) where Z(du) M(X ) is a signed measure on X. X this implies a posterior on f. π(z data) L(data Z) π(z),

44 Dirichlet process prior f (x) = X K(x, u)z(du) = X K(x, u)w(u)f (du) F (du) is a distribution and w(u) a coefficient function.

45 Dirichlet process prior f (x) = X K(x, u)z(du) = X K(x, u)w(u)f (du) F (du) is a distribution and w(u) a coefficient function. Model F using a Dirichlet process prior: DP(α, F 0 )

46 Bayesian representer form Given X n = (x 1,..., x n ) iid F n F X n DP(α + n, F n ), F n = (αf 0 + δ xi )/(α + n). i=1 E[f X n ] = a n n K(x, u) w(u) df 0 (u)+n 1 (1 a n ) w(x i ) K(x, x i ), i=1 a n = α/(α + n).

47 Bayesian representer form Taking lim α 0 to represent a non-informative prior: Proposition (Bayesian representer form) n ˆf n (x) = w i K(x, x i ), w i = w(x i )/n. i=1

48 Bayesian kernel model for gradient estimates By Taylor expansion y i = f (x j ) + f (x i ) (x i x j ) + ε xi,x j.

49 Bayesian kernel model for gradient estimates By Taylor expansion y i = f (x j ) + f (x i ) (x i x j ) + ε xi,x j. By representer form y i = α 0 + Kα + (ιx i X )CK i + ε i where ι = (1,..., 1), α = (α 1,...α n ) IR n, C = (c 1,...c n ) IR p n, X is the n p data matrix, K i is the ith column of K.

50 Likelihood: error term and spatial statistics Intuition: Consider a spatial model (similarity matrix) w ij := θ exp{ φ x i x j }, where θ and φ are parameters of a spatial model.

51 Likelihood: error term and spatial statistics Intuition: Consider a spatial model (similarity matrix) w ij := θ exp{ φ x i x j }, where θ and φ are parameters of a spatial model. A natural modeling assumption is ε xi,x j w 1 ij.

52 Likelihood: error term and spatial statistics Intuition: Consider a spatial model (similarity matrix) w ij := θ exp{ φ x i x j }, where θ and φ are parameters of a spatial model. A natural modeling assumption is ε xi,x j w 1 ij. Given this spatial structure ε i No n (0, W 1 i ) where W i = diag(w xi,x 1,..., w xi,x n ).

53 Likelihood Statistical principles Given the error model the likelihood is L(data f, f ) w ij exp { 1 (e i 2 W ie i ) } ij i

54 Likelihood Statistical principles Given the error model the likelihood is L(data f, f ) w ij exp { 1 (e i 2 W ie i ) } with K = F F := diag(λ 2 1,..., λ2 n ) α = F 1 β e i = y i α 0 F β (ιx i X )CK i ij i

55 Prior specification Statistical principles π(α 0, θ) 1/θ,

56 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T )

57 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2),

58 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k )

59 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k ) φ k Ga(α c /2, β c /2)

60 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k ) φ k Ga(α c /2, β c /2) π k Beta(α π, β π ),

61 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k ) φ k Ga(α c /2, β c /2) π k Beta(α π, β π ), φ Ga(a φ /2, b φ /2)

62 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k ) φ k Ga(α c /2, β c /2) π k Beta(α π, β π ), φ Ga(a φ /2, b φ /2) Standard Gibbs sampler simulates p(α, α 0, C, φ, θ data).

63 Linear example Statistical principles Simulated data Digits Dimensions RKHS norm samples Dimensions

64 Linear example Statistical principles Simulated data Digits x Dimensions Dimensions

65 Nonlinear example Statistical principles Simulated data Digits Dimension Feature Dimension Feature 1

66 Digit classification Statistical principles Simulated data Digits Input MNIST handwritten digits database: X i R 784 : 28 by 28 gray-scale pixel image

67 Digit classification Statistical principles Simulated data Digits Input MNIST handwritten digits database: X i R 784 : 28 by 28 gray-scale pixel image Formulation Problem 1: 3 vs 8 with 50 3 s, 50 8 s Problem 2: 5 vs 8 with 50 5 s, 50 8 s

68 Digit classification Statistical principles Simulated data Digits Input MNIST handwritten digits database: X i R 784 : 28 by 28 gray-scale pixel image Formulation Problem 1: 3 vs 8 with 50 3 s, 50 8 s Problem 2: 5 vs 8 with 50 5 s, 50 8 s Goal Learn features for predictive model: 3 vs 8 5 vs 8

69 3, 5, 8 Classification problem Simulated data Digits

70 Top features: 3 vs 8 Statistical principles Simulated data Digits

71 Top features: 5 vs 8 Statistical principles Simulated data Digits

72 Genes don t do things Pathways and gene sets Progression in prostate cancer

73 Diabetes Oxphos Statistical principles Pathways and gene sets Progression in prostate cancer

74 Gender Statistical principles Pathways and gene sets Progression in prostate cancer

75 Gene set database Statistical principles Pathways and gene sets Progression in prostate cancer The gene sets in the database are defined by 1 Positional gene sets: cytogenetic bands, 3 megabase windows;

76 Gene set database Statistical principles Pathways and gene sets Progression in prostate cancer The gene sets in the database are defined by 1 Positional gene sets: cytogenetic bands, 3 megabase windows; 2 Motif gene sets: TRANSFAC motifs, Representative motifs;

77 Gene set database Statistical principles Pathways and gene sets Progression in prostate cancer The gene sets in the database are defined by 1 Positional gene sets: cytogenetic bands, 3 megabase windows; 2 Motif gene sets: TRANSFAC motifs, Representative motifs; 3 Curated gene sets: Pathways, Literature reviews, Animal models, Clinical phenotypes, Expert curations, Chemical or genetic perturbations.

78 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer.

79 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}.

80 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}. 523 pathway defined gene sets.

81 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}. 523 pathway defined gene sets. 1 Which pathways are involved in all or some stages of progression?

82 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}. 523 pathway defined gene sets. 1 Which pathways are involved in all or some stages of progression? 2 What are the pathway dependencies (inferring pathway networks)?

83 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}. 523 pathway defined gene sets. 1 Which pathways are involved in all or some stages of progression? 2 What are the pathway dependencies (inferring pathway networks)? 3 For each relevant pathway infer gene network for pathway.

84 Pathways relevant in progression Pathways and gene sets Progression in prostate cancer A TRANS b p p m 0.8 CCC B GHD KREB 0.2 C HORM GLY

85 Pathways and gene sets Progression in prostate cancer Pathway dependencies: benign to primary A B

86 Refinement of gene sets Pathways and gene sets Progression in prostate cancer 1 Not all genes in a gene set are relevant in the specific context studied.

87 Refinement of gene sets Pathways and gene sets Progression in prostate cancer 1 Not all genes in a gene set are relevant in the specific context studied. 2 Genes not included in the gene set maybe relevant to the specific context studied.

88 Gene network for ERK pathway Pathways and gene sets Progression in prostate cancer NGF PTPR EL K 1 SOS1 NGFB DPM2 GR B2 PPP2CA GNB1 MK NK 2 MK NK 1 EGFR R PS R AF1 SHC1 STAT TGFB MY C RPS6K AS MAPK 1 MAP2K 2 PDG MAP2K 1 GNAS

89 Relevant papers Learning Coordinate Covariances via Gradients. S. Mukherjee, D-X. Zhou; Journal of Machine Learning Research, 7(Mar): , Estimation of Gradients and Coordinate Covariation in Classification. S. Mukherjee, Q. Wu; Journal of Machine Learning Research, 7(Nov): , Characterizing the Function Space for Bayesian Kernel Models. N. Pillai, Q. Wu, F. Liang, S. Mukherjee, R.L. Wolpert; Journal of Machine Learning Research, 8(Aug): , Non-parametric Bayesian kernel models. F. Liang, K. Mao, M. Liao, S. Mukherjee, M. West; Biometrika, in submission. Learning Gradients: predictive models that infer geometry and dependence. Qiang Wu, Justin Guinney, Mauro Maggioni, ; Journal of Machine Learning Research, submitted. Modeling Cancer Progression via Pathway Dependencies. E. Edelman, J. Guinney, J-T. Chi, P.G. Febbo, S. Mukherjee; PLoS Computational Biology, in press. Bayesian simultaneous dimension reduction and regression. K. Mao, F. Liang, S. Mukherjee, Q. Wu; in preparation.

90 Acknowledgements People that did the work: Gradients Q Wu, D-X Zhou, K Mao, J Guinney

91 Acknowledgements People that did the work: Gradients Q Wu, D-X Zhou, K Mao, J Guinney Computational biology E Edelman, J Guinney, P Febbo, J-T Chi

92 Acknowledgements People that did the work: Gradients Q Wu, D-X Zhou, K Mao, J Guinney Computational biology E Edelman, J Guinney, P Febbo, J-T Chi Bayesian modeling N Pillai, K Mao, F Liang, M West, R Wolpert

93 Acknowledgements People that did the work: Gradients Q Wu, D-X Zhou, K Mao, J Guinney Computational biology E Edelman, J Guinney, P Febbo, J-T Chi Bayesian modeling N Pillai, K Mao, F Liang, M West, R Wolpert Funding: IGSP Center for Systems Biology at Duke NSF DMS

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Sayan Mukherjee. June 15, 2007

Sayan Mukherjee. June 15, 2007 Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University June 15, 2007 To Tommy Poggio This talk is dedicated to my advisor Tommy Poggio as

More information

Supervised Dimension Reduction:

Supervised Dimension Reduction: Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department

More information

Nonparametric Bayesian Models for Supervised Dimension R

Nonparametric Bayesian Models for Supervised Dimension R Nonparametric Bayesian Models for Supervised Dimension Reduction Department of Statistical Science Duke University December 2, 2009 Nonparametric Bayesian Models for Supervised Dimension Reduction Department

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Localized Sliced Inverse Regression

Localized Sliced Inverse Regression Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Two models for Bayesian supervised dimension reduction

Two models for Bayesian supervised dimension reduction Two models for Bayesian supervised dimension reduction BY KAI MAO Department of Statistical Science Duke University, Durham NC 778-5, U.S.A. km68@stat.duke.edu QIANG WU Department of Mathematics Michigan

More information

Learning Gradients on Manifolds

Learning Gradients on Manifolds Learning Gradients on Manifolds Sayan Mukherjee, Qiang Wu and Ding-Xuan Zhou Duke University, Michigan State University, and City University of Hong Kong Sayan Mukherjee Department of Statistical Science

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Local regression, intrinsic dimension, and nonparametric sparsity

Local regression, intrinsic dimension, and nonparametric sparsity Local regression, intrinsic dimension, and nonparametric sparsity Samory Kpotufe Toyota Technological Institute - Chicago and Max Planck Institute for Intelligent Systems I. Local regression and (local)

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Estimating variable structure and dependence in Multi-task learning via gradients

Estimating variable structure and dependence in Multi-task learning via gradients ESTIMATING VARIABLE STRUCTURE IN MULTI-TASK LEARNING Estimating variable structure and dependence in Multi-task learning via gradients Justin Guinney 1,2 Qiang Wu 1,3,4 Sayan Mukherjee 1,3,4 1 Institute

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Variable Selection and Dimension Reduction by Learning Gradients

Variable Selection and Dimension Reduction by Learning Gradients Variable Selection and Dimension Reduction by Learning Gradients Qiang Wu and Sayan Mukherjee August 6, 2008 1 Introduction High dimension data analysis has become a challenging problem in modern sciences.

More information

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Dawn Woodard Operations Research & Information Engineering Cornell University Johns Hopkins Dept. of Biostatistics, April

More information

arxiv: v1 [stat.me] 5 Aug 2015

arxiv: v1 [stat.me] 5 Aug 2015 Scalable Bayesian Kernel Models with Variable Selection Lorin Crawford, Kris C. Wood, and Sayan Mukherjee arxiv:1508.01217v1 [stat.me] 5 Aug 2015 Summary Nonlinear kernels are used extensively in regression

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels March 13, 2008 Paper: R.R. Coifman, S. Lafon, maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels Figure: Example Application from [LafonWWW] meaningful geometric

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Spectral Regularization

Spectral Regularization Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Regularization via Spectral Filtering

Regularization via Spectral Filtering Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

GRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT

GRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT GRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT Cha Zhang Joint work with Dinei Florêncio and Philip A. Chou Microsoft Research Outline Gaussian Markov Random Field Graph construction Graph transform

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

Exchangeability. Peter Orbanz. Columbia University

Exchangeability. Peter Orbanz. Columbia University Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information