Inverse regression approach to (robust) non-linear high-to-low dimensional mapping

Size: px
Start display at page:

Download "Inverse regression approach to (robust) non-linear high-to-low dimensional mapping"

Transcription

1 Inverse regression approach to (robust) non-linear high-to-low dimensional mapping Emeline Perthame Joint work with Florence Forbes INRIA, team MISTIS, Grenoble LMNO, Caen October 27, / 25

2 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 2 / 25

3 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 3 / 25

4 A non linear mapping problem A non linear mapping problem y = y 1... y D g(y) x 1. x L = x Prediction of X from Y through a non linear regression function g with Y R D, X R L, D L E(X Y = y) = g(y) 4 / 25

5 A non linear mapping problem Application: Ω mission on Mars launch of a spectrometer around Mars Problem: Retrieving physical properties from hyperspectral images Y: spectrum (D=184) X: composition of the ground (L=3) Reflectance Mars Express - Omega (2004) [ prop. of dust prop. of CO 2 ice prop. of water ice Wavelength 5 / 25

6 Some approaches Difficulty: D large curse of dimensionality Solutions: via dimensionality reduction Reduce dimension of y before regression: eg. PCA on y Risk: poor prediction of x Take x into account: PLS, SIR, Kernel SIR, PC based methods Two steps approaches not expressed as a single optimization problem Our approach: inverse regression to reduce dimension 6 / 25

7 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 7 / 25

8 Proposed Method: An inverse regression strategy x R L low-dimensional space, y R D high-dimensional space, (y, x) are realizations of (Y, X ) p(y, X ; θ), θ parameters Inverse conditional density: p(y X ; θ) Y is a noisy function of X Modeled via mixtures Tractable θ estimation Forward conditional density: p(x Y ; θ ), with θ = f (θ) High-to-low prediction, eg. ˆX = E[X Y = Y ; θ ] 8 / 25

9 Student Locally-linear Mapping (SLLiM) A piecewise affine model: Introduce a missing variable Z Z = k Y is the image of X by an affine transformation K Y = I(Z = k)(a k X + b k + E k ) k=1 Definition of SLLiM p(y X, Z = k; θ) = S(Y ; A k X + b k, Σ k, α y k, γy k ) Affine transformations are local: mixture of K Student laws p(x Z = k; θ) = S(X ; c k, Γ k, α k, 1) p(z = k; θ) = π k The set of all model parameters is: θ = {π k, c k, Γ k, A k, b k, Σ k, α k, k = 1... K } 9 / 25

10 Why a Student mixture? Dealing with outliers Generalized Student distribution for the joint density of (X, Y ) S M (y; µ, Σ, α, γ) = Γ(α + M /2) Σ 1/2 Γ(α) (2πγ) M /2 [1 + δ(y, µ, Σ)/(2γ)] (α+m /2), Gaussian scale mixture representation (using weight variable U distributed according to a Gamma distribution ) S M (y; µ, Σ, α, γ) = 0 N M (y; µ, Σ/u) G(u; α, γ) du Parameters estimation is tractable by an EM algorithm Density Gaussian Student α= x 10 / 25

11 Low-to-high (Inverse) Regression If X and Y are both observed The parameter vector, θ, can be estimated in closed-form using an EM inference procedure This yields the inverse conditional density which is a Student mixture: p(y X ; θ) = K k=1 π k S(X ; c k, Γ k, α k, 1) K j =1 πj S(X ; cj, Γj, αj, 1) S(Y ; A k X + b k, Σ k α y k, γy k ) Both densities are Student mixtures parameterized by θ. Therefore, to obtain: A low-to-high inverse regression function: E[Y X = x; θ] = K k=1 π k S(x; c k, Γ k, α k, 1) K j =1 πj S(x; cj, Γj, α k, 1) (A k x + b k ), 11 / 25

12 High-to-low (Forward) Regression The forward conditional density is a Student mixture as well: p(x Y ; θ ) = K k=1 π k S(Y ; c k, Γ k, α k, 1) K j =1 π j S(Y ; c j, Γ j, αj, 1) S(X ; A k Y + b k, Σ k, α x k, γ x k ) The forward parameter vector, θ has an analytic expression as a function of θ Both densities are Student mixtures parameterized by θ. Therefore, to obtain: A high-to-low forward regression function: E[X Y = y; θ] = K k=1 π k S(y; c k, Γ k, α k, 1) K j =1 πj S(y; c j, Γ j, αj, 1) (A k y + b k ). 12 / 25

13 The forward parameter vector θ from θ c k = A k c k + b k, Γ k = Σ k + A k Γ k A T k, A k = Σ k A T k Σ 1 k, bk = Σ k (Γ 1 k c k A T k Σ 1 k b k ), Σ k = (Γ 1 k + A T k Σ 1 k A k ) / 25

14 A joint model approach to reduce the number of parameters Joint model p(x = x, Y = y Z = k) = S L+D ([ x y ] ) ; m k, V k, α k, 1 with [ ] c k m k = A k c k + b k [ ] Γk Γ k A T k and V k = A k Γ k Σ k + A k Γ k A T k Reduce the number of parameters to estimate Forward strategy + Γ k diagonal nb. par. = 1 D(D 1) + DL + 2L + D 2 D = 500, L = parameters Inverse strategy + Σ k diagonal nb. par. 1 L(L 1) + DL + 2D + L 2 D = 500, L = parameters 14 / 25

15 Extension to partially observed responses Incorporate a latent component into the low-dimensional variable: [ ] T X = W where T R L t is observed and W R Lw is latent (L = L t + L w) Example on Mars data: lighting? temperature? grain size? Observed pairs {(y n, T n), n = 1... N } (T R L t ) Additional latent variable W (W R Lw ) Assuming the independence of T and W given Z : p(x = (T, W ) Z = k) = S L ((T, W ) ; c k, Γ k, α k, 1) [ ] [ ] c t with c k = k Γ t, Γ 0 k = k 0 0 I Lw 15 / 25

16 Extension to partially observed responses Extension of SLLiM to more general covariance structure With A k = [ ] A t k A w k, K Y = I(Z = k)(a t k T + A w k W + b k + E k ) k=1 rewrites K Y = I(Z = k)(a t k T + b k + E k ) k=1 with Var(E k ) Σ k + A w k Aw k Diagonal Σ k Factor analysis with L w factors (at most) A compromise between full O(D 2 ) and diagonal O(D) covariances 16 / 25

17 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 17 / 25

18 Estimation of θ = (c k, Γ k, A k, b k, Σ k, π k, α k ) 1 k K by EM algorithm E-step Update posterior probabilities (E Z ) p(z = k t, y, θ (i) ) SMM-like (E W ) p(w Z = k, t, y, θ (i) ) Probabilistic PCA or Factor Analysis like (E U ) E(U Z = k, t, y, θ (i) ) Down-weighting extreme/atypic values in estimators More robust M-step (M X ) (π k, c k, Γ k ) SMM-like (M Y X ) (A k, b k, Σ k ) Hybrid between linear regression and PPCA/FA [ ] 0 0 Ã k = Ỹ k X k T ( 0 S k w + X k X k T ) 1 (M α) α k Not in closed-form but standard (specific to Student) 18 / 25

19 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 19 / 25

20 Application L = D = 1 RATP Subway in Paris Measure of air quality at Châtelet station, line 4 March 2015 N = 341 measures Prediction of NO (L=1) from NO 2 (D=1) Robustness of SLLiM NO NO2 20 / 25

21 Application L = D = 1 / SLLiM compared to GLLiM NO GLLiM SLLiM NO GLLiM SLLiM NO NO2 Illustration of robustness of the proposed model 21 / 25

22 Application L = D = 1 / SLLiM compared to GLLiM NRMSE GLLiM SLLiM GLLiM-WO SLLiM-WO K SLLiM achieves better prediction rates than GLLiM on complete data SLLiM becomes equivalent to GLLiM when outliers are removed 22 / 25

23 Other applications and augmented version of SLLiM Application when D L Hyperspectral data on Mars (D=184, L=2, N=6983) Comparison with other non linear regression methods Table: Mars data: average NRMSE and standard deviations in parenthesis for proportions of CO 2 ice and dust over 100 runs. Method Prop. of CO 2 ice Prop. of dust SLLiM (K=10) (0.019) (0.020) GLLiM (K=10) (0.023) (0.023) MARS (0.016) (0.021) SIR (0.025) (0.016) RVM (0.021) (0.034) 23 / 25

24 Results - Application to hyperspectral image analysis GLLiM SLLiM Splines Proportion of CO2 ice Proportion of dust 24 / 25

25 Conclusion and future work Mixture model used for prediction Addition of latent variables of partially observed responses Selection of K and L w K fixed? Or selected by BIC? L w selected by BIC? Thank you for your attention! Any questions? 25 / 25

Inverse regression approach to robust non-linear high-to-low dimensional mapping

Inverse regression approach to robust non-linear high-to-low dimensional mapping Inverse regression approach to robust non-linear high-to-low dimensional mapping Emeline Perthame, Florence Forbes, Antoine Deleforge To cite this version: Emeline Perthame, Florence Forbes, Antoine Deleforge.

More information

High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables

High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables Noname manuscript No. (will be inserted by the editor) High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables Antoine Deleforge Florence Forbes Radu Horaud the date

More information

Estimation of Mars surface physical properties from hyperspectral images using the SIR method

Estimation of Mars surface physical properties from hyperspectral images using the SIR method Estimation of Mars surface physical properties from hyperspectral images using the SIR method Caroline Bernard-Michel, Sylvain Douté, Laurent Gardes and Stéphane Girard Source: ESA Outline I. Context Hyperspectral

More information

High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables

High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables Antoine Deleforge, Florence Forbes, Radu Horaud To cite this version: Antoine Deleforge, Florence Forbes, Radu

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

A Weighted Multivariate Gaussian Markov Model For Brain Lesion Segmentation

A Weighted Multivariate Gaussian Markov Model For Brain Lesion Segmentation A Weighted Multivariate Gaussian Markov Model For Brain Lesion Segmentation Senan Doyle, Florence Forbes, Michel Dojat July 5, 2010 Table of contents Introduction & Background A Weighted Multi-Sequence

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Brain Lesion Segmentation: A Bayesian Weighted EM Approach

Brain Lesion Segmentation: A Bayesian Weighted EM Approach Brain Lesion Segmentation: A Bayesian Weighted EM Approach Senan Doyle, Florence Forbes, Michel Dojat November 19, 2009 Table of contents Introduction & Background A Weighted Multi-Sequence Markov Model

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

Mixtures of Robust Probabilistic Principal Component Analyzers

Mixtures of Robust Probabilistic Principal Component Analyzers Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Hyper-Spectral Image Analysis with Partially-Latent Regression and Spatial Markov Dependencies

Hyper-Spectral Image Analysis with Partially-Latent Regression and Spatial Markov Dependencies IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2015 1 Hyper-Spectral Image Analysis with Partially-Latent Regression and Spatial Marov Dependencies Antoine Deleforge Florence

More information

A latent variable modelling approach to the acoustic-to-articulatory mapping problem

A latent variable modelling approach to the acoustic-to-articulatory mapping problem A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Variational Bayesian Learning

Variational Bayesian Learning AB Variational Bayesian Learning Tapani Raiko April 17, 2008 Machine learning: Advanced probabilistic methods Motivation The main issue in probabilistic machine learning models is to find the posterior

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Robust Probabilistic Projections

Robust Probabilistic Projections Cédric Archambeau cedric.archambeau@uclouvain.be Nicolas Delannay nicolas.delannay@uclouvain.be Michel Verleysen verleysen@dice.ucl.ac.be Université catholique de Louvain, Machine Learning Group, 3 Pl.

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

arxiv: v5 [stat.ap] 20 Jul 2017

arxiv: v5 [stat.ap] 20 Jul 2017 NONLINEAR NETWORK-BASED QUANTITATIVE TRAIT PREDICTION FROM TRANSCRIPTOMIC DATA EMILIE DEVIJVER, MÉLINA GALLOPIN, AND EMELINE PERTHAME arxiv:1701.07899v5 [stat.ap] 20 Jul 2017 Abstract. Quantitatively predicting

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Algorithms for Variational Learning of Mixture of Gaussians

Algorithms for Variational Learning of Mixture of Gaussians Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Lecture 2: Simple Classifiers

Lecture 2: Simple Classifiers CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

More information

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Rao-Blackwellized Particle Filtering for 6-DOF Estimation of Attitude and Position via GPS and Inertial Sensors

Rao-Blackwellized Particle Filtering for 6-DOF Estimation of Attitude and Position via GPS and Inertial Sensors Rao-Blackwellized Particle Filtering for 6-DOF Estimation of Attitude and Position via GPS and Inertial Sensors GRASP Laboratory University of Pennsylvania June 6, 06 Outline Motivation Motivation 3 Problem

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

E cient Importance Sampling

E cient Importance Sampling E cient David N. DeJong University of Pittsburgh Spring 2008 Our goal is to calculate integrals of the form Z G (Y ) = ϕ (θ; Y ) dθ. Special case (e.g., posterior moment): Z G (Y ) = Θ Θ φ (θ; Y ) p (θjy

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information