Inverse regression approach to (robust) non-linear high-to-low dimensional mapping
|
|
- Gordon Taylor
- 5 years ago
- Views:
Transcription
1 Inverse regression approach to (robust) non-linear high-to-low dimensional mapping Emeline Perthame Joint work with Florence Forbes INRIA, team MISTIS, Grenoble LMNO, Caen October 27, / 25
2 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 2 / 25
3 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 3 / 25
4 A non linear mapping problem A non linear mapping problem y = y 1... y D g(y) x 1. x L = x Prediction of X from Y through a non linear regression function g with Y R D, X R L, D L E(X Y = y) = g(y) 4 / 25
5 A non linear mapping problem Application: Ω mission on Mars launch of a spectrometer around Mars Problem: Retrieving physical properties from hyperspectral images Y: spectrum (D=184) X: composition of the ground (L=3) Reflectance Mars Express - Omega (2004) [ prop. of dust prop. of CO 2 ice prop. of water ice Wavelength 5 / 25
6 Some approaches Difficulty: D large curse of dimensionality Solutions: via dimensionality reduction Reduce dimension of y before regression: eg. PCA on y Risk: poor prediction of x Take x into account: PLS, SIR, Kernel SIR, PC based methods Two steps approaches not expressed as a single optimization problem Our approach: inverse regression to reduce dimension 6 / 25
7 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 7 / 25
8 Proposed Method: An inverse regression strategy x R L low-dimensional space, y R D high-dimensional space, (y, x) are realizations of (Y, X ) p(y, X ; θ), θ parameters Inverse conditional density: p(y X ; θ) Y is a noisy function of X Modeled via mixtures Tractable θ estimation Forward conditional density: p(x Y ; θ ), with θ = f (θ) High-to-low prediction, eg. ˆX = E[X Y = Y ; θ ] 8 / 25
9 Student Locally-linear Mapping (SLLiM) A piecewise affine model: Introduce a missing variable Z Z = k Y is the image of X by an affine transformation K Y = I(Z = k)(a k X + b k + E k ) k=1 Definition of SLLiM p(y X, Z = k; θ) = S(Y ; A k X + b k, Σ k, α y k, γy k ) Affine transformations are local: mixture of K Student laws p(x Z = k; θ) = S(X ; c k, Γ k, α k, 1) p(z = k; θ) = π k The set of all model parameters is: θ = {π k, c k, Γ k, A k, b k, Σ k, α k, k = 1... K } 9 / 25
10 Why a Student mixture? Dealing with outliers Generalized Student distribution for the joint density of (X, Y ) S M (y; µ, Σ, α, γ) = Γ(α + M /2) Σ 1/2 Γ(α) (2πγ) M /2 [1 + δ(y, µ, Σ)/(2γ)] (α+m /2), Gaussian scale mixture representation (using weight variable U distributed according to a Gamma distribution ) S M (y; µ, Σ, α, γ) = 0 N M (y; µ, Σ/u) G(u; α, γ) du Parameters estimation is tractable by an EM algorithm Density Gaussian Student α= x 10 / 25
11 Low-to-high (Inverse) Regression If X and Y are both observed The parameter vector, θ, can be estimated in closed-form using an EM inference procedure This yields the inverse conditional density which is a Student mixture: p(y X ; θ) = K k=1 π k S(X ; c k, Γ k, α k, 1) K j =1 πj S(X ; cj, Γj, αj, 1) S(Y ; A k X + b k, Σ k α y k, γy k ) Both densities are Student mixtures parameterized by θ. Therefore, to obtain: A low-to-high inverse regression function: E[Y X = x; θ] = K k=1 π k S(x; c k, Γ k, α k, 1) K j =1 πj S(x; cj, Γj, α k, 1) (A k x + b k ), 11 / 25
12 High-to-low (Forward) Regression The forward conditional density is a Student mixture as well: p(x Y ; θ ) = K k=1 π k S(Y ; c k, Γ k, α k, 1) K j =1 π j S(Y ; c j, Γ j, αj, 1) S(X ; A k Y + b k, Σ k, α x k, γ x k ) The forward parameter vector, θ has an analytic expression as a function of θ Both densities are Student mixtures parameterized by θ. Therefore, to obtain: A high-to-low forward regression function: E[X Y = y; θ] = K k=1 π k S(y; c k, Γ k, α k, 1) K j =1 πj S(y; c j, Γ j, αj, 1) (A k y + b k ). 12 / 25
13 The forward parameter vector θ from θ c k = A k c k + b k, Γ k = Σ k + A k Γ k A T k, A k = Σ k A T k Σ 1 k, bk = Σ k (Γ 1 k c k A T k Σ 1 k b k ), Σ k = (Γ 1 k + A T k Σ 1 k A k ) / 25
14 A joint model approach to reduce the number of parameters Joint model p(x = x, Y = y Z = k) = S L+D ([ x y ] ) ; m k, V k, α k, 1 with [ ] c k m k = A k c k + b k [ ] Γk Γ k A T k and V k = A k Γ k Σ k + A k Γ k A T k Reduce the number of parameters to estimate Forward strategy + Γ k diagonal nb. par. = 1 D(D 1) + DL + 2L + D 2 D = 500, L = parameters Inverse strategy + Σ k diagonal nb. par. 1 L(L 1) + DL + 2D + L 2 D = 500, L = parameters 14 / 25
15 Extension to partially observed responses Incorporate a latent component into the low-dimensional variable: [ ] T X = W where T R L t is observed and W R Lw is latent (L = L t + L w) Example on Mars data: lighting? temperature? grain size? Observed pairs {(y n, T n), n = 1... N } (T R L t ) Additional latent variable W (W R Lw ) Assuming the independence of T and W given Z : p(x = (T, W ) Z = k) = S L ((T, W ) ; c k, Γ k, α k, 1) [ ] [ ] c t with c k = k Γ t, Γ 0 k = k 0 0 I Lw 15 / 25
16 Extension to partially observed responses Extension of SLLiM to more general covariance structure With A k = [ ] A t k A w k, K Y = I(Z = k)(a t k T + A w k W + b k + E k ) k=1 rewrites K Y = I(Z = k)(a t k T + b k + E k ) k=1 with Var(E k ) Σ k + A w k Aw k Diagonal Σ k Factor analysis with L w factors (at most) A compromise between full O(D 2 ) and diagonal O(D) covariances 16 / 25
17 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 17 / 25
18 Estimation of θ = (c k, Γ k, A k, b k, Σ k, π k, α k ) 1 k K by EM algorithm E-step Update posterior probabilities (E Z ) p(z = k t, y, θ (i) ) SMM-like (E W ) p(w Z = k, t, y, θ (i) ) Probabilistic PCA or Factor Analysis like (E U ) E(U Z = k, t, y, θ (i) ) Down-weighting extreme/atypic values in estimators More robust M-step (M X ) (π k, c k, Γ k ) SMM-like (M Y X ) (A k, b k, Σ k ) Hybrid between linear regression and PPCA/FA [ ] 0 0 Ã k = Ỹ k X k T ( 0 S k w + X k X k T ) 1 (M α) α k Not in closed-form but standard (specific to Student) 18 / 25
19 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 19 / 25
20 Application L = D = 1 RATP Subway in Paris Measure of air quality at Châtelet station, line 4 March 2015 N = 341 measures Prediction of NO (L=1) from NO 2 (D=1) Robustness of SLLiM NO NO2 20 / 25
21 Application L = D = 1 / SLLiM compared to GLLiM NO GLLiM SLLiM NO GLLiM SLLiM NO NO2 Illustration of robustness of the proposed model 21 / 25
22 Application L = D = 1 / SLLiM compared to GLLiM NRMSE GLLiM SLLiM GLLiM-WO SLLiM-WO K SLLiM achieves better prediction rates than GLLiM on complete data SLLiM becomes equivalent to GLLiM when outliers are removed 22 / 25
23 Other applications and augmented version of SLLiM Application when D L Hyperspectral data on Mars (D=184, L=2, N=6983) Comparison with other non linear regression methods Table: Mars data: average NRMSE and standard deviations in parenthesis for proportions of CO 2 ice and dust over 100 runs. Method Prop. of CO 2 ice Prop. of dust SLLiM (K=10) (0.019) (0.020) GLLiM (K=10) (0.023) (0.023) MARS (0.016) (0.021) SIR (0.025) (0.016) RVM (0.021) (0.034) 23 / 25
24 Results - Application to hyperspectral image analysis GLLiM SLLiM Splines Proportion of CO2 ice Proportion of dust 24 / 25
25 Conclusion and future work Mixture model used for prediction Addition of latent variables of partially observed responses Selection of K and L w K fixed? Or selected by BIC? L w selected by BIC? Thank you for your attention! Any questions? 25 / 25
Inverse regression approach to robust non-linear high-to-low dimensional mapping
Inverse regression approach to robust non-linear high-to-low dimensional mapping Emeline Perthame, Florence Forbes, Antoine Deleforge To cite this version: Emeline Perthame, Florence Forbes, Antoine Deleforge.
More informationHigh-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables
Noname manuscript No. (will be inserted by the editor) High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables Antoine Deleforge Florence Forbes Radu Horaud the date
More informationEstimation of Mars surface physical properties from hyperspectral images using the SIR method
Estimation of Mars surface physical properties from hyperspectral images using the SIR method Caroline Bernard-Michel, Sylvain Douté, Laurent Gardes and Stéphane Girard Source: ESA Outline I. Context Hyperspectral
More informationHigh-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables
High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables Antoine Deleforge, Florence Forbes, Radu Horaud To cite this version: Antoine Deleforge, Florence Forbes, Radu
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationA Weighted Multivariate Gaussian Markov Model For Brain Lesion Segmentation
A Weighted Multivariate Gaussian Markov Model For Brain Lesion Segmentation Senan Doyle, Florence Forbes, Michel Dojat July 5, 2010 Table of contents Introduction & Background A Weighted Multi-Sequence
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationBrain Lesion Segmentation: A Bayesian Weighted EM Approach
Brain Lesion Segmentation: A Bayesian Weighted EM Approach Senan Doyle, Florence Forbes, Michel Dojat November 19, 2009 Table of contents Introduction & Background A Weighted Multi-Sequence Markov Model
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationMixtures of Robust Probabilistic Principal Component Analyzers
Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationHyper-Spectral Image Analysis with Partially-Latent Regression and Spatial Markov Dependencies
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 9, NO. 9, SEPTEMBER 2015 1 Hyper-Spectral Image Analysis with Partially-Latent Regression and Spatial Marov Dependencies Antoine Deleforge Florence
More informationA latent variable modelling approach to the acoustic-to-articulatory mapping problem
A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationVariational Bayesian Learning
AB Variational Bayesian Learning Tapani Raiko April 17, 2008 Machine learning: Advanced probabilistic methods Motivation The main issue in probabilistic machine learning models is to find the posterior
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationRobust Probabilistic Projections
Cédric Archambeau cedric.archambeau@uclouvain.be Nicolas Delannay nicolas.delannay@uclouvain.be Michel Verleysen verleysen@dice.ucl.ac.be Université catholique de Louvain, Machine Learning Group, 3 Pl.
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationINFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES
INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationarxiv: v5 [stat.ap] 20 Jul 2017
NONLINEAR NETWORK-BASED QUANTITATIVE TRAIT PREDICTION FROM TRANSCRIPTOMIC DATA EMILIE DEVIJVER, MÉLINA GALLOPIN, AND EMELINE PERTHAME arxiv:1701.07899v5 [stat.ap] 20 Jul 2017 Abstract. Quantitatively predicting
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationDimension Reduction. David M. Blei. April 23, 2012
Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationAlgorithms for Variational Learning of Mixture of Gaussians
Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationTechniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods
Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationRao-Blackwellized Particle Filtering for 6-DOF Estimation of Attitude and Position via GPS and Inertial Sensors
Rao-Blackwellized Particle Filtering for 6-DOF Estimation of Attitude and Position via GPS and Inertial Sensors GRASP Laboratory University of Pennsylvania June 6, 06 Outline Motivation Motivation 3 Problem
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationE cient Importance Sampling
E cient David N. DeJong University of Pittsburgh Spring 2008 Our goal is to calculate integrals of the form Z G (Y ) = ϕ (θ; Y ) dθ. Special case (e.g., posterior moment): Z G (Y ) = Θ Θ φ (θ; Y ) p (θjy
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More information