Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions

Size: px
Start display at page:

Download "Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions"

Transcription

1 Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne 25th Australasian Joint Conference on Artificial Intelligence 2012 (The University of Melbourne) AI / 19

2 Content Mixture Modelling 1 Mixture Modelling Problem Description MML Mixture Models 2 MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians 3 Example (The University of Melbourne) AI / 19

3 Problem Description Mixture Modelling Problem Description We have n items, each with q associated attributes, formed into a matrix y 1 y 1,1 y 1,2... y 1,q y 2 Y =. = y 2,1 y 2,2... y 2,q y n y n,1 y n,2... y n,q Group together, or cluster, similar items A form of unsupervised learning Sometimes called intrinsic classification Class labels are learned from the data (The University of Melbourne) AI / 19

4 Mixture Modelling Problem Description Mixture Modelling (1) Models data as a mixture of probability distributions p(y i,j ; Φ) = K α k p(y i,j ; θ k,j ) k=1 where K is the number of classes α = (α 1,..., α K ) are the mixing (population) weights θ k,j are the parameters of the distributions Φ = {K, α, θ 1,1,..., θ K,q } denotes the complete mixture model Has an explicit probabilistic form allows for statistical interpretion (The University of Melbourne) AI / 19

5 Mixture Modelling Problem Description Mixture Modelling (2) How is this related to clustering? Each class is a cluster Class-specific probability distributions over each attribute e.g., normal, inverse Gaussian, Poisson, etc. Mixing weight is prevalance of the classes in the population Measure of similarity of item to class q p k (y i ) = p(y i,j ; θ k,j ) j=1 probability of item s attributes under class distributions (The University of Melbourne) AI / 19

6 Mixture Modelling Problem Description Mixture Modelling (2) How is this related to clustering? Each class is a cluster Class-specific probability distributions over each attribute e.g., normal, inverse Gaussian, Poisson, etc. Mixing weight is prevalance of the classes in the population Measure of similarity of item to class q p k (y i ) = p(y i,j ; θ k,j ) j=1 probability of item s attributes under class distributions (The University of Melbourne) AI / 19

7 Mixture Modelling (3) Mixture Modelling Problem Description Membership of items to classes is soft r i,k = α k p k (y i ) Kl=1 α l p l (y i ) Posterior probability of belonging to class k α k is a priori probability item belongs to class k p k (y i ) is probability of data item y i under class k Assign to class with highest posterior probability Total number of samples in a class is then n n k = r i,k i=1 (The University of Melbourne) AI / 19

8 Mixture Modelling (3) Mixture Modelling Problem Description Membership of items to classes is soft r i,k = α k p k (y i ) Kl=1 α l p l (y i ) Posterior probability of belonging to class k α k is a priori probability item belongs to class k p k (y i ) is probability of data item y i under class k Assign to class with highest posterior probability Total number of samples in a class is then n n k = r i,k i=1 (The University of Melbourne) AI / 19

9 Mixture Modelling MML Mixture Models MML Mixture Models (1) Minimum Message Length goodness-of-fit criterion Popular criterion for mixture modelling Based on the idea of compression Message length of data is our yardstick; comprised of 1 Length of codeword needed to state model Φ Number of classes: I(K) Relative abundances: I(α) Parameters for each distribution in each class: I(θ k,j ) 2 Length of codeword needed to state data, given model: I(Y Φ) (The University of Melbourne) AI / 19

10 Mixture Modelling MML Mixture Models MML Mixture Models (1) Minimum Message Length goodness-of-fit criterion Popular criterion for mixture modelling Based on the idea of compression Message length of data is our yardstick; comprised of 1 Length of codeword needed to state model Φ Number of classes: I(K) Relative abundances: I(α) Parameters for each distribution in each class: I(θ k,j ) 2 Length of codeword needed to state data, given model: I(Y Φ) (The University of Melbourne) AI / 19

11 Mixture Modelling MML Mixture Models (2) MML Mixture Models Total message length: K q I(Y, Φ) = I(K) + I(α) + I(θ k,j ) + I(Y Φ) k=1 j=1 balances model complexity against model fit Estimate Φ by minimising message length ˆα and ˆθ j,k found by expectation-maximisation Find ˆK by splitting/merging classes (The University of Melbourne) AI / 19

12 Mixture Modelling MML Mixture Models (2) MML Mixture Models Total message length: K q I(Y, Φ) = I(K) + I(α) + I(θ k,j ) + I(Y Φ) k=1 j=1 balances model complexity against model fit Estimate Φ by minimising message length ˆα and ˆθ j,k found by expectation-maximisation Find ˆK by splitting/merging classes (The University of Melbourne) AI / 19

13 Content MML Inverse Gaussian Distributions 1 Mixture Modelling Problem Description MML Mixture Models 2 MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians 3 Example (The University of Melbourne) AI / 19

14 MML Inverse Gaussian Distributions Inverse Gaussian Distributions (1) Inverse Gaussian Distributions Distribution for positive, continuous data We say Y i IG(µ, λ) if p.d.f. for Y i = y i is p(y i ; µ, λ) = ( 1 2πλy 3 i where µ > 0 is the mean parameter λ > 0 is the inverse-shape parameter Suitable for positively skewed data ) 1 ( 2 exp (y i µ) 2 ) 2µ 2, λy i Derive the message length formula for use in mixture modelling (The University of Melbourne) AI / 19

15 MML Inverse Gaussian Distributions Inverse Gaussian Distributions (2) Inverse Gaussian Distributions Example of inverse Gaussian distributions µ=1, λ=1 µ=1, λ=3 µ=3, λ=1 p(y; µ, λ) y (The University of Melbourne) AI / 19

16 MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (1) Use Wallace Freeman approximation Bayesian; we chose uninformative priors π(µ, λ) 1 λµ 3 2 Message length component for use in mixture models I(θ k,j ) = log n k 1 ( ) 2 log ˆλ 2 2aj k,j + log bj where ˆλ k,j is the MML estimate of λ for class k and variable j n k is number of samples in class k a j, b j are hyper-parameters Details may be found in the paper (The University of Melbourne) AI / 19

17 MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (1) Use Wallace Freeman approximation Bayesian; we chose uninformative priors π(µ, λ) 1 λµ 3 2 Message length component for use in mixture models I(θ k,j ) = log n k 1 ( ) 2 log ˆλ 2 2aj k,j + log bj where ˆλ k,j is the MML estimate of λ for class k and variable j n k is number of samples in class k a j, b j are hyper-parameters Details may be found in the paper (The University of Melbourne) AI / 19

18 MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (2) Let y = (y 1,..., y n ) be data from an inverse Gaussian Define sufficient statistics n n 1 S 1 = y i, S 2 =, y i i=1 Compare maximum likelihood estimates i=1 ˆµ ML = S 1 n, ˆλML = S 1S 2 n 2 ns 1 to minimum message length estimates ˆµ 87 = S 1 n, ˆλ87 = S 1S 2 n 2 (n 1)S 1 MML estimates: 1 Are Unbiased 2 Strictly dominate ML estimates in terms of KL risk (The University of Melbourne) AI / 19

19 MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (2) Let y = (y 1,..., y n ) be data from an inverse Gaussian Define sufficient statistics n n 1 S 1 = y i, S 2 =, y i i=1 Compare maximum likelihood estimates i=1 ˆµ ML = S 1 n, ˆλML = S 1S 2 n 2 ns 1 to minimum message length estimates ˆµ 87 = S 1 n, ˆλ87 = S 1S 2 n 2 (n 1)S 1 MML estimates: 1 Are Unbiased 2 Strictly dominate ML estimates in terms of KL risk (The University of Melbourne) AI / 19

20 Content Example 1 Mixture Modelling Problem Description MML Mixture Models 2 MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians 3 Example (The University of Melbourne) AI / 19

21 Example (1) Example Compared inverse Gaussian mixture models against standard Gaussian mixture models Used several well known, real, datasets 1 Enzyme 2 Acidity 3 Galaxy Results shown for enzyme n = 245 samples See paper for acidity and galaxy results (The University of Melbourne) AI / 19

22 Example (2) Example Histogram of enzyme data (The University of Melbourne) AI / 19

23 Example (3) Example Gaussian mixture model (K = 2, I = 86.19) (The University of Melbourne) AI / 19

24 Example (4) Example Inverse Gaussian mixture model (K = 3, I = 69.34) (The University of Melbourne) AI / 19

25 Example References Wallace, C. S., Boulton, D. M. An information measure for classification. Computer Journal, 1968, Vol. 11, pp Wallace, C. S., Dowe, D. L. MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions. Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, 1997, pp Wallace, C. S. Intrinsic Classification of Spatially Correlated Data. The Computer Journal, 1998, Vol. 41, pp Wallace, C. S., Dowe, D. L., MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing, 2000, Vol. 10, pp Wallace, C. S. Statistical and Inductive Inference by Minimum Message Length, Springer, 2005 (The University of Melbourne) AI / 19

The Minimum Message Length Principle for Inductive Inference

The Minimum Message Length Principle for Inductive Inference The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Minimum Message Length Analysis of Multiple Short Time Series

Minimum Message Length Analysis of Multiple Short Time Series Minimum Message Length Analysis of Multiple Short Time Series Daniel F. Schmidt and Enes Makalic Abstract This paper applies the Bayesian minimum message length principle to the multiple short time series

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties

Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties Gerhard Visser Clayton School of I.T., Monash University, Clayton, Vic. 3168, Australia, (gvis1@student.monash.edu.au)

More information

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

MML Invariant Linear Regression

MML Invariant Linear Regression MML Invariant Linear Regression Daniel F. Schmidt and Enes Makalic The University of Melbourne Centre for MEGA Epidemiology Carlton VIC 3053, Australia {dschmidt,emakalic}@unimelb.edu.au Abstract. This

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Statistical and Inductive Inference by Minimum Message Length

Statistical and Inductive Inference by Minimum Message Length C.S. Wallace Statistical and Inductive Inference by Minimum Message Length With 22 Figures Springer Contents Preface 1. Inductive Inference 1 1.1 Introduction 1 1.2 Inductive Inference 5 1.3 The Demise

More information

Logistic Regression with the Nonnegative Garrote

Logistic Regression with the Nonnegative Garrote Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Minimum Message Length Grouping of Ordered Data

Minimum Message Length Grouping of Ordered Data Minimum Message Length Grouping of Ordered Data Leigh J. Fitzgibbon, Lloyd Allison, and David L. Dowe School of Computer Science and Software Engineering Monash University, Clayton, VIC 3168 Australia

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Shrinkage and Denoising by Minimum Message Length

Shrinkage and Denoising by Minimum Message Length 1 Shrinkage and Denoising by Minimum Message Length Daniel F. Schmidt and Enes Makalic Abstract This paper examines orthonormal regression and wavelet denoising within the Minimum Message Length (MML framework.

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Model Based Clustering of Count Processes Data

Model Based Clustering of Count Processes Data Model Based Clustering of Count Processes Data Tin Lok James Ng, Brendan Murphy Insight Centre for Data Analytics School of Mathematics and Statistics May 15, 2017 Tin Lok James Ng, Brendan Murphy (Insight)

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

arxiv: v3 [stat.me] 11 Feb 2018

arxiv: v3 [stat.me] 11 Feb 2018 arxiv:1708.02742v3 [stat.me] 11 Feb 2018 Minimum message length inference of the Poisson and geometric models using heavy-tailed prior distributions Chi Kuen Wong, Enes Makalic, Daniel F. Schmidt February

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

On the identification of outliers in a simple model

On the identification of outliers in a simple model On the identification of outliers in a simple model C. S. Wallace 1 Introduction We suppose that we are given a set of N observations {y i i = 1,...,N} which are thought to arise independently from some

More information

2.6.3 Generalized likelihood ratio tests

2.6.3 Generalized likelihood ratio tests 26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Week 1: Introduction, Statistical Basics, and a bit of Information Theory Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Minimum Message Length Shrinkage Estimation

Minimum Message Length Shrinkage Estimation Minimum Message Length Shrinage Estimation Enes Maalic, Daniel F. Schmidt Faculty of Information Technology, Monash University, Clayton, Australia, 3800 Abstract This note considers estimation of the mean

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Handling imprecise and uncertain class labels in classification and clustering

Handling imprecise and uncertain class labels in classification and clustering Handling imprecise and uncertain class labels in classification and clustering Thierry Denœux 1 1 Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) COST Action IC 0702 Working group C, Mallorca,

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

HCOC: hierarchical classifier with overlapping class groups

HCOC: hierarchical classifier with overlapping class groups HCOC: hierarchical classifier with overlapping class groups Igor T Podolak Group of Machine Learning Methods GMUM Theoretical Foundations of Machine Learning, Będlewo 2015 20th February 2015 1 / Igor T

More information

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. García Barrado 1 E. Coart 2 T. Burzykowski 1,2 1 Interuniversity Institute for Biostatistics and

More information

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University OVERVIEW This class will cover model-based

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature

More information

MML Mixture Models of Heterogeneous Poisson Processes with Uniform Outliers for Bridge Deterioration

MML Mixture Models of Heterogeneous Poisson Processes with Uniform Outliers for Bridge Deterioration MML Mixture Models of Heterogeneous Poisson Processes with Uniform Outliers for Bridge Deterioration T. Maheswaran, J.G. Sanjayan 2, David L. Dowe 3 and Peter J. Tan 3 VicRoads, Metro South East Region,

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information