Variable selection for model-based clustering of categorical data

Size: px
Start display at page:

Download "Variable selection for model-based clustering of categorical data"

Transcription

1 Variable selection for model-based clustering of categorical data Brendan Murphy Wirtschaftsuniversität Wien Seminar, / 44

2 Alzheimer Dataset Data were collected on early onset Alzheimer patient symptoms in St. James Hospital, Dublin. Two hundred and forty patients had six behavioural and psychological symptoms (Hallucination, Activity, Aggression, Agitation, Diurnal and Affective) recorded. Number of distinct groups of patients gives an idea of the number of subclasses or syndromes. Which symptoms distinguish the groups? Can some subset better distinguish syndromes? Previous studies: difficulty determining whether two or three groups are more suitable to describe data. 2 / 44

3 Back Pain Dataset A study to investigate the use of a mechanisms-based classification of muscoloskeletal pain in clinical practice. The aim of the study was to asses the discriminative power of the taxonomy of pain in Nociceptive, Peripheral Neuropathic and Central Sensitization for low-back disorders. There are N = 464 patients who were assessed according to a list of 36 binary clinical indicators ( Present / Absent ). Some of the indicators carry the same information about the pain categories, thus the interest here is to select a subset of most relevant clinical criteria, performing a partition of the patients. Does the partition of the patients agree with the clinical taxonomy? 3 / 44

4 Clustering and Variable Selection The motivating examples show the need for: Clustering: Can we establish the existance of subgroups? How can we characterize these subgroups? Variable Selection: Can we use a subset of the variables to distinguish the subgroups? 4 / 44

5 Model-Based Clustering/Mixture Models Denote the N M data matrix by X The nth observation is denoted by X n. Model-based clustering assumes that X n arises from a finite mixture model Assuming G classes (components) τ g are mixture weights p(x n τ, θ, G) = G τ g p(x n θ g ). g=1 p(x n θ g ) is the component distribution. 5 / 44

6 Latent Class Analysis (LCA) model Latent Class Analysis (LCA) is a model for clustering categorical data. Let X n = (X n1, X n2,..., X nm ) where X nm takes a value from {1, 2,..., C m }. In LCA we assume that there is local independence between variables, so that if we knew X n was in class g we could write it s density as p(x n θ g ) = M C m m=1 c=1 θ I(Xnm=c) gmc, where {θ gm1,..., θ gmcm } give the probabilities of observing the categories {1,..., C m } in variable m θ g will characterize and embody the differences between groups 6 / 44

7 Example: Alzheimer Dataset Result for three group model from BayesLCA package (White & Murphy, 2014) Item Probabilities Groups / 44

8 LCA model (general) Model likelihood of the form, p(x n θ, τ, G) = G M C m τ g g=1 m=1 c=1 More convenient to work with completed data θ I(X nm=c) gmc. Augment data with class labels Z n = (Z n1, Z n2,..., Z ng ) where { 1 if observation n belongs to group g Z ng = 0 otherwise. Then we can write down completed data likelihood for an observation { G M } C Zng m p(x n, Z n θ, τ, G) = θ I(X nm=c) gmc. g=1 τ g m=1 c=1 8 / 44

9 LCA model (general) Estimation by EM algorithm or VB (see BayesLCA package) Note that G must be chosen in advance; possible to discriminate the best G for the data using information criteria (eg. BIC) Bayesian approaches: Pandolfi, Bartolucci and Friel (2014) use reversible jump to get posterior probability for G 9 / 44

10 Bayesian variable selection in LCA model Consider the variables that are useful for clustering in the model Let ν cl be a vector containing the indexes of the set of variables used for clustering the data ν n contain the remaining indexes This splits the observed categorical variables into those with discriminating power, and those without. 10 / 44

11 Bayesian variable selection in LCA model Then for the variables used in clustering p cl (X, Z θ, ν, τ, G) = N { G n=1 g=1 C m τ g m ν cl c=1 θ I(X nm=c) gmc } Zng... and for those not used p n (X ρ, ν) = N C m n=1 m ν n c=1 ρ I(X nm=c) mc, ρ mc is the probability of variable m having category c and is the same for all items 11 / 44

12 Bayesian variable selection in LCA model Priors are Dirichlet on the item probabilities in each class for the discriminating variables, and also in the non-discriminating variables p(θ gm β) = Γ (C mβ) Γ (β) C m p(ρ m β) = Γ (C mβ) Γ (β) Cm C m c=1 C m c=1 θ β 1 gmc. ρ β 1 mc. Prior on class probabilities also Dirichlet p(τ α, G) = Γ (Gα) Γ (α) G G g=1 τ α 1 g 12 / 44

13 Bayesian variable selection in LCA model We aim to explore uncertainty in the number of groups G and the variables used for clustering Take a prior on G also. We employ the prior of Nobile and Fearnside (2007), that was justified for this problem in a similar context p(g) 1 G! normalized over 1,..., G max In fact the work we present here, brings that of Nobile and Fearnside (2007) (for Gaussian mixtures) into the categorical data domain 13 / 44

14 Bayesian variable selection in LCA model Variables are assumed to be included a priori following a Bernoulli with parameter π p(ν π) = (1 π). m ν n m ν cl π Usually there will only be enough information to set something like π = 0.5 in a practical situation Ley and Steel (2009) investigate putting a Beta(a 0, b 0 ) hyperprior on π; we tried this but found no notable difference in results 14 / 44

15 Bayesian variable selection in LCA model If we write down the model in it s full form, we get a joint posterior on item probabilities over classes, class probabilities, labels and the number of classes: the full completed likelihood is p full (X, Z θ, ρ, ν, τ, G) = p cl (X, Z θ, ν, τ, G)p n (X ρ, ν) and posterior is p(g, Z, θ, ρ, ν, τ X, α, π, β) p full (X, Z θ, ρ, ν, τ, G) p(τ α, G)p(ν π) m ν n p(ρ m β) G g=1 p(g). m ν cl p(θ gm β) 15 / 44

16 Marginalization approach Using normalizing constants for the Dirichlet distribution it turns out that p(g, Z, ν X, α, π, β) p(g) p(z, θ, ρ, ν, τ X, G, α, π, β) dθ dρ dτ. is actually available in closed form. Instead of doing a trans-dimensional search like reversible jump algorithm, why not search over the discrete space defined by (G, Z, ν)? 16 / 44

17 Marginalization approach Doing the algebra gives p(g, Z, ν X, α, π, β) p(g)p(ν π) Γ (Gα) G g=1 Γ (N g + α) Γ (α) G Γ (N + Gα) Γ (C m β) Cm c=1 Γ (N mc + β) m ν n Γ (β) Cm Γ (N + C m β) G Γ (C m β) Cm c=1 Γ (N gmc + β) m ν cl Γ (β) Cm Γ (N g + C m β) g=1 N g is the number of observations clustered to group g, N mc is the number of times variable m takes category c, N gmc is the number of items in group g that have category c for variable m 17 / 44

18 MCMC sampling algorithm Class memberships are sampled using a Gibbs sampling step which exploits the full conditional distribution of the class label for observation n, n = 1,..., N A component is added or removed with probability 0.5 A component k is chosen at random to eject a new component from A draw u Beta(a, a) is made, and each element of the ejecting component is assigned to new component with prob u Components are removed by putting the elements of two randomly drawn clusters into a single cluster. 18 / 44

19 MCMC sampling algorithm To sample the clustering variables A variable j is chosen randomly from {1,..., M} If j ν n it is proposed to move it to ν cl. Alternatively, if j ν cl propose to move it to ν n Acceptance prob for inclusion in ν cl is min(1, R) with R = = p(g, Z, ν X, α, π, β) p(g, Z, ν X, α, π, β) ( ) G 1 Γ (C j β) G Cj c=1 Γ(N gjc + β) Γ (β) C j Γ(N g=1 g + C m β) ( Cj ) 1 c=1 Γ(N ( ) jc + β) π. Γ(N + C j β) 1 π 19 / 44

20 Label switching Because the LCA likelihood is invariant to relabelling of the components, we need to deal with the label switching problem The reason is that p(g, Z, ν X, α, π, β) = p(g, Z δ, ν X, α, π, β) where Z δ denotes the indicator matrix obtained by applying any permutation δ of 1,..., G to the columns in Z Need to post-process the samples of labels to undo any label switching that may have occurred; this has to be done to get the posterior probability of cluster membership 20 / 44

21 Post-hoc parameter estimation Use the conditional expection and variance formulae E[A] = E[E[A B]] Var[A] = E[Var[A B]] + Var[E[A B]], Let N (t) g := N n=1 Z (t) ng, S (t) gmc := N n=1 Z (t) ng I(X nm = c). Then we can estimate the expected values E[θ gmc X, β] = E[E[θ gmc X, Z, β]] 1 T E[θ gmc X, Z (t), β], T and similarly for the variance. t=1 21 / 44

22 Post-hoc parameter estimation This leads to nice formulae to estimate the posterior mean and variance of the item probabilities E[θ gmc X, β] 1 T T t=1 S gmc (t) + β N g (t) + C m β. + 1 T Var[θ gmc X, β] 1 T T t=1 ( T t=1 (S gmc (t) + β)(n g (t) + (C m 1)β S gmc) (t) + C m β) 2 (N (t) (N (t) g S (t) gmc + β N (t) g + C m β 1 T Similar formulae are available for τ g. g + C m β + 1) ) T S gmc (t) 2 + β, + C m β t=1 N (t) g 22 / 44

23 Alzheimer data The sampler was then run for 100,000 iterations, and thinned by subsampling every twentieth iterate. Alzheimer Data Log Posterior Iteration 23 / 44

24 Alzheimer data Alzheimer Data 1.0 Variables Groups 24 / 44

25 Alzheimer data The posterior probability for the number of syndromes in early onset Alzheimers where p j = Estimated posterior probability of G classes Setting for π p 2 p 3 p 4 p 5 p 6 π = π Beta(1,1.5) / 44

26 Alzheimer data Collapsed Gibbs sampler post-hoc estimates Hallucination Activity Aggression Agitation Diurnal Affective Group (0.03) 0.54 (0.06) 0.10 (0.04) 0.14 (0.06) 0.13 (0.05) 0.59 (0.08) Group (0.04) 0.80 (0.06) 0.40 (0.08) 0.64 (0.12) 0.39 (0.07) 0.94 (0.04) Full model Gibbs sampler estimates Hallucination Activity Aggression Agitation Diurnal Affective Group (0.03) 0.54 (0.06) 0.11 (0.05) 0.14 (0.06) 0.14 (0.05) 0.59 (0.08) Group (0.04) 0.79 (0.07) 0.39 (0.08) 0.64 (0.12) 0.38 (0.07) 0.93 (0.07) 26 / 44

27 Back Pain Data Physiotherapy Data Variables Groups p 7 p 8 p 9 p 10 p 11 p / 44

28 Back Pain Data / Clinical Taxonomy The clustering closely follows the clinical taxonomy, but where the groups are subdivided into subtypes. CN N PN Group Group Group Group Group Group Group / 44

29 Local Independence (A Problem?) When analyzing the back pain data, we achieved very little data reduction. In fact, only one variable was labeled as non-clustering. An explanation for this is the local independence assumption in the model. Suppose we have two variables that are highly dependent and both exhibit clustering. The variable selection method will include both variables in the model, even if one variable contains no extra clustering information. 29 / 44

30 Dean & Raftery s Greedy Search Dean & Raftery (2010) proposed a greedy stepwise variable selection algorithm for LCA. The observation vector X n is partitioned as where X n = (X C n, X P n, X O n ) X C n are the current clustering variables. X P n is proposed to be added to the clustering variables. X O n are the other variables. 30 / 44

31 Dean & Raftery s Greedy Search Two competing models are compared: M 1 z M 2 z X C X P X C X P X O X O M1 assumes that the proposed variable has clustering structure. M 2 assumes that the proposed variable has no clustering structure. This framework reduces the independence assumption of the previously described approach. 31 / 44

32 Novel Extension: Relaxing Independence Further It is unrealistic to assume that X C n and X P n are conditionally independent. We propose replacing M 2 with a different model. M 1 z M 2 z X C X P X C X R X C X P X O X O M1 assumes that the proposed variable has clustering structure. M2 assumes that the proposed variable has no clustering structure beyond that explained by the clustering variables. 32 / 44

33 Stepwise Search Algorithm We propose a stepwise search algorithm to find an optimal set of variables for clustering. The algorithm involves the following steps: Add: Add a variable to the current clustering variables. Remove: Remove a variable from the current clustering variables. Swap: Swap a proposed variable with one already in the clustering variables. Model selection is implemented using BIC. 33 / 44

34 Back Pain Data The proposed model was applied to the back pain data: Variables N. latent classes BIC ARI All All Criteria Criteria Criteria The new model achieves much greater data reduction. 34 / 44

35 Algorithm Run Iter. Proposal BIC diff. Decision Proposal BIC diff. Decision 1 Remove Crit Accepted 2 Remove Crit Accepted Swap Crit.22 with Crit Rejected 3 Remove Crit Accepted Swap Crit.25 with Crit Rejected 4 Remove Crit Accepted Swap Crit.2 with Crit Rejected 5 Remove Crit Accepted Swap Crit.29 with Crit Rejected 6 Remove Crit Accepted Swap Crit.12 with Crit Accepted 7 Remove Crit Accepted Swap Crit.26 with Crit Accepted 8 Remove Crit Accepted Swap Crit.18 with Crit Rejected 9 Remove Crit Accepted Swap Crit.7 with Crit Rejected 10 Remove Crit Accepted Swap Crit.11 with Crit Rejected 11 Remove Crit Accepted Swap Crit.8 with Crit Rejected 12 Remove Crit Accepted Swap Crit.16 with Crit Accepted 13 Remove Crit Accepted Swap Crit.10 with Crit Rejected 14 Remove Crit Accepted Swap Crit.31 with Crit Rejected 15 Remove Crit Accepted Swap Crit.18 with Crit Rejected 16 Remove Crit Accepted Swap Crit.24 with Crit Rejected 17 Remove Crit Accepted Swap Crit.32 with Crit Rejected 18 Remove Crit Accepted Swap Crit.37 with Crit Rejected 19 Remove Crit Accepted Swap Crit.9 with Crit Rejected 20 Remove Crit Accepted Swap Crit.30 with Crit Accepted 21 Remove Crit Accepted Swap Crit.37 with Crit Rejected 22 Remove Crit Accepted Swap Crit.36 with Crit Accepted 23 Remove Crit Accepted Swap Crit.1 with Crit Accepted 24 Remove Crit Accepted Swap Crit.6 with Crit Accepted 25 Remove Crit Accepted Swap Crit.20 with Crit Accepted 26 Remove Crit Rejected Swap Crit.6 with Crit Accepted 27 Remove Crit Rejected Swap Crit.37 with Crit Rejected 35 / 44

36 Clustering / Clinical Taxonomy The clustering closely follows the clinical taxonomy. Class 1 Class 2 Class 3 Nociceptive Peripheral Neuropathic Central Sensitiization It is not unusual for patients diagnosed as Nociceptive may have Peripheral Neuropathic aspects to their back pain. 36 / 44

37 Clustering Variables The selected variables exhibit strong clustering across the three groups. Class Class Class Crit.2 Crit.5 Crit.8 Crit.9 Crit.13 Crit.15 Crit.19 Crit.26 Crit.28 Crit.33 Crit.37 Selected criteria 37 / 44

38 Chosen Variables with Descriptions The chosen variables have the following descriptions. Crit. Description Class 1 Class 2 Class 3 2 Pain associated to trauma, pathologic process or dysfunction Usually intermittent and sharp with movement/mechanical provocation Pain localized to the area of injury/dysfunction Pain referred in a dermatomal or cutaneous distribution Disproportionate, nonmechanical, unpredictable pattern of pain Pain in association with other dysesthesias Night pain/disturbed sleep Pain in association with high levels of functional disability Clear, consistent and proportionate pattern of pain Diffuse/nonanatomic areas of pain/tenderness on palpation Pain/symptom provocation on palpation of relevant neural tissues / 44

39 Discarded Variables Many of the discarded variables are related with the clustering variables. Discarded criteria Crit.38 Crit.36 Crit.35 Crit.34 Crit.32 Crit.31 Crit.30 Crit.29 Crit.27 Crit.25 Crit.24 Crit.23 Crit.22 Crit.20 Crit.18 Crit.16 Crit.14 Crit.12 Crit.11 Crit.10 Crit.7 Crit.6 Crit.4 Crit.3 Crit.1 Crit.2 Crit.5 Crit.8 Crit.9 Crit.13 Crit.15 Crit.19 Crit.26 Crit.28 Crit.33 Crit.37 Selected criteria These are not clustering variables because they don t exhibit clustering beyond what can be explained by the clustering variables. 39 / 44

40 Summary Model-based approaches to clustering and variable selection achieve excellent performance. The collapsed MCMC scheme explores the model space effectively. Removing independence assumptions in the model achieves improved variable selection. Care needed interpreting the chosen/discarded variables. 40 / 44

41 Simulation 1 z X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X / 44

42 Simulation 1 Results X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X / 44

43 Simulation 2 z X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X / 44

44 Simulation 2 Results X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X / 44

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Clustering bi-partite networks using collapsed latent block models

Clustering bi-partite networks using collapsed latent block models Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Bayesian finite mixtures with an unknown number of. components: the allocation sampler

Bayesian finite mixtures with an unknown number of. components: the allocation sampler Bayesian finite mixtures with an unknown number of components: the allocation sampler Agostino Nobile and Alastair Fearnside University of Glasgow, UK 2 nd June 2005 Abstract A new Markov chain Monte Carlo

More information

Bayesian Mixtures of Bernoulli Distributions

Bayesian Mixtures of Bernoulli Distributions Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametrics: Models Based on the Dirichlet Process Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro

More information

Bayesian Classification and Regression Trees

Bayesian Classification and Regression Trees Bayesian Classification and Regression Trees James Cussens York Centre for Complex Systems Analysis & Dept of Computer Science University of York, UK 1 Outline Problems for Lessons from Bayesian phylogeny

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Collapsed Variational Dirichlet Process Mixture Models

Collapsed Variational Dirichlet Process Mixture Models Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara Dept. of Computer Science Tokyo Institute of Technology, Japan kurihara@mi.cs.titech.ac.jp Max Welling Dept. of Computer Science

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Priors for Random Count Matrices with Random or Fixed Row Sums

Priors for Random Count Matrices with Random or Fixed Row Sums Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences

More information

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model Hierarchical models (chapter 5) Introduction to hierarchical models - sometimes called multilevel model Exchangeability Slide 1 Hierarchical model Example: heart surgery in hospitals - in hospital j survival

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Steven L. Scott. Presented by Ahmet Engin Ural

Steven L. Scott. Presented by Ahmet Engin Ural Steven L. Scott Presented by Ahmet Engin Ural Overview of HMM Evaluating likelihoods The Likelihood Recursion The Forward-Backward Recursion Sampling HMM DG and FB samplers Autocovariance of samplers Some

More information

Tree-Based Inference for Dirichlet Process Mixtures

Tree-Based Inference for Dirichlet Process Mixtures Yang Xu Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, USA Katherine A. Heller Department of Engineering University of Cambridge Cambridge, UK Zoubin Ghahramani

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Infinite latent feature models and the Indian Buffet Process

Infinite latent feature models and the Indian Buffet Process p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Prior Choice, Summarizing the Posterior

Prior Choice, Summarizing the Posterior Prior Choice, Summarizing the Posterior Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Informative Priors Binomial Model: y π Bin(n, π) π is the success probability. Need prior p(π) Bayes

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Motif representation using position weight matrix

Motif representation using position weight matrix Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Bayesian model selection for computer model validation via mixture model estimation

Bayesian model selection for computer model validation via mixture model estimation Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut BYU Statistics Department

More information

Learning Bayesian Networks for Biomedical Data

Learning Bayesian Networks for Biomedical Data Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,

More information

Bayesian Inference of Multiple Gaussian Graphical Models

Bayesian Inference of Multiple Gaussian Graphical Models Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Quantifying Fingerprint Evidence using Bayesian Alignment

Quantifying Fingerprint Evidence using Bayesian Alignment Quantifying Fingerprint Evidence using Bayesian Alignment Peter Forbes Joint work with Steffen Lauritzen and Jesper Møller Department of Statistics University of Oxford UCL CSML Lunch Talk 14 February

More information

Bayesian Nonparametric Models

Bayesian Nonparametric Models Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior

More information

Infer relationships among three species: Outgroup:

Infer relationships among three species: Outgroup: Infer relationships among three species: Outgroup: Three possible trees (topologies): A C B A B C Model probability 1.0 Prior distribution Data (observations) probability 1.0 Posterior distribution Bayes

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Naïve Bayesian. From Han Kamber Pei

Naïve Bayesian. From Han Kamber Pei Naïve Bayesian From Han Kamber Pei Bayesian Theorem: Basics Let X be a data sample ( evidence ): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine H

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

WinLTA USER S GUIDE for Data Augmentation

WinLTA USER S GUIDE for Data Augmentation USER S GUIDE for Version 1.0 (for WinLTA Version 3.0) Linda M. Collins Stephanie T. Lanza Joseph L. Schafer The Methodology Center The Pennsylvania State University May 2002 Dev elopment of this program

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

(3) Review of Probability. ST440/540: Applied Bayesian Statistics Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo (and Bayesian Mixture Models) David M. Blei Columbia University October 14, 2014 We have discussed probabilistic modeling, and have seen how the posterior distribution is the critical

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Bayesian Model Comparison:

Bayesian Model Comparison: Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014 Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work

More information