39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

Size: px
Start display at page:

Download "39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017"

Transcription

1 Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, / 36

2 Joint work with Quan Zhang McCombs PhD student in Risk Analysis and Decision Making 2 / 36

3 1 Overview 2 logistic Permuted and augmented () logistic 3 Binary M 4 Binary 5 3 / 36

4 construction Provide a size-biased random permutation of the random draw from a Dirichlet process (Sethuraman 1994). Stick success probabilities can depend on covariates (Dunson and Park 2008, Chung and Dunson 2009, Ren et al 2011). Can be used to model the dependencies between probabilities by logit-gaussian distribution/process (Linderman et al 2015). 4 / 36

5 logistic Permuted and augmented () logistic for Drawing y i (p i1,..., p is ) is equivalent to drawing a sequence of binary random variables as [( b is {b ij } j<s Bernoulli 1 ) ] j<s b ij π is, p is π is = 1 j<s p, s = 1, 2,..., S. ij If we define y i = s if and only if b is = 1 and b ij = 0 for all j s, then we have p is = P(y i = s {π is } 1,S ) = P(b is = 1) j s P(b ij = 0) = (π is ) 1(s S) j<s(1 π ij ). 5 / 36

6 logistic Permuted and augmented () logistic Theorem (1) Suppose y i S s=1 p isδ s, where [p i1,..., p is ] is a probability vector whose elements are constructed as p is = (π is ) 1(s S) j<s (1 π ij), then y i can be equivalently generated from augmented (asb) as S y i 1(b is = 1) 1(s S) 1(b ij = 0) δ s, s=1 j<s b is Bernoulli(π is ), s {1,..., S}. 6 / 36

7 logistic Permuted and augmented () logistic transforms the problem of with S categories into the problem of S conditionally independent binary s. Gibbs sampling: Sample b is for s {1,..., S}: b is = 0 if s < y i b is = 1 if s = y i b is Bernoulli(π is ) if s > y i Solve b is Bernoulli(π is ) for s {1,..., S}, where the covariate-dependent stick probability π is for the sth stick/category is a deterministic function of x i β s. Any binary model (with cross entropy loss) can be generalized to a one under augmented, but a naive combination may not work well. 7 / 36

8 logistic Permuted and augmented () logistic : augmented logistic regrssion ex i β s If we let π is =, which means logit(π 1 + e x i β is ) = x i β s, then s we have p is = ex i β s e x i β s 1 + e x i β j j<s logistic : S y i 1(b is = 1) 1(s S) 1(b ij = 0) δ s, s=1 j<s ( ) b is Bernoulli ex i β s π is = 1 + e x i β s, s {1,..., S}. 8 / 36

9 logistic Permuted and augmented () logistic Gibbs sampling: Sample b is for s {1,..., S}: b is = 0 if s < y i b is = 1 if s = y i b is Bernoulli(π is ) if s > y ( i Solve b is Bernoulli ex i β s π is = 1 + e x i β s β s can be inferred with the Polya-Gamma data augmentation. Closed-form Gibbs sampling update equations. Problem solved? End of the talk?? Not really... ) : 9 / 36

10 logistic Permuted and augmented () logistic The number of geometric constraints increases in s. p i1 = ( 1 + e x i β 1) 1 is larger than 0.5 if x iβ 1 > 0. p i2 = ( 1 + e x i β 1) 1 ( 1 + e x i β 2) 1 is possible to be larger than 0.5 only if both x iβ 1 < 0 and x iβ 2 > 0 p i3 = ( 1 + e x i β 1) 1 ( 1 + e x i β 2) 1 ( 1 + e x i β 3) 1 is possible to be larger than 0.5 only if x iβ 1 < 0, x iβ 2 < 0, and x iβ 3 > / 36

11 logistic Permuted and augmented () logistic logistic is not invariant to label permutation, and may or may not work depending on how the categories are labeled (ordered). For the Iris data (sepal and petal lengths as covaraites), it does not work well if 1st category/stick: blue points (middle) 2nd category/stick: black points (bottom) 3rd category/stick: gray points (top) It works well as long as the blue points are not labeled as the first category (stick). 11 / 36

12 logistic Permuted and augmented () logistic If some categories are not linearly separable, augmented logistic may not work well no matter how the categories are labeled. How to address the sensitivity to label permutation? How to separate the categories that are not linearly separable? 12 / 36

13 Permuted and augmented () logistic Permuted and augmented () logistic Denote z = (z 1,..., z S ) as a permutation of (1,..., S) z s {1,..., S} is the index of the latent stick that category s is uniquely mapped to. S! possible permutations 6! = 720; 7! = 5, 040;... ; 10! = 3, 628, 800;... Fortunately, the effective search space could be significantly smaller than S!. We will discuss why and provide examples. 13 / 36

14 logistic Permuted and augmented () logistic Theorem (2) Suppose y i S s=1 p is(z)δ s, where [p i1 (z),..., p is (z)] is a probability vector whose elements are constructed as p is (z) = (π izs ) 1(zs S) (1 π ij ), j<z s then y i can be equivalently generated under the permuted and augmented () construction as y i S s=1 { [1(b izs = 1)] 1(zs S) } j<z s 1(b ij = 0) δ s, b ij Bernoulli(π ij ), j {1,..., S}. S categories are randomly one-to-one mapped to S sticks. {b is } s given {π is } s are mutually independent in the prior. 14 / 36

15 logistic Permuted and augmented () logistic Gibbs sampling for : Sample b is for s {1,..., S}: Category y i is mapped to stick z yi. b is = 0 if s < z yi b is = 1 if s = z yi b is Bernoulli(π is ) if s > z yi Solve b is Bernoulli(π is ) for s {1,..., S}, where the covariate-dependent stick probability π is for the sth category is a deterministic function of x i β s. Sample (z 1,..., z S ), the one-to-one mapping between the category and stick indices, using Metropolis-Hastings. 15 / 36

16 logistic Permuted and augmented () logistic Sample the label-stick one-to-one mapping: Let (z 1,..., z S ) be uniformly at random selected from the S! possible permutations in the prior. Propose to change z = (z 1,..., z j,..., z j,..., z S ) to z = (z 1,..., z S ) := (z 1,..., z j,..., z j,..., z S ). Accept the proposal with probability { S s=1 min [p } iz s ]1(y i =s) S s=1 [p iz s ], 1 1(y i =s) i [ S s=1 (π iz s ) 1(z s S) ] 1(yi =s) j<z (1 π s = min ij ) [ S i s=1 (π izs ) ], 1 1(yi 1(zs S) =s) j<z s (1 π ij ). Proposing two indices z j and z j to switch in each iteration is effective for escaping from the set of poor mappings. The probability of a z j not proposed to switch is [(S 2)/S] t after t MCMC iterations. Even if S = 100, this probability is less than 10 8 at t = S/2 is the expected number of iterations for a z j to be proposed to switch. 16 / 36

17 logistic Permuted and augmented () logistic Model: y i logistic { } S 1(zs S) 1(b izs = 1) 1(b ij = 0) δ s, j<z s ( ) s=1 b is Bernoulli ex i β s π is = 1 + e x i β s, s {1,..., S}. The number of geometric constraints increases in z s. If z s = 1, then p is = ( 1 + e x i β 1) 1 is larger than 0.5 if x i β 1 > 0. If z s = 2, then p is = ( 1 + e x i β 1) 1 ( 1 + e x i β 2) 1 is possible to be larger than 0.5 only if both x i β 1 < 0 and x i β 2 > 0 If z s = 3, then p i3 = ( 1 + e x i 1) β 1 ( 1 + e x i 2) β 1 ( 1 + e x i β 3 is possible to be larger than 0.5 only if x i β 1 < 0, x i β 2 < 0, and x i β 3 > 0. ) 1 17 / 36

18 logistic Permuted and augmented () logistic Sequential decision making and relaxing independence of irrelevant alternative (IIA) assumption A one-vs-remaining decision at each of the stick breaking steps. Lemma Under the construction, the probability ratio of two choices are influenced by the success probabilities of the sticks that lie between these two choices corresponding sticks. In other words, the probability ratio of two choices will be influenced by some other choices if they are not mapped to adjacent sticks. 18 / 36

19 logistic Permuted and augmented () logistic logistic as a discrete choice model The logistic that assigns choice s {1,..., S} for individual i with probability p is = (π is ) 1(s S) j<s (1 π ij), π is = 1/(1 + e W is ), is equivalent to a sequential random utility maximization model which selects choice s once U is > j s U ij is observed, where U i1 = U i2 + + U is + W i1 + ε i1, U is = j>s U ij + W is + ε is, U i(s 1) = W i(s 1) + ε i(s 1), U is = 0, and ε is i.i.d. Logistic(0, 1). 19 / 36

20 Binary M l(β, ν) = Binary n max(1 y i x iβ, 0) + νr(β), where y i { 1, 1} i=1 vector machine 20 / 36

21 Binary M Mixture representation: 0 Bayesian Binary [Polson & Scott (2011)] L(y i x i, β) = exp { 2max(1 y i x iβ, 0) } ( 1 = exp 1 (1 + λ i y i x ) i β)2 dλ i. 2πλi 2 λ i Gibbs sampling for β is available under data augmentation. Decision rule [sollich (2002) and Mallick, et al. (2005)]: 1 P(y i = 1 x i, β) = 1 + e 2y i x i β, for x iβ 1; e y i [x i β+sign(x i β)], for x iβ > 1; 21 / 36

22 Binary M Under the construction, given the covariate vector x i and category-stick mapping z, support vector machine (M) parameterizes p is, the probability of category s, as p is (z) = [π izs,svm(x i, β s )] 1(zs S) j:z j <z s π izj,svm(x i, β j ), where 1 π izj,svm(x i, β j ) = 1 + e 2x, for x i β j i β j 1; e x i β j sign(x i β j ), for x i β j > / 36

23 Binary b is Bernoulli [ K 1 k=1 Equivalently, Softplus [ (2016)] [ (1+e x +1) i β(t sk ln {1+e x i β(t ) sk ln ln θ (T ) isk ( ) Gamma r sk, e x +1) i β(t sk, ( θ (t) isk Gamma θ (t+1) isk, e x i β(t+1) sk ( ) θ (1) isk Gamma θ (2) isk, ex i β(2) sk, ), )]}) ] (1+e x rsk i β(2) sk. b is = 1(m is 1), m is = K k=1 K is supported by the gamma process. m (1) isk, m(1) isk Pois(θ(1) isk ), 23 / 36

24 Properties of binary Binary K > 1 and T = 1: using the interaction of up to K hyperplanes to enclose negative examples K = 1 and T > 1: using the interaction of up to T hyperplanes to enclose positive examples K > and T > 1: using the union of convex-polytope-like confined space to enclose positive examples K and T together control the nonlinear capacity of the model K is supported by the gamma process. 24 / 36

25 Binary K = 20, T = 1 Label Class A as 1 and Class B as 0: Binary 25 / 36

26 Binary K = 1, T = 20 Label Class A as 0 and Class B as 1: Binary 26 / 36

27 Binary K = 20, T = 20 Label Class A as 1 and Class B as 0: Binary 27 / 36

28 Binary (MSR) With a draw from a gamma process for each category that (2:T +1) consists of countably infinite atoms β sk with weights r sk > 0, where β (t) sk RP+1, given the covariate vector x i and category-stick mapping z, MSR parameterizes p is, the probability of category s, under the construction as p izs = [ 1 k=1 (1+e x i β(t+1) sk ln {1+e x i β(t ) [ ( { j:z j <z s k=1 1+e x i β(t+1) jk ln sk [ )]}) ] ln ln (1+e x rsk 1(zs S) i β(2) sk 1+e x i β(t ) jk ln [ ( )]}) ] rjk ln 1+e x i β(2) jk. 28 / 36

29 Permuted and Label blue, black, and gray (middle, bottom, and top) points as classes 1, 2, and 3, respectively. Fix z = [1, 2, 3] for. Row 1: K = 1, T = 1. Row 2: K = 1, T = 3. Row 3: K = 5, T = 1. Row 4: K = 5, T = / 36

30 Permuted and Label blue, black, and gray (inside left, outside, and inside right) points as classes 1, 2, and 3, respectively. with K = T = 10. Row 1: z = [1, 2, 3]. Row 2: z = [2, 1, 3] Row 3: z = [3, 1, 2]. Row 4: sample z during MCMC iterations 30 / 36

31 Model comparison Table: Comparison of classification error rate of -M, MSR with different K and T, L 2 -MLR, and AMM. -M K = 1 T = 1 K = 1 T = 3 K = 5 T = 1 K = 5 T = 3 L 2-MLR AMM square iris wine glass vehicle waveform segment vowel dna satimage ANER / 36

32 Number of active experts in 5 4 parsb active experts square iris wine glass vehicle waveform segment vowel dna satimage Figure: Boxplots of the number of active experts. 32 / 36

33 Permuted and count Likelihood for S! different permulations log likelihood Figure: Histogram of S! = 6! = 720 log-likelihoods for Satimage, using augmented (MSR) with K = 5 and T = 3. The blue line indicates the average log-likelihood of the collected MCMC samples of MSR, with the permutation z sampled via the proposed Metropololis-Hasting step. 33 / 36

34 Softplus with support hyperplanes Figure: Row 1: Softplus with K = 5, T = 3. Row 2: Softplus with K = 5, T = 3 and data transformation of support hyperplanes. 34 / 36

35 Discussions A general framework to transform a binary classifier to a multi-class one. Fully Bayesian inference via data augmentation. The coefficient vectors of different categories can be sampled in parallel in each MCMC iteration. Not invariant to label permutation if the label-stick mapping is fixed. Asymmetric geometric constraints (more constraints for a category mapped to a larger-indexed stick). 35 / 36

36 Thank You! 36 / 36

Permuted and Augmented Stick-Breaking Bayesian Multinomial Regression

Permuted and Augmented Stick-Breaking Bayesian Multinomial Regression Journal of Machine Learning Research 18 (2018) 1-33 Submitted 7/17; Revised 12/17; Published 4/18 Permuted and Augmented Stick-Breaking Bayesian Multinomial Regression Quan Zhang quan.zhang@mccombs.utexas.edu

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Priors for Random Count Matrices with Random or Fixed Row Sums

Priors for Random Count Matrices with Random or Fixed Row Sums Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Coupled Hidden Markov Models: Computational Challenges

Coupled Hidden Markov Models: Computational Challenges .. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

A Bayesian multi-dimensional couple-based latent risk model for infertility

A Bayesian multi-dimensional couple-based latent risk model for infertility A Bayesian multi-dimensional couple-based latent risk model for infertility Zhen Chen, Ph.D. Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Maintaining Nets and Net Trees under Incremental Motion

Maintaining Nets and Net Trees under Incremental Motion Maintaining Nets and Net Trees under Incremental Motion Minkyoung Cho, David Mount, and Eunhui Park Department of Computer Science University of Maryland, College Park August 25, 2009 Latent Space Embedding

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

arxiv: v1 [stat.ml] 20 Nov 2012

arxiv: v1 [stat.ml] 20 Nov 2012 A survey of non-exchangeable priors for Bayesian nonparametric models arxiv:1211.4798v1 [stat.ml] 20 Nov 2012 Nicholas J. Foti 1 and Sinead Williamson 2 1 Department of Computer Science, Dartmouth College

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

CMPS 242: Project Report

CMPS 242: Project Report CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Clustering bi-partite networks using collapsed latent block models

Clustering bi-partite networks using collapsed latent block models Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Efficient MCMC Samplers for Network Tomography

Efficient MCMC Samplers for Network Tomography Efficient MCMC Samplers for Network Tomography Martin Hazelton 1 Institute of Fundamental Sciences Massey University 7 December 2015 1 Email: m.hazelton@massey.ac.nz AUT Mathematical Sciences Symposium

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making Modeling conditional distributions with mixture models: Applications in finance and financial decision-making John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information