39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017
|
|
- Alexina Wells
- 5 years ago
- Views:
Transcription
1 Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, / 36
2 Joint work with Quan Zhang McCombs PhD student in Risk Analysis and Decision Making 2 / 36
3 1 Overview 2 logistic Permuted and augmented () logistic 3 Binary M 4 Binary 5 3 / 36
4 construction Provide a size-biased random permutation of the random draw from a Dirichlet process (Sethuraman 1994). Stick success probabilities can depend on covariates (Dunson and Park 2008, Chung and Dunson 2009, Ren et al 2011). Can be used to model the dependencies between probabilities by logit-gaussian distribution/process (Linderman et al 2015). 4 / 36
5 logistic Permuted and augmented () logistic for Drawing y i (p i1,..., p is ) is equivalent to drawing a sequence of binary random variables as [( b is {b ij } j<s Bernoulli 1 ) ] j<s b ij π is, p is π is = 1 j<s p, s = 1, 2,..., S. ij If we define y i = s if and only if b is = 1 and b ij = 0 for all j s, then we have p is = P(y i = s {π is } 1,S ) = P(b is = 1) j s P(b ij = 0) = (π is ) 1(s S) j<s(1 π ij ). 5 / 36
6 logistic Permuted and augmented () logistic Theorem (1) Suppose y i S s=1 p isδ s, where [p i1,..., p is ] is a probability vector whose elements are constructed as p is = (π is ) 1(s S) j<s (1 π ij), then y i can be equivalently generated from augmented (asb) as S y i 1(b is = 1) 1(s S) 1(b ij = 0) δ s, s=1 j<s b is Bernoulli(π is ), s {1,..., S}. 6 / 36
7 logistic Permuted and augmented () logistic transforms the problem of with S categories into the problem of S conditionally independent binary s. Gibbs sampling: Sample b is for s {1,..., S}: b is = 0 if s < y i b is = 1 if s = y i b is Bernoulli(π is ) if s > y i Solve b is Bernoulli(π is ) for s {1,..., S}, where the covariate-dependent stick probability π is for the sth stick/category is a deterministic function of x i β s. Any binary model (with cross entropy loss) can be generalized to a one under augmented, but a naive combination may not work well. 7 / 36
8 logistic Permuted and augmented () logistic : augmented logistic regrssion ex i β s If we let π is =, which means logit(π 1 + e x i β is ) = x i β s, then s we have p is = ex i β s e x i β s 1 + e x i β j j<s logistic : S y i 1(b is = 1) 1(s S) 1(b ij = 0) δ s, s=1 j<s ( ) b is Bernoulli ex i β s π is = 1 + e x i β s, s {1,..., S}. 8 / 36
9 logistic Permuted and augmented () logistic Gibbs sampling: Sample b is for s {1,..., S}: b is = 0 if s < y i b is = 1 if s = y i b is Bernoulli(π is ) if s > y ( i Solve b is Bernoulli ex i β s π is = 1 + e x i β s β s can be inferred with the Polya-Gamma data augmentation. Closed-form Gibbs sampling update equations. Problem solved? End of the talk?? Not really... ) : 9 / 36
10 logistic Permuted and augmented () logistic The number of geometric constraints increases in s. p i1 = ( 1 + e x i β 1) 1 is larger than 0.5 if x iβ 1 > 0. p i2 = ( 1 + e x i β 1) 1 ( 1 + e x i β 2) 1 is possible to be larger than 0.5 only if both x iβ 1 < 0 and x iβ 2 > 0 p i3 = ( 1 + e x i β 1) 1 ( 1 + e x i β 2) 1 ( 1 + e x i β 3) 1 is possible to be larger than 0.5 only if x iβ 1 < 0, x iβ 2 < 0, and x iβ 3 > / 36
11 logistic Permuted and augmented () logistic logistic is not invariant to label permutation, and may or may not work depending on how the categories are labeled (ordered). For the Iris data (sepal and petal lengths as covaraites), it does not work well if 1st category/stick: blue points (middle) 2nd category/stick: black points (bottom) 3rd category/stick: gray points (top) It works well as long as the blue points are not labeled as the first category (stick). 11 / 36
12 logistic Permuted and augmented () logistic If some categories are not linearly separable, augmented logistic may not work well no matter how the categories are labeled. How to address the sensitivity to label permutation? How to separate the categories that are not linearly separable? 12 / 36
13 Permuted and augmented () logistic Permuted and augmented () logistic Denote z = (z 1,..., z S ) as a permutation of (1,..., S) z s {1,..., S} is the index of the latent stick that category s is uniquely mapped to. S! possible permutations 6! = 720; 7! = 5, 040;... ; 10! = 3, 628, 800;... Fortunately, the effective search space could be significantly smaller than S!. We will discuss why and provide examples. 13 / 36
14 logistic Permuted and augmented () logistic Theorem (2) Suppose y i S s=1 p is(z)δ s, where [p i1 (z),..., p is (z)] is a probability vector whose elements are constructed as p is (z) = (π izs ) 1(zs S) (1 π ij ), j<z s then y i can be equivalently generated under the permuted and augmented () construction as y i S s=1 { [1(b izs = 1)] 1(zs S) } j<z s 1(b ij = 0) δ s, b ij Bernoulli(π ij ), j {1,..., S}. S categories are randomly one-to-one mapped to S sticks. {b is } s given {π is } s are mutually independent in the prior. 14 / 36
15 logistic Permuted and augmented () logistic Gibbs sampling for : Sample b is for s {1,..., S}: Category y i is mapped to stick z yi. b is = 0 if s < z yi b is = 1 if s = z yi b is Bernoulli(π is ) if s > z yi Solve b is Bernoulli(π is ) for s {1,..., S}, where the covariate-dependent stick probability π is for the sth category is a deterministic function of x i β s. Sample (z 1,..., z S ), the one-to-one mapping between the category and stick indices, using Metropolis-Hastings. 15 / 36
16 logistic Permuted and augmented () logistic Sample the label-stick one-to-one mapping: Let (z 1,..., z S ) be uniformly at random selected from the S! possible permutations in the prior. Propose to change z = (z 1,..., z j,..., z j,..., z S ) to z = (z 1,..., z S ) := (z 1,..., z j,..., z j,..., z S ). Accept the proposal with probability { S s=1 min [p } iz s ]1(y i =s) S s=1 [p iz s ], 1 1(y i =s) i [ S s=1 (π iz s ) 1(z s S) ] 1(yi =s) j<z (1 π s = min ij ) [ S i s=1 (π izs ) ], 1 1(yi 1(zs S) =s) j<z s (1 π ij ). Proposing two indices z j and z j to switch in each iteration is effective for escaping from the set of poor mappings. The probability of a z j not proposed to switch is [(S 2)/S] t after t MCMC iterations. Even if S = 100, this probability is less than 10 8 at t = S/2 is the expected number of iterations for a z j to be proposed to switch. 16 / 36
17 logistic Permuted and augmented () logistic Model: y i logistic { } S 1(zs S) 1(b izs = 1) 1(b ij = 0) δ s, j<z s ( ) s=1 b is Bernoulli ex i β s π is = 1 + e x i β s, s {1,..., S}. The number of geometric constraints increases in z s. If z s = 1, then p is = ( 1 + e x i β 1) 1 is larger than 0.5 if x i β 1 > 0. If z s = 2, then p is = ( 1 + e x i β 1) 1 ( 1 + e x i β 2) 1 is possible to be larger than 0.5 only if both x i β 1 < 0 and x i β 2 > 0 If z s = 3, then p i3 = ( 1 + e x i 1) β 1 ( 1 + e x i 2) β 1 ( 1 + e x i β 3 is possible to be larger than 0.5 only if x i β 1 < 0, x i β 2 < 0, and x i β 3 > 0. ) 1 17 / 36
18 logistic Permuted and augmented () logistic Sequential decision making and relaxing independence of irrelevant alternative (IIA) assumption A one-vs-remaining decision at each of the stick breaking steps. Lemma Under the construction, the probability ratio of two choices are influenced by the success probabilities of the sticks that lie between these two choices corresponding sticks. In other words, the probability ratio of two choices will be influenced by some other choices if they are not mapped to adjacent sticks. 18 / 36
19 logistic Permuted and augmented () logistic logistic as a discrete choice model The logistic that assigns choice s {1,..., S} for individual i with probability p is = (π is ) 1(s S) j<s (1 π ij), π is = 1/(1 + e W is ), is equivalent to a sequential random utility maximization model which selects choice s once U is > j s U ij is observed, where U i1 = U i2 + + U is + W i1 + ε i1, U is = j>s U ij + W is + ε is, U i(s 1) = W i(s 1) + ε i(s 1), U is = 0, and ε is i.i.d. Logistic(0, 1). 19 / 36
20 Binary M l(β, ν) = Binary n max(1 y i x iβ, 0) + νr(β), where y i { 1, 1} i=1 vector machine 20 / 36
21 Binary M Mixture representation: 0 Bayesian Binary [Polson & Scott (2011)] L(y i x i, β) = exp { 2max(1 y i x iβ, 0) } ( 1 = exp 1 (1 + λ i y i x ) i β)2 dλ i. 2πλi 2 λ i Gibbs sampling for β is available under data augmentation. Decision rule [sollich (2002) and Mallick, et al. (2005)]: 1 P(y i = 1 x i, β) = 1 + e 2y i x i β, for x iβ 1; e y i [x i β+sign(x i β)], for x iβ > 1; 21 / 36
22 Binary M Under the construction, given the covariate vector x i and category-stick mapping z, support vector machine (M) parameterizes p is, the probability of category s, as p is (z) = [π izs,svm(x i, β s )] 1(zs S) j:z j <z s π izj,svm(x i, β j ), where 1 π izj,svm(x i, β j ) = 1 + e 2x, for x i β j i β j 1; e x i β j sign(x i β j ), for x i β j > / 36
23 Binary b is Bernoulli [ K 1 k=1 Equivalently, Softplus [ (2016)] [ (1+e x +1) i β(t sk ln {1+e x i β(t ) sk ln ln θ (T ) isk ( ) Gamma r sk, e x +1) i β(t sk, ( θ (t) isk Gamma θ (t+1) isk, e x i β(t+1) sk ( ) θ (1) isk Gamma θ (2) isk, ex i β(2) sk, ), )]}) ] (1+e x rsk i β(2) sk. b is = 1(m is 1), m is = K k=1 K is supported by the gamma process. m (1) isk, m(1) isk Pois(θ(1) isk ), 23 / 36
24 Properties of binary Binary K > 1 and T = 1: using the interaction of up to K hyperplanes to enclose negative examples K = 1 and T > 1: using the interaction of up to T hyperplanes to enclose positive examples K > and T > 1: using the union of convex-polytope-like confined space to enclose positive examples K and T together control the nonlinear capacity of the model K is supported by the gamma process. 24 / 36
25 Binary K = 20, T = 1 Label Class A as 1 and Class B as 0: Binary 25 / 36
26 Binary K = 1, T = 20 Label Class A as 0 and Class B as 1: Binary 26 / 36
27 Binary K = 20, T = 20 Label Class A as 1 and Class B as 0: Binary 27 / 36
28 Binary (MSR) With a draw from a gamma process for each category that (2:T +1) consists of countably infinite atoms β sk with weights r sk > 0, where β (t) sk RP+1, given the covariate vector x i and category-stick mapping z, MSR parameterizes p is, the probability of category s, under the construction as p izs = [ 1 k=1 (1+e x i β(t+1) sk ln {1+e x i β(t ) [ ( { j:z j <z s k=1 1+e x i β(t+1) jk ln sk [ )]}) ] ln ln (1+e x rsk 1(zs S) i β(2) sk 1+e x i β(t ) jk ln [ ( )]}) ] rjk ln 1+e x i β(2) jk. 28 / 36
29 Permuted and Label blue, black, and gray (middle, bottom, and top) points as classes 1, 2, and 3, respectively. Fix z = [1, 2, 3] for. Row 1: K = 1, T = 1. Row 2: K = 1, T = 3. Row 3: K = 5, T = 1. Row 4: K = 5, T = / 36
30 Permuted and Label blue, black, and gray (inside left, outside, and inside right) points as classes 1, 2, and 3, respectively. with K = T = 10. Row 1: z = [1, 2, 3]. Row 2: z = [2, 1, 3] Row 3: z = [3, 1, 2]. Row 4: sample z during MCMC iterations 30 / 36
31 Model comparison Table: Comparison of classification error rate of -M, MSR with different K and T, L 2 -MLR, and AMM. -M K = 1 T = 1 K = 1 T = 3 K = 5 T = 1 K = 5 T = 3 L 2-MLR AMM square iris wine glass vehicle waveform segment vowel dna satimage ANER / 36
32 Number of active experts in 5 4 parsb active experts square iris wine glass vehicle waveform segment vowel dna satimage Figure: Boxplots of the number of active experts. 32 / 36
33 Permuted and count Likelihood for S! different permulations log likelihood Figure: Histogram of S! = 6! = 720 log-likelihoods for Satimage, using augmented (MSR) with K = 5 and T = 3. The blue line indicates the average log-likelihood of the collected MCMC samples of MSR, with the permutation z sampled via the proposed Metropololis-Hasting step. 33 / 36
34 Softplus with support hyperplanes Figure: Row 1: Softplus with K = 5, T = 3. Row 2: Softplus with K = 5, T = 3 and data transformation of support hyperplanes. 34 / 36
35 Discussions A general framework to transform a binary classifier to a multi-class one. Fully Bayesian inference via data augmentation. The coefficient vectors of different categories can be sampled in parallel in each MCMC iteration. Not invariant to label permutation if the label-stick mapping is fixed. Asymmetric geometric constraints (more constraints for a category mapped to a larger-indexed stick). 35 / 36
36 Thank You! 36 / 36
Permuted and Augmented Stick-Breaking Bayesian Multinomial Regression
Journal of Machine Learning Research 18 (2018) 1-33 Submitted 7/17; Revised 12/17; Published 4/18 Permuted and Augmented Stick-Breaking Bayesian Multinomial Regression Quan Zhang quan.zhang@mccombs.utexas.edu
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationPriors for Random Count Matrices with Random or Fixed Row Sums
Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationBayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationLecture 16: Mixtures of Generalized Linear Models
Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently
More informationA Bayesian Perspective on Residential Demand Response Using Smart Meter Data
A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationCoupled Hidden Markov Models: Computational Challenges
.. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationMaking rating curves - the Bayesian approach
Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationGradient-Based Learning. Sargur N. Srihari
Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationA Bayesian multi-dimensional couple-based latent risk model for infertility
A Bayesian multi-dimensional couple-based latent risk model for infertility Zhen Chen, Ph.D. Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationMaintaining Nets and Net Trees under Incremental Motion
Maintaining Nets and Net Trees under Incremental Motion Minkyoung Cho, David Mount, and Eunhui Park Department of Computer Science University of Maryland, College Park August 25, 2009 Latent Space Embedding
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationarxiv: v1 [stat.ml] 20 Nov 2012
A survey of non-exchangeable priors for Bayesian nonparametric models arxiv:1211.4798v1 [stat.ml] 20 Nov 2012 Nicholas J. Foti 1 and Sinead Williamson 2 1 Department of Computer Science, Dartmouth College
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationCMPS 242: Project Report
CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the
More informationBayesian Modeling of Conditional Distributions
Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationClustering bi-partite networks using collapsed latent block models
Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationEfficient MCMC Samplers for Network Tomography
Efficient MCMC Samplers for Network Tomography Martin Hazelton 1 Institute of Fundamental Sciences Massey University 7 December 2015 1 Email: m.hazelton@massey.ac.nz AUT Mathematical Sciences Symposium
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationModeling conditional distributions with mixture models: Applications in finance and financial decision-making
Modeling conditional distributions with mixture models: Applications in finance and financial decision-making John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More informationScaling up Bayesian Inference
Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background
More information