Dynamic Generalized Linear Models
|
|
- Brianna Bryant
- 6 years ago
- Views:
Transcription
1 Dynamic Generalized Linear Models Jesse Windle Oct. 24, 2012 Contents 1 Introduction 1 2 Binary Data (Static Case) 2 3 Data Augmentation (de-marginalization) by 4 examples Example 1: CDF method Example 2: Mixture of Normals Example 3: Discrete Mixture of Normals Example 4: Laplace Method Aside: Expectation Maximization 7 5 Dynamic Generalized Linear Models The Old School Way Using data augmentation Introduction Useful when the response is categorical or count data. Used in ecology, finance, economics, political science, neuroscience, epidemiology. Ralph Reed is data mining your Kindle [Becker, 2012]. Response: Vote Republican 1
2 Predictors: hunting license, read the Bible, has read Going Rogue by Sarah Palin, drive a pickup, married, income. Sometimes such models evolve in time. Spike train data in neuroscience. Outline Posterior inference via data augmentation in static case. Then posterior inference for dynamic case. 2 Binary Data (Static Case) y i {0, 1} response, x i row-vector of predictors. P (y i = 1) = p i p i = x i β Transform to some other coordinate system: ψ i = g(p i ) p i = σ(ψ i ), σ is a sigmoidal function. ψ i = x i β linear model in new coordinate system. A term in LH: LH: p y i i (1 p i) 1 y i σ(y i ) y i (1 σ(y i )) 1 y i. n σ(y i ) y i (1 σ(y i )) 1 y i. i=1 Problem: we don t know this distribution. Possible Solutions: Metropolis-Hastings (requires picking a proposal, tuning). Data augmentation + Gibbs sampling (don t even have to think... once you have the augmentation). 2
3 3 Data Augmentation (de-marginalization) by 4 examples Idea: p(β) is hard to simulate. Find joint distribution so that p(β) = p(β, ω)dω and are easy(er) to simulate. Gibbs sample. p(β ω) and p(ω β) 3.1 Example 1: CDF method Albert and Chib [1993] σ(ψ i ) = Φ(ψ i ), Gaussian CDF. So { P (y i = 1) = p i p i = Φ(ψ i ), ψ i = x i β. Data generating density: p(y i ψ i ) = δ 1 (y i )Φ(ψ i ) + δ 0 (y i )(1 Φ(ψ i )). De-marginalize: Φ(ψ i ) = N(z i ; ψ i, 1)1 (0, ) (z i ). So p(y i ψ i ) = [ ] δ 1 (y i )1 (0, ) (z i ) + δ 0 (y i ) 1 (,0) (z i ) N(z i ψ i, 1)dz i. }{{} 1 1 (0, ) (z i ) 3
4 Thus p(y i, z i ψ i ) = p(y i z i )p(z i ψ i ) [ ] = δ 1 (y i )1 (0, ) (z i ) + δ 0 (y i )1 (,0) (z i ) N(z i ψ i, 1). Augmented model: { y i = 1{z i > 0} Gibbs sample: z i = x i β + ε i, ε i N(0, 1). p(β z, y) Normal given normal prior. p(z y, β) = i p(z i y i, β) where Check it out for yourself. p(z i y i, β) is truncated normal. 3.2 Example 2: Mixture of Normals Holmes and Held [2006] Goal: take intractable distribution and represent it as a mixture of normals. Previously no interpretation of ψ i. Logistic regression: Same: P (y i = 1) = p i. Different: iff ψ i = log p i 1 p i (log odds scale) p i = Play the same game... { y i = 1{z i > 0} eψ i 1 + e ψ i. z i = x i β + ε i, ε i Lo(0, 1). 4
5 Lo is the logistic distribution. p(β z i, y i )? Andrews and Mallows [1974] ε Lo(0, 1) ε N(0, ξ) ξ = 4λ 2 λ KS = Kolmogorov-Smirnov. So Augmented Model p(ε) = 0 p(ε λ)p(λ)dλ. y i = 1{z i > 0} z i = x i β + ε i, ε i N(0, ξ i ) Gibbs Sample: ξ i = 4λ 2 i λ i KS. p(z i y i, β, ξ) p(β y, z, ξ) p(ξ i β, z, y) trunc. normal normal something tractable. 3.3 Example 3: Discrete Mixture of Normals Frühwirth-Schnatter and Frühwirth [2007, 2010]. Logistic continued... { y i = 1{z i > 0} z i = x i β + ε i, ε i Lo(0, 1). Approximate Lo using a mixture of normals: { ε i N(0, σ 2 r i ) r i MN(1, w). 5
6 Then y i = 1{z i > 0} z i = x i β + ε i, ε i N(0, σr 2 i ) r i = MN(1, w). The only thing that changes is that p(r i z i, β, y) = MN. 3.4 Example 4: Laplace Method Logistic continued... So where κ i = y i 1/2. p(y i p i ) p y i i (1 p i) 1 y i ( e ψ i ) yi ( 1 = 1 + e ψ i 1 + e ψ i De-marginalize (via Laplace transform): where ξ i PG(1, 0). Then so ) 1 yi = eψ iy i 1 + e ψ i = e ψ i(y i 1/2) 2 1/2 cosh 1 (ψ i /2). p(y i ψ i ) e ψ iκ i cosh 1 (ψ i /2) cosh 1 (ψ i /2) = 0 e ξ iψ 2 i /2 p(ξ i )dξ i p(y i ψ i ) e ψ iκ i e ξ iψi 2/2 p(ξ i )dξ i 0 p(y i, ξ i ψ i ) e ψ iκ i ξ i ψ 2 i /2 p(ξ i ). So where p(ξ) = p(ξ i ). p(y, ξ β) e κ Xβ 1 2 β XΞXβ p(ξ) 6
7 Gibbs Sample: p(β ξ, y) p(β)e κ Xβ 1 2 β XΞXβ normal kernel. ONLY ONE LAYER OF LATENTS! Data generating process: p(ξ i y, β) e ξ iψ 2 i /2 p(ξ i ) PG(1, ψ i ). ξ i PG(1, ψ i ) y i Binom(1, p i ) p i = ψ i = x i β. eψ i 1+e ψ i 4 Aside: Expectation Maximization Tangential, but a great opportunity to discuss EM, aka, a way to find the posterior mode. Following Gelman s red book. We want argmax β p(β y). We will assume everything is conditioned on y. We have some sort of augmentation variable ξ. Then Then Think of these as functions of β and ξ. Take the expectation over (ξ β old ): p(β) = p(β, ξ) p(ξ β). log p(β) = log p(β, ξ) log p(ξ β). log p(β) = E ξ β old[log p(β, ξ)] E ξ β old[log p(ξ β)]. Fact: Thus if β satisfies E ξ β old[log p(ξ β)] E ξ β old[log p(ξ β old )] for all β. E ξ β old[log p(β, ξ)] E ξ β old[log p(β old, ξ)] 7
8 then log p(β) log p(β old ). In the Laplace transform method: So p(β, ξ) = c(y)p(β)e κx β 1 2 β X ΞXβ p(ξ). E ξ β old[log p(β, ξ)] = log c(y) { + log p(β) quadratic form +κx β 1 2 β X E ξ β old(ξ)x β E ξ β old[log p(ξ)]. We can calculate E[ξ i ] using the MGF, i.e. the Laplace transform. Then β t 1 E ξ β t 1(ξ) β t. 5 Dynamic Generalized Linear Models Now our index is time. 5.1 The Old School Way West et al. [1985] Still: ψ t = log pt 1 p t We will evolve filtered distribution of ψ t by linear Bayes. Observation equation: P (y t = 1) = p t. Evolution equation: { ψ t = x t β t β t = G t β t 1 + ω t, ω t [0, W ]. We only specify the first two moments! Just like DLM: once you understand how to do one update you are done. 8
9 Prior: β t 1 D t 1 [m t 1, C t 1 ]. Evolution: β t D t 1 [a t, R t ] where a t = G t 1 m t 1 and R t = G t 1 C t 1 G t 1. Forecast: ψ t D t 1 [f t, q t ] where f t = x t a t and q t = x tr t x t. Update: Now we get to specify a distribution. ψ t D t 1 is the prior for the observation of (y t p t ). The conjugate prior for p t is a Beta distribution. There is a one-to-one relationship between (f t, q t ) and (r t, s t ) so that ψ t D t 1 [f t, q t ] iff p t D t 1 Beta(r t, s t ). Use that to pick r t and s t. Hit the prior with the new observation and we have p t D t Beta(r t, s t ). Now go backwards to get ψ t D t [f t, q t ]. This is exact in that once you specify that p t D t 1 is Beta, then everything follows without any approximation. However, we only have an approximate update to β t. We can use the distributions and ( ψt β t ) D t 1 ψ t D t [( ft a t ) ( )] qt x, t R t R t x t R t to update β t by linear Bayes to get an approximation of the two moments of β t D t. 5.2 Using data augmentation Instead of having a normal prior on β we now have a stochastic process prior for β. Just change ψ i = x i β to ψ t = x t β t. 9
10 Probit: CDF. Augmented model: y t = 1{z t > 0} z t = x t β t + ε t, ε t N(0, 1) β t = G t β t 1 + ω t, ω t N(0, W ). Gibbs: p(z t β t, y t ) same. p({β t } z) FFBS. Logistic: mixture of normals. Augmented model: y t = 1{z t > 0} z t = x t β t + ε t, ε i N(0, ξ t ) β t = G t 1 β t 1 + ω t, ω t N(0, W ) ξ t = 4λ 2 t λ t KS. Gibbs: all the same, but p({β t } z, ξ) FFBS. Logistic: discrete mixture of normals. Augmented model: y t = 1{z t > 0} z t = x t β t + ε t, ε t N(0, σr 2 t ) β t = G t 1 β t 1 + ω t, ω t N(0, W ) r t = MN(1, w). Gibbs: all the same, but p({β t } z, r) FFBS. Logistic: Polya-Gamma: 10
11 Look at posterior for β: p({β t } ξ, y) e κ Xβ 1 2 β X ΞX β p(β) e 1 2 (z Xβ)2 p(β), p(z β)p(β) where Ξz = κ where { z t = x t β t + ε t, ε t N(0, 1/ξ t ) So, again, we can just FFBS for β. β t = G t 1 β t 1 + η t, η t N(0, W ). References J. H. Albert and S. Chib. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422): , June D. F. Andrews and C. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society. Series B (Methodological), 36(36):99 102, J. Becker. An Evangelical is back from exile, lifting Romney. The New York Times, September S. Frühwirth-Schnatter and R. Frühwirth. Auxiliary mixture sampling with applications to logistic models. Computational Statistics and Data Analysis, 51: , S. Frühwirth-Schnatter and R. Frühwirth. Data augmentation and mcmc for binary and multinomial logit models. In Statistical Modelling and Regression Structures, pages Springer-Verlag, Available from UT library online. C. C. Holmes and L. Held. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1): , M. West, J. Harrison, and H. S. Migon. Dynamics generalized linear models and bayesian forecasting. Journal of the American Statistical Association, 80(389):73 83, March
The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic
he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationLecture 4: Dynamic models
linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationFemale Wage Careers - A Bayesian Analysis Using Markov Chain Clustering
Statistiktage Graz, September 7 9, Female Wage Careers - A Bayesian Analysis Using Markov Chain Clustering Regina Tüchler, Wirtschaftskammer Österreich Christoph Pamminger, The Austrian Center for Labor
More informationCoupled Hidden Markov Models: Computational Challenges
.. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationGibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables
Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Mohammad Emtiyaz Khan Department of Computer Science University of British Columbia May 8, 27 Abstract
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationOutline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models
Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationLog Gaussian Cox Processes. Chi Group Meeting February 23, 2016
Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLogistisk regression T.K.
Föreläsning 13: Logistisk regression T.K. 05.12.2017 Your Learning Outcomes Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition likelihood function: maximum likelihood estimate
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationCS535D Project: Bayesian Logistic Regression through Auxiliary Variables
CS535D Project: Bayesian Logistic Regression through Auxiliary Variables Mark Schmidt Abstract This project deals with the estimation of Logistic Regression parameters. We first review the binary logistic
More informationResearch Division Federal Reserve Bank of St. Louis Working Paper Series
Research Division Federal Reserve Bank of St Louis Working Paper Series Kalman Filtering with Truncated Normal State Variables for Bayesian Estimation of Macroeconomic Models Michael Dueker Working Paper
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationWill Penny. DCM short course, Paris 2012
DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,
Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For
More informationNPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic
NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationWageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice
Wageningen Summer School in Econometrics The Bayesian Approach in Theory and Practice September 2008 Slides for Lecture on Qualitative and Limited Dependent Variable Models Gary Koop, University of Strathclyde
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationMarginal Specifications and a Gaussian Copula Estimation
Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationData Augmentation for the Bayesian Analysis of Multinomial Logit Models
Data Augmentation for the Bayesian Analysis of Multinomial Logit Models Steven L. Scott, University of Southern California Bridge Hall 401-H, Los Angeles, CA 90089-1421 (sls@usc.edu) Key Words: Markov
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationMetropolis-Hastings sampling
Metropolis-Hastings sampling Gibbs sampling requires that a sample from each full conditional distribution. In all the cases we have looked at so far the conditional distributions were conjugate so sampling
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationEstimating marginal likelihoods from the posterior draws through a geometric identity
Estimating marginal likelihoods from the posterior draws through a geometric identity Johannes Reichl Energy Institute at the Johannes Kepler University Linz E-mail for correspondence: reichl@energieinstitut-linz.at
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationKatsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract
Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationA Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach
A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach Zvi Gilula (Hebrew University) Robert McCulloch (Arizona State) Ya acov Ritov (University of Michigan)
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationWill Penny. SPM for MEG/EEG, 15th May 2012
SPM for MEG/EEG, 15th May 2012 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule
More informationDiscussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance
Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance by Casarin, Grassi, Ravazzolo, Herman K. van Dijk Dimitris Korobilis University of Essex,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationApproximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs
Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs Aleksandar Dimitriev Erik Štrumbelj Abstract We present a novel approach to binary and ordinal
More informationEstimating the marginal likelihood with Integrated nested Laplace approximation (INLA)
Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationNonparametric Bayesian modeling for dynamic ordinal regression relationships
Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria
More informationGoals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model
Goals PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1 Tetsuya Matsubayashi University of North Texas November 2, 2010 Random utility model Multinomial logit model Conditional logit model
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationOnline Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access
Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationModern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K
Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationProbability Review. September 25, 2015
Probability Review September 25, 2015 We need a tool to 1) Formulate a model of some phenomenon. 2) Learn an instance of the model from data. 3) Use it to infer outputs from new inputs. Why Probability?
More informationBayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014
Bayesian Prediction of Code Output ASA Albuquerque Chapter Short Course October 2014 Abstract This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation
More information