5.4 wells in Bangladesh Chris Parrish July 3, 2016

Size: px
Start display at page:

Download "5.4 wells in Bangladesh Chris Parrish July 3, 2016"

Transcription

1 5.4 wells in Bangladesh Chris Parrish July 3, 2016 Contents wells in Bangladesh 1 data logistic regression with one predictor 3 figure first logistic model: switched ~ dist 4 model fit more reasonable model: switched ~ dist/100 5 model fit figure logistic regression with second input variable 8 figure model: switched ~ dist/100 + arsenic 9 model fit figure 5.11a figure 5.11a wells in Bangladesh reference: - ARM chapter 05, github library(rstan) rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectcores()) library(ggplot2) wells in Bangladesh data # Data source("wells.data.r", echo = TRUE) > N < > switched <- c(1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

2 + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 0, 0, 1, 0, 1, 1, 0,... [TRUNCATED] > arsenic <- c(2.36, 0.71, 2.07, 1.15, 1.1, 3.9, 2.97, , 3.28, 2.52, 3.13, 3.04, 2.91, 3.21, 1.7, 1.8, 1.44, , 2.33, 2.83, 1.79,... [TRUNCATED] > dist <- c( , , , , , , , +... [TRUNCATED] > assoc <- c(0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, + 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, + 1, 0, 0, 1, 1, 0, 1, 0, 0,... [TRUNCATED] > educ <- c(0, 0, 10, 12, 14, 9, 4, 10, 0, 0, 5, 0, + 0, 0, 0, 7, 7, 7, 0, 10, 7, 0, 5, 0, 8, 8, 10, 16, 10, 10, + 10, 10, 0, 0, 0, 3, 0, [TRUNCATED] scatterplot data <- data.frame(dist, arsenic) ggplot(data, aes(dist, arsenic)) + geom_point(shape = 20, color = "darkred") + geom_smooth() arsenic summary statistics dist 2

3 tbl <- rbind(summary(dist), summary(arsenic)) row.names(tbl) <- c("dist", "arsenic") tbl Min. 1st Qu. Median Mean 3rd Qu. Max. dist arsenic apply(cbind(dist, arsenic), 2, sd) dist arsenic logistic regression with one predictor figure 5.8 # Logistic regression with one predictor # Figure 5.8 p1 <- ggplot(data.frame(dist)) + geom_histogram(aes(dist), color = "seashell", fill = "wheat", binwidth = 10) + scale_x_continuous("distance (in meters) to the nearest safe well") + scale_y_continuous("") print(p1) Distance (in meters) to the nearest safe well 3

4 first logistic model: switched ~ dist model wells_dist.stan data { int<lower=0> N; int<lower=0,upper=1> switched[n]; vector[n] dist; parameters { vector[2] beta; model { switched ~ bernoulli_logit(beta[1] + beta[2] * dist); fit # First logistic model: switched ~ dist data.list.1 <- c("n", "switched", "dist") wells_dist.sf <- stan(file='wells_dist.stan', data=data.list.1, iter=1000, chains=4) plot(wells_dist.sf) ci_level: 0.8 (80% intervals) outer_level: 0.95 (95% intervals) beta[1] beta[2] pairs(wells_dist.sf)

5 beta[1] beta[2] lp print(wells_dist.sf, pars = c("beta", "lp ")) Inference for Stan model: wells_dist. 4 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% beta[1] beta[2] lp n_eff Rhat beta[1] beta[2] lp Samples were drawn using NUTS(diag_e) at Tue Jul 5 10:11: For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). The estimated Bayesian Fraction of Missing Information is a measure of the efficiency of the sampler with values close to 1 being ideal. For each chain, these estimates are more reasonable model: switched ~ dist/100 model wells_dist100.stan 5

6 data { int<lower=0> N; int<lower=0,upper=1> switched[n]; vector[n] dist; transformed data { vector[n] dist100; // rescaling dist100 = dist / 100.0; parameters { vector[2] beta; model { switched ~ bernoulli_logit(beta[1] + beta[2] * dist100); fit # More reasonable model: switched ~ dist/100 wells_dist100.sf <- stan(file='wells_dist100.stan', data=data.list.1, iter=1000, chains=4) plot(wells_dist100.sf) ci_level: 0.8 (80% intervals) outer_level: 0.95 (95% intervals) beta[1] beta[2] pairs(wells_dist100.sf)

7 beta[1] beta[2] lp print(wells_dist100.sf, pars = c("beta", "lp ")) Inference for Stan model: wells_dist chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% beta[1] beta[2] lp n_eff Rhat beta[1] beta[2] lp Samples were drawn using NUTS(diag_e) at Tue Jul 5 10:11: For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). The estimated Bayesian Fraction of Missing Information is a measure of the efficiency of the sampler with values close to 1 being ideal. For each chain, these estimates are figure 5.9 # Figure 5.9 beta.post.2 <- extract(wells_dist100.sf, "beta")$beta beta.mean.2 <- colmeans(beta.post.2) 7

8 # dev.new() p2 <- ggplot(data.frame(switched, dist), aes(dist, switched)) + geom_jitter(position = position_jitter(width = 0.2, height = 0.01), shape = 20, color = "darkred") + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.2[1] - beta.mean.2[2] * x / 100))) + scale_x_continuous("distance (in meters) to the nearest safe well", breaks = seq(from = 0, by = 50, length.out = 7)) + scale_y_continuous("pr(switching)", breaks = seq(0, 1, 0.2)) print(p2) Pr(switching) Distance (in meters) to the nearest safe well logistic regression with second input variable figure 5.10 # Logistic regression with second input variable # Figure 5.10 # dev.new() p3 <- ggplot(data.frame(arsenic)) + geom_histogram(aes(arsenic), color = "seashell", fill = "wheat", binwidth = 0.25) + scale_x_continuous("arsenic concentration in well water") + scale_y_continuous("") print(p3) 8

9 Arsenic concentration in well water model: switched ~ dist/100 + arsenic model wells_d100ars.stan data { int<lower=0> N; int<lower=0,upper=1> switched[n]; vector[n] dist; vector[n] arsenic; transformed data { vector[n] dist100; // rescaling dist100 = dist / 100.0; parameters { vector[3] beta; model { switched ~ bernoulli_logit(beta[1] + beta[2] * dist100 + beta[3] * arsenic); 9

10 fit # Model: switched ~ dist/100 + arsenic data.list.3 <- c("n", "switched", "dist", "arsenic") wells_d100ars.sf <- stan(file='wells_d100ars.stan', data=data.list.3, iter=1000, chains=4) plot(wells_d100ars.sf) ci_level: 0.8 (80% intervals) outer_level: 0.95 (95% intervals) beta[1] beta[2] beta[3] pairs(wells_d100ars.sf) beta[1] beta[2] beta[3] lp print(wells_d100ars.sf, pars = c("beta", "lp ")) Inference for Stan model: wells_d100ars

11 4 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% beta[1] beta[2] beta[3] lp n_eff Rhat beta[1] beta[2] beta[3] lp Samples were drawn using NUTS(diag_e) at Tue Jul 5 10:11: For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). The estimated Bayesian Fraction of Missing Information is a measure of the efficiency of the sampler with values close to 1 being ideal. For each chain, these estimates are beta.post.3 <- extract(wells_d100ars.sf, "beta")$ beta beta.mean.3 <- colmeans(beta.post.3) figure 5.11a # Figure 5.11 (a) # dev.new() p4 <- ggplot(data.frame(switched, dist), aes(dist, switched)) + geom_jitter(position = position_jitter(width = 0.2, height = 0.01), shape = 20, color = "darkred") + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.3[1] - beta.mean.3[2] * x / beta.mean.3[3] * 0.5))) + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.3[1] - beta.mean.3[2] * x / beta.mean.3[3]))) + annotate("text", x = c(50,75), y = c(0.35, 0.55), label = c("if As = 0.5", "if As = 1.0"), size = 4) + scale_x_continuous("distance (in meters) to the nearest safe well", breaks = seq(from = 0, by = 50, length.out = 7)) + scale_y_continuous("pr(switching)", breaks = seq(0, 1, 0.2)) plot(p4) 11

12 Pr(switching) if As = 0.5 if As = Distance (in meters) to the nearest safe well figure 5.11a # Figure 5.11 (b) # dev.new() p5 <- ggplot(data.frame(switched, arsenic), aes(arsenic, switched)) + geom_jitter(position = position_jitter(width = 0.2, height = 0.01), shape = 20, color = "darkred") + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.3[1] - beta.mean.3[3] * x))) + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.3[1] - beta.mean.3[2] * beta.mean.3[3] * x))) + annotate("text", x = c(1.7,2.5), y = c(0.78, 0.56), label = c("if dist = 0", "if dist = 50"), size = 4) + scale_x_continuous("arsenic concentration in well water", breaks = seq(from = 0, by = 2, length.out = 5)) + scale_y_continuous("pr(switching)", breaks = seq(0, 1, 0.2)) print(p5) 12

13 if dist = 0 Pr(switching) if dist = Arsenic concentration in well water 13

4.2 centering Chris Parrish July 2, 2016

4.2 centering Chris Parrish July 2, 2016 4.2 centering Chris Parrish July 2, 2016 Contents centering and standardizing 1 centering.................................................. 1 data.................................................. 1 model.................................................

More information

Stan Workshop. Peter Gao and Serge Aleshin-Guendel. November

Stan Workshop. Peter Gao and Serge Aleshin-Guendel. November Stan Workshop Peter Gao and Serge Aleshin-Guendel November 21 2017 Set Up To run a model in Stan in R, you need two files: a.stan file and a.r file. The.stan file declares the model, the.r file performs

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

Non-analytic; can only be solved with MCMC. Last model isn t found in any package.

Non-analytic; can only be solved with MCMC. Last model isn t found in any package. DAY 2 STAN WORKSHOP RECAP RECAP Ran multiple Stan programs! Wrote multiple Stan programs. Non-analytic; can only be solved with MCMC. Last model isn t found in any package. Started learning a new language.

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models STAT 489-01: Bayesian Methods of Data Analysis Spring Semester 2017 Contents 1 Multi-Variable Linear Models 1 2 Generalized Linear Models 4 2.1 Illustration: Logistic Regression..........

More information

Hierarchical models. Dr. Jarad Niemi. STAT Iowa State University. February 14, 2018

Hierarchical models. Dr. Jarad Niemi. STAT Iowa State University. February 14, 2018 Hierarchical models Dr. Jarad Niemi STAT 544 - Iowa State University February 14, 2018 Jarad Niemi (STAT544@ISU) Hierarchical models February 14, 2018 1 / 38 Outline Motivating example Independent vs pooled

More information

Bayesian Estimation of Regression Models

Bayesian Estimation of Regression Models Bayesian Estimation of Regression Models An Appendix to Fox & Weisberg An R Companion to Applied Regression, third edition John Fox last revision: 2018-10-01 Abstract In this appendix to Fox and Weisberg

More information

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis MIT 1985 1/30 Stan: a program for Bayesian data analysis with complex models Andrew Gelman, Bob Carpenter, and Matt Hoffman, Jiqiang Guo, Ben Goodrich, and Daniel Lee Department of Statistics, Columbia

More information

Exercises R For Simulations Columbia University EPIC 2015 (with answers)

Exercises R For Simulations Columbia University EPIC 2015 (with answers) Exercises R For Simulations Columbia University EPIC 2015 (with answers) C DiMaggio June 10, 2015 Contents 1 Sampling and Simulations 2 1.1 Simulating Risk Ratios.......................... 4 1.2 Bootstrap

More information

Introduction to Stan for Markov Chain Monte Carlo

Introduction to Stan for Markov Chain Monte Carlo Introduction to Stan for Markov Chain Monte Carlo Matthew Simpson Department of Statistics, University of Missouri April 25, 2017 These slides and accompanying R and Stan files are available at http://stsn.missouri.edu/education-and-outreach.shtml

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

CHAPTER 5. Logistic regression

CHAPTER 5. Logistic regression CHAPTER 5 Logistic regression Logistic regression is the standard way to model binary outcomes (that is, data y i that take on the values 0 or 1). Section 5.1 introduces logistic regression in a simple

More information

Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models

Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models A. STAN CODE // STAN code for Bayesian bivariate model // based on code

More information

Bayesian Regression for a Dirichlet Distributed Response using Stan

Bayesian Regression for a Dirichlet Distributed Response using Stan Bayesian Regression for a Dirichlet Distributed Response using Stan Holger Sennhenn-Reulen 1 1 Department of Growth and Yield, Northwest German Forest Research Institute, Göttingen, Germany arxiv:1808.06399v1

More information

Assignment 2: K-Nearest Neighbors and Logistic Regression

Assignment 2: K-Nearest Neighbors and Logistic Regression Assignment 2: K-Nearest Neighbors and Logistic Regression SDS293 - Machine Learning Due: 4 Oct 2017 by 11:59pm Conceptual Exercises 4.4 parts a-d (p. 168-169 ISLR) When the number of features p is large,

More information

2017 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

2017 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling 2017 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics, University of Washington 2017-09-05 Overview In this set of notes

More information

Hierarchical Linear Models

Hierarchical Linear Models Hierarchical Linear Models Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin The linear regression model Hierarchical Linear Models y N(Xβ, Σ y ) β σ 2 p(β σ 2 ) σ 2 p(σ 2 ) can be extended

More information

Cheat Sheet: Linear Regression

Cheat Sheet: Linear Regression Cheat Sheet: Linear Regression Measurement and Evaluation of HCC Systems Scenario Use regression if you want to test the simultaneous linear effect of several variables varx1, varx2, on a continuous outcome

More information

36-463/663: Multilevel & Hierarchical Models HW09 Solution

36-463/663: Multilevel & Hierarchical Models HW09 Solution 36-463/663: Multilevel & Hierarchical Models HW09 Solution November 15, 2016 Quesion 1 Following the derivation given in class, when { n( x µ) 2 L(µ) exp, f(p) exp 2σ 2 0 ( the posterior is also normally

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

movies Name:

movies Name: movies Name: 217-4-14 Contents movies.................................................... 1 USRevenue ~ Budget + Opening + Theaters + Opinion..................... 6 USRevenue ~ Opening + Opinion..................................

More information

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008 Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Simpson s Rule. f(x)dx = h 3 [f(x 0) + 4f(x 1 ) + f(x 2 )] h2 90 f (4) (ɛ)

Simpson s Rule. f(x)dx = h 3 [f(x 0) + 4f(x 1 ) + f(x 2 )] h2 90 f (4) (ɛ) Simpson s Rule Simpson s rule is another closed Newton-Cotes formula for approximating integrals over an interval with equally spaced nodes. Unlike the trapezoidal rule, which employs straight lines to

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

BAYESIAN INFERENCE IN STAN

BAYESIAN INFERENCE IN STAN BAYESIAN INFERENCE IN STAN DANIEL LEE bearlee@alum.mit.edu http://mc-stan.org OBLIGATORY DISCLOSURE I am a researcher at Columbia University. I am a cofounder of Stan Group Inc. and have equity. I am required

More information

WinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis

WinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis WinBUGS : part 2 Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert Gabriele, living with rheumatoid arthritis Agenda 2! Hierarchical model: linear regression example! R2WinBUGS Linear Regression

More information

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo Hamiltonian Monte Carlo within Stan Daniel Lee Columbia University, Statistics Department bearlee@alum.mit.edu BayesComp mc-stan.org Why MCMC? Have data. Have a rich statistical model. No analytic solution.

More information

Statistical Simulation An Introduction

Statistical Simulation An Introduction James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Simulation Through Bootstrapping Introduction 1 Introduction When We Don t Need Simulation

More information

36-463/663Multilevel and Hierarchical Models

36-463/663Multilevel and Hierarchical Models 36-463/663Multilevel and Hierarchical Models From Bayes to MCMC to MLMs Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 Outline Bayesian Statistics and MCMC Distribution of Skill Mastery in a Population

More information

Joint longitudinal and time-to-event models via Stan

Joint longitudinal and time-to-event models via Stan Joint longitudinal and time-to-event models via Stan Sam Brilleman 1,2, Michael J. Crowther 3, Margarita Moreno-Betancur 2,4,5, Jacqueline Buros Novik 6, Rory Wolfe 1,2 StanCon 2018 Pacific Grove, California,

More information

Running head: BAYESIAN LINEAR MIXED MODELS: A TUTORIAL 1. Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and

Running head: BAYESIAN LINEAR MIXED MODELS: A TUTORIAL 1. Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and Running head: BAYESIAN LINEAR MIXED MODELS: A TUTORIAL 1 Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists Tanner Sorensen University of Potsdam,

More information

Bayesian course - problem set 5 (lecture 6)

Bayesian course - problem set 5 (lecture 6) Bayesian course - problem set 5 (lecture 6) Ben Lambert November 30, 2016 1 Stan entry level: discoveries data The file prob5 discoveries.csv contains data on the numbers of great inventions and scientific

More information

X t P oisson(λ) (16.1)

X t P oisson(λ) (16.1) Chapter 16 Stan 16.1 Discoveries data revisited The file evaluation_discoveries.csv contains data on the numbers of great inventions and scientific discoveries (X t ) in each year from 1860 to 1959 [1].

More information

Spatial Smoothing in Stan: Conditional Auto-Regressive Models

Spatial Smoothing in Stan: Conditional Auto-Regressive Models Spatial Smoothing in Stan: Conditional Auto-Regressive Models Charles DiMaggio, PhD, NYU School of Medicine Stephen J. Mooney, PhD, University of Washington Mitzi Morris, Columbia University Dan Simpson,

More information

Non-Conjugate Models and Grid Approximations. Patrick Lam

Non-Conjugate Models and Grid Approximations. Patrick Lam Non-Conjugate Models and Grid Approximations Patrick Lam Outline The Binomial Model with a Non-Conjugate Prior Bayesian Regression with Grid Approximations Outline The Binomial Model with a Non-Conjugate

More information

A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models

A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology and Learning Systems ralmond@fsu.edu BMAW 2014 1

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 2 Coding and output Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. General (Markov) epidemic model 2. Non-Markov epidemic model 3. Debugging

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

Online Appendix to: Crises and Recoveries in an Empirical Model of. Consumption Disasters

Online Appendix to: Crises and Recoveries in an Empirical Model of. Consumption Disasters Online Appendix to: Crises and Recoveries in an Empirical Model of Consumption Disasters Emi Nakamura Columbia University Robert Barro Harvard University Jón Steinsson Columbia University José Ursúa Harvard

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists

Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists BAYESIAN LINEAR MIXED MODELS: A TUTORIAL 1 Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists Tanner Sorensen University of Potsdam, Potsdam, Germany

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Supplementary Information for Ecology of conflict: marine food supply affects human-wildlife interactions on land

Supplementary Information for Ecology of conflict: marine food supply affects human-wildlife interactions on land 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Supplementary Information for Ecology of conflict: marine food supply affects human-wildlife interactions on land Kyle A. Artelle, Sean C. Anderson, John D. Reynolds,

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Pooling and Hierarchical Modeling of Repeated Binary Trial Data with Stan

Pooling and Hierarchical Modeling of Repeated Binary Trial Data with Stan Pooling and Hierarchical Modeling of Repeated Binary Trial Data with Stan Stan Development Team (in order of joining): Andrew Gelman, Bob Carpenter, (Matt Hoffman), Daniel Lee, Ben Goodrich, Michael Betancourt,

More information

Lecture 9: Predictive Inference

Lecture 9: Predictive Inference Lecture 9: Predictive Inference There are (at least) three levels at which we can make predictions with a regression model: we can give a single best guess about what Y will be when X = x, a point prediction;

More information

Cheat Sheet: ANOVA. Scenario. Power analysis. Plotting a line plot and a box plot. Pre-testing assumptions

Cheat Sheet: ANOVA. Scenario. Power analysis. Plotting a line plot and a box plot. Pre-testing assumptions Cheat Sheet: ANOVA Measurement and Evaluation of HCC Systems Scenario Use ANOVA if you want to test the difference in continuous outcome variable vary between multiple levels (A, B, C, ) of a nominal variable

More information

Big Data Analysis with Apache Spark UC#BERKELEY

Big Data Analysis with Apache Spark UC#BERKELEY Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Multivariate Analysis of Variance

Multivariate Analysis of Variance Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Exam #1 March 9, 2016

Exam #1 March 9, 2016 Mathematics 1372/D552, Spring 2016 Instructor: Suman Ganguli Exam #1 March 9, 2016 Name: 1. (20 points) The following data set lists the high temperatures (in degrees Fahrenheit) observed in Central Park

More information

Gibbs Sampling in Linear Models #1

Gibbs Sampling in Linear Models #1 Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1 Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and

More information

(Moderately) Advanced Hierarchical Models

(Moderately) Advanced Hierarchical Models (Moderately) Advanced Hierarchical Models Ben Goodrich StanCon: January 12, 2018 Ben Goodrich Advanced Hierarchical Models StanCon 1 / 18 Obligatory Disclosure Ben is an employee of Columbia University,

More information

Electric Field Mapping

Electric Field Mapping Electric Field Mapping Equipment: mapping board, U-probe, 5 resistive boards, templates, knob adjustable DC voltmeter, 4 long leads, 16 V DC for wall strip, 8 1/2 X 11 sheets of paper Reading: Topics of

More information

Bayesian Estimation of Expected Cell Counts by Using R

Bayesian Estimation of Expected Cell Counts by Using R Bayesian Estimation of Expected Cell Counts by Using R Haydar Demirhan 1 and Canan Hamurkaroglu 2 Department of Statistics, Hacettepe University, Beytepe, 06800, Ankara, Turkey Abstract In this article,

More information

Lecture 9: Predictive Inference for the Simple Linear Model

Lecture 9: Predictive Inference for the Simple Linear Model See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 9: Predictive Inference for the Simple Linear Model 36-401, Fall 2015, Section B 29 September 2015 Contents 1 Confidence intervals

More information

Introductory Statistics with R: Simple Inferences for continuous data

Introductory Statistics with R: Simple Inferences for continuous data Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu

More information

The Central Limit Theorem

The Central Limit Theorem The Central Limit Theorem Suppose n tickets are drawn at random with replacement from a box of numbered tickets. The central limit theorem says that when the probability histogram for the sum of the draws

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Coupled Hidden Markov Models: Computational Challenges

Coupled Hidden Markov Models: Computational Challenges .. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov

More information

Weakly informative priors

Weakly informative priors Department of Statistics and Department of Political Science Columbia University 21 Oct 2011 Collaborators (in order of appearance): Gary King, Frederic Bois, Aleks Jakulin, Vince Dorie, Sophia Rabe-Hesketh,

More information

Preferences in college applications

Preferences in college applications Preferences in college applications A non-parametric Bayesian analysis of top-10 rankings Alnur Ali 1 Thomas Brendan Murphy 2 Marina Meilă 3 Harr Chen 4 1 Microsoft 2 University College Dublin 3 University

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt A Handbook of Statistical Analyses Using R 3rd Edition Torsten Hothorn and Brian S. Everitt CHAPTER 12 Quantile Regression: Head Circumference for Age 12.1 Introduction 12.2 Quantile Regression 12.3 Analysis

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

Electric Field Mapping Lab 2. Precautions

Electric Field Mapping Lab 2. Precautions TS 2-12-12 Electric Field Mapping Lab 2 1 Electric Field Mapping Lab 2 Equipment: mapping board, U-probe, resistive boards, templates, dc voltmeter (431B), 4 long leads, 16 V dc for wall strip Reading:

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

Quantitative Bivariate Data

Quantitative Bivariate Data Statistics 211 (L02) - Linear Regression Quantitative Bivariate Data Consider two quantitative variables, defined in the following way: X i - the observed value of Variable X from subject i, i = 1, 2,,

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Markov Chain Monte Carlo Lecture 4

Markov Chain Monte Carlo Lecture 4 The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Models with qualitative explanatory variables p216

Models with qualitative explanatory variables p216 Models with qualitative explanatory variables p216 Example gen = 1 for female Row gpa hsm gen 1 3.32 10 0 2 2.26 6 0 3 2.35 8 0 4 2.08 9 0 5 3.38 8 0 6 3.29 10 0 7 3.21 8 0 8 2.00 3 0 9 3.18 9 0 10 2.34

More information

Package horseshoe. November 8, 2016

Package horseshoe. November 8, 2016 Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Cheat Sheet: factorial ANOVA

Cheat Sheet: factorial ANOVA Cheat Sheet: factorial ANOVA Measurement and Evaluation of HCC Systems Scenario Use factorial ANOVA if you want to test the effect of two (or more) nominal variables varx1 and varx2 on a continuous outcome

More information

BEGINNING BAYES IN R. Comparing two proportions

BEGINNING BAYES IN R. Comparing two proportions BEGINNING BAYES IN R Comparing two proportions Learning about many parameters Chapters 2-3: single parameter (one proportion or one mean) Chapter 4: multiple parameters Two proportions from independent

More information

Modelling Biochemical Reaction Networks. Lecture 21: Phase diagrams

Modelling Biochemical Reaction Networks. Lecture 21: Phase diagrams Modelling Biochemical Reaction Networks Lecture 21: Phase diagrams Marc R. Roussel Department of Chemistry and Biochemistry Phase diagrams Bifurcation diagrams show us, among other things, where bifurcations

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

13 Notes on Markov Chain Monte Carlo

13 Notes on Markov Chain Monte Carlo 13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information