5.4 wells in Bangladesh Chris Parrish July 3, 2016
|
|
- Magdalene Charles
- 6 years ago
- Views:
Transcription
1 5.4 wells in Bangladesh Chris Parrish July 3, 2016 Contents wells in Bangladesh 1 data logistic regression with one predictor 3 figure first logistic model: switched ~ dist 4 model fit more reasonable model: switched ~ dist/100 5 model fit figure logistic regression with second input variable 8 figure model: switched ~ dist/100 + arsenic 9 model fit figure 5.11a figure 5.11a wells in Bangladesh reference: - ARM chapter 05, github library(rstan) rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectcores()) library(ggplot2) wells in Bangladesh data # Data source("wells.data.r", echo = TRUE) > N < > switched <- c(1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
2 + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 0, 0, 1, 0, 1, 1, 0,... [TRUNCATED] > arsenic <- c(2.36, 0.71, 2.07, 1.15, 1.1, 3.9, 2.97, , 3.28, 2.52, 3.13, 3.04, 2.91, 3.21, 1.7, 1.8, 1.44, , 2.33, 2.83, 1.79,... [TRUNCATED] > dist <- c( , , , , , , , +... [TRUNCATED] > assoc <- c(0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, + 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, + 1, 0, 0, 1, 1, 0, 1, 0, 0,... [TRUNCATED] > educ <- c(0, 0, 10, 12, 14, 9, 4, 10, 0, 0, 5, 0, + 0, 0, 0, 7, 7, 7, 0, 10, 7, 0, 5, 0, 8, 8, 10, 16, 10, 10, + 10, 10, 0, 0, 0, 3, 0, [TRUNCATED] scatterplot data <- data.frame(dist, arsenic) ggplot(data, aes(dist, arsenic)) + geom_point(shape = 20, color = "darkred") + geom_smooth() arsenic summary statistics dist 2
3 tbl <- rbind(summary(dist), summary(arsenic)) row.names(tbl) <- c("dist", "arsenic") tbl Min. 1st Qu. Median Mean 3rd Qu. Max. dist arsenic apply(cbind(dist, arsenic), 2, sd) dist arsenic logistic regression with one predictor figure 5.8 # Logistic regression with one predictor # Figure 5.8 p1 <- ggplot(data.frame(dist)) + geom_histogram(aes(dist), color = "seashell", fill = "wheat", binwidth = 10) + scale_x_continuous("distance (in meters) to the nearest safe well") + scale_y_continuous("") print(p1) Distance (in meters) to the nearest safe well 3
4 first logistic model: switched ~ dist model wells_dist.stan data { int<lower=0> N; int<lower=0,upper=1> switched[n]; vector[n] dist; parameters { vector[2] beta; model { switched ~ bernoulli_logit(beta[1] + beta[2] * dist); fit # First logistic model: switched ~ dist data.list.1 <- c("n", "switched", "dist") wells_dist.sf <- stan(file='wells_dist.stan', data=data.list.1, iter=1000, chains=4) plot(wells_dist.sf) ci_level: 0.8 (80% intervals) outer_level: 0.95 (95% intervals) beta[1] beta[2] pairs(wells_dist.sf)
5 beta[1] beta[2] lp print(wells_dist.sf, pars = c("beta", "lp ")) Inference for Stan model: wells_dist. 4 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% beta[1] beta[2] lp n_eff Rhat beta[1] beta[2] lp Samples were drawn using NUTS(diag_e) at Tue Jul 5 10:11: For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). The estimated Bayesian Fraction of Missing Information is a measure of the efficiency of the sampler with values close to 1 being ideal. For each chain, these estimates are more reasonable model: switched ~ dist/100 model wells_dist100.stan 5
6 data { int<lower=0> N; int<lower=0,upper=1> switched[n]; vector[n] dist; transformed data { vector[n] dist100; // rescaling dist100 = dist / 100.0; parameters { vector[2] beta; model { switched ~ bernoulli_logit(beta[1] + beta[2] * dist100); fit # More reasonable model: switched ~ dist/100 wells_dist100.sf <- stan(file='wells_dist100.stan', data=data.list.1, iter=1000, chains=4) plot(wells_dist100.sf) ci_level: 0.8 (80% intervals) outer_level: 0.95 (95% intervals) beta[1] beta[2] pairs(wells_dist100.sf)
7 beta[1] beta[2] lp print(wells_dist100.sf, pars = c("beta", "lp ")) Inference for Stan model: wells_dist chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% beta[1] beta[2] lp n_eff Rhat beta[1] beta[2] lp Samples were drawn using NUTS(diag_e) at Tue Jul 5 10:11: For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). The estimated Bayesian Fraction of Missing Information is a measure of the efficiency of the sampler with values close to 1 being ideal. For each chain, these estimates are figure 5.9 # Figure 5.9 beta.post.2 <- extract(wells_dist100.sf, "beta")$beta beta.mean.2 <- colmeans(beta.post.2) 7
8 # dev.new() p2 <- ggplot(data.frame(switched, dist), aes(dist, switched)) + geom_jitter(position = position_jitter(width = 0.2, height = 0.01), shape = 20, color = "darkred") + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.2[1] - beta.mean.2[2] * x / 100))) + scale_x_continuous("distance (in meters) to the nearest safe well", breaks = seq(from = 0, by = 50, length.out = 7)) + scale_y_continuous("pr(switching)", breaks = seq(0, 1, 0.2)) print(p2) Pr(switching) Distance (in meters) to the nearest safe well logistic regression with second input variable figure 5.10 # Logistic regression with second input variable # Figure 5.10 # dev.new() p3 <- ggplot(data.frame(arsenic)) + geom_histogram(aes(arsenic), color = "seashell", fill = "wheat", binwidth = 0.25) + scale_x_continuous("arsenic concentration in well water") + scale_y_continuous("") print(p3) 8
9 Arsenic concentration in well water model: switched ~ dist/100 + arsenic model wells_d100ars.stan data { int<lower=0> N; int<lower=0,upper=1> switched[n]; vector[n] dist; vector[n] arsenic; transformed data { vector[n] dist100; // rescaling dist100 = dist / 100.0; parameters { vector[3] beta; model { switched ~ bernoulli_logit(beta[1] + beta[2] * dist100 + beta[3] * arsenic); 9
10 fit # Model: switched ~ dist/100 + arsenic data.list.3 <- c("n", "switched", "dist", "arsenic") wells_d100ars.sf <- stan(file='wells_d100ars.stan', data=data.list.3, iter=1000, chains=4) plot(wells_d100ars.sf) ci_level: 0.8 (80% intervals) outer_level: 0.95 (95% intervals) beta[1] beta[2] beta[3] pairs(wells_d100ars.sf) beta[1] beta[2] beta[3] lp print(wells_d100ars.sf, pars = c("beta", "lp ")) Inference for Stan model: wells_d100ars
11 4 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% beta[1] beta[2] beta[3] lp n_eff Rhat beta[1] beta[2] beta[3] lp Samples were drawn using NUTS(diag_e) at Tue Jul 5 10:11: For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). The estimated Bayesian Fraction of Missing Information is a measure of the efficiency of the sampler with values close to 1 being ideal. For each chain, these estimates are beta.post.3 <- extract(wells_d100ars.sf, "beta")$ beta beta.mean.3 <- colmeans(beta.post.3) figure 5.11a # Figure 5.11 (a) # dev.new() p4 <- ggplot(data.frame(switched, dist), aes(dist, switched)) + geom_jitter(position = position_jitter(width = 0.2, height = 0.01), shape = 20, color = "darkred") + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.3[1] - beta.mean.3[2] * x / beta.mean.3[3] * 0.5))) + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.3[1] - beta.mean.3[2] * x / beta.mean.3[3]))) + annotate("text", x = c(50,75), y = c(0.35, 0.55), label = c("if As = 0.5", "if As = 1.0"), size = 4) + scale_x_continuous("distance (in meters) to the nearest safe well", breaks = seq(from = 0, by = 50, length.out = 7)) + scale_y_continuous("pr(switching)", breaks = seq(0, 1, 0.2)) plot(p4) 11
12 Pr(switching) if As = 0.5 if As = Distance (in meters) to the nearest safe well figure 5.11a # Figure 5.11 (b) # dev.new() p5 <- ggplot(data.frame(switched, arsenic), aes(arsenic, switched)) + geom_jitter(position = position_jitter(width = 0.2, height = 0.01), shape = 20, color = "darkred") + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.3[1] - beta.mean.3[3] * x))) + stat_function(fun = function(x) 1 / (1 + exp(- beta.mean.3[1] - beta.mean.3[2] * beta.mean.3[3] * x))) + annotate("text", x = c(1.7,2.5), y = c(0.78, 0.56), label = c("if dist = 0", "if dist = 50"), size = 4) + scale_x_continuous("arsenic concentration in well water", breaks = seq(from = 0, by = 2, length.out = 5)) + scale_y_continuous("pr(switching)", breaks = seq(0, 1, 0.2)) print(p5) 12
13 if dist = 0 Pr(switching) if dist = Arsenic concentration in well water 13
4.2 centering Chris Parrish July 2, 2016
4.2 centering Chris Parrish July 2, 2016 Contents centering and standardizing 1 centering.................................................. 1 data.................................................. 1 model.................................................
More informationStan Workshop. Peter Gao and Serge Aleshin-Guendel. November
Stan Workshop Peter Gao and Serge Aleshin-Guendel November 21 2017 Set Up To run a model in Stan in R, you need two files: a.stan file and a.r file. The.stan file declares the model, the.r file performs
More informationlm statistics Chris Parrish
lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................
More informationNon-analytic; can only be solved with MCMC. Last model isn t found in any package.
DAY 2 STAN WORKSHOP RECAP RECAP Ran multiple Stan programs! Wrote multiple Stan programs. Non-analytic; can only be solved with MCMC. Last model isn t found in any package. Started learning a new language.
More informationGeneralized Linear Models
Generalized Linear Models STAT 489-01: Bayesian Methods of Data Analysis Spring Semester 2017 Contents 1 Multi-Variable Linear Models 1 2 Generalized Linear Models 4 2.1 Illustration: Logistic Regression..........
More informationHierarchical models. Dr. Jarad Niemi. STAT Iowa State University. February 14, 2018
Hierarchical models Dr. Jarad Niemi STAT 544 - Iowa State University February 14, 2018 Jarad Niemi (STAT544@ISU) Hierarchical models February 14, 2018 1 / 38 Outline Motivating example Independent vs pooled
More informationBayesian Estimation of Regression Models
Bayesian Estimation of Regression Models An Appendix to Fox & Weisberg An R Companion to Applied Regression, third edition John Fox last revision: 2018-10-01 Abstract In this appendix to Fox and Weisberg
More informationMIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis
MIT 1985 1/30 Stan: a program for Bayesian data analysis with complex models Andrew Gelman, Bob Carpenter, and Matt Hoffman, Jiqiang Guo, Ben Goodrich, and Daniel Lee Department of Statistics, Columbia
More informationExercises R For Simulations Columbia University EPIC 2015 (with answers)
Exercises R For Simulations Columbia University EPIC 2015 (with answers) C DiMaggio June 10, 2015 Contents 1 Sampling and Simulations 2 1.1 Simulating Risk Ratios.......................... 4 1.2 Bootstrap
More informationIntroduction to Stan for Markov Chain Monte Carlo
Introduction to Stan for Markov Chain Monte Carlo Matthew Simpson Department of Statistics, University of Missouri April 25, 2017 These slides and accompanying R and Stan files are available at http://stsn.missouri.edu/education-and-outreach.shtml
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationCHAPTER 5. Logistic regression
CHAPTER 5 Logistic regression Logistic regression is the standard way to model binary outcomes (that is, data y i that take on the values 0 or 1). Section 5.1 introduces logistic regression in a simple
More informationOnline Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models
Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models A. STAN CODE // STAN code for Bayesian bivariate model // based on code
More informationBayesian Regression for a Dirichlet Distributed Response using Stan
Bayesian Regression for a Dirichlet Distributed Response using Stan Holger Sennhenn-Reulen 1 1 Department of Growth and Yield, Northwest German Forest Research Institute, Göttingen, Germany arxiv:1808.06399v1
More informationAssignment 2: K-Nearest Neighbors and Logistic Regression
Assignment 2: K-Nearest Neighbors and Logistic Regression SDS293 - Machine Learning Due: 4 Oct 2017 by 11:59pm Conceptual Exercises 4.4 parts a-d (p. 168-169 ISLR) When the number of features p is large,
More information2017 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling
2017 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics, University of Washington 2017-09-05 Overview In this set of notes
More informationHierarchical Linear Models
Hierarchical Linear Models Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin The linear regression model Hierarchical Linear Models y N(Xβ, Σ y ) β σ 2 p(β σ 2 ) σ 2 p(σ 2 ) can be extended
More informationCheat Sheet: Linear Regression
Cheat Sheet: Linear Regression Measurement and Evaluation of HCC Systems Scenario Use regression if you want to test the simultaneous linear effect of several variables varx1, varx2, on a continuous outcome
More information36-463/663: Multilevel & Hierarchical Models HW09 Solution
36-463/663: Multilevel & Hierarchical Models HW09 Solution November 15, 2016 Quesion 1 Following the derivation given in class, when { n( x µ) 2 L(µ) exp, f(p) exp 2σ 2 0 ( the posterior is also normally
More informationModel comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection
Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions
More informationmovies Name:
movies Name: 217-4-14 Contents movies.................................................... 1 USRevenue ~ Budget + Opening + Theaters + Opinion..................... 6 USRevenue ~ Opening + Opinion..................................
More informationAges of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008
Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More informationSimpson s Rule. f(x)dx = h 3 [f(x 0) + 4f(x 1 ) + f(x 2 )] h2 90 f (4) (ɛ)
Simpson s Rule Simpson s rule is another closed Newton-Cotes formula for approximating integrals over an interval with equally spaced nodes. Unlike the trapezoidal rule, which employs straight lines to
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationBAYESIAN INFERENCE IN STAN
BAYESIAN INFERENCE IN STAN DANIEL LEE bearlee@alum.mit.edu http://mc-stan.org OBLIGATORY DISCLOSURE I am a researcher at Columbia University. I am a cofounder of Stan Group Inc. and have equity. I am required
More informationWinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis
WinBUGS : part 2 Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert Gabriele, living with rheumatoid arthritis Agenda 2! Hierarchical model: linear regression example! R2WinBUGS Linear Regression
More informationHamiltonian Monte Carlo
Hamiltonian Monte Carlo within Stan Daniel Lee Columbia University, Statistics Department bearlee@alum.mit.edu BayesComp mc-stan.org Why MCMC? Have data. Have a rich statistical model. No analytic solution.
More informationStatistical Simulation An Introduction
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Simulation Through Bootstrapping Introduction 1 Introduction When We Don t Need Simulation
More information36-463/663Multilevel and Hierarchical Models
36-463/663Multilevel and Hierarchical Models From Bayes to MCMC to MLMs Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 Outline Bayesian Statistics and MCMC Distribution of Skill Mastery in a Population
More informationJoint longitudinal and time-to-event models via Stan
Joint longitudinal and time-to-event models via Stan Sam Brilleman 1,2, Michael J. Crowther 3, Margarita Moreno-Betancur 2,4,5, Jacqueline Buros Novik 6, Rory Wolfe 1,2 StanCon 2018 Pacific Grove, California,
More informationRunning head: BAYESIAN LINEAR MIXED MODELS: A TUTORIAL 1. Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and
Running head: BAYESIAN LINEAR MIXED MODELS: A TUTORIAL 1 Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists Tanner Sorensen University of Potsdam,
More informationBayesian course - problem set 5 (lecture 6)
Bayesian course - problem set 5 (lecture 6) Ben Lambert November 30, 2016 1 Stan entry level: discoveries data The file prob5 discoveries.csv contains data on the numbers of great inventions and scientific
More informationX t P oisson(λ) (16.1)
Chapter 16 Stan 16.1 Discoveries data revisited The file evaluation_discoveries.csv contains data on the numbers of great inventions and scientific discoveries (X t ) in each year from 1860 to 1959 [1].
More informationSpatial Smoothing in Stan: Conditional Auto-Regressive Models
Spatial Smoothing in Stan: Conditional Auto-Regressive Models Charles DiMaggio, PhD, NYU School of Medicine Stephen J. Mooney, PhD, University of Washington Mitzi Morris, Columbia University Dan Simpson,
More informationNon-Conjugate Models and Grid Approximations. Patrick Lam
Non-Conjugate Models and Grid Approximations Patrick Lam Outline The Binomial Model with a Non-Conjugate Prior Bayesian Regression with Grid Approximations Outline The Binomial Model with a Non-Conjugate
More informationA Comparison of Two MCMC Algorithms for Hierarchical Mixture Models
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology and Learning Systems ralmond@fsu.edu BMAW 2014 1
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationMCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham
MCMC 2: Lecture 2 Coding and output Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. General (Markov) epidemic model 2. Non-Markov epidemic model 3. Debugging
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationOnline Appendix to: Crises and Recoveries in an Empirical Model of. Consumption Disasters
Online Appendix to: Crises and Recoveries in an Empirical Model of Consumption Disasters Emi Nakamura Columbia University Robert Barro Harvard University Jón Steinsson Columbia University José Ursúa Harvard
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationBayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists
BAYESIAN LINEAR MIXED MODELS: A TUTORIAL 1 Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists Tanner Sorensen University of Potsdam, Potsdam, Germany
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationSupplementary Information for Ecology of conflict: marine food supply affects human-wildlife interactions on land
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Supplementary Information for Ecology of conflict: marine food supply affects human-wildlife interactions on land Kyle A. Artelle, Sean C. Anderson, John D. Reynolds,
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationPooling and Hierarchical Modeling of Repeated Binary Trial Data with Stan
Pooling and Hierarchical Modeling of Repeated Binary Trial Data with Stan Stan Development Team (in order of joining): Andrew Gelman, Bob Carpenter, (Matt Hoffman), Daniel Lee, Ben Goodrich, Michael Betancourt,
More informationLecture 9: Predictive Inference
Lecture 9: Predictive Inference There are (at least) three levels at which we can make predictions with a regression model: we can give a single best guess about what Y will be when X = x, a point prediction;
More informationCheat Sheet: ANOVA. Scenario. Power analysis. Plotting a line plot and a box plot. Pre-testing assumptions
Cheat Sheet: ANOVA Measurement and Evaluation of HCC Systems Scenario Use ANOVA if you want to test the difference in continuous outcome variable vary between multiple levels (A, B, C, ) of a nominal variable
More informationBig Data Analysis with Apache Spark UC#BERKELEY
Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationMultivariate Analysis of Variance
Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)
More informationIntroduction to mtm: An R Package for Marginalized Transition Models
Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition
More informationExam #1 March 9, 2016
Mathematics 1372/D552, Spring 2016 Instructor: Suman Ganguli Exam #1 March 9, 2016 Name: 1. (20 points) The following data set lists the high temperatures (in degrees Fahrenheit) observed in Central Park
More informationGibbs Sampling in Linear Models #1
Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1 Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and
More information(Moderately) Advanced Hierarchical Models
(Moderately) Advanced Hierarchical Models Ben Goodrich StanCon: January 12, 2018 Ben Goodrich Advanced Hierarchical Models StanCon 1 / 18 Obligatory Disclosure Ben is an employee of Columbia University,
More informationElectric Field Mapping
Electric Field Mapping Equipment: mapping board, U-probe, 5 resistive boards, templates, knob adjustable DC voltmeter, 4 long leads, 16 V DC for wall strip, 8 1/2 X 11 sheets of paper Reading: Topics of
More informationBayesian Estimation of Expected Cell Counts by Using R
Bayesian Estimation of Expected Cell Counts by Using R Haydar Demirhan 1 and Canan Hamurkaroglu 2 Department of Statistics, Hacettepe University, Beytepe, 06800, Ankara, Turkey Abstract In this article,
More informationLecture 9: Predictive Inference for the Simple Linear Model
See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 9: Predictive Inference for the Simple Linear Model 36-401, Fall 2015, Section B 29 September 2015 Contents 1 Confidence intervals
More informationIntroductory Statistics with R: Simple Inferences for continuous data
Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu
More informationThe Central Limit Theorem
The Central Limit Theorem Suppose n tickets are drawn at random with replacement from a box of numbered tickets. The central limit theorem says that when the probability histogram for the sum of the draws
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationCoupled Hidden Markov Models: Computational Challenges
.. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov
More informationWeakly informative priors
Department of Statistics and Department of Political Science Columbia University 21 Oct 2011 Collaborators (in order of appearance): Gary King, Frederic Bois, Aleks Jakulin, Vince Dorie, Sophia Rabe-Hesketh,
More informationPreferences in college applications
Preferences in college applications A non-parametric Bayesian analysis of top-10 rankings Alnur Ali 1 Thomas Brendan Murphy 2 Marina Meilă 3 Harr Chen 4 1 Microsoft 2 University College Dublin 3 University
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationA Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt
A Handbook of Statistical Analyses Using R 3rd Edition Torsten Hothorn and Brian S. Everitt CHAPTER 12 Quantile Regression: Head Circumference for Age 12.1 Introduction 12.2 Quantile Regression 12.3 Analysis
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationElectric Field Mapping Lab 2. Precautions
TS 2-12-12 Electric Field Mapping Lab 2 1 Electric Field Mapping Lab 2 Equipment: mapping board, U-probe, resistive boards, templates, dc voltmeter (431B), 4 long leads, 16 V dc for wall strip Reading:
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More informationQuantitative Bivariate Data
Statistics 211 (L02) - Linear Regression Quantitative Bivariate Data Consider two quantitative variables, defined in the following way: X i - the observed value of Variable X from subject i, i = 1, 2,,
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationMarkov Chain Monte Carlo Lecture 4
The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationModels with qualitative explanatory variables p216
Models with qualitative explanatory variables p216 Example gen = 1 for female Row gpa hsm gen 1 3.32 10 0 2 2.26 6 0 3 2.35 8 0 4 2.08 9 0 5 3.38 8 0 6 3.29 10 0 7 3.21 8 0 8 2.00 3 0 9 3.18 9 0 10 2.34
More informationPackage horseshoe. November 8, 2016
Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationCheat Sheet: factorial ANOVA
Cheat Sheet: factorial ANOVA Measurement and Evaluation of HCC Systems Scenario Use factorial ANOVA if you want to test the effect of two (or more) nominal variables varx1 and varx2 on a continuous outcome
More informationBEGINNING BAYES IN R. Comparing two proportions
BEGINNING BAYES IN R Comparing two proportions Learning about many parameters Chapters 2-3: single parameter (one proportion or one mean) Chapter 4: multiple parameters Two proportions from independent
More informationModelling Biochemical Reaction Networks. Lecture 21: Phase diagrams
Modelling Biochemical Reaction Networks Lecture 21: Phase diagrams Marc R. Roussel Department of Chemistry and Biochemistry Phase diagrams Bifurcation diagrams show us, among other things, where bifurcations
More informationMeasurement error as missing data: the case of epidemiologic assays. Roderick J. Little
Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods
More informationESP 178 Applied Research Methods. 2/23: Quantitative Analysis
ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and
More information13 Notes on Markov Chain Monte Carlo
13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More information