Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Size: px
Start display at page:

Download "Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1"

Transcription

1 Models for Count and Binary Data Poisson and Logistic GWR Models 24/07/2008 GWR Workshop 1

2 Outline I: Modelling counts Poisson regression II: Modelling binary events Logistic Regression III: Poisson Regression in GWR premature mortality in Tokyo IV: Logistic Regression in GWR landslides in Clearwater National Forest, Idaho 24/07/2008 2

3 Background Standard GWR uses OLS (Ordinary Least Squares) methods. These are not always the best option. OLS assumes a Normal (Gaussian) error term 24/07/2008 3

4 Non-Normality (1) Count data need a model form that cannot predict negative values! Poisson cases of an illness sightings of a rare animal number of crimes Number of earth tremors 24/07/2008 4

5 Non-Normality (2) Dichotomous (Binary; Yes/No; 1/0) outcomes need a model form that predicts the probability that an observation is 1 hence must give probabilities between 0-1 Logistic or Binary Logit Does an individual have a disease or not? Is a house detached or not? Was a crime committed at this location within the past week? 24/07/2008 5

6 I: Models for Count Data Poisson Regression 24/07/2008 6

7 The Poisson Model Lamba is the expected count of objects given the conditions at location i The betas are regression coefficients The x s are predictor variables Note that lambda will always be greater than or equal to zero 24/07/2008 7

8 Offsets Often we have a population at risk for count data For example, population susceptible to a disease Number of households (for rates of household burglaries) For zone-based data, this quantity changes from zone to zone and we need to allow for this in our model We do this by using an offset 24/07/2008 8

9 Adding the Offset P i is a variable (not a parameter to calibrate) It represents the population at risk Calibration of betas is by an iterative process it takes longer! 24/07/2008 9

10 Example Household burglary counts for 43 Police Forces in UK (the y i s) Offset = no. of households (P i ) Predictors - these are the x ij s Population Density (persons/sqkm) Unemployed males aged as % of total population 24/07/

11 Results Estimate Std. Error z value P - value (Intercept) e <0.001 Youngunemp 9.601e e <0.001 Density 1.884e e < /07/

12 Question Why are z-values so high? Sometimes due to variation in the counts being more than expected for a Poisson distribution Maybe this is because the parameters vary over space? 24/07/

13 Geographically Weighted Poisson Regression The above is still a global model Question do the same relationships hold everywhere? Is the linkage between unemployed young males and burglary the same everywhere? We need to extend the previous model to a geographically weighted version 24/07/

14 Geographically Weighted Poisson Model Where u i and v i are the coordinates of observation i 24/07/

15 Results for Burglary Data 24/07/

16 II: Models for binary data Logistic Regression 24/07/

17 Logistic Models pr( y i = 1) = p i pr( y i = 0) = 1 p i Here, p i is the mean of the distribution for a dichotomous y i where y i is the dependent variable We need to pay attention to p i -this depends on the explanatory variables How does p i depend on (x 1i,x 2i, x mi )? 24/07/

18 The logistic model p i = i logit( β + β x + β x i ) where logit( z) = exp( z) 1+ exp( z) 24/07/

19 Graph of Logit function logit(x) x 24/07/

20 Alternative form p log( i ) = β + β x + β x i 2 2i 1 p i... where the left hand side of the equation is the log odds for y i = 1 24/07/

21 Interpreting the parameters As with Poisson, parameters make more sense if you take antilogs: we can write pi 1 p i = exp( β )exp( β x )exp( β x 0 1 1i 2 2i )... Each exp(β j ) gives a multiplicative factor for the odds that y i =1 when the corresponding predictor increases by 1 unit. Note multiplicative factor is for the ODDS not the PROBABILITY 24/07/

22 An Example: Housing in the UK Dependent variable: Does a house have more than 1 bathroom Dichotomous (binary) value Independent variable: Floor Area (sq. m) 24/07/

23 Results Estimate Std. Error z value Pr(> z ) Intercept <0.001 FloorArea <0.001 Exp(beta) as % Increase Pr(> z ) FloorArea < /07/

24 24/07/ Geographically Weighted Logistic Regression Where (u i,v i ) are the coordinates of observation i Note as before that the betas are functions not coefficients to be estimated using non-parametric methods i.e. β 0 (u,v) and so on... ), ( ), ( ), ( ) 1 log( = i i i i i i i i i i x v u x v u v u p p β β β

25 24/07/

26 Interpretation 2nd Bathrooms more likely in houses in South Wales and southern England General North/South effect Perhaps because property is more expensive in the south of the UK people tend to add second bathrooms to smaller houses 24/07/

27 Issues Convergence problems If there are areas dominated by y i s all equal to zero or one Take care with automatic bandwidth selection Possibly best policy is trial and error with manual bandwidth control in some cases 24/07/

28 Poisson and Logistic Regression in GWR 24/07/2008 GWR Workshop 28

29 III: Poisson Regression premature mortality in Toyko IV: Logistic Regression landslides in Clearwater National Forest, Idaho 24/07/

30 III Poisson GW Regression 24/07/

31 Poisson Regression To the user this looks as if it is implemented in GWR in almost the same way as ordinary Gaussian regression. The dependent variable must be a count variable (i.e. the values must be integers [whole numbers]) 24/07/

32 Rates: Count & Offsets If you wish to model counts which relate to areas with a varying underlying population you can do this with an offset variable y = ne X β (n is the offset) 24/07/

33 The Offset Variable To keep the Model Editor simple, a variable entered as the weight variable in Poisson regression is treated as the offset 24/07/

34 Outputs These are the similar to those which are obtained for ordinary Gaussian regression The interpretation of the parameters is slightly different 24/07/

35 The data We will use data for 261 municipalities in Tokyo Metropolitan Area We are considering determinants of premature mortality The independent variables are proportions of the elderly, professionals, home owners, and unemployed 24/07/

36 Offset The premature mortality count will obviously vary according to the size of the zone As an offset, we will use the expected number of premature deaths 24/07/

37 24/07/

38 24/07/

39 24/07/

40 24/07/

41 24/07/

42 24/07/

43 The data The data are the folder \SampleData\Tokyo There are shapefiles for both the municipality and prefecture boundaries 24/07/

44 The data The are some dbase files with socioeconomic data There is a data file for GWR3 There is a table with information on the various geographies we will need this for mapping the GWR results 24/07/

45 The model editor After you have selected the data you complete the model editor thus Notice the offset variable as the weight 24/07/

46 Patience The Poisson model is fitted used a method known as iteratively reweighed least squares This takes about 5 times as long as fitting a Gaussian model 24/07/

47 Header *************************************************************** * * * GEOGRAPHICALLY WEIGHTED POISSON REGRESSION * * * *************************************************************** Number of data cases read: 262 Sample data file read... *Number of observations, nobs= 262 *Number of predictors, nvar= 4 Observation Easting extent: Observation Northing extent: /07/

48 Calibration *Finding bandwidth using all regression points This can take some time... *Calibration will be based on 262 cases *Adaptive kernel sample size limits: *Crossvalidation begins... Bandwidth CV Score ** Convergence after 9 function calls ** Convergence: Local Sample Size= 79 24/07/

49 Global Model ********** Global Poisson Model Diagnostics ********** Convergence after 3 iterations Log-likelihood: Deviance (-2LogLikelihood): Trace of the Hat Matrix: Number of parameters in model: Akaike Information Criterion: Corrected AIC (AICc) Bayesian Information Criterion: Parameter Estimate Std Err T Exp(B) Sd(Exp(B)) Intercept Professl Elderly OwnHome Unemply /07/

50 Local Model ********** Local Poisson Model Diagnostics ********** Log Likelihood: Deviance: Trace of the hat matrix Residual sum of squares Effective number of parameters Akaike Information Criterion Corrected AIC Bayesian Information Criterion /07/

51 5-number summaries ********************************************************** * PARAMETER 5-NUMBER SUMMARIES * ********************************************************** Label Minimum Lwr Quartile Median Upr Quartile Maximum Intrcept Professl Elderly OwnHome Unemply /07/

52 Mapping The municipality and data used for the GWR model are in two different map projections However there is a lookup table to link the two attribute tables this time we do not use a spatial join 24/07/

53 GeogIndex.csv The lookup table is a comma separated variable file The ID field contains the IDs used in the municipality shapefile The GWR_ID field contains the sequence numbers assigned by GWR3 24/07/

54 TMABSU attributes 24/07/

55 GeogIndex.csv 24/07/

56 Tokyo point attributes 24/07/

57 24/07/

58 The result You can now map parameter values, and the other diagnostics Professionals 24/07/

59 IV Logistic GW Regression 24/07/

60 Logistic Regression You use logistic regression when your dependent variable is binary or dichotomous the values should be 0 or 1 The independents can be continuous or binary valued dummy variables 24/07/

61 Predicted π The predicted value is the probability that the dependent variable is 1 It is continuous, lying between 0 and 1 24/07/

62 Landslides In November 1995 and February 1996 there were some 865 landslides in Clearwater National Forest, Idaho Gorsesvki et al (TGIS, 10(3) ) suggested that topographic factors may be an influence of landslide occurrence 24/07/

63 The Data We have extracted data for a subset of 239 observations 138 landslide sites and 101 control sites 63

64 The Data Our dataset also contains some topographic indicators for sites in Clearwater National Forest where there have been landslides A similar number of locations have been chosen randomly in the study area where there have not been landslides the control sites A binary variable, Landslid, indicates whether the sample is a landslide or a control site coded 1 or 0 respectively 24/07/

65 Landslide sites are yellow, control sites are black 65

66 Topographic variables The variables have been mostly generated from a digital elevation model This was created from Shuttle Radar Topography Mission data The grid size is 25m 24/07/

67 Predictor Variables Elevation (metres) Slope (% - 0=flat, 10=vertical) Sine of the aspect ( ) Cosine of the aspect ( ) Absolute deviation from due south ( degrees) Distance to the nearest watercourse (metres) 24/07/

68 The data file The data file is named landslides.csv and is in the \SampleData\Clearwater folder There is also a georeferenced scanned map of the study area called basemap.jpg 24/07/

69 Input and Output Files 24/07/

70 Model Editor For the first model use Landslid as the Dependent Variable, and Elev and Slope as the Independent Variables. Use an Adaptive kernel we have Cartesian coordinates 24/07/

71 Running the Model As with Poisson Regression, the model is fitted using iteratively reweighted least squares This means it will take a little longer to run than an ordinary GWR, so be patient 24/07/

72 Control and Listing Files 24/07/

73 Header *************************************************************** * * * GEOGRAPHICALLY WEIGHTED LOGISTIC REGRESSION * * * *************************************************************** Number of data cases read: 239 Sample data file read... *Number of observations, nobs= 239 *Number of predictors, nvar= 2 P(Landslid=1 X) = Observation Easting extent: Observation Northing extent: We have 239 observations with 2 predictors. The study area extent is 33.5km by /07/

74 Calibration *Adaptive kernel sample size limits: *AICc minimisation begins... Bandwidth AICc ** Convergence after 8 function calls ** Convergence: Local Sample Size= 85 This is quite a large bandwidth notice that there is not much variation between the AICs for a range of bandwidths 74

75 Global Model ********** Global Logistic Model Diagnostics ********** Convergence after 5 iterations Log-likelihood: Deviance (-2LogLikelihood): Number of parameters in model: Akaike Information Criterion: Corrected AIC (AICc) Bayesian Information Criterion: Parameter Estimate Std Err T Exp(B) Sd(Exp(B)) Intercept Elev Slope The global AICc is The elevation and slope parameters are both significant higher elevations decrease, and steeper slopes increase, the probability of a landslide 75

76 Local Model ********** Local Logistic Model Diagnostics ********** Log Likelihood: Deviance: Residual sum of squares Effective number of parameters Akaike Information Criterion Corrected AIC Bayesian Information Criterion There is a smaller AICc with the local model so we have some improvement in fit 24/07/

77 5-number summaries ********************************************************** * PARAMETER 5-NUMBER SUMMARIES * ********************************************************** Label Minimum Lwr Quartile Median Upr Quartile Maximum Intrcept Elev Slope Most of the local elevation parameters are negative, and most of the local slope parameters are negative 24/07/

78 Visualising As before, you use ArcMap. The scanned basemap is basemap.jpg The Interchange File must be converted to a coverage Visualisation is as before 24/07/

79 Predicted/Observed The residual is the difference between the observed y (0/1) and the predicted probability of 1 Values from are locations where no landslide occurred but one has been predicted Values from 0.5 1are landslide sites where no landslide has been predicted we might look at these to see what other characteristics they might have which are not in the model 24/07/

80 The default symbology choices for the RESID variable: to change these click on Classify 80

81 The blue bars representing the breaks are removed click to highlight, then right-click to choose Delete Break 81

82 3 breaks at -0.5, 0.5 and the top value 24/07/

83 Small cluster of 3 false negatives near Sheep Mountain Work Center 83

84 End of presentation 24/07/

Using Spatial Statistics Social Service Applications Public Safety and Public Health

Using Spatial Statistics Social Service Applications Public Safety and Public Health Using Spatial Statistics Social Service Applications Public Safety and Public Health Lauren Rosenshein 1 Regression analysis Regression analysis allows you to model, examine, and explore spatial relationships,

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM New Mexico Health Insurance Coverage, 2009-2013 Exploratory, Ordinary Least Squares, and Geographically Weighted Regression Using GeoDa-GWR, R, and QGIS Larry Spear 4/13/2016 (Draft) A dataset consisting

More information

Modeling Spatial Relationships Using Regression Analysis

Modeling Spatial Relationships Using Regression Analysis Esri International User Conference San Diego, California Technical Workshops July 24, 2012 Modeling Spatial Relationships Using Regression Analysis Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS Answering

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Modeling Spatial Relationships using Regression Analysis

Modeling Spatial Relationships using Regression Analysis Esri International User Conference San Diego, CA Technical Workshops July 2011 Modeling Spatial Relationships using Regression Analysis Lauren M. Scott, PhD Lauren Rosenshein, MS Mark V. Janikas, PhD Answering

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS Modeling Spatial Relationships Using Regression Analysis Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS Workshop Overview Answering why? questions Introduce regression analysis - What it is and why

More information

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Kristina Pestaria Sinaga, Manuntun Hutahaean 2, Petrus Gea 3 1, 2, 3 University of Sumatera Utara,

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

ESRI 2008 Health GIS Conference

ESRI 2008 Health GIS Conference ESRI 2008 Health GIS Conference An Exploration of Geographically Weighted Regression on Spatial Non- Stationarity and Principal Component Extraction of Determinative Information from Robust Datasets A

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD

GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD Some Slides to Go Along with the Demo Hot spot analysis of average age of death Section B DEMO: Mortality Data Analysis 2 Some Slides to Go Along with the Demo Do Economic Factors Alone Explain Early Death?

More information

Geographically Weighted Regression LECTURE 2 : Introduction to GWR II

Geographically Weighted Regression LECTURE 2 : Introduction to GWR II Geographically Weighted Regression LECTURE 2 : Introduction to GWR II Stewart.Fotheringham@nuim.ie http://ncg.nuim.ie/gwr A Simulation Experiment Y i = α i + β 1i X 1i + β 2i X 2i Data on X 1 and X 2 drawn

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer

More information

Introduction To Raster Based GIS Dr. Zhang GISC 1421 Fall 2016, 10/19

Introduction To Raster Based GIS Dr. Zhang GISC 1421 Fall 2016, 10/19 Introduction To Raster Based GIS Dr. Zhang GISC 1421 Fall 2016, 10/19 Model of the course Using and making maps Navigating GIS maps Map design Working with spatial data Geoprocessing Spatial data infrastructure

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

Categorical and Zero Inflated Growth Models

Categorical and Zero Inflated Growth Models Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Lauren Jacob May 6, Tectonics of the Northern Menderes Massif: The Simav Detachment and its relationship to three granite plutons

Lauren Jacob May 6, Tectonics of the Northern Menderes Massif: The Simav Detachment and its relationship to three granite plutons Lauren Jacob May 6, 2010 Tectonics of the Northern Menderes Massif: The Simav Detachment and its relationship to three granite plutons I. Introduction: Purpose: While reading through the literature regarding

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample

More information

Geographical General Regression Neural Network (GGRNN) Tool For Geographically Weighted Regression Analysis

Geographical General Regression Neural Network (GGRNN) Tool For Geographically Weighted Regression Analysis Geographical General Regression Neural Network (GGRNN) Tool For Geographically Weighted Regression Analysis Muhammad Irfan, Aleksandra Koj, Hywel R. Thomas, Majid Sedighi Geoenvironmental Research Centre,

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Data Structures & Database Queries in GIS

Data Structures & Database Queries in GIS Data Structures & Database Queries in GIS Objective In this lab we will show you how to use ArcGIS for analysis of digital elevation models (DEM s), in relationship to Rocky Mountain bighorn sheep (Ovis

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate

More information

Statistics: A review. Why statistics?

Statistics: A review. Why statistics? Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016 Niche Modeling Katie Pollard & Josh Ladau Gladstone Institutes UCSF Division of Biostatistics, Institute for Human Genetics and Institute for Computational Health Science STAMPS - MBL Course Woods Hole,

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

PASS Sample Size Software. Poisson Regression

PASS Sample Size Software. Poisson Regression Chapter 870 Introduction Poisson regression is used when the dependent variable is a count. Following the results of Signorini (99), this procedure calculates power and sample size for testing the hypothesis

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

Varieties of Count Data

Varieties of Count Data CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function

More information

Community Health Needs Assessment through Spatial Regression Modeling

Community Health Needs Assessment through Spatial Regression Modeling Community Health Needs Assessment through Spatial Regression Modeling Glen D. Johnson, PhD CUNY School of Public Health glen.johnson@lehman.cuny.edu Objectives: Assess community needs with respect to particular

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Regression Analysis. A statistical procedure used to find relations among a set of variables.

Regression Analysis. A statistical procedure used to find relations among a set of variables. Regression Analysis A statistical procedure used to find relations among a set of variables. Understanding relations Mapping data enables us to examine (describe) where things occur (e.g., areas where

More information

Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012

Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012 Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models Jian Wang September 18, 2012 What are mixed models The simplest multilevel models are in fact mixed models:

More information

Acknowledgments xiii Preface xv. GIS Tutorial 1 Introducing GIS and health applications 1. What is GIS? 2

Acknowledgments xiii Preface xv. GIS Tutorial 1 Introducing GIS and health applications 1. What is GIS? 2 Acknowledgments xiii Preface xv GIS Tutorial 1 Introducing GIS and health applications 1 What is GIS? 2 Spatial data 2 Digital map infrastructure 4 Unique capabilities of GIS 5 Installing ArcView and the

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial Analysis Summary: Acute Myocardial Infarction and Social Determinants of Health Acute Myocardial Infarction Study Summary March 2014 Project Summary :: Purpose This report details analyses and methodologies

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Outline. ArcGIS? ArcMap? I Understanding ArcMap. ArcMap GIS & GWR GEOGRAPHICALLY WEIGHTED REGRESSION. (Brief) Overview of ArcMap

Outline. ArcGIS? ArcMap? I Understanding ArcMap. ArcMap GIS & GWR GEOGRAPHICALLY WEIGHTED REGRESSION. (Brief) Overview of ArcMap GEOGRAPHICALLY WEIGHTED REGRESSION Outline GWR 3.0 Software for GWR (Brief) Overview of ArcMap Displaying GWR results in ArcMap stewart.fotheringham@nuim.ie http://ncg.nuim.ie ncg.nuim.ie/gwr/ ArcGIS?

More information

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Exploratory Spatial Data Analysis (ESDA)

Exploratory Spatial Data Analysis (ESDA) Exploratory Spatial Data Analysis (ESDA) VANGHR s method of ESDA follows a typical geospatial framework of selecting variables, exploring spatial patterns, and regression analysis. The primary software

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

How to Model Stream Temperature Using ArcMap

How to Model Stream Temperature Using ArcMap How to Model Stream Temperature Using ArcMap Take note: Assumption before proceeding: A temperature point file has been attributed with TauDEM variables. There are three processes described in this document.

More information

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates WI-ATSA June 2-3, 2016 Overview Brief description of logistic

More information

Fixed effects results...32

Fixed effects results...32 1 MODELS FOR CONTINUOUS OUTCOMES...7 1.1 MODELS BASED ON A SUBSET OF THE NESARC DATA...7 1.1.1 The data...7 1.1.1.1 Importing the data and defining variable types...8 1.1.1.2 Exploring the data...12 Univariate

More information

MODULE 12: Spatial Statistics in Epidemiology and Public Health Lecture 7: Slippery Slopes: Spatially Varying Associations

MODULE 12: Spatial Statistics in Epidemiology and Public Health Lecture 7: Slippery Slopes: Spatially Varying Associations MODULE 12: Spatial Statistics in Epidemiology and Public Health Lecture 7: Slippery Slopes: Spatially Varying Associations Jon Wakefield and Lance Waller 1 / 53 What are we doing? Alcohol Illegal drugs

More information

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017 MLMED User Guide Nicholas J. Rockwood The Ohio State University rockwood.19@osu.edu Beta Version May, 2017 MLmed is a computational macro for SPSS that simplifies the fitting of multilevel mediation and

More information

Examining the extent to which hotspot analysis can support spatial predictions of crime

Examining the extent to which hotspot analysis can support spatial predictions of crime Examining the extent to which hotspot analysis can support spatial predictions of crime Spencer Paul Chainey Thesis submitted in accordance with the requirements of the Degree of Doctor of Philosophy University

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Introduction To Logistic Regression

Introduction To Logistic Regression Introduction To Lecture 22 April 28, 2005 Applied Regression Analysis Lecture #22-4/28/2005 Slide 1 of 28 Today s Lecture Logistic regression. Today s Lecture Lecture #22-4/28/2005 Slide 2 of 28 Background

More information

Dose-Response Analysis Report

Dose-Response Analysis Report Contents Introduction... 1 Step 1 - Treatment Selection... 2 Step 2 - Data Column Selection... 2 Step 3 - Chemical Selection... 2 Step 4 - Rate Verification... 3 Step 5 - Sample Verification... 4 Step

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29 14.1 Regression Models

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data Presented at the Seventh World Conference of the Spatial Econometrics Association, the Key Bridge Marriott Hotel, Washington, D.C., USA, July 10 12, 2013. Application of eigenvector-based spatial filtering

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

1. BINARY LOGISTIC REGRESSION

1. BINARY LOGISTIC REGRESSION 1. BINARY LOGISTIC REGRESSION The Model We are modelling two-valued variable Y. Model s scheme Variable Y is the dependent variable, X, Z, W are independent variables (regressors). Typically Y values are

More information