A methodology to model the number of completeness errors using count data regression models

Size: px
Start display at page:

Download "A methodology to model the number of completeness errors using count data regression models"

Transcription

1 Malta, January 2015 INTERNATIONAL WORKSHOP ON SPATIAL DATA AND MAP QUALITY A methodology to model the number of completeness errors using count data regression models José Rodríguez-Avi 1 & Francisco Javier Ariza-López 2 1 Dpto. Estadística e Investigación Operativa, jravi@ujaen.es 2 Dpto. Ingeniería Cartográfica, Geodésica y Fotogrametría, fjariza@ujaen.es Universidad de Jaén Paraje de las Lagunillas S/N E Jaén 1

2 Completeness errors: They are very important in a spatial data set. Completeness is related with the quality of the survey and with the oldness of the data (is affected by time). Following ISO we can distinguish: Omissions: An element (such as a bridge, a building, a cross, a power plant) is in the terrain but it does not appear in the map) and 2 7 Commissions: An element appears in the map but it does not exist in the terrain The greater the number of these errors is, the worse is the quality of the product. We also want to obtain additional information about the product, related to structural aspects, and to investigate the existence of relationships between errors and structural aspects. 7 Omission of a pool 2 Omission of a building 2

3 We are used to employ linear regression model when the response variable is continuous IN OUR CASE: The number of errors is not a continuous but a count data random variable and it should be modelled by discrete models: Poisson, Negative Binomial, Waring, and so on. Additionally, structural aspects of the spatial data set may be taken into account as exogenous co variables in order to explain the number of errors: Use of Count Data Regression Models Omissions 3

4 Count Data Regression Models We start with a dependent count data variable, (for instance, number of omission errors in a tile) We add some structural information for each tile (covariates) We define a cell (of tiles) as the set of tiles that have the same covariates values: All the tiles in the same cell are indistinguishable in terms of covariates Appling a count data regression model we propose a residual discrete distribution for each cell, in a way that parameters of the discrete distribution depend on covariates. We propose several models, and we choose the best model (in any sense) Once the best model is selected, we can: Determine which covariates are related to and how this relation is Obtain the parameters of the residual distribution for any cell and calculate any probability. 4

5 Methodological procedure: Count Data Regression Models Let,, be the set of covariates. The distribution of the response variable given, has a discrete distribution with q parameters and whose mean is a function of the s: where,,, are coefficients to be estimated. The rest of distribution s parameter are estimated independently of covariates. In consequence the number of parameters to be estimated are (number of covariates) + 1 (Intercept) + 1 5

6 Methodological procedure: Count Data Regression Models The coefficient shows the relation, if any, between the independent variable and the dependent variable. Using properties of the MLE method, we can make a test of hypothesis for each dividing the coefficient by its standard deviation (the Wald test). If the coefficient can be consider as equal to 0 we conclude that the corresponding variable has not relation with Y. In other case, both variables are related and the coefficient s sign indicates if such relation is positive (if X increases, Y increases) or negative (If X increases, Y decreases). 6

7 Count Data Regression Models Statistical models: Residual density Equidispersion: Poisson Regression model (PRM). exp Overdispersion: Negative Binomial Regression model (NBRM). Γ Γ! Overdispersion: Generalized Waring Regression model (GWRM). P Y X Γ ρ Γ k ρ Γ Γ k Γ ρ! Γ y Γ k y Γ k ρ y y! 7

8 Count Data Regression Models Statistical models: Estimation of parameters Equidispersion: Poisson Regression model (PRM). Overdispersion: Negative Binomial Regression model (NBRM). Overdispersion: Generalized Waring Regression model (GWRM). 1 8

9 Count Data Regression Models Statistical models: Residual variance Equidispersion: Poisson Regression model (PRM). Overdispersion: Negative Binomial Regression model (NBRM). 1 Overdispersion: Generalized Waring Regression model (GWRM)

10 Count Data Regression Models Models selection Akaike Information Criteria (AIC) 2 ln 2 which is based on the fitted log likelihood function by the ML method, with a penalization related to the number of parameters of the model,. The model with the lower AIC is preferred. In consequence, once fitted all the models, we select the one which has the lower AIC. The AIC has only a comparative value. It does not signify anything by itself 10

11 Count Data Regression Models Models selection To probe if a set of variables must be included or excluded in a model we propose employing the Likelihood Ratio Test, (LRT) in its asymptotic version. If we denote model 1 the one with less covariates and Model 2 the one with more covariates, the LRT is given by: 2 where and are the corresponding likelihood values for the estimated model 1 and 2. The value is asymptotically distributed as a with degrees of freedom (where f is the difference between the number of estimated parameters in both models) 11

12 Count Data Regression Models Software The three models can be fitted using the statistical package R: the glm function of the stats package for the PRM, the glm.nbin function of the MASS package for the NBRM the GWRM package for the GWRM. In all cases, these programs provide Parameters estimation and Wald Test, log likelihood AIC values. 12

13 The Data for the Analysis This study is based on an actual case where a published GDS was assessed by means of a field survey. The GDS is called MTA10v (from Mapa Topográfico de Andalucía E10k vectorial ) which is the official cartography of Andalusia (Spain). It is a map series produced between 1987 and 2007 by the Instituto Cartográfico de Andalucía (nowadays Instituto de Estadística y Cartografía de Andalucía). The MTA10v is a topographic vector database derived from a topographic paper map designed at the beginning of the eighties. The MTA10v is the base for the different thematic datasets and maps of the Regional Government of Andalusia. It has a complete territorial coverage on a semidetailed scale (1:10000) and is updated in a four year cycle on a sheets basis. It is composed of 2745 sheets obtained by manual photogrammetric restitution of flights at 1:25000 scale and updated with flights at 1:20000 scale. 13

14 The Data for the Analysis Because of the large area (87000 km 2 ) covered by the MTA10v the region is divided into four quadrants (ICA, 2002b). So through the cyclic updating strategy each year yields an updating of a fourth of the region. The MTA10v is in the 30N UTM projected coordinate system, and referenced to the ED50 datum. The declared positional accuracy of the MTA10v is RMSE 3m (Corral and García, 2000). An independent accuracy study (Ariza López 2005) based on a random sample of 930 points surveyed by means of a differential GPS fast static survey informs us that positional accuracy is of 10.65m at 95% confidence level. 14

15 The Data for the Analysis General view of the content of the MTA 15

16 The Data for the Analysis Count of elements per layer in the MTA10v series Layer Count of total elements in the spatial data base (n) Edifications Buildings, storage (except water), shed, unique building, antenna, spotlight, transformer building), residential blocks, railway stations, port, airport, airfield, heliport, electrical substation, pumping station, fence, cave, etc. Hydrography River, stream, runway, tide, reservoir, sea, lake, etc. Energy Infrastructure Electric turret, transformer, power line, pipeline, etc. Water Infrastructure Canal, ditch, water supply, water systems, dam, water reservoir, water tank, water treatment plant, pool, well, fountain, spring water, siphon, etc. Communication routes Street, road, crossing, highway, knots, footpath, firewall, railway (all types), tunnel, bridge, cable cars, funicular, chairlift, lift, etc. Vegetation Overstory, parterre, garden, golf course, etc. Total

17 The Data for the Analysis Density map (count of elements per tiles of 1 1 km 2 ) 17

18 Spatial distribution of the sample of 192 clusters 18

19 Application We count completeness errors (omissions and commissions of features) in the 192 analysis items (spatial tiles of 1 1 km 2 ). Each item is a sampled cluster in which both the MTA10v and the ground truth were exhaustively revised and cross compared, and the differences in presences and absences were exhaustively counted The dependent random variables : and : Omissions 19

20 Covariates: Application Province. Categorical variable with 8 different values, 7 dummy variables, Cadiz (CA), Cordoba (CO), Granada (GR), Huelva (HU), Jaén (JA), Málaga (MA) and Seville (SE), with Almeria as the reference value. Urban. Discriminates between Urban (1) or Rural (0). Littoral. Discriminates between a Littoral (1) or Interior (0). Features. Quantitative variable that takes into account the number of features per tile. We introduce this variable as ln(features), and its effect is an offset. Density. Quantitative variable that takes into account the density of features per hectare/kilometre. In our case, due to the field-surveyed tiles are not exactly of 1 1 km 2 is introduced as correction factor. 20

21 Omissions 18 Application

22 Application Omissions Model Y Province +Urban+ Littoral + Ln(Features)+Density Mean expression: ln ln. Variables Model loglikelihood AIC Number of parameters Poisson Regression Model 1148, Province + Urban+ Littoral + ln(features)+density Negative Binomial Regression Model Generalized Waring Regression Model

23 Coefficients for the GWRM covariates estimates Standard dev. z p value (Intercept) Cadiz (province) Cordoba (province) Granada (province) Huelva (province) Jaen (province) Malaga (province) Seville (province) Urban Littoral log(features) Density

24 ln ln

25 Provided that the remaining covariates are equal, the effect of each qualitative covariate may be analysed in terms of Odd ratios, which are defined as: / using the Odd ratios we can observe that: The mean of the error number for Littoral tiles is 94.72% higher than the mean of the error number of interior tiles: / Urban tiles have a mean of the number of errors 124,25% higher than Rustic tiles: / For continuous covariates and also provided that the remaining covariates remain equal: If the number of features in a tile increases by 1%, the mean increases by % If Density in a tile increases by 1, the mean decreases by 3.28% 25

26 Count of errors estimated for each tile 26

27 Probability map for the case of having 10 or more grown features per tile 27

28 Probability maps for the case of having 10% or more grown elements per tile 28

29 Evolution of variance and partition of the variance versus mean (absolute values 29

30 Evolution of variance and partition of the variance versus mean (% values) 30

31 Nomogram showing the relation between the count of errors, probability and % of population 31

32 Frequency of the number of commission errors. 32

33 Summary of models fits (Dependent variable: Commissions) Variables Model Poisson Regression Model loglikelihood AIC Number of parameters Province + Urban+ Littoral + ln(features)+density Negative Binomial Regression Model Generalized Waring Regression Model

34 Covariates Estimate Std. Error value Pr (Intercept) Cadiz (province) Cordoba (province) Granada (province) Huelva (province) Jaen (province) Malaga (province) Sevilla (province) Urban Littoral log(features) e 06 Density

Invited session on Thematic accuracy assessment by means of confusion matrices

Invited session on Thematic accuracy assessment by means of confusion matrices Malta, February 2018 2 nd INTERNATIONAL WORKSHOP ON SPATIAL DATA QUALITY Invited session on Thematic accuracy assessment by Francisco Javier Ariza-López 1, José Rodríguez-Avi 1,Virtudes Alba-Fernández

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

A homogeneity test for spatial point patterns

A homogeneity test for spatial point patterns A homogeneity test for spatial point patterns M.V. Alba-Fernández University of Jaén Paraje las lagunillas, s/n B3-053, 23071, Jaén, Spain mvalba@ujaen.es F. J. Ariza-López University of Jaén Paraje las

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Using International Standards to Control the Positional Quality of Spatial Data

Using International Standards to Control the Positional Quality of Spatial Data Using International Standards to Control the Positional Quality of Spatial Data F.J. Ariza-López, and J. Rodríguez-Avi Abstract A positional quality control method based on the application of the International

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Methodology for Analyzing Multi-Temporal Planimetric Changes of River Channels

Methodology for Analyzing Multi-Temporal Planimetric Changes of River Channels Methodology for Analyzing Multi-Temporal Planimetric Changes of River Channels Mozas-Calvache, A. T., Ureña-Cámara, M. A., Ariza-López, F. J. Department of Cartographic, Geodetic Engineering and Photogrammetry.

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Technical Memorandum #2 Future Conditions

Technical Memorandum #2 Future Conditions Technical Memorandum #2 Future Conditions To: Dan Farnsworth Transportation Planner Fargo-Moorhead Metro Council of Governments From: Rick Gunderson, PE Josh Hinds PE, PTOE Houston Engineering, Inc. Subject:

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Lecture-19: Modeling Count Data II

Lecture-19: Modeling Count Data II Lecture-19: Modeling Count Data II 1 In Today s Class Recap of Count data models Truncated count data models Zero-inflated models Panel count data models R-implementation 2 Count Data In many a phenomena

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Statistics 572 Semester Review

Statistics 572 Semester Review Statistics 572 Semester Review Final Exam Information: The final exam is Friday, May 16, 10:05-12:05, in Social Science 6104. The format will be 8 True/False and explains questions (3 pts. each/ 24 pts.

More information

Logistic Regression in R. by Kerry Machemer 12/04/2015

Logistic Regression in R. by Kerry Machemer 12/04/2015 Logistic Regression in R by Kerry Machemer 12/04/2015 Linear Regression {y i, x i1,, x ip } Linear Regression y i = dependent variable & x i = independent variable(s) y i = α + β 1 x i1 + + β p x ip +

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

GIS data classes used within the November 2013 Environmental Statement Engineering Maps

GIS data classes used within the November 2013 Environmental Statement Engineering Maps LWM-HS2-EN-DAT-000-000001 P01 22 January 2014 GIS data classes used within the November 2013 Environmental Statement Engineering Maps Document No.:LWM-HS2-EN-DAT-000-000001 Revision Author Checked by Approved

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Testing and Model Selection

Testing and Model Selection Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses

More information

Introduction to logistic regression

Introduction to logistic regression Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Kristina Pestaria Sinaga, Manuntun Hutahaean 2, Petrus Gea 3 1, 2, 3 University of Sumatera Utara,

More information

Prediction of Bike Rental using Model Reuse Strategy

Prediction of Bike Rental using Model Reuse Strategy Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu

More information

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017 Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

ZERO INFLATED POISSON REGRESSION

ZERO INFLATED POISSON REGRESSION STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

Sample solutions. Stat 8051 Homework 8

Sample solutions. Stat 8051 Homework 8 Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if

More information

Notes for week 4 (part 2)

Notes for week 4 (part 2) Notes for week 4 (part 2) Ben Bolker October 3, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Local Flood Hazards. Click here for Real-time River Information

Local Flood Hazards. Click here for Real-time River Information Local Flood Hazards Floods of the White River and Killbuck Creek are caused by runoff from general, and/or intense rainfall. Other areas of flooding concern are from the Boland Ditch and Pittsford Ditch.

More information

TRIM Workshop. Arco van Strien Wildlife statistics Statistics Netherlands (CBS)

TRIM Workshop. Arco van Strien Wildlife statistics Statistics Netherlands (CBS) TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS) What is TRIM? TRends and Indices for Monitoring data Computer program for the analysis of time series of count data with missing

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

A strategy for modelling count data which may have extra zeros

A strategy for modelling count data which may have extra zeros A strategy for modelling count data which may have extra zeros Alan Welsh Centre for Mathematics and its Applications Australian National University The Data Response is the number of Leadbeater s possum

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

ORF 245 Fundamentals of Engineering Statistics. Final Exam

ORF 245 Fundamentals of Engineering Statistics. Final Exam Princeton University Department of Operations Research and Financial Engineering ORF 245 Fundamentals of Engineering Statistics Final Exam May 22, 2008 7:30pm-10:30pm PLEASE DO NOT TURN THIS PAGE AND START

More information

PRELIMINARY ANALYSIS OF ACCURACY OF CONTOUR LINES USING POSITIONAL QUALITY CONTROL METHODOLOGIES FOR LINEAR ELEMENTS

PRELIMINARY ANALYSIS OF ACCURACY OF CONTOUR LINES USING POSITIONAL QUALITY CONTROL METHODOLOGIES FOR LINEAR ELEMENTS CO-051 PRELIMINARY ANALYSIS OF ACCURACY OF CONTOUR LINES USING POSITIONAL QUALITY CONTROL METHODOLOGIES FOR LINEAR ELEMENTS UREÑA M.A., MOZAS A.T., PÉREZ J.L. Universidad de Jaén, JAÉN, SPAIN ABSTRACT

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

KENTUCKY HAZARD MITIGATION PLAN RISK ASSESSMENT

KENTUCKY HAZARD MITIGATION PLAN RISK ASSESSMENT KENTUCKY HAZARD MITIGATION PLAN RISK ASSESSMENT Presentation Outline Development of the 2013 State Hazard Mitigation Plan Risk Assessment Determining risk assessment scale Census Data Aggregation Levels

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58 Generalized Linear Mixed-Effects Models Copyright c 2015 Dan Nettleton (Iowa State University) Statistics 510 1 / 58 Reconsideration of the Plant Fungus Example Consider again the experiment designed to

More information

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study 1.4 0.0-6 7 8 9 10 11 12 13 14 15 16 17 18 19 age Model 1: A simple broken stick model with knot at 14 fit with

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

1.5 Testing and Model Selection

1.5 Testing and Model Selection 1.5 Testing and Model Selection The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses (e.g. Likelihood Ratio statistic) and to choosing between specifications

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Digital Change Detection Using Remotely Sensed Data for Monitoring Green Space Destruction in Tabriz

Digital Change Detection Using Remotely Sensed Data for Monitoring Green Space Destruction in Tabriz Int. J. Environ. Res. 1 (1): 35-41, Winter 2007 ISSN:1735-6865 Graduate Faculty of Environment University of Tehran Digital Change Detection Using Remotely Sensed Data for Monitoring Green Space Destruction

More information

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Land Administration and Cadastre

Land Administration and Cadastre Geomatics play a major role in hydropower, land and water resources and other infrastructure projects. Lahmeyer International s (LI) worldwide projects require a wide range of approaches to the integration

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Urban form, resource intensity & renewable energy potential of cities

Urban form, resource intensity & renewable energy potential of cities Urban form, resource intensity & renewable energy potential of cities Juan J. SARRALDE 1 ; David QUINN 2 ; Daniel WIESMANN 3 1 Department of Architecture, University of Cambridge, 1-5 Scroope Terrace,

More information

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta International Journal of Science and Engineering Investigations vol. 7, issue 77, June 2018 ISSN: 2251-8843 Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 PG Student, 2 Assistant Professor, Department of Civil Engineering, Maulana Azad National

More information

GIS modelling of intermodal networks: a comparison of two methods

GIS modelling of intermodal networks: a comparison of two methods Urban Transport XXI 475 GIS modelling of intermodal networks: a comparison of two methods J. G. Moreno-Navarro 1, A. Medianero-Coza 1 & I. Hilal 2 1 Department of Geography, University of Seville, Spain

More information

Comparing CORINE Land Cover with a more detailed database in Arezzo (Italy).

Comparing CORINE Land Cover with a more detailed database in Arezzo (Italy). Comparing CORINE Land Cover with a more detailed database in Arezzo (Italy). Javier Gallego JRC, I-21020 Ispra (Varese) ITALY e-mail: javier.gallego@jrc.it Keywords: land cover, accuracy assessment, area

More information

Poisson Regression. The Training Data

Poisson Regression. The Training Data The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following

More information

D.N.D. Hettiarachchi (Hetti) Survey Department, Sri Lanka.

D.N.D. Hettiarachchi (Hetti) Survey Department, Sri Lanka. ADMINISTRATION OF GEOGRAPHICAL NAMES IN SRI LANKA D.N.D. Hettiarachchi (Hetti) Survey Department, Sri Lanka. hettiarachchidnd@gmail.com Country: Sri Lanka What is the official language(s)? Sinhala and

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

PRESENTATION OUTLINE. FIG Working Week INTRODUCTION 2.0 LITERATURE REVIEW 3.0 RESEARCH METHODOLOGY 4.0 RESULTS AND DISCUSSION

PRESENTATION OUTLINE. FIG Working Week INTRODUCTION 2.0 LITERATURE REVIEW 3.0 RESEARCH METHODOLOGY 4.0 RESULTS AND DISCUSSION Geospatial Techniques in Water Distribution Network Mapping and Modelling in Warri Port Complex (Nigeria) PRESENTED BY Henry Agbomemeh AUDU 1, Nigeria and Jacob Odeh EHIOROBO 2, Nigeria PRESENTATION OUTLINE

More information

Louisiana Transportation Engineering Conference. Monday, February 12, 2007

Louisiana Transportation Engineering Conference. Monday, February 12, 2007 Louisiana Transportation Engineering Conference Monday, February 12, 2007 Agenda Project Background Goal of EIS Why Use GIS? What is GIS? How used on this Project Other site selection tools I-69 Corridor

More information

The Flight of the Space Shuttle Challenger

The Flight of the Space Shuttle Challenger The Flight of the Space Shuttle Challenger On January 28, 1986, the space shuttle Challenger took off on the 25 th flight in NASA s space shuttle program. Less than 2 minutes into the flight, the spacecraft

More information

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests Brief Sketch of Solutions: Tutorial 3 3) unit root tests.5.4.4.3.3.2.2.1.1.. -.1 -.1 -.2 -.2 -.3 -.3 -.4 -.4 21 22 23 24 25 26 -.5 21 22 23 24 25 26.8.2.4. -.4 - -.8 - - -.12 21 22 23 24 25 26 -.2 21 22

More information

Automatic Geo-Referencing of Provisional Cadastral Maps: Towards a Survey-Accurate Cadastral Database for a National Spatial Data Infrastructure

Automatic Geo-Referencing of Provisional Cadastral Maps: Towards a Survey-Accurate Cadastral Database for a National Spatial Data Infrastructure Institute of Cartography and Geoinformatics Leibniz Universität Hannover Automatic Geo-Referencing of Provisional Cadastral Maps: Towards a Survey-Accurate Cadastral Database for a National Spatial Data

More information