A methodology to model the number of completeness errors using count data regression models
|
|
- Claude Gibson
- 5 years ago
- Views:
Transcription
1 Malta, January 2015 INTERNATIONAL WORKSHOP ON SPATIAL DATA AND MAP QUALITY A methodology to model the number of completeness errors using count data regression models José Rodríguez-Avi 1 & Francisco Javier Ariza-López 2 1 Dpto. Estadística e Investigación Operativa, jravi@ujaen.es 2 Dpto. Ingeniería Cartográfica, Geodésica y Fotogrametría, fjariza@ujaen.es Universidad de Jaén Paraje de las Lagunillas S/N E Jaén 1
2 Completeness errors: They are very important in a spatial data set. Completeness is related with the quality of the survey and with the oldness of the data (is affected by time). Following ISO we can distinguish: Omissions: An element (such as a bridge, a building, a cross, a power plant) is in the terrain but it does not appear in the map) and 2 7 Commissions: An element appears in the map but it does not exist in the terrain The greater the number of these errors is, the worse is the quality of the product. We also want to obtain additional information about the product, related to structural aspects, and to investigate the existence of relationships between errors and structural aspects. 7 Omission of a pool 2 Omission of a building 2
3 We are used to employ linear regression model when the response variable is continuous IN OUR CASE: The number of errors is not a continuous but a count data random variable and it should be modelled by discrete models: Poisson, Negative Binomial, Waring, and so on. Additionally, structural aspects of the spatial data set may be taken into account as exogenous co variables in order to explain the number of errors: Use of Count Data Regression Models Omissions 3
4 Count Data Regression Models We start with a dependent count data variable, (for instance, number of omission errors in a tile) We add some structural information for each tile (covariates) We define a cell (of tiles) as the set of tiles that have the same covariates values: All the tiles in the same cell are indistinguishable in terms of covariates Appling a count data regression model we propose a residual discrete distribution for each cell, in a way that parameters of the discrete distribution depend on covariates. We propose several models, and we choose the best model (in any sense) Once the best model is selected, we can: Determine which covariates are related to and how this relation is Obtain the parameters of the residual distribution for any cell and calculate any probability. 4
5 Methodological procedure: Count Data Regression Models Let,, be the set of covariates. The distribution of the response variable given, has a discrete distribution with q parameters and whose mean is a function of the s: where,,, are coefficients to be estimated. The rest of distribution s parameter are estimated independently of covariates. In consequence the number of parameters to be estimated are (number of covariates) + 1 (Intercept) + 1 5
6 Methodological procedure: Count Data Regression Models The coefficient shows the relation, if any, between the independent variable and the dependent variable. Using properties of the MLE method, we can make a test of hypothesis for each dividing the coefficient by its standard deviation (the Wald test). If the coefficient can be consider as equal to 0 we conclude that the corresponding variable has not relation with Y. In other case, both variables are related and the coefficient s sign indicates if such relation is positive (if X increases, Y increases) or negative (If X increases, Y decreases). 6
7 Count Data Regression Models Statistical models: Residual density Equidispersion: Poisson Regression model (PRM). exp Overdispersion: Negative Binomial Regression model (NBRM). Γ Γ! Overdispersion: Generalized Waring Regression model (GWRM). P Y X Γ ρ Γ k ρ Γ Γ k Γ ρ! Γ y Γ k y Γ k ρ y y! 7
8 Count Data Regression Models Statistical models: Estimation of parameters Equidispersion: Poisson Regression model (PRM). Overdispersion: Negative Binomial Regression model (NBRM). Overdispersion: Generalized Waring Regression model (GWRM). 1 8
9 Count Data Regression Models Statistical models: Residual variance Equidispersion: Poisson Regression model (PRM). Overdispersion: Negative Binomial Regression model (NBRM). 1 Overdispersion: Generalized Waring Regression model (GWRM)
10 Count Data Regression Models Models selection Akaike Information Criteria (AIC) 2 ln 2 which is based on the fitted log likelihood function by the ML method, with a penalization related to the number of parameters of the model,. The model with the lower AIC is preferred. In consequence, once fitted all the models, we select the one which has the lower AIC. The AIC has only a comparative value. It does not signify anything by itself 10
11 Count Data Regression Models Models selection To probe if a set of variables must be included or excluded in a model we propose employing the Likelihood Ratio Test, (LRT) in its asymptotic version. If we denote model 1 the one with less covariates and Model 2 the one with more covariates, the LRT is given by: 2 where and are the corresponding likelihood values for the estimated model 1 and 2. The value is asymptotically distributed as a with degrees of freedom (where f is the difference between the number of estimated parameters in both models) 11
12 Count Data Regression Models Software The three models can be fitted using the statistical package R: the glm function of the stats package for the PRM, the glm.nbin function of the MASS package for the NBRM the GWRM package for the GWRM. In all cases, these programs provide Parameters estimation and Wald Test, log likelihood AIC values. 12
13 The Data for the Analysis This study is based on an actual case where a published GDS was assessed by means of a field survey. The GDS is called MTA10v (from Mapa Topográfico de Andalucía E10k vectorial ) which is the official cartography of Andalusia (Spain). It is a map series produced between 1987 and 2007 by the Instituto Cartográfico de Andalucía (nowadays Instituto de Estadística y Cartografía de Andalucía). The MTA10v is a topographic vector database derived from a topographic paper map designed at the beginning of the eighties. The MTA10v is the base for the different thematic datasets and maps of the Regional Government of Andalusia. It has a complete territorial coverage on a semidetailed scale (1:10000) and is updated in a four year cycle on a sheets basis. It is composed of 2745 sheets obtained by manual photogrammetric restitution of flights at 1:25000 scale and updated with flights at 1:20000 scale. 13
14 The Data for the Analysis Because of the large area (87000 km 2 ) covered by the MTA10v the region is divided into four quadrants (ICA, 2002b). So through the cyclic updating strategy each year yields an updating of a fourth of the region. The MTA10v is in the 30N UTM projected coordinate system, and referenced to the ED50 datum. The declared positional accuracy of the MTA10v is RMSE 3m (Corral and García, 2000). An independent accuracy study (Ariza López 2005) based on a random sample of 930 points surveyed by means of a differential GPS fast static survey informs us that positional accuracy is of 10.65m at 95% confidence level. 14
15 The Data for the Analysis General view of the content of the MTA 15
16 The Data for the Analysis Count of elements per layer in the MTA10v series Layer Count of total elements in the spatial data base (n) Edifications Buildings, storage (except water), shed, unique building, antenna, spotlight, transformer building), residential blocks, railway stations, port, airport, airfield, heliport, electrical substation, pumping station, fence, cave, etc. Hydrography River, stream, runway, tide, reservoir, sea, lake, etc. Energy Infrastructure Electric turret, transformer, power line, pipeline, etc. Water Infrastructure Canal, ditch, water supply, water systems, dam, water reservoir, water tank, water treatment plant, pool, well, fountain, spring water, siphon, etc. Communication routes Street, road, crossing, highway, knots, footpath, firewall, railway (all types), tunnel, bridge, cable cars, funicular, chairlift, lift, etc. Vegetation Overstory, parterre, garden, golf course, etc. Total
17 The Data for the Analysis Density map (count of elements per tiles of 1 1 km 2 ) 17
18 Spatial distribution of the sample of 192 clusters 18
19 Application We count completeness errors (omissions and commissions of features) in the 192 analysis items (spatial tiles of 1 1 km 2 ). Each item is a sampled cluster in which both the MTA10v and the ground truth were exhaustively revised and cross compared, and the differences in presences and absences were exhaustively counted The dependent random variables : and : Omissions 19
20 Covariates: Application Province. Categorical variable with 8 different values, 7 dummy variables, Cadiz (CA), Cordoba (CO), Granada (GR), Huelva (HU), Jaén (JA), Málaga (MA) and Seville (SE), with Almeria as the reference value. Urban. Discriminates between Urban (1) or Rural (0). Littoral. Discriminates between a Littoral (1) or Interior (0). Features. Quantitative variable that takes into account the number of features per tile. We introduce this variable as ln(features), and its effect is an offset. Density. Quantitative variable that takes into account the density of features per hectare/kilometre. In our case, due to the field-surveyed tiles are not exactly of 1 1 km 2 is introduced as correction factor. 20
21 Omissions 18 Application
22 Application Omissions Model Y Province +Urban+ Littoral + Ln(Features)+Density Mean expression: ln ln. Variables Model loglikelihood AIC Number of parameters Poisson Regression Model 1148, Province + Urban+ Littoral + ln(features)+density Negative Binomial Regression Model Generalized Waring Regression Model
23 Coefficients for the GWRM covariates estimates Standard dev. z p value (Intercept) Cadiz (province) Cordoba (province) Granada (province) Huelva (province) Jaen (province) Malaga (province) Seville (province) Urban Littoral log(features) Density
24 ln ln
25 Provided that the remaining covariates are equal, the effect of each qualitative covariate may be analysed in terms of Odd ratios, which are defined as: / using the Odd ratios we can observe that: The mean of the error number for Littoral tiles is 94.72% higher than the mean of the error number of interior tiles: / Urban tiles have a mean of the number of errors 124,25% higher than Rustic tiles: / For continuous covariates and also provided that the remaining covariates remain equal: If the number of features in a tile increases by 1%, the mean increases by % If Density in a tile increases by 1, the mean decreases by 3.28% 25
26 Count of errors estimated for each tile 26
27 Probability map for the case of having 10 or more grown features per tile 27
28 Probability maps for the case of having 10% or more grown elements per tile 28
29 Evolution of variance and partition of the variance versus mean (absolute values 29
30 Evolution of variance and partition of the variance versus mean (% values) 30
31 Nomogram showing the relation between the count of errors, probability and % of population 31
32 Frequency of the number of commission errors. 32
33 Summary of models fits (Dependent variable: Commissions) Variables Model Poisson Regression Model loglikelihood AIC Number of parameters Province + Urban+ Littoral + ln(features)+density Negative Binomial Regression Model Generalized Waring Regression Model
34 Covariates Estimate Std. Error value Pr (Intercept) Cadiz (province) Cordoba (province) Granada (province) Huelva (province) Jaen (province) Malaga (province) Sevilla (province) Urban Littoral log(features) e 06 Density
Invited session on Thematic accuracy assessment by means of confusion matrices
Malta, February 2018 2 nd INTERNATIONAL WORKSHOP ON SPATIAL DATA QUALITY Invited session on Thematic accuracy assessment by Francisco Javier Ariza-López 1, José Rodríguez-Avi 1,Virtudes Alba-Fernández
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationA homogeneity test for spatial point patterns
A homogeneity test for spatial point patterns M.V. Alba-Fernández University of Jaén Paraje las lagunillas, s/n B3-053, 23071, Jaén, Spain mvalba@ujaen.es F. J. Ariza-López University of Jaén Paraje las
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationUsing International Standards to Control the Positional Quality of Spatial Data
Using International Standards to Control the Positional Quality of Spatial Data F.J. Ariza-López, and J. Rodríguez-Avi Abstract A positional quality control method based on the application of the International
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationMethodology for Analyzing Multi-Temporal Planimetric Changes of River Channels
Methodology for Analyzing Multi-Temporal Planimetric Changes of River Channels Mozas-Calvache, A. T., Ureña-Cámara, M. A., Ariza-López, F. J. Department of Cartographic, Geodetic Engineering and Photogrammetry.
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationTechnical Memorandum #2 Future Conditions
Technical Memorandum #2 Future Conditions To: Dan Farnsworth Transportation Planner Fargo-Moorhead Metro Council of Governments From: Rick Gunderson, PE Josh Hinds PE, PTOE Houston Engineering, Inc. Subject:
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationLecture-19: Modeling Count Data II
Lecture-19: Modeling Count Data II 1 In Today s Class Recap of Count data models Truncated count data models Zero-inflated models Panel count data models R-implementation 2 Count Data In many a phenomena
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationStatistics 572 Semester Review
Statistics 572 Semester Review Final Exam Information: The final exam is Friday, May 16, 10:05-12:05, in Social Science 6104. The format will be 8 True/False and explains questions (3 pts. each/ 24 pts.
More informationLogistic Regression in R. by Kerry Machemer 12/04/2015
Logistic Regression in R by Kerry Machemer 12/04/2015 Linear Regression {y i, x i1,, x ip } Linear Regression y i = dependent variable & x i = independent variable(s) y i = α + β 1 x i1 + + β p x ip +
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationGIS data classes used within the November 2013 Environmental Statement Engineering Maps
LWM-HS2-EN-DAT-000-000001 P01 22 January 2014 GIS data classes used within the November 2013 Environmental Statement Engineering Maps Document No.:LWM-HS2-EN-DAT-000-000001 Revision Author Checked by Approved
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationTesting and Model Selection
Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationSpatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach
Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Kristina Pestaria Sinaga, Manuntun Hutahaean 2, Petrus Gea 3 1, 2, 3 University of Sumatera Utara,
More informationPrediction of Bike Rental using Model Reuse Strategy
Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationPoisson Regression. Gelman & Hill Chapter 6. February 6, 2017
Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationSTAT 510 Final Exam Spring 2015
STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and
More informationZERO INFLATED POISSON REGRESSION
STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationBinary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017
Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which
More informationSample solutions. Stat 8051 Homework 8
Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if
More informationNotes for week 4 (part 2)
Notes for week 4 (part 2) Ben Bolker October 3, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was
More informationLocal Flood Hazards. Click here for Real-time River Information
Local Flood Hazards Floods of the White River and Killbuck Creek are caused by runoff from general, and/or intense rainfall. Other areas of flooding concern are from the Boland Ditch and Pittsford Ditch.
More informationTRIM Workshop. Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS) What is TRIM? TRends and Indices for Monitoring data Computer program for the analysis of time series of count data with missing
More informationLattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)
Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationA strategy for modelling count data which may have extra zeros
A strategy for modelling count data which may have extra zeros Alan Welsh Centre for Mathematics and its Applications Australian National University The Data Response is the number of Leadbeater s possum
More informationA Practitioner s Guide to Generalized Linear Models
A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationORF 245 Fundamentals of Engineering Statistics. Final Exam
Princeton University Department of Operations Research and Financial Engineering ORF 245 Fundamentals of Engineering Statistics Final Exam May 22, 2008 7:30pm-10:30pm PLEASE DO NOT TURN THIS PAGE AND START
More informationPRELIMINARY ANALYSIS OF ACCURACY OF CONTOUR LINES USING POSITIONAL QUALITY CONTROL METHODOLOGIES FOR LINEAR ELEMENTS
CO-051 PRELIMINARY ANALYSIS OF ACCURACY OF CONTOUR LINES USING POSITIONAL QUALITY CONTROL METHODOLOGIES FOR LINEAR ELEMENTS UREÑA M.A., MOZAS A.T., PÉREZ J.L. Universidad de Jaén, JAÉN, SPAIN ABSTRACT
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationKENTUCKY HAZARD MITIGATION PLAN RISK ASSESSMENT
KENTUCKY HAZARD MITIGATION PLAN RISK ASSESSMENT Presentation Outline Development of the 2013 State Hazard Mitigation Plan Risk Assessment Determining risk assessment scale Census Data Aggregation Levels
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationGeneralized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58
Generalized Linear Mixed-Effects Models Copyright c 2015 Dan Nettleton (Iowa State University) Statistics 510 1 / 58 Reconsideration of the Plant Fungus Example Consider again the experiment designed to
More informationSubject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study
Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study 1.4 0.0-6 7 8 9 10 11 12 13 14 15 16 17 18 19 age Model 1: A simple broken stick model with knot at 14 fit with
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More information1.5 Testing and Model Selection
1.5 Testing and Model Selection The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses (e.g. Likelihood Ratio statistic) and to choosing between specifications
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationDigital Change Detection Using Remotely Sensed Data for Monitoring Green Space Destruction in Tabriz
Int. J. Environ. Res. 1 (1): 35-41, Winter 2007 ISSN:1735-6865 Graduate Faculty of Environment University of Tehran Digital Change Detection Using Remotely Sensed Data for Monitoring Green Space Destruction
More informationR Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models
R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationLand Administration and Cadastre
Geomatics play a major role in hydropower, land and water resources and other infrastructure projects. Lahmeyer International s (LI) worldwide projects require a wide range of approaches to the integration
More informationECON 5350 Class Notes Functional Form and Structural Change
ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this
More informationUrban form, resource intensity & renewable energy potential of cities
Urban form, resource intensity & renewable energy potential of cities Juan J. SARRALDE 1 ; David QUINN 2 ; Daniel WIESMANN 3 1 Department of Architecture, University of Cambridge, 1-5 Scroope Terrace,
More informationApplication of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta
International Journal of Science and Engineering Investigations vol. 7, issue 77, June 2018 ISSN: 2251-8843 Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,
More informationDEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1
DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 PG Student, 2 Assistant Professor, Department of Civil Engineering, Maulana Azad National
More informationGIS modelling of intermodal networks: a comparison of two methods
Urban Transport XXI 475 GIS modelling of intermodal networks: a comparison of two methods J. G. Moreno-Navarro 1, A. Medianero-Coza 1 & I. Hilal 2 1 Department of Geography, University of Seville, Spain
More informationComparing CORINE Land Cover with a more detailed database in Arezzo (Italy).
Comparing CORINE Land Cover with a more detailed database in Arezzo (Italy). Javier Gallego JRC, I-21020 Ispra (Varese) ITALY e-mail: javier.gallego@jrc.it Keywords: land cover, accuracy assessment, area
More informationPoisson Regression. The Training Data
The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following
More informationD.N.D. Hettiarachchi (Hetti) Survey Department, Sri Lanka.
ADMINISTRATION OF GEOGRAPHICAL NAMES IN SRI LANKA D.N.D. Hettiarachchi (Hetti) Survey Department, Sri Lanka. hettiarachchidnd@gmail.com Country: Sri Lanka What is the official language(s)? Sinhala and
More informationLogistic Regression. Continued Psy 524 Ainsworth
Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationPRESENTATION OUTLINE. FIG Working Week INTRODUCTION 2.0 LITERATURE REVIEW 3.0 RESEARCH METHODOLOGY 4.0 RESULTS AND DISCUSSION
Geospatial Techniques in Water Distribution Network Mapping and Modelling in Warri Port Complex (Nigeria) PRESENTED BY Henry Agbomemeh AUDU 1, Nigeria and Jacob Odeh EHIOROBO 2, Nigeria PRESENTATION OUTLINE
More informationLouisiana Transportation Engineering Conference. Monday, February 12, 2007
Louisiana Transportation Engineering Conference Monday, February 12, 2007 Agenda Project Background Goal of EIS Why Use GIS? What is GIS? How used on this Project Other site selection tools I-69 Corridor
More informationThe Flight of the Space Shuttle Challenger
The Flight of the Space Shuttle Challenger On January 28, 1986, the space shuttle Challenger took off on the 25 th flight in NASA s space shuttle program. Less than 2 minutes into the flight, the spacecraft
More informationBrief Sketch of Solutions: Tutorial 3. 3) unit root tests
Brief Sketch of Solutions: Tutorial 3 3) unit root tests.5.4.4.3.3.2.2.1.1.. -.1 -.1 -.2 -.2 -.3 -.3 -.4 -.4 21 22 23 24 25 26 -.5 21 22 23 24 25 26.8.2.4. -.4 - -.8 - - -.12 21 22 23 24 25 26 -.2 21 22
More informationAutomatic Geo-Referencing of Provisional Cadastral Maps: Towards a Survey-Accurate Cadastral Database for a National Spatial Data Infrastructure
Institute of Cartography and Geoinformatics Leibniz Universität Hannover Automatic Geo-Referencing of Provisional Cadastral Maps: Towards a Survey-Accurate Cadastral Database for a National Spatial Data
More information