Poisson Data. Handout #4

Similar documents
Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Cohen s s Kappa and Log-linear Models

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS. The SURVEYFREQ Procedure

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Wrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

The GENMOD Procedure (Book Excerpt)

Using PROC GENMOD to Analyse Ratio to Placebo in Change of Dactylitis. Irmgard Hollweck / Meike Best 13.OCT.2013

Regression Models for Risks(Proportions) and Rates. Proportions. E.g. [Changes in] Sex Ratio: Canadian Births in last 60 years

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Introduction to SAS proc mixed

SAS/STAT 14.2 User s Guide. The GENMOD Procedure

Where have all the puffins gone 1. An Analysis of Atlantic Puffin (Fratercula arctica) Banding Returns in Newfoundland

Introduction to SAS proc mixed

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Simple logistic regression

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

Appendix: Computer Programs for Logistic Regression

Testing Indirect Effects for Lower Level Mediation Models in SAS PROC MIXED

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Generalized linear models

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Inference for Binomial Parameters

Chapter 5: Logistic Regression-I

Section Poisson Regression

1) Answer the following questions with one or two short sentences.

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

A CASE STUDY IN HANDLING OVER-DISPERSION IN NEMATODE COUNT DATA SCOTT EDWIN DOUGLAS KREIDER. B.S., The College of William & Mary, 2008 A THESIS

ZERO INFLATED POISSON REGRESSION

Mixed Models Lecture Notes By Dr. Hanford page 199 More Statistics& SAS Tutorial at

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

SAS Software to Fit the Generalized Linear Model

Answer to exercise: Blood pressure lowering drugs

4.8 Alternate Analysis as a Oneway ANOVA

Testing Independence

Using PROC GENMOD to Analyse Ratio to Placebo in Change of Dactylitis

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Count data page 1. Count data. 1. Estimating, testing proportions

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS

Generalized Linear Models

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

Week 7.1--IES 612-STA STA doc

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

STA6938-Logistic Regression Model

Correlation and the Analysis of Variance Approach to Simple Linear Regression

STAT 7030: Categorical Data Analysis

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

A course in statistical modelling. session 06b: Modelling count data

Laboratory Topics 4 & 5

COMPLEMENTARY LOG-LOG MODEL

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Chapter 11: Robust & Quantile regression

Multinomial Logistic Regression Models

ssh tap sas913, sas

Case-control studies C&H 16

Sections 4.1, 4.2, 4.3

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes

Some comments on Partitioning

Topic 23: Diagnostics and Remedies

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

Topic 28: Unequal Replication in Two-Way ANOVA

Survival Regression Models

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Analysis of binary repeated measures data with R

STAT 501 EXAM I NAME Spring 1999

General Linear Model (Chapter 4)

Chapter 4: Generalized Linear Models-I

STAT 5200 Handout #26. Generalized Linear Mixed Models

BIOS 625 Fall 2015 Homework Set 3 Solutions

BMI 541/699 Lecture 22

Model selection and comparison

a = 4 levels of treatment A = Poison b = 3 levels of treatment B = Pretreatment n = 4 replicates for each treatment combination

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

PARTICLE COUNTS MISLEAD SCIENTISTS: A JMP AND SAS INTEGRATION CASE STUDY INVESTIGATING MODELING

Chapter 4: Generalized Linear Models-II

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

EVALUATION OF WILDLIFE REFLECTORS IN REDUCING VEHICLE-DEER COLLISIONS ON INDIANA INTERSTATE 80/90

Generalized linear models

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

Repeated Measures Part 2: Cartoon data

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

STAT 525 Fall Final exam. Tuesday December 14, 2010

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Transcription:

Poisson Data The other response variable of interest records the number of blue spots observed after incubation. This type of data, i.e. count data, is often skewed showing numerous small values with occasional large ones. Note that these counts differ from the binomial case above. There, the count of successes was bounded by the number of experimental units: 3 reps x 8 explants = 24. In this example, the number of spots observed is unbounded or open ended. The maximum number possible is not known. A probability distribution which can describe this type of data is the Poisson. The SAS code to analyze Poisson data is similar to that of the Binomial: proc genmod; class treat; model spots = treat/dist=poisson link = log type3; lsmeans treat/diff; contrast 'Osmotic Effect' treat -4 1 1 1 1; by cultivar; with the obvious exceptions of the DIST and LINK options. In the 1/6

case of the Poisson distribution, the log LINK is most appropriate (log here designates natural log, ln). However, as was seen with the binomial, a count of 0 can cause errors. Therefore, a data step can be used prior to the analysis to adjust these values before PROC GENMOD is called: data osmotic; set osmotic; if spots = 0 then spots =.5; Again, a small value is added to those observations having a count of 0 spots. The output for cultivar MUM follows. ------------------------------------------- cult=mum ------------------------------------------- Class Level Information Class Levels Values treat 5 0 0.2 0.4 0.6 1 Source DF Square Pr > ChiSq treat 5 6895.92 <.0001 2/6

Least Squares Means Standard Effect treat Estimate Error DF Square Pr > ChiSq treat 0 1.3652 0.1031 1 175.20 <.0001 treat 0.2 3.2974 0.0393 1 7056.4 <.0001 treat 0.4 2.7267 0.0533 1 2613.4 <.0001 treat 0.6 3.1481 0.0423 1 5540.0 <.0001 treat 1 0.9328 0.1280 1 53.08 <.0001 Differences of Least Squares Means Standard Effect treat _treat Estimate Error DF Square Pr > ChiSq treat 0 0.2-1.9321 0.1104 1 306.52 <.0001 treat 0 0.4-1.3615 0.1161 1 137.48 <.0001 treat 0 0.6-1.7829 0.1115 1 255.78 <.0001 treat 0 1 0.4324 0.1644 1 6.92 0.0085 treat 0.2 0.4 0.5707 0.0662 1 74.25 <.0001 treat 0.2 0.6 0.1493 0.0577 1 6.69 0.0097 treat 0.2 1 2.3646 0.1339 1 311.76 <.0001 treat 0.4 0.6-0.4214 0.0681 1 38.32 <.0001 treat 0.4 1 1.7939 0.1387 1 167.27 <.0001 treat 0.6 1 2.2153 0.1348 1 269.90 <.0001 3/6

Contrast Results Contrast DF Square Pr > ChiSq Type Osmotic Effect 1 144.89 <.0001 LR The treatment level information is given followed by the overall test of model effects. Here, TREAT is highly significant with a p-value less than 0.0001. The means are given in the next table. As was the case above, these reflect the transformed (log) values. All the treatment means are significantly different from zero indicating that all treatments had more than one spot. Pair-wise comparison of the means is given next. In this example, all the pair-wise comparisons are significant, i.e. all treatments are apparently different from one another. Finally, the contrast for any osmotic effect is also highly significant with a chi-square value of 144.89. Given this significance, it might be of interest to actually estimate the mean number of spots for each component of the contrast, i.e. a mean for no osmotic pressure (equilibrium) and a mean where osmotic pressure was applied. This can be accomplished with the ESTIMATE statement. However, because SAS implicitly includes an overall intercept in the model, the actual ESTIMATE statement can be 4/6

rather tricky to formulate. One method of avoiding this is to rerun the model without the intercept term: proc genmod; class treat; model spots = treat/dist=poisson link = log type3 noint; lsmeans treat/diff; contrast 'Osmotic Effect' treat -4 1 1 1 1; estimate 'Equilibrium' treat 1 0 0 0 0; estimate 'Osmotic Applied' treat 0.25.25.25.25; by cult; The intercept is omitted with the NOINT option in the MODEL statement. Two ESTIMATE statements have been added; one for the osmotic treatment of 0 and another for the average of the remaining treatments. Note that, like CONTRAST, each statement has a unique label given in quotes. The resulting output is: Contrast Estimate Results Standard Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq Equilibrium 1.3652 0.1031 0.05 1.1631 1.5674 175.20 <.0001 5/6

Osmotic Applied 2.5263 0.0376 0.05 2.4526 2.5999 4524.6 <.0001 Each ESTIMATE statement produces the value of its estimate, the associated standard error, and confidence limits. Also given are test results for the hypothesis that the estimate is equal to 0. (Notice that the Equilibrium estimate is the same as that for TREAT = 0 above). The results show that the application of osmotic pressure increased the number of spots over the equilibrium condition. Since these are still the log transformed values, the actual count estimates can be determined by utilizing the exponential (anti-log) function: and Equilibrium number of spots = e 1.3652 = 3.92 Avg. number of spots with treatment = e 2.5263 = 12.51. This shows the osmotic treatments on average gave more than a 3 fold increase in the number of spots. The associated standard errors and confidence limits may also be back-transformed in a similar manner. 6/6