Munitions Example: Negative Binomial to Predict Number of Accidents

Similar documents
X = S 2 = 1 n. X i. i=1

Horse Kick Example: Chi-Squared Test for Goodness of Fit with Unknown Parameters

Spectrophotometer Example: One-Factor Random Effects ANOVA

Problem 4.4 and 6.16 of Devore discuss the Rayleigh Distribution, whose pdf is x, x > 0; x x 0. x 2 f(x; θ) dx x=0. pe p dp.

f Simulation Example: Simulating p-values of Two Sample Variance Test. Name: Example June 26, 2011 Math Treibergs

Gas Transport Example: Confidence Intervals in the Wilcoxon Rank-Sum Test

Pumpkin Example: Flaws in Diagnostics: Correcting Models

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

Carapace Example: Confidence Intervals in the Wilcoxon Sign-rank Test

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

Natural language support but running in an English locale

Straw Example: Randomized Block ANOVA

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

Math Treibergs. Turbine Blade Example: 2 6 Factorial Name: Example Experiment with 1 4 Replication February 22, 2016

Math Treibergs. Peanut Oil Data: 2 5 Factorial design with 1/2 Replication. Name: Example April 22, Data File Used in this Analysis:

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

Beam Example: Identifying Influential Observations using the Hat Matrix

(1) (AE) (BE) + (AB) + (CE) (AC) (BE) + (ABCE) +(DE) (AD) (BD) + (ABDE) + (CD) (ACDE) (BCDE) + (ABCD)

Continuous random variables

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 9: June 6, Abstract. Regression and R.

Lecture 45 Sections Wed, Nov 19, 2008

A Practitioner s Guide to Generalized Linear Models

Math 70 Homework 8 Michael Downs. and, given n iid observations, has log-likelihood function: k i n (2) = 1 n

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1

A Few Special Distributions and Their Properties

Log-linear Models for Contingency Tables

Chapter 2. Discrete Distributions

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Statistical Intervals (One sample) (Chs )

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Class 26: review for final exam 18.05, Spring 2014

1 Uniform Distribution. 2 Gamma Distribution. 3 Inverse Gamma Distribution. 4 Multivariate Normal Distribution. 5 Multivariate Student-t Distribution

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

15 Discrete Distributions

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

2.3 Analysis of Categorical Data

Lecture 2: Discrete Probability Distributions

Section 4.6 Simple Linear Regression

Will Murray s Probability, XXXII. Moment-Generating Functions 1. We want to study functions of them:

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Random variables, distributions and limit theorems

Statistical Methods in Particle Physics

Random Variable And Probability Distribution. Is defined as a real valued function defined on the sample space S. We denote it as X, Y, Z,

Testing Independence

Probability Distributions Columns (a) through (d)

Contents 1. Contents

Common Discrete Distributions

n px p x (1 p) n x. p x n(n 1)... (n x + 1) x!

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Chapter 4. Probability-The Study of Randomness

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Statistical Computing Session 4: Random Simulation

Statistics 3858 : Maximum Likelihood Estimators

(U 338-E) 2018 Nuclear Decommissioning Cost Triennial Proceeding A Workpapers

Week 7.1--IES 612-STA STA doc

Continuous random variables

STA510. Solutions to exercises. 3, fall 2012

Lecture 3. Biostatistics in Veterinary Science. Feb 2, Jung-Jin Lee Drexel University. Biostatistics in Veterinary Science Lecture 3

Introduction and Overview STAT 421, SP Course Instructor

Common ontinuous random variables

Things to remember when learning probability distributions:

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

1. (Regular) Exponential Family

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Plotting data is one method for selecting a probability distribution. The following

STATISTICS SYLLABUS UNIT I

, find P(X = 2 or 3) et) 5. )px (1 p) n x x = 0, 1, 2,..., n. 0 elsewhere = 40

375 PU M Sc Statistics

A NOTE ON BAYESIAN ESTIMATION FOR THE NEGATIVE-BINOMIAL MODEL. Y. L. Lio

Math 50: Final. 1. [13 points] It was found that 35 out of 300 famous people have the star sign Sagittarius.

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

Definition of Statistics Statistics Branches of Statistics Descriptive statistics Inferential statistics

3. Probability and Statistics

The Chi-Square Distributions

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

EXPLICIT EXPRESSIONS FOR MOMENTS OF χ 2 ORDER STATISTICS

150. a. Clear fractions in the following equation and write in. b. For the equation you wrote in part (a), compute. The Quadratic Formula

Continuous Probability Distributions. Uniform Distribution

Continuous Distributions

Statistics and data analyses

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

AP Statistics Cumulative AP Exam Study Guide

Model selection and comparison

TUTORIAL 8 SOLUTIONS #

2 The Polygonal Distribution

Count data and the Poisson distribution

In Chapter 17, I delve into probability in a semiformal way, and introduce distributions

Review of One-way Tables and SAS

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Transcription:

Math 37 1. Treibergs Munitions Eample: Negative Binomial to Predict Number of Accidents Name: Eample July 17, 211 This problem comes from Bulmer, Principles of Statistics, Dover, 1979, which is a republication of the 1967 second edition published by Oliver & Boyd. The Poisson Random Variable describes the number of occurances of rare events in a period of time, but it sometimes fails to account best for an observed variable. The data on point comes from a 192 study of Greenwood & Yule. They counted the number of accidents in five weeks that occurred to a group of women working on high eplosives shells in a munitions factory during the 1914 1918 War. One could regard an accident as a rare event and as such would be governed by Poisson s law. However Poisson s law does not fit the data very well; there are too many women in the etreme groups. Indeed the variance is.69 but the mean is.47, which should be the same for a Poisson variable. Let X 1, X 2,..., X n are observations of the number of accidents. If these were a random sample from Pois(λ), then since E(X) V(X) λ, an estimator for λ and is the sample mean. X S 2 1 n n X i. The sample mean is the maimum likelihood estimator ˆλ. We use this to compute the Poisson approimation. Greenwood & Yule eplained the inaccuracy by supposing that the women were not homogeneous, but differed in their accident proneness. This would spread out the distribution of accident numbers. Let us suppose that the accident rate is itself a random variable Λ. Then for any individual, the number of accidents follows a Poisson distribution with rate constant Λ. We may say that the conditional distribution of the number of accidents X for given Λ λ is i1 p( λ) e λ λ, for, 1, 2,...! Also, suppose that there is a constant a such that the rate satisfies a χ 2 -distribution with f degrees of freedom, where f is not necessarily an integer, The pdf of Y is so that the pdf for Λ is f Y (y) f Λ (λ) Y aλ χ 2 (f). 1 2 f/2 Γ(f/2) yf/2 1 e y/2, a 2 f/2 Γ(f/2) (aλ)f/2 1 e aλ/2. The population distribution of X is thus the marginal pmf for, 1, 2,..., p X () f(, λ) dλ p( λ) f Λ (λ) dλ e λ λ a! 2 f/2 Γ(f/2) (aλ)f/2 1 e aλ/2 dλ a f/2 2 f/2 Γ(f/2)! λ f/2+ 1 e (a/2+1)λ dλ. 1

Using the gamma distribution whose pdf for y is f(y; α, β) 1 β α Γ(α) yα 1 e y/β, with α f/2 + and 1/β a/2 + 1, and the fact that its total probability is one, p X () af/2 Γ(f/2 + ) (a/2 + 2 f/2 1) f/2 Γ(f/2)! ( ) f/2 ( ) Γ(f/2 + ) a 2 Γ(f/2)! a + 2 a + 2 Γ(r + ) p r (1 p) Γ(r)! nb(; r, p) is negative binomial with parameters r f/2 and p a/(a + 2). In case r is an integer, Γ(r) (r 1)! and the coefficient equals the usual ( ) +r 1 r 1. Let s compute the mean and variance of a negative binomial variable. Observe that since Γ(z + 1) zγ(z), we have for, 1, 2,..., Γ(r + ) (r + 1) (r + 2) r ( ) r + 1 Γ(r)! 1 1 ( 1) n ( 1) n ( r) ( r 1) r + 1 1 1 Remembering the negative binomial formula from Calculus II, we see that, first of all, it is a pmf p X () p r ( q) p r (1 q) r 1. Differentiating with respect to q gives the first moment so that r(1 q) r 1 d d q (1 q) r Differentiating again, E(X) ( q) 1 1 q p r ( q) qr(1 q) r 1 p r rq p. r(1 q) r 1 +r(r+1)q(1 q) r 2 d d q rq(1 q) r 1 so that E(X 2 ) It follows that ( q). 2 ( q) 1 1 q 2 ( q). 2 p r ( q) qr(1 q) r 1 p r + r(r + 1)q 2 (1 q) r 2 p r qr r(r + 1)q2 + p p 2 V(X) E(X 2 ) E(X) 2 qr p r(r + 1)q2 + p 2 r2 q 2 qr [p + q] p 2 p 2 qr p 2. 2

We have shown that if X nb(r, p), then E(X) r(1 p) r(1 p) ; V(X) p p 2. We fit the best negative binomial distribution using the method of moments. In other words, we equate the theoretical first and second moments to the empirical ones, which is equivalent to equating the mean and variance of the theoretical population to the sample mean and variance, and solve for the parameters. Equating to X and s 2, we find p X s 2 ; r X 2 s 2 X. Using these parameter values in the negative binomial seems to fit the data better. R Session: R version 2.1.1 (29-12-14) Copyright (C) 29 The R Foundation for Statistical Computing ISBN 3-951-7- R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type license() or licence() for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type contributors() for more information and citation() on how to cite R or R packages in publications. Type demo() for some demos, help() for on-line help, or help.start() for an HTML browser interface to help. Type q() to quit R. [R.app GUI 1.31 (5538) powerpc-apple-darwin8.11.1] [Workspace restored from /Users/andrejstreibergs/.RData] ###### ENTER ACCIDENT DATA FOR WOMEN AT THE MUNITIONS PLANT ################# # acc records the freq for each number of accidents acc <- scan() 1: 447 132 42 21 3 2 8: Read 7 items # noacc records the corresponding number of accidents ("6" means 6 or more) noacc <- :6 # The total number of women is n n <- sum(acc); n [1] 647 3

# The average number of accidents per woman bar <- sum(acc*noacc)/n; bar [1].4652241 # The variance of the number of accidents s2 <- (sum(noacc^2*acc)-bar^2*n)/(n-1); s2 [1].69192 ########## METHOD OF MOMENTS ESTIMATOR FOR PARAMETERS ####################### pp <- bar/s2 rr <- bar^2/(s2-bar) # Set up vector of probabilities for neg. binomial. probnb <- dnbinom(:6,rr,pp) replace probnb[7] by total of si or more # lower.tailf means probability is given for X5 probnb[7] <- pnbinom(5,rr,pp,lower.tailf) n [1] 647 Distribute the total of 647 women among their number of accidents. # enb is epected no accidents using neg. binom. enb <- probnb*n ########## COMPARE TO BEST POISSON ESTIMATE ################################# # Set up vector of probabilities for Poisson. probpois <- dpois(:6,bar) probpois[7] <- ppois(5,bar,lower.tailf) Distribute the total of 647 women among their number of accidents. # epois is epected no accidents using Poisson. epois <- probpois*n ################## BUILD TABLE TO COMPARE OBS. TO EXPECTED ################## m <- matri( c(acc, sum(acc), round(epois, 3), sum(round(epois, 3)), + round(enb, 3), sum(round(enb, 3))), ncol 3) colnames(m) <- c("observed"," Poisson"," Neg. Binom.") rownames(m) <- c(:5, "Over 5", "Sum") m Observed Poisson Neg. Binom. 447 46.312 442.97 1 132 189.26 138.546 2 42 43.97 44.364 3 21 6.819 14.315 4 3.793 4.637 5 2.74 1.55 Over 5.6.726 Sum 647 647. 647. 4

############ PLOT COMPARATIVE HISTOGRAMS #################################### clr<-rainbow(15) plot(c(:6,:6,:6),c(acc, epois,enb), type"n", + main "Distribution of Number of Accidents", ylab "Number of Women", + lab "Number of Accidents") points((:6)-.15,acc, type "h", col clr[1], lwd 1) points((:6),epois, type "h", col clr[5], lwd 1) points((:6)+.15,enb, type "h", col clr[11], lwd 1) legend(3.5, 425, legend c("observed", "Poisson Pred.", + "Neg. Binomial Pred."), fill c(2,3,4)) # M374Munitions1.pdf 5

Distribution of Number of Accidents Number of Women 1 2 3 4 Observed Poisson Pred. Neg. Binomial Pred. 1 2 3 4 5 6 Number of Accidents 6

################# DO CHI-SQUARED TESTS ON FIT ############################### # Lump 4 or over into single bin acc1 <- c(acc[1:4],acc[5]+acc[6]+acc[7]) acc1 [1] 447 132 42 21 5 probpois1 <- c(probpois[1:4],probpois[5]+probpois[6]+probpois[7]) sum(probpois1) [1] 1 probnb1 <- c(probnb[1:4],probnb[5]+probnb[6]+probnb[7]) sum(probnb1) [1] 1 chisq.test(acc1,pprobpois1) Chi-squared test for given probabilities data: acc1 X-squared 7.3724, df 4, p-value 1.894e-14 Warning message: In chisq.test(acc1, p probpois1) : Chi-squared approimation may be incorrect # Since one parameter was estimated, the critical.5 chisq and p-value are qchisq(.5,3,lower.tailf) [1] 7.814728 pchisq(7.3724,3,lower.tailf) [1] 3.552333e-15 chisq.test(acc1,pprobnb1) Chi-squared test for given probabilities data: acc1 X-squared 4.126, df 4, p-value.3923 # Since two parameter were estimated, the critical.5 chisq and p-value are qchisq(.5,2,lower.tailf) [1] 5.991465 pchisq(4.126,2,lower.tailf) [1].1285677 # It seems that we cannot reject H: distribution is neg. binom, # but we do reject H: distribution is Poisson. However, the test is not # valid: the parameters for nb were not MLE estimators for binned data (for # that matter neither was the parameter for Poisson!) 7