Confidence and prediction intervals for. generalised linear accident models

Size: px
Start display at page:

Download "Confidence and prediction intervals for. generalised linear accident models"

Transcription

1 Confidence and prediction intervals for generalised linear accident models G.R. Wood September 8, 2004 Department of Statistics, Macquarie University, NSW 2109, Australia address: Fax: Abstract Generalised linear models, with log link and either Poisson or negative binomial errors, are commonly used for relating accident rates to explanatory variables. This paper adds to the toolkit for such models. It describes how confidence intervals (for example, for the true accident rate at given flows) and prediction intervals (for example, for the number of accidents at a new site with given flows) can be produced using spreadsheet technology. Running head: Confidence intervals for accident models Key words and phrases: Generalised linear model; negative binomial; Poisson

2 1 Introduction Generalised linear models have gathered recognition in recent years (Maycock and Hall, 1984; Hauer et al., 1988; Maher and Summersgill, 1996) as useful tools for relating the number of accidents, of a specifed type, to explanatory variables such as vehicle flows. For the single flow model, the true mean number of accidents µ is modelled as β 0 x β 1, where x denotes the flow. The distribution of the observed number of accidents, for a given flow, is assumed to be either Poisson, or more generally, negative binomially distributed about this mean value. The negative binomial distribution occurs naturally when we allow for variation of safety M between sites, with a given flow, to be modelled by a gamma distribution, and then variation of the number of accidents Y within a site, with safety M, to be modelled by a Poisson distribution with mean M. A detailed description of these models has been given in the companion paper (Wood, 2002), where methods for assessing goodness of fit were described. Once goodness of fit is established for a model it is of interest to provide confidence intervals (for model parameters) and prediction intervals (for dependent variables); this is routinely carried out when working with linear models. Such intervals provide information about the extent of variation in these quantities. In this context the intervals of interest, for a given flow, are: i) A confidence interval for µ, the true accident rate ii) a) For a Poisson model, a prediction interval for y, the accident rate at a new site 1

3 b) For a negative binomial model, a prediction interval for m, the safety of a new site, and a prediction interval for y, the accident rate at a new site. The purpose of this paper is to provide formulae, in Section 2, which enable construction of these intervals, and to illustrate their use with real accident data in Section 3. The required calculations can be carried out on a spreadsheet. Exposition is generally in terms of models with a single flow; models with more than one explanatory variable are handled in an extended, but similar, fashion. Notation and terminology used in this paper are as in Wood (2002). Standard texts, for example McCullough and Nelder (1989), discuss confidence intervals for generalised linear model parameters; the author, however, has not found the approach discussed here in the literature, other than in Maher and Summersgill (1996). Here we clarify, amplify and extend that work. Specifically, approximate confidence and prediction intervals appropriate for a given flow are developed. We caution that the confidence level necessarily decreases if we wish to make statements about many flow values. For this, so-called simultaneous (and necessarily wider) confidence bands are needed. The development of simultaneous confidence bands is a topic of current research; the work of Sun et al. (2000) produces such confidence bands for the mean in a generalised linear model. This paper can be read in two ways. A reader interested in the practical construction of confidence and prediction intervals should skim Section 2, then work carefully through the examples of Section 3, referring to Section 2 and the appendix for formulae as needed 2

4 (Table 4 provides an overall summary). For the reader interested in the underlying theory, careful reading of Section 2 and the appendix is recommended. 2 Confidence and prediction intervals A confidence interval for the true mean, for both the Poisson and negative binomial models, is developed in Section 2.1. In Section 2.2 a prediction interval for a predicted number of accidents at a new site is derived for the Poisson model, while in Section 2.3 prediction intervals for safety and predicted number of accidents at a new site are produced for negative binomial models. 2.1 Confidence interval for µ The generalised linear model we have described uses a log link function; the logarithm of µ is linear in the model parameters β 0 and β 1, since η = log µ = log β 0 + β 1 log x = β 0 + β 1 log x, for the single flow model. Standard generalised linear model theory gives that asymptotically the estimates b 0 and b 1, of β 0 and β 1 respectively, have a bivariate normal distribution (Dobson, 1990), in particular b 0 N β 0, I 1, b 1 β 1 so they are unbiased, with covariance matrix the inverse of the information matrix I. It follows that ˆη = b 0 + b 1 log x has asymptotically a normal distribution and since ˆη = log ˆµ, where ˆµ = e b 0 x b 1, ˆµ has an approximately lognormal distribution. 3

5 This enables us to write down an approximate 95% confidence interval for η, when the flow is x, as b 0 + b 1 log x ± 1.96 Var(b 0 + b 1 log x), whence a 95% confidence interval for µ = e η is given by [ e b 0 +b 1 log x 1.96 Var(b 0 +b 1 log x), e b 0 +b 1 log x+1.96 Var(b 0 +b 1 log x)] The lower boundary is closer to the estimate ˆµ of µ than is the higher boundary, reflecting the right skewed lognormal distribution of the estimate ˆµ. Here Var(b 0 + b 1 log x) = Var(b 0) + 2 log xcov(b 0, b 1 ) + (log x) 2 Var(b 1 ) = I log xi (log x) 2 I22 1 Illustrative real examples are given in Section 3. Note that ˆη = (1, log x)(b 0, b 1 ) T (where T denotes transpose) so in practice Var(ˆη) is most easily calculated as Var(ˆη) = (1, log x)i 1 (1, log x) T There are two ways to find the components of I 1. If the model is fitted using a statistical package then options are generally available which output the covariance matrix I 1 of the paramaters. On the other hand, if using the first principles method described in (Wood, 2002, A.3) then the required covariance matrix is (X T W X) 1, where X is the design matrix and W a diagonal matrix. A final remark in this subsection: the log-normal distribution of ˆµ discussed can be approximated by a normal distribution, or ˆµ N ( µ 0 = µ, σ0 2 = µ 2 Var(ˆη) ) 4

6 (as in Maher and Summersgill (1996), Equation (14)). This approximate sampling distribution for ˆµ is fundamental in the sequel. 2.2 Poisson model We consider the case of the Poisson model and an interval for a predicted number of accidents, y. Under the model, given a true mean accident rate of µ, the conditional distribution of accidents Y is Poisson with mean µ. A confidence interval for the number of accidents Y, however, must now accommodate the approximately normal variation in ˆµ, our estimator of µ, as N(µ 0, σ0). 2 Table 1 summarises the variables involved. Variable description Variable notation Distribution Accident rate, given true rate µ Y µ Poisson(µ) Estimator of true mean accident rate ˆµ N(µ 0, σ0) 2 Table 1: The two levels of variation, first in µ, then in Y given µ, to be considered when forming a prediction interval for y in the Poisson model. The marginal distribution of Y is thus a mixture of Poisson distributions, on the mean, by a normal distribution. It can be shown that the distribution of Y, supported by {0, 1, 2,...} has mean µ 0 and variance σ0 2 + µ 0. (The key to this calculation is the observation that a central moment of a mixture is the mixture of the central moments of the distributions being mixed.) Our intuition does tell us that this variance should depend on that of ˆµ, namely σ0, 2 and also should increase as µ 0 increases, since Poisson distributions with larger mean have greater variance. 5

7 Chebyshev s inequality (Feller, 1966), namely P ( Y µ Y tσ Y ) 1 t 2 for t > 0, is a most useful result and can be used to produce a prediction interval. When the interval is one-sided (so for low mean values) Chebyshev s one-sided inequality (Feller, 1966, Section V.7, Example (a)), namely P (Y µ Y tσ Y ) t 2 for t > 0, provides a tighter interval. A formula, exploiting the discreteness of the distribution of Y, however, has been developed and offers a slight strengthening of this approach. It is described in the appendix and its use illustrated in Section 3. Further tightening of this confidence interval is doubtless still possible, for example, by making use of third and higher moments. 2.3 Negative binomial model We now develop intervals for safety and predicted number of accidents, for the negative binomial model; there are three mixtures involved. We first study construction of a prediction interval for site safety M, which rests on a mixing process analogous to the Poisson case. We then see that the negative binomial conditional distribution of Y, given µ and k, is a mixture, as is the marginal distribution of Y. 6

8 Prediction interval for safety m Here we find a prediction interval for m, the underlying site safety described in Hauer et al. (1988), as the flow x varies. We answer the question If we selected another site with flow x, where would m lie? Table 2 summarises the variables involved. Variable description Variable notation Distribution Safety, given true rate µ M (µ, k) Gamma(k, µ/k) Estimator of true mean accident rate ˆµ N(µ 0, σ0) 2 Table 2: The two levels of variation, first in µ, then in M given µ and k, needed when producing a prediction interval for safety m, for the negative binomial model. Here we mix gamma distributions with a normal; it can be shown that the marginal distribution of M has mean µ 0 and variance σ0 2 +(σ0 2 +µ 2 0)/k. If we assume approximate normality (this will improve as µ increases) and recalling that k is estimated as ˆk during the fitting process, an approximate 95% prediction interval for site safety m is ˆµ ± 1.96 ˆµ2 Var(ˆη) + ˆµ2 Var(ˆη) + ˆµ 2 ˆk Estimates ˆµ and Var(ˆη) can be readily evaluated, as described earlier. The lower limit of this interval may be negative, due to the use of a normal approximation to the lognormally distributed ˆµ; in this case the lower limit should be set to zero. Simulation tests, using typical parameter values, have shown this to provide a satisfactory prediction interval approximation. 7

9 Prediction interval for number of accidents y Here we find a prediction interval for the number of accidents y at a site, randomly chosen from those with flow x. The relevant variables are summarised in Table 3. Variable description Variable notation Distribution Accident rate, given safety m Y m Poisson(m) Safety, given true rate µ M µ, k Gamma(k, k/µ) Estimator of true mean accident rate ˆµ N(µ 0, σ0) 2 Table 3: The three levels of variation involved in forming a prediction interval for y in the negative binomial model; first in µ, then in M given µ and k, and finally in Y given m. The model builds the distribution of accidents across all sites with flow x, first as a mixture of the Poisson within site variation Y m by the gamma across site variation M µ, k, well known to be negative binomial with parameters k and p = k/(µ + k), as described in (Wood, 2002, p.425). Second, we must now recognise that µ itself is unknown, so the accident rate is really a mixture of negative binomial Y k, p variables by a normal distribution on µ. The marginal distribution of Y, the number of accidents at a site with flow x, having support in {0, 1, 2,...}, can be shown to have mean µ 0 and variance σ0 2 +(σ0 2 +µ 2 0)/k+µ 0. All quantities can be estimated during model fitting, as earlier. Note that as k increases the variance shrinks to that found in the Poisson case. Chebyshev s one-sided inequality, or the slightly stronger method of the appendix, can be used to produce a prediction 8

10 interval for y. 3 Examples Illustrations of each of the intervals described are now given, using three New Zealand accident datasets. 3.1 Poisson model with one independent variable A Poisson model was used to relate loss-of-control accidents to incoming flow at 289 arms of both priority and uncontrolled T-intersections throughout New Zealand, yielding b 0 = and b 1 = The grouped G 2 method described in (Wood, 2002, Section 4.3) was used to test the goodness of fit of the model; there was no evidence of poor fit. In order to find a confidence interval for µ, the covariance matrix (X T W X) 1 was obtained as a by-product of the fitting process, as I 1 = For a flow of x = 600, for example, this gives Var(b 0 + b 1 log x) = , whence an approximate 95% confidence interval for µ is [0.0251, ]. The full confidence band, as x varies, is shown (dashed line) around the fitted curve (solid line) in Figure 1. For each flow, µ 0 = ˆµ and σ0 2 = ˆµ 2 Var(ˆη) were then calculated, as described in Section 2.2. The formula given in the appendix was then applied, yielding the 95% band for a predicted y value, shown as the stepped line in Figure 1. (The 90% band 9

11 was everywhere {0}, so the horizontal axis in the figure.) Figure 1 here Figure 1: A Poisson model (solid curve) relates accident rate to flow for the loss-ofcontrol data. A 95% confidence band for the true accident rate µ is shown with the dashed lines, while a 95% prediction band for the number of accidents y at at new site is shown with the stepped line. 3.2 Negative binomial model with one independent variable Rear-end accidents at 392 arms of signalised crossroads throughout New Zealand were related to flow using the negative binomial model, producing b 0 = , b 1 = and ˆk = Goodness of fit of the model was tested using the method presented in (Wood, 2002, Section 4.4), revealing no evidence of poor fit. Again, the covariance matrix for b 0 and b 1, (X T W X) 1 was obtained as I 1 = allowing construction of a 95% confidence band for µ, as described for the Poisson model. For example, for x = 10000, we find ˆµ = and Var(ˆη) = Var(b 0+b 1 log x) = , whence an approximate 95% confidence interval is [0.2036, ]. The full confidence band is shown (long dashes) around the fitted curve (solid line) in Figure 2. The variance σ0 2 +(σ0 2 +µ 2 0)/k of M, for a given flow, was then estimated as described 10

12 in Section 2.3. Continuing the example, for x = 10000, and recalling that ˆk = 0.60, this leads to a 95% prediction interval for m of [0, 1.003]. The full prediction band is shown (short dashes) in Figure 2. Finally, the variance of a predicted Y is calculated using the formula given in Section 2.3, for each flow level, and the formula described in the appendix used to calculate the upper limit of the prediction interval. For example, for x = and using ˆµ, Var(ˆη) and ˆk as before we find that {0, 1, 2} provides a 90% interval. The full prediction band is shown in Figure 2 (stepped horizontal lines). Figure 2 here Figure 2: A negative binomial model (solid curve) relates accident rate to flow for the rear-end data. A 95% confidence band for the true accident rate µ is shown (long-dashes) and a 95% prediction band for the safety m (short dashes), while a 90% prediction band for the number of accidents at a new site is shown (stepped horizontal lines). 3.3 Negative binomial model with two independent variables Right turn against vehicle accidents at four-arm, two-way signalised intersections were related to the two traffic flows (annual average daily through flow x 1 and annual average daily right-turning flow x 2 ) in a recent New Zealand study, using a negative binomial model with µ = β 0 x β 1 1 x β 2 2. Maximum likelihood estimation of the parameters gave b 0(= log b 0 ) = , b 1 = , b 2 = , ˆk = 1.8 and a covariance matrix for 11

13 the three parameters of I 1 = Using centrally placed flows of x 1 = and x 2 = 5000, a 95% confidence interval for the associated true accident rate µ will be [ ] eˆη 1.96 Var(ˆη), eˆη+1.96 Var(ˆη) where ˆη = log ˆµ = b 0 + b 1 log x 1 + b 2 log x 2 and Var(ˆη) = Var(b 0) + (log x 1 ) 2 Var(b 1 ) + (log x 2 ) 2 Var(b 2 ) + 2 log x 1 Cov(b 0, b 1 ) + 2 log x 2 Cov(b 0, b 2 ) + 2 log x 1 log x 2 Cov(b 1, b 2 ) As remarked earlier, these calculations are most easily carried out by noting that ˆη = (1, log x 1, log x 2 )(b 0, b 1, b 2 ) T whence Var(ˆη) = (1, log x 1, log x 2 )I 1 (1, log x 1, log x 2 ) T Substituting values gives ˆη = and Var(ˆη) = whence a 95% confidence interval for µ is [1.0907, ]. A 95% prediction interval for safety m and the given flows is still given by ˆµ ± 1.96 ˆµ2 Var(ˆη) + ˆµ2 Var(ˆη) + ˆµ 2 ˆk 12

14 Since ˆµ = eˆη = , substituting values and raising the lower boundary to zero produces the wider interval of [0, ]. Finally, the number of accidents Y at a site with the given flows will have mean estimated as ˆµ = and variance as ˆµ 2 Var(ˆη) + ˆµ2 Var(ˆη) + ˆµ 2 ˆk + ˆµ = Since ˆµ > 1 we must use the one-sided Chebyshev inequality to find the upper limit of a 95% prediction interval for y. This yields P (Y µ Y + 19σ Y ) 1 20 whence substitution of the estimated mean and standard deviation for Y gives an upper limit of = Thus we can quote [0, 9] as a 95% prediction interval for the number of right turn against accidents at a signalised intersection with the given flows. 4 Discussion and summary Comment on the approximations used to produce the intervals is now presented. First, model parameter estimates (b 0, b 1, etc.) are only approximately normally distributed, the approximation improving with sample size; this will influence the accuracy of a confidence interval produced for ˆµ. Second, the lognormal distribution of ˆµ is approximated by a normal distribution, so tending to widen a prediction interval for m and y. Finally, use of the Chebyshev inequality is conservative, so again tending to widen a prediction 13

15 interval. The method given in the appendix was developed in an attempt to improve on the Chebyshev inequality. Once the model is fitted, for any finite number of independent variables (whether measurement or categorical), all intervals discussed can be calculated using a spreadsheet. In general we have ˆη = ab T, Var(ˆη) = ai 1 a T and ˆµ = eˆη, where a is the row vector of coefficients of the parameter estimates in row vector b. For example, for the model µ = β 0 x β 1 e zβ 2, with one measurement variable x and one categorical variable z, a = (1, log x, z) while b = (b 0, b 1, b 2 ). The form of all 95% confidence and prediction intervals discussed in this paper is summarised in Table 4, with intervals for y produced using the one-sided Chebyshev inequality. For a 90% prediction interval for y, the factor in the formulae is 3, rather than 19, being then the solution to 1/(1 + t 2 ) = 1/10; for a 99% prediction interval for y, the factor would be 99. When ˆµ 1 it may be possible to tighten the interval for y using the formulae in the appendix; for high ˆµ values a prediction interval for y excluding zero may possibly be found using the two-sided Chebyshev inequality. An assumption underlying the models used is that the explanatory variables are measured without error. As was pointed out in Section 8 of Maher and Summersgill (1996), for geometric and control variables this presents no problem. Flow variables, however, should be annual average daily traffic (AADT) over the study period. In practice, they are often scaled figures, based on a count in a single day, so estimates of AADT. Maher and Summersgill (1996) ran simulation experiments and concluded that 14

16 Poisson model µ y [ˆµ/e 1.96 Var(ˆη), ˆµe 1.96 Var(ˆη) ] [ 0, ˆµ + 19 ˆµ 2 Var(ˆη) + ˆµ ] Negative binomial model µ m y [ˆµ/e 1.96 Var(ˆη), ˆµe 1.96 Var(ˆη) ] max 0, ˆµ 1.96 ˆµ 2 Var(ˆη) + ˆµ2 Var(ˆη) + ˆµ 2, ˆµ ˆµ ˆk 2 Var(ˆη) + ˆµ2 Var(ˆη) + ˆµ 2 ˆk 0, ˆµ + 19 ˆµ 2 Var(ˆη) + ˆµ2 Var(ˆη) + ˆµ 2 + ˆµ ˆk Table 4: A summary of the 95% confidence and prediction intervals discussed in the paper ( x denotes the largest integer less than or equal to x). for their data, 12 or 16 hour single day counts provided adequate accuracy. The intervals discussed in this paper rely not only on ˆη (and hence the flows and parameter estimates) but also on Var(ˆη) and ˆk. The effect of error in the AADTs on estimation of the covariance matrix (and hence Var(ˆη)) and the parameter k remains to be investigated. As a general guide, however, counts taken over longer periods will provide for greater confidence and prediction interval accuracy. Collinearity of columns of the design matrix leads to uncertainty in parameter estimates. This uncertainty (seen as larger entries in the covariance matrix I 1 ) is incorporated into the intervals constructed. The upshot is that use of independent explanatory variables will tend to yield tighter confidence and prediction intervals. For example, 15

17 the through and right-turning flows used in the two variable example in Section 3.3 exhibited extremely low correlation, leading to the low second and third figures on the diagonal of the covariance matrix. To summarise, approximate confidence intervals for the true mean have been presented (Section 2.1) for Poisson and negative binomial generalised linear accident models. An approximate prediction interval for a predicted accident rate has been developed (Section 2.2) for the Poisson model. The form of this interval for the negative binomial model has also been presented (Section 2.3), together with a prediction interval for a predicted safety. Examples have been used (Section 3) to illustrate the ideas. Acknowledgements Dr. Shane Turner is thanked for alerting the author to the importance of this problem and for kindly supplying (together with Aaron Roozenburg) the three data sets. The research was partly supported by the Land Transport Safety Authority of New Zealand. Pertinent comments from a referee led to improvements in the paper. Appendix A strengthening of the one-sided Chebyshev inequality, exploiting the integer support of Y and appropriate for the typically low mean values (µ < 1) encountered in accident modelling, is presented. It provides a formula for the upper limit of a one-sided 16

18 prediction interval for y. In the remainder, Y is a random variable taking values in {0, 1, 2,...}, with mean µ and variance σ 2. Table 5, in summary form, shows the prediction 100(1 α)% interval as µ varies. µ interval 100(1 α)% prediction interval 0 µ α {0} α < µ 0.5 {0, 1,..., µ < µ < 1 {0, 1,..., µ + µ 2 (µ 2 σ 2 )/α } 1 + µ 2 + (µ 2 + σ 2 µ(1 + 2α))/α } Table 5: Upper limits for the prediction interval for y, as µ varies ( x denotes the largest integer less than or equal to x). The reasoning behind these formulae follows now. We let p i = Pr(Y = i) for i = 0, 1,.... Then µ = ip i = ip i p i = 1 p 0 i=0 i=1 i=1 So if 0 µ α, p 0 1 µ 1 α whence {0} serves as a 100(1 α)% prediction interval. Now suppose that α < µ 0.5. It is always the case that x 1 σ 2 = p 0 (0 µ) 2 + p i (i µ) 2 + p i (i µ) 2 i=1 i=x where x 1 is the largest positive integer such that Pr(Y x) α. At least probability 1 µ sits at zero and at least probability α sits to the right of and including x. By placing the maximum possible balance of the probability, µ α, at the domain point 17

19 closest to µ, namely zero, we can conservatively find x as the largest solution of σ 2 (1 µ)µ 2 + (µ α)µ 2 + α(x µ) 2 or, with manipulation, the largest solution of x 2 2µx + 1 α (µ2 σ 2 ) 0 The larger root of this quadratic is µ+ µ 2 (µ 2 σ 2 )/α, from which the result follows. When 0.5 < µ < 1 we follow the same path, but must place the balance of probability at the closest domain point to µ, now 1, so must find the largest integer x for which or µ + σ 2 (1 µ)µ 2 + (µ α)(1 µ) 2 + α(x µ) µ 2 + (µ 2 + σ 2 µ(1 + 2α))/α, so demonstrating the result. References [1] Dobson, A., An Introduction to Generalized Linear Models. Chapman and Hall, London. [2] Feller, W., An Introduction to Probability Theory and its Applications, Vol. 2. Wiley, New York. [3] Hauer, E., Ng, J.C.N., Lovell, J., Estimation of safety at signalized intersections. Transportation Research Record 1185, [4] Maher, M.J., Summersgill, I., A comprehensive methodology for the fitting of predictive accident models. Accident Analysis and Prevention 28,

20 [5] Maycock, G., Hall, R.D., Accidents at 4-arm roundabouts. Laboratory Report LR1120. Crowthorne, Berks, U.K. Transport Research Laboratory. [6] McCullagh, P., Nelder, J.A., Generalised Linear Models (2nd edition). Chapman and Hall, London, New York. [7] Sun, J., Loader, C., McCormick, W.P., Confidence bands in generalized linear models. The Annals of Statistics 28, [8] Wood, G.R., Generalised linear accident models and goodness of fit testing. Accident Analysis and Prevention 34,

PLANNING TRAFFIC SAFETY IN URBAN TRANSPORTATION NETWORKS: A SIMULATION-BASED EVALUATION PROCEDURE

PLANNING TRAFFIC SAFETY IN URBAN TRANSPORTATION NETWORKS: A SIMULATION-BASED EVALUATION PROCEDURE PLANNING TRAFFIC SAFETY IN URBAN TRANSPORTATION NETWORKS: A SIMULATION-BASED EVALUATION PROCEDURE Michele Ottomanelli and Domenico Sassanelli Polytechnic of Bari Dept. of Highways and Transportation EU

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The

More information

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0) Delta Method Often estimators are functions of other random variables, for example in the method of moments. These functions of random variables can sometimes inherit a normal approximation from the underlying

More information

Accident Analysis and Prevention xxx (2006) xxx xxx. Dominique Lord

Accident Analysis and Prevention xxx (2006) xxx xxx. Dominique Lord Accident Analysis and Prevention xxx (2006) xxx xxx Modeling motor vehicle crashes using Poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames Statistical Methods in HYDROLOGY CHARLES T. HAAN The Iowa State University Press / Ames Univariate BASIC Table of Contents PREFACE xiii ACKNOWLEDGEMENTS xv 1 INTRODUCTION 1 2 PROBABILITY AND PROBABILITY

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial

More information

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition Preface Preface to the First Edition xi xiii 1 Basic Probability Theory 1 1.1 Introduction 1 1.2 Sample Spaces and Events 3 1.3 The Axioms of Probability 7 1.4 Finite Sample Spaces and Combinatorics 15

More information

GLM models and OLS regression

GLM models and OLS regression GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:

More information

Chapter 3: Polynomial and Rational Functions

Chapter 3: Polynomial and Rational Functions Chapter 3: Polynomial and Rational Functions 3.1 Polynomial Functions A polynomial on degree n is a function of the form P(x) = a n x n + a n 1 x n 1 + + a 1 x 1 + a 0, where n is a nonnegative integer

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester GLM models and OLS regression The

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

CHAIN LADDER FORECAST EFFICIENCY

CHAIN LADDER FORECAST EFFICIENCY CHAIN LADDER FORECAST EFFICIENCY Greg Taylor Taylor Fry Consulting Actuaries Level 8, 30 Clarence Street Sydney NSW 2000 Australia Professorial Associate, Centre for Actuarial Studies Faculty of Economics

More information

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 120 minutes.

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 120 minutes. Closed book and notes. 10 minutes. Two summary tables from the concise notes are attached: Discrete distributions and continuous distributions. Eight Pages. Score _ Final Exam, Fall 1999 Cover Sheet, Page

More information

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 1.1 The Probability Model...1 1.2 Finite Discrete Models with Equally Likely Outcomes...5 1.2.1 Tree Diagrams...6 1.2.2 The Multiplication Principle...8

More information

Lecture 3: Statistical sampling uncertainty

Lecture 3: Statistical sampling uncertainty Lecture 3: Statistical sampling uncertainty c Christopher S. Bretherton Winter 2015 3.1 Central limit theorem (CLT) Let X 1,..., X N be a sequence of N independent identically-distributed (IID) random

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Week 1 Quantitative Analysis of Financial Markets Distributions A

Week 1 Quantitative Analysis of Financial Markets Distributions A Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

IV. The Normal Distribution

IV. The Normal Distribution IV. The Normal Distribution The normal distribution (a.k.a., a the Gaussian distribution or bell curve ) is the by far the best known random distribution. It s discovery has had such a far-reaching impact

More information

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS

FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS The 7 th International Days of Statistics and Economics, Prague, September 9-, 03 FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS Ivana Malá Abstract In the contribution the finite mixtures of distributions

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

ABSTRACT KEYWORDS 1. INTRODUCTION

ABSTRACT KEYWORDS 1. INTRODUCTION THE SAMPLE SIZE NEEDED FOR THE CALCULATION OF A GLM TARIFF BY HANS SCHMITTER ABSTRACT A simple upper bound for the variance of the frequency estimates in a multivariate tariff using class criteria is deduced.

More information

2008 Winton. Statistical Testing of RNGs

2008 Winton. Statistical Testing of RNGs 1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software Sampling: A Brief Review Workshop on Respondent-driven Sampling Analyst Software 201 1 Purpose To review some of the influences on estimates in design-based inference in classic survey sampling methods

More information

Freeway rear-end collision risk for Italian freeways. An extreme value theory approach

Freeway rear-end collision risk for Italian freeways. An extreme value theory approach XXII SIDT National Scientific Seminar Politecnico di Bari 14 15 SETTEMBRE 2017 Freeway rear-end collision risk for Italian freeways. An extreme value theory approach Gregorio Gecchele Federico Orsini University

More information

SuperMix2 features not available in HLM 7 Contents

SuperMix2 features not available in HLM 7 Contents SuperMix2 features not available in HLM 7 Contents Spreadsheet display of.ss3 files... 2 Continuous Outcome Variables: Additional Distributions... 3 Additional Estimation Methods... 5 Count variables including

More information

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS TABLE OF CONTENTS INTRODUCTORY NOTE NOTES AND PROBLEM SETS Section 1 - Point Estimation 1 Problem Set 1 15 Section 2 - Confidence Intervals and

More information

TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models

TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models TRB Paper #11-2877 Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Instute

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Math 180B Problem Set 3

Math 180B Problem Set 3 Math 180B Problem Set 3 Problem 1. (Exercise 3.1.2) Solution. By the definition of conditional probabilities we have Pr{X 2 = 1, X 3 = 1 X 1 = 0} = Pr{X 3 = 1 X 2 = 1, X 1 = 0} Pr{X 2 = 1 X 1 = 0} = P

More information

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Standard Error of Technical Cost Incorporating Parameter Uncertainty Standard Error of Technical Cost Incorporating Parameter Uncertainty Christopher Morton Insurance Australia Group This presentation has been prepared for the Actuaries Institute 2012 General Insurance

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Including Statistical Power for Determining. How Many Crashes Are Needed in Highway Safety Studies

Including Statistical Power for Determining. How Many Crashes Are Needed in Highway Safety Studies Including Statistical Power for Determining How Many Crashes Are Needed in Highway Safety Studies Dominique Lord Assistant Professor Texas A&M University, 336 TAMU College Station, TX 77843-336 Phone:

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009

EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009 EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS Transportation Seminar February 16 th, 2009 By: Hongyun Chen Graduate Research Assistant 1 Outline Introduction Problem

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

Biased Urn Theory. Agner Fog October 4, 2007

Biased Urn Theory. Agner Fog October 4, 2007 Biased Urn Theory Agner Fog October 4, 2007 1 Introduction Two different probability distributions are both known in the literature as the noncentral hypergeometric distribution. These two distributions

More information

D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Conditional distributions (discrete case)

Conditional distributions (discrete case) Conditional distributions (discrete case) The basic idea behind conditional distributions is simple: Suppose (XY) is a jointly-distributed random vector with a discrete joint distribution. Then we can

More information

Conjugate Predictive Distributions and Generalized Entropies

Conjugate Predictive Distributions and Generalized Entropies Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros Dominique Lord 1 Associate Professor Zachry Department of Civil Engineering Texas

More information

Algebra and Trigonometry 2006 (Foerster) Correlated to: Washington Mathematics Standards, Algebra 2 (2008)

Algebra and Trigonometry 2006 (Foerster) Correlated to: Washington Mathematics Standards, Algebra 2 (2008) A2.1. Core Content: Solving problems The first core content area highlights the type of problems students will be able to solve by the end of, as they extend their ability to solve problems with additional

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Practical Statistics for the Analytical Scientist Table of Contents

Practical Statistics for the Analytical Scientist Table of Contents Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning

More information

arxiv: v1 [stat.me] 5 Apr 2013

arxiv: v1 [stat.me] 5 Apr 2013 Logistic regression geometry Karim Anaya-Izquierdo arxiv:1304.1720v1 [stat.me] 5 Apr 2013 London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK e-mail: karim.anaya@lshtm.ac.uk and Frank Critchley

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 16. A Brief Introduction to Continuous Probability

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 16. A Brief Introduction to Continuous Probability CS 7 Discrete Mathematics and Probability Theory Fall 213 Vazirani Note 16 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces Ω, where the

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679 APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 1 Table I Summary of Common Probability Distributions 2 Table II Cumulative Standard Normal Distribution Table III Percentage Points, 2 of the Chi-Squared

More information

2. Variance and Higher Moments

2. Variance and Higher Moments 1 of 16 7/16/2009 5:45 AM Virtual Laboratories > 4. Expected Value > 1 2 3 4 5 6 2. Variance and Higher Moments Recall that by taking the expected value of various transformations of a random variable,

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45 Two hours Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER PROBABILITY 2 14 January 2015 09:45 11:45 Answer ALL four questions in Section A (40 marks in total) and TWO of the THREE questions

More information

Progression Factors in the HCM 2000 Queue and Delay Models for Traffic Signals

Progression Factors in the HCM 2000 Queue and Delay Models for Traffic Signals Akcelik & Associates Pty Ltd TECHNICAL NOTE Progression Factors in the HCM 2000 Queue and Delay Models for Traffic Signals Author: Rahmi Akçelik September 2001 Akcelik & Associates Pty Ltd DISCLAIMER:

More information

Mixtures and Random Sums

Mixtures and Random Sums Mixtures and Random Sums C. CHATFIELD and C. M. THEOBALD, University of Bath A brief review is given of the terminology used to describe two types of probability distribution, which are often described

More information

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New

More information

2010 HSC NOTES FROM THE MARKING CENTRE MATHEMATICS EXTENSION 1

2010 HSC NOTES FROM THE MARKING CENTRE MATHEMATICS EXTENSION 1 Contents 2010 HSC NOTES FROM THE MARKING CENTRE MATHEMATICS EXTENSION 1 Introduction... 1 Question 1... 1 Question 2... 2 Question 3... 3 Question 4... 4 Question 5... 5 Question 6... 5 Question 7... 6

More information

3.3 Estimator quality, confidence sets and bootstrapping

3.3 Estimator quality, confidence sets and bootstrapping Estimator quality, confidence sets and bootstrapping 109 3.3 Estimator quality, confidence sets and bootstrapping A comparison of two estimators is always a matter of comparing their respective distributions.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

I used college textbooks because they were the only resource available to evaluate measurement uncertainty calculations.

I used college textbooks because they were the only resource available to evaluate measurement uncertainty calculations. Introduction to Statistics By Rick Hogan Estimating uncertainty in measurement requires a good understanding of Statistics and statistical analysis. While there are many free statistics resources online,

More information

The Not-Formula Book for C2 Everything you need to know for Core 2 that won t be in the formula book Examination Board: AQA

The Not-Formula Book for C2 Everything you need to know for Core 2 that won t be in the formula book Examination Board: AQA Not The Not-Formula Book for C Everything you need to know for Core that won t be in the formula book Examination Board: AQA Brief This document is intended as an aid for revision. Although it includes

More information

Modeling Recurrent Events in Panel Data Using Mixed Poisson Models

Modeling Recurrent Events in Panel Data Using Mixed Poisson Models Modeling Recurrent Events in Panel Data Using Mixed Poisson Models V. Savani and A. Zhigljavsky Abstract This paper reviews the applicability of the mixed Poisson process as a model for recurrent events

More information

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

Probability and Stochastic Processes

Probability and Stochastic Processes Probability and Stochastic Processes A Friendly Introduction Electrical and Computer Engineers Third Edition Roy D. Yates Rutgers, The State University of New Jersey David J. Goodman New York University

More information

This appendix provides a very basic introduction to linear algebra concepts.

This appendix provides a very basic introduction to linear algebra concepts. APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not

More information

Solution: chapter 2, problem 5, part a:

Solution: chapter 2, problem 5, part a: Learning Chap. 4/8/ 5:38 page Solution: chapter, problem 5, part a: Let y be the observed value of a sampling from a normal distribution with mean µ and standard deviation. We ll reserve µ for the estimator

More information

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability

More information

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,

More information

Approximation of Posterior Means and Variances of the Digitised Normal Distribution using Continuous Normal Approximation

Approximation of Posterior Means and Variances of the Digitised Normal Distribution using Continuous Normal Approximation Approximation of Posterior Means and Variances of the Digitised Normal Distribution using Continuous Normal Approximation Robert Ware and Frank Lad Abstract All statistical measurements which represent

More information

STATISTICS; An Introductory Analysis. 2nd hidition TARO YAMANE NEW YORK UNIVERSITY A HARPER INTERNATIONAL EDITION

STATISTICS; An Introductory Analysis. 2nd hidition TARO YAMANE NEW YORK UNIVERSITY A HARPER INTERNATIONAL EDITION 2nd hidition TARO YAMANE NEW YORK UNIVERSITY STATISTICS; An Introductory Analysis A HARPER INTERNATIONAL EDITION jointly published by HARPER & ROW, NEW YORK, EVANSTON & LONDON AND JOHN WEATHERHILL, INC.,

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

Algebra 2 Early 1 st Quarter

Algebra 2 Early 1 st Quarter Algebra 2 Early 1 st Quarter CCSS Domain Cluster A.9-12 CED.4 A.9-12. REI.3 Creating Equations Reasoning with Equations Inequalities Create equations that describe numbers or relationships. Solve equations

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type TRB Paper 10-2572 Examining Methods for Estimating Crash Counts According to Their Collision Type Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas A&M University

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information