Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data
|
|
- Samantha Holland
- 5 years ago
- Views:
Transcription
1 Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data Eduardo Elias Ribeiro Junior 1 2 Walmes Marques Zeviani 1 Wagner Hugo Bonat 1 Clarice Garcia Borges Demétrio 2 John Hinde 3 1 Statistics and Geoinformation Laboratory (LEG-UFPR) 2 Department of Exact Sciences (ESALQ-USP) 3 School of Mathematics, Statistics and Applied Mathematics (NUI-Galway) 27th June 2018 jreduardo@usp.br edujrrib@gmail.com
2 Outline 1. Background 2. Reparametrization 3. Simulation study Final remarks Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 0
3 Background 1 Background Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 0
4 Count data Background Number of times an event occurs in the observation unit. Random variables that assume non-negative integer values. Let Y be a counting random variable, so that y = 0, 1, 2,... Examples in experimental researches: number of grains produced by a plant; number of fruits produced by a tree; number of insects on a particular cell; others. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 1
5 Background Poisson model and limitations GLM framework (Nelder & Wedderburn 1972) Provide suitable distribution for a counting random variables; Efficient algorithm for estimation and inference; Implemented in many software. Poisson model Relationship between mean and variance, E(Y) = Var(Y); Main limitations Overdispersion (more common), E(Y) < Var(Y) Underdispersion (less common), E(Y) > Var(Y) Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 2
6 Background COM-Poisson distribution Probability mass function (Shmueli et al. 2005) takes the form Pr(Y = y λ, ν) = where λ > 0 and ν 0. Moments are not available in closed form; λ y (y!) ν Z(λ, ν), Z(λ, ν) = λ j (j!) j=0 ν, Expectation and variance can be closely approximated by E(Y) λ 1/ν ν 1 2ν and Var(Y) λ1/ν ν with accurate approximations for ν 1 or λ > 10 ν (Shmueli et al. 2005, Sellers et al. 2012). Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 3
7 Background COM-Poisson regression models Model definition Modelling the relationship between E(Y i ) and x i indirectly (Sellers & Shmueli 2010); Y i x i COM-Poisson(λ i, ν) η(e(y i x i )) = log(λ i ) = x i β Main goals Study distribution properties in terms of i) modelling real count data and ii) inference aspects. Propose a reparametrization in order to model the expectation of the response variable as a function of the covariate values directly. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 4
8 Reparametrization 2 Reparametrization Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 4
9 Reparametrization Reparametrized COM-Poisson Reparametrization Introduced new parameter µ, using the mean approximation µ = λ 1/ν ν 1 ( ) (ν 1) ν λ = µ + ; 2ν 2ν Precision parameter is taken on the log scale to avoid restrictions on the parameter space Probability mass function φ = log(ν) φ R; Replacing λ and ν as function of µ and φ in the pmf of COM-Poisson ( ) ye φ Pr(Y = y µ, φ) = µ + eφ 1 (y!) e φ 2e φ Z(µ, φ). Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 5
10 Reparametrization Study of the moments approximations µ 15 µ (a) φ 5 (b) φ Figure: Quadratic errors for the approximation of the (a) expectation and (b) variance. Dotted lines represent the restriction for suitable approximations given by Shmueli et al. (2005). Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 6
11 Reparametrization COM-Poisson µ distribution φ = 0.7 φ = 0.0 φ = 0.7 µ = 2 µ = 8 µ = P(Y = y) y Figure: Shapes of the COM-Poisson distribution for different parameter values. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 7
12 Reparametrization Properties of COM-Poisson distribution To explore the flexibility of the COM-Poisson distribution, we consider the follow indexes: Dispersion index: DI = Var(Y)/E(Y); Zero-inflation index: ZI = 1 + log Pr(Y = 0)/E(Y); Heavy-tail index: HT = Pr(Y = y + 1)/ Pr(Y = y), for y. These indexes are interpreted in relation to the Poisson distribution: over- (DI > 1), under- (DI < 1) and equidispersion (DI = 1); zero-inflation (ZI > 0) and zero-deflation (ZI < 0) and heavy-tail distribution for HT 1 when y. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 8
13 Reparametrization Properties of COM-Poisson distribution φ φ φ φ Mean Variance Dispersion index Zero inflation index Heavy tail index µ (a) µ (b) µ (c) y Figure: Indexes for COM-Poisson distribution. (a) Mean and variance relationship, (b d) dispersion, zero-inflation and heavy-tail indexes for different parameter values. Dotted lines represents the Poisson special case. (d) Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 9
14 Reparametrization COM-Poisson µ regression models Let y i a set of independent observations from the COM-Poisson and x i = (x i1, x i2,..., x ip ) is a vector of known covariates, i = 1, 2,..., n. Model definition Modelling relationship between E(Y i ) and x i directly Y i x i COM-Poisson µ (µ i, φ) log(e(y i x i )) = log(µ i ) = x i β Log-likelihood function (l = l(β, φ y)) [ n ( ) ] l = e φ y i log µ i + eφ n 1 2e i=1 φ log(y i!) where µ i = exp(x i β) i=1 n i=1 log(z(µ i, φ)) Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 10
15 Reparametrization Estimation and inference The estimation and inference is based on the method of maximum likelihood. Let θ = (β, φ) the model parameters. Parameter estimates are obtained by numerical maximization of the log-likelihood function (by BFGS algorithm); l( ˆθ) = max l(θ), θ R p+1 ; Standard errors for regression coefficients are obtained based on the observed information matrix; Var( ˆθ) = H 1, where H is the matrix of second partial derivatives at ˆθ; Confidence intervals for ˆµ i are obtained by delta method.. Var[g( ˆθ)] = G Var( ˆθ)G, where G = ( g/ β 1,..., g/ β p ) ; The Hessian matrix H is obtained numerically by finite differences. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 11
16 Simulation study 3 Simulation study Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 11
17 Simulation study Definitions on the simulation study Objective: assess the properties of maximum likelihood estimators and orthogonality in the reparametrized model; Simulation: we consider counts generated according a regression model with a continuous and categorical covariates and different dispersion scenarios. Algorithm 1: Steps in simulation study. for n {50, 100, 300, 1000} do set x 1 as a sequence, with n elements, between 0 and 1; set x 2 as a repetition, with n elements, of three categories; compute µ using µ = exp(β 0 + β 1 x 1 + β 21 x 21 + β 22 x 22 ); for φ { 1.6, 1.0, 0.0, 1.8} do repeat simulate y from COM-Poisson distribution with µ and φ parameters; fit COM-Poisson µ regression model to simulated y; get ˆθ = ( ˆφ, ˆβ 0, ˆβ 1, ˆβ 21, ˆβ 22 ); get confidence intervals for ˆθ based on the observed information matrix. until 1000 times; Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 12
18 Simulation study Definitions on the simulation study Level of categorical variable (x 2 ) a b c φ = φ = Average count Dispersion index 4 3 φ = 0 φ = Values of continous variable (x 1 ) Values of continous variable (x 1 ) Figure: Average counts (left) and dispersion indexes (right) for each scenario considered in the simulation study. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 13
19 Simulation study Bias of the estimators Sample size n=50 n=100 n=300 n= φ = 1.6 φ = 1 φ = 0 φ = 1.8 ^ β22 ^ β21 ^ β1 ^ β0 ^ φ Standardized Bias Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 14
20 Simulation study Coverage rate of the confidence intervals φ = 1.6 φ = 1 φ = 0 φ = 1.8 ^ φ ^ β0 ^ β ^ β21 ^ β22 Coverage rate Sample size Figure: Coverage rate based on confidence intervals obtained by quadratic approximation for different sample sizes and dispersion levels. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 15
21 Simulation study Orthogonality property of the MLEs φ φ Original parametrization Proposed parametrization φ = φ = φ = φ = λ µ φ = φ = φ = φ = Figure: Deviance surfaces contour plots under original and proposed parametrization. The ellipses are confidence regions (90, 95 and 99%), dotted lines are the maximum likelihood estimates, and points are the real parameters used in the simulation. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 16
22 4 Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 16
23 Motivating data sets and data analysis Three illustrative examples of count data analysis are reported. Assessing toxicity of nitrofen in aquatic systems, an equidispersed example; Soil moisture and potassium doses on soybean culture, an overdispersed example; and Artificial defoliation in cotton phenology, an underdispersed example. In the data analysis, we consider the COM-Poisson model in the two forms (original and new parametrization) and the quasi-poisson regression model as alternative models for the standard Poisson regression model. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 17
24 Artificial defoliation in cotton phenology 4.1 Artificial defoliation in cotton phenology Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 17
25 Cotton bolls data Artificial defoliation in cotton phenology Aim: to assess the effects of five defoliation levels on the bolls produced at five growth stages; Design: factorial 5 5, with 5 replicates; Experimental unit: a plot with 2 plants; Factors: Artificial defoliation (des): Growth stage (est): Response variable: Total number of cotton bolls; Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 18
26 Model specification Artificial defoliation in cotton phenology Linear predictor: following Zeviani et al. (2014) log(µ ij ) = β 0 + β 1j def i + β 2j def 2 i i varies in the levels of artificial defoliation; j varies in the levels of growth stages. Alternative models: Poisson (µ ij ); COM-Poisson (λ ij = η(µ ij ), φ) COM-Poisson µ (µ ij, φ) Quasi-Poisson (var(y ij ) = σµ ij ) Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 19
27 Parameter estimates Artificial defoliation in cotton phenology Table: Parameter estimates (Est) and ratio between estimate and standard error (SE). Poisson COM-Poisson COM-Poisson µ Quasi-Poisson Est Est/SE Est Est/SE Est Est/SE Est Est/SE φ, σ β β β β β β β β β β β LogLik AIC BIC Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 20
28 Fitted curves Artificial defoliation in cotton phenology vegetative Poisson COM Poisson COM Poisson µ Quasi Poisson flower bud blossom boll boll open Number of bolls produced Artificial defoliation level Figure: Scatterplots of the observed data and curves of fitted values with 95% confidence intervals as functions of the defoliation level for each growth stage. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 21
29 Soil moisture and potassium doses on soybean culture 4.2 Soil moisture and potassium doses on soybean culture Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 21
30 Soybean data Soil moisture and potassium doses on soybean culture Aim: evaluate the effects of potassium doses applied to soil in different soil moisture levels; Design: factorial 5 3 in a randomized complete block design (5 blocks); Experimental unit: a pot with a plant; Factors: Potassium fertilization dose (K): Soil moisture level (umid): Response variable: Total number of bean seeds per pot; Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 22
31 Model specification Soil moisture and potassium doses on soybean culture Linear predictor: based on descriptive analysis, log(µ ijk ) = β 0 + γ i + τ j + β 1 K k + β 2 K 2 k + β 3jK k i varies according the blocks; j varies in the levels of soil moisture; k varies in the levels of potassium fertilization. Alternative models: Poisson (µ ij ); COM-Poisson (λ ij = η(µ ij ), φ) COM-Poisson µ (µ ij, φ) Quasi-Poisson (var(y ij ) = σµ ij ) Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 23
32 Parameter estimates Soil moisture and potassium doses on soybean culture Table: Parameter estimates (Est) and ratio between estimate and standard error (SE). Poisson COM-Poisson COM-Poisson µ Quasi-Poisson Est Est/SE Est Est/SE Est Est/SE Est Est/SE φ, σ β γ γ γ γ τ τ β β β β LogLik AIC BIC Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 24
33 Fitted curves Soil moisture and potassium doses on soybean culture Poisson COM Poisson COM Poisson µ Quasi Poisson moisture : 37.5% moisture : 50% moisture : 62.5% Number of grains per pot Potassium fertilization level Figure: Dispersion diagrams of been seeds counts as function of potassium doses and humidity levels with fitted curves and confidence intervals (95%). Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 25
34 Assessing toxicity of nitrofen in aquatic systems 4.3 Assessing toxicity of nitrofen in aquatic systems Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 25
35 Nitrofen data Assessing toxicity of nitrofen in aquatic systems Aim: measure the reproductive toxicity of the herbicide nitrofen on a species of zooplankton (Ceriodaphnia dubia); Design: completely randomized design, with 10 replicates; Experimental unit: zooplankton animal; Factors: herbicide nitrofen dose (dose); Response variable: Total number of live offspring; Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 26
36 Model specification Assessing toxicity of nitrofen in aquatic systems Linear predictors: Linear: log(µ i ) = β 0 + β 1 dose i, Quadratic: log(µ i ) = β 0 + β 1 dose i + β 2 dose 2 i and Cubic: log(µ i ) = β 0 + β 1 dose i + β 2 dose 2 i + β 3dose 3 i. Alternative models: Poisson (µ ij ); COM-Poisson (λ ij = η(µ ij ), φ) COM-Poisson µ (µ ij, φ) Quasi-Poisson (var(y ij ) = σµ ij ) Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 27
37 Likelihood ratio tests Assessing toxicity of nitrofen in aquatic systems Table: Model fit measures and comparisons between linear predictors. Poisson np l AIC 2(diff l) diff np P(> χ 2 ) Linear Quadratic E 16 Cubic E 02 COM-Poisson np l AIC 2(diff l) diff np P(> χ 2 ) ˆφ Linear Quadratic E Cubic E COM-Poisson µ np l AIC 2(diff l) diff np P(> χ 2 ) ˆφ Linear Quadratic E Cubic E Quasi-Poisson np QDev AIC F diff np P(> F) ˆσ Linear Quadratic E Cubic E Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 28
38 Assessing toxicity of nitrofen in aquatic systems Parameter estimates and fitted values Table: Parameter estimates (Est) and ratio between estimate and standard error (SE). Poisson COM-Poisson COM-Poisson µ Quasi-Poisson Est Est/SE Est Est/SE Est Est/SE Est Est/SE β β β β Number of live offspring Poisson COM Poisson COM Poisson µ Quasi Poisson Nitrofen concentration level Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 29
39 4.4 Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 29
40 Additional results To compare the computational times on the two parametrizations we repeat the fitting 50 times. Time (seconds) 1.5e e e e e+09 Cotton (under) COM Poisson COM Poisson µ Nitrofen (equi) COM Poisson COM Poisson µ 1.0e e e+10 Soybean (over) COM Poisson COM Poisson µ Figure: Computational times to fit the models under original and reparametrized versions based on the fifty repetitions. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 30
41 Final remarks 5 Final remarks Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 30
42 Final remarks Concluding remarks Summary Over/under-dispersion needs caution; COM-Poisson is a suitable choice for these situations; The proposed reparametrization, COM-Poisson µ has some advantages: Simple transformation of the parameter space; Leads to the orthogonality of the parameters (seen empirically); Full parametric approach; Empirical correlation between the estimators was practically null; Faster for fitting; Allows interpretation of the coefficients directly (like GLM-Poisson model). Future work Simulation study to assess model robustness against distribution miss specification; Assess theoretical approximations for Z(λ, ν) (or Z(µ, φ)), in order to avoid the selection of sum s upper bound; Propose a mixed GLM based on the COM-Poisson µ model. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 31
43 Notes Full-text article is available on arxiv All codes (in R) and source files are available on GitHub Acknowledgements National Council for Scientific and Technological Development (CNPq), for their support. Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 32
44 References Bibliography Nelder, J. A. & Wedderburn, R. W. M. (1972), Generalized Linear Models, Journal of the Royal Statistical Society. Series A (General) 135, Sellers, K. F., Borle, S. & Shmueli, G. (2012), The com-poisson model for count data: a survey of methods and applications, Applied Stochastic Models in Business and Industry 28(2), Sellers, K. F. & Shmueli, G. (2010), A flexible regression model for count data, Annals of Applied Statistics 4(2), Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S. & Boatwright, P. (2005), A useful distribution for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution, Journal of the Royal Statistical Society. Series C: Applied Statistics 54(1), Zeviani, W. M., Ribeiro Jr, P. J., Bonat, W. H., Shimakura, S. E. & Muniz, J. A. (2014), The Gamma-count distribution in the analysis of experimental underdispersed data, Journal of Applied Statistics pp Clarice Garcia Borges Demétrio Reparametrization of COM-Poisson Regression Models Slide 33
Extended Poisson Tweedie: Properties and regression models for count data
Extended Poisson Tweedie: Properties and regression models for count data Wagner H. Bonat 1,2, Bent Jørgensen 2,Célestin C. Kokonendji 3, John Hinde 4 and Clarice G. B. Demétrio 5 1 Laboratory of Statistics
More informationExtended Poisson-Tweedie: properties and regression models for count data
Extended Poisson-Tweedie: properties and regression models for count data arxiv:1608.06888v2 [stat.me] 11 Sep 2016 Wagner H. Bonat and Bent Jørgensen and Célestin C. Kokonendji and John Hinde and Clarice
More informationThe Gamma-count distribution in the analysis of. experimental underdispersed data
The Gamma-count distribution in the analysis of experimental underdispersed data arxiv:1312.2423v1 [stat.ap] 9 Dec 2013 Walmes Marques Zeviani 1, Paulo Justiniano Ribeiro Jr 1, Wagner Hugo Bonat 1, Silvia
More informationKey Words: Conway-Maxwell-Poisson (COM-Poisson) regression; mixture model; apparent dispersion; over-dispersion; under-dispersion
DATA DISPERSION: NOW YOU SEE IT... NOW YOU DON T Kimberly F. Sellers Department of Mathematics and Statistics Georgetown University Washington, DC 20057 kfs7@georgetown.edu Galit Shmueli Indian School
More informationCharacterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model
Characterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model Royce A. Francis 1,2, Srinivas Reddy Geedipally 3, Seth D. Guikema 2, Soma Sekhar Dhavala 5, Dominique Lord 4, Sarah
More informationApproximating the Conway-Maxwell-Poisson normalizing constant
Filomat 30:4 016, 953 960 DOI 10.98/FIL1604953S Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Approximating the Conway-Maxwell-Poisson
More informationPoisson regression 1/15
Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationMultivariate Regression Models in R: The mcglm package
Multivariate Regression Models in R: The mcglm package Prof. Wagner Hugo Bonat R Day Laboratório de Estatística e Geoinformação - LEG Universidade Federal do Paraná - UFPR 15 de maio de 2018 Introduction
More informationThe LmB Conferences on Multivariate Count Analysis
The LmB Conferences on Multivariate Count Analysis Title: On Poisson-exponential-Tweedie regression models for ultra-overdispersed count data Rahma ABID, C.C. Kokonendji & A. Masmoudi Email Address: rahma.abid.ch@gmail.com
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationFlexiblity of Using Com-Poisson Regression Model for Count Data
STATISTICS, OPTIMIZATION AND INFORMATION COMPUTING Stat., Optim. Inf. Comput., Vol. 6, June 2018, pp 278 285. Published online in International Academic Press (www.iapress.org) Flexiblity of Using Com-Poisson
More informationGeneralized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58
Generalized Linear Mixed-Effects Models Copyright c 2015 Dan Nettleton (Iowa State University) Statistics 510 1 / 58 Reconsideration of the Plant Fungus Example Consider again the experiment designed to
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationarxiv: v1 [stat.ap] 9 Nov 2010
The Annals of Applied Statistics 2010, Vol. 4, No. 2, 943 961 DOI: 10.1214/09-AOAS306 c Institute of Mathematical Statistics, 2010 A FLEXIBLE REGRESSION MODEL FOR COUNT DATA arxiv:1011.2077v1 [stat.ap]
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationGeneralized Linear Models and Extensions
Session 1 1 Generalized Linear Models and Extensions Clarice Garcia Borges Demétrio ESALQ/USP Piracicaba, SP, Brasil March 2013 email: Clarice.demetrio@usp.br Session 1 2 Course Outline Session 1 - Generalized
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationif n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1
Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z
More information1/15. Over or under dispersion Problem
1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let
More informationPoisson Regression. Ryan Godwin. ECON University of Manitoba
Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationThe Conway Maxwell Poisson Model for Analyzing Crash Data
The Conway Maxwell Poisson Model for Analyzing Crash Data (Discussion paper associated with The COM Poisson Model for Count Data: A Survey of Methods and Applications by Sellers, K., Borle, S., and Shmueli,
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationBayesian density regression for count data
Bayesian density regression for count data Charalampos Chanialidis 1, Ludger Evers 2, and Tereza Neocleous 3 arxiv:1406.1882v1 [stat.me] 7 Jun 2014 1 University of Glasgow, c.chanialidis1@research.gla.ac.uk
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationUsing Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results
Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results Jing Zhang Miami University August 12, 2014 Jing Zhang (Miami University) Using Historical
More informationQuasi-likelihood Scan Statistics for Detection of
for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationMSH3 Generalized linear model Ch. 6 Count data models
Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear
More informationModel Selection for Semiparametric Bayesian Models with Application to Overdispersion
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and
More informationA strategy for modelling count data which may have extra zeros
A strategy for modelling count data which may have extra zeros Alan Welsh Centre for Mathematics and its Applications Australian National University The Data Response is the number of Leadbeater s possum
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationA brief introduction to mixed models
A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.
More informationLattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)
Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial
More informationTRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models
TRB Paper #11-2877 Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Instute
More informationGeneralized Linear Models and Extensions
Session 1 1 Generalized Linear Models and Extensions Clarice Garcia Borges Demétrio ESALQ/USP Piracicaba, SP, Brasil email: Clarice.demetrio@usp.br March 2013 International Year of Statistics http://www.statistics2013.org/
More informationChapter 4: Generalized Linear Models-II
: Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationA Useful Distribution for Fitting Discrete Data: Revival of the COM-Poisson
A Useful Distribution for Fitting Discrete Data: Revival of the COM-Poisson Galit Shmueli Department of Decision and Information Technologies, Robert H. Smith School of Business, University of Maryland,
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationGeneralized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data
Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationAnalysis of Extra Zero Counts using Zero-inflated Poisson Models
Analysis of Extra Zero Counts using Zero-inflated Poisson Models Saranya Numna A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Mathematics and Statistics
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationMixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationGeneralized Linear Models I
Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationLikelihood Inference in the Presence of Nuisance Parameters
PHYSTAT2003, SLAC, September 8-11, 2003 1 Likelihood Inference in the Presence of Nuance Parameters N. Reid, D.A.S. Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationLikelihood Inference in the Presence of Nuisance Parameters
Likelihood Inference in the Presence of Nuance Parameters N Reid, DAS Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe some recent approaches to likelihood based
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed
More informationApproximate Median Regression via the Box-Cox Transformation
Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The
More informationDELTA METHOD and RESERVING
XXXVI th ASTIN COLLOQUIUM Zurich, 4 6 September 2005 DELTA METHOD and RESERVING C.PARTRAT, Lyon 1 university (ISFA) N.PEY, AXA Canada J.SCHILLING, GIE AXA Introduction Presentation of methods based on
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationThe combined model: A tool for simulating correlated counts with overdispersion. George KALEMA Samuel IDDI Geert MOLENBERGHS
The combined model: A tool for simulating correlated counts with overdispersion George KALEMA Samuel IDDI Geert MOLENBERGHS Interuniversity Institute for Biostatistics and statistical Bioinformatics George
More informationGeneralized Estimating Equations
Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson
More informationEmpirical Bayes Analysis for a Hierarchical Negative Binomial Generalized Linear Model
International Journal of Contemporary Mathematical Sciences Vol. 9, 2014, no. 12, 591-612 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.413 Empirical Bayes Analysis for a Hierarchical
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationOutline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form
Outline Statistical inference for linear mixed models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark general form of linear mixed models examples of analyses using linear mixed
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationModeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses
Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationAnalytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI
Analytics Software Beyond deterministic chain ladder reserves Neil Covington Director of Solutions Management GI Objectives 2 Contents 01 Background 02 Formulaic Stochastic Reserving Methods 03 Bootstrapping
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationWhat is a meta-analysis? How is a meta-analysis conducted? Model Selection Approaches to Inference. Meta-analysis. Combining Data
Combining Data IB/NRES 509 Statistical Modeling What is a? A quantitative synthesis of previous research Studies as individual observations, weighted by n, σ 2, quality, etc. Can combine heterogeneous
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationGeneralized linear mixed models (GLMMs) for dependent compound risk models
Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut 52nd Actuarial Research Conference Georgia
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationLecture 8. Poisson models for counts
Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationExtending the Linear Model
Extending the Linear Model G.Janacek School of Computing Sciences, UEA Part I generalized linear models 1 Contents I generalized linear models 1 0.1 Introduction................................... 4 0.1.1
More informationOn the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory
On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory Danaïl Davidov under the supervision of Jean-Philippe Boucher Département de mathématiques Université
More informationCount and Duration Models
Count and Duration Models Stephen Pettigrew April 2, 2015 Stephen Pettigrew Count and Duration Models April 2, 2015 1 / 1 Outline Stephen Pettigrew Count and Duration Models April 2, 2015 2 / 1 Logistics
More information