Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Similar documents
MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Outline of GLMs. Definitions

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

Bias-corrected AIC for selecting variables in Poisson regression models

Introduction to General and Generalized Linear Models

Generalized Linear Models

Generalized Linear Models (GLZ)

Generalized Linear Models. Kurt Hornik

Model comparison and selection

Generalized Estimating Equations

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

Bootstrap prediction and Bayesian prediction under misspecified models

LOGISTIC REGRESSION Joseph M. Hilbe

Generalized Linear Models and Extensions

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

Chapter 4: Generalized Linear Models-II

Stat 5101 Lecture Notes

Akaike Information Criterion

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood.

Approximate Likelihoods

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Semiparametric Generalized Linear Models

Capture-Recapture Analyses of the Frog Leiopelma pakeka on Motuara Island

STAT5044: Regression and Anova

Subject CS1 Actuarial Statistics 1 Core Principles

Theory and Methods of Statistical Inference

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

Stat 710: Mathematical Statistics Lecture 12

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation. Citation Electronic Journal of Statistics, 2013, v. 7, p.

Open Problems in Mixed Models

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Introduction to Generalized Linear Models

Estimation of the Predictive Ability of Ecological Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Generalized Linear Models Introduction

Hypothesis Testing in Smoothing Spline Models

Classification. Chapter Introduction. 6.2 The Bayes classifier

SUPPLEMENTARY SIMULATIONS & FIGURES

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Econometrics I, Estimation

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018

Generalized, Linear, and Mixed Models

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta

arxiv: v2 [stat.me] 8 Jun 2016

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Package dispmod. March 17, 2018

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Chapter 1 Statistical Inference

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

GENERALIZED LINEAR MODELS Joseph M. Hilbe

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

An Introduction to Multivariate Statistical Analysis

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

Quasi-likelihood Scan Statistics for Detection of

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Generalized linear mixed models for biologists

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

Statistics 203: Introduction to Regression and Analysis of Variance Course review

A strategy for modelling count data which may have extra zeros

UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY

A Variant of AIC Using Bayesian Marginal Likelihood

A PARADOXICAL EFFECT OF NUISANCE PARAMETERS ON EFFICIENCY OF ESTIMATORS

Bivariate Weibull-power series class of distributions

How the mean changes depends on the other variable. Plots can show what s happening...

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models

Analysis Methods for Supersaturated Design: Some Comparisons

Package dispmod. R topics documented: February 15, Version 1.1. Date Title Dispersion models

DYNAMIC ECONOMETRIC MODELS Vol. 9 Nicolaus Copernicus University Toruń Mariola Piłatowska Nicolaus Copernicus University in Toruń

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Introduction to capture-markrecapture

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Lecture 8. Poisson models for counts

Weighted tests of homogeneity for testing the number of components in a mixture

Introduction to Estimation Methods for Time Series models Lecture 2

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

Prediction of Bike Rental using Model Reuse Strategy

On Autoregressive Order Selection Criteria

Using Estimating Equations for Spatially Correlated A

LTI Systems, Additive Noise, and Order Estimation

Modelling geoadditive survival data

Transcription:

Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and Yiping Tang Department of Mathematics and Informatics, Chiba University 1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan E-mail: wang@math.s.chiba-u.ac.jp Abstract In analyzing complicated data, we are often unwilling or not confident to impose a parametric model for the data-generating structure. One important example is data analysis for proportional or count data with overdispersion. The obvious advantage of assuming full parametric models is that one can resort to likelihood analyses, for instance, to use AIC or BIC to choose the most appropriate regression models. For overdispersed proportional data, possible parametric models include the Beta-binomial models, the double exponential models, etc. In this paper, we extend the generalized linear models by replacing the full parametric models with a finite number of moment restrictions on both the data and the structural parameters. For such semiparametric Bayesian models, we propose a method for selecting the best possible regression model in the semiparametric model class. We will apply the proposed model selection technique to overdispersed data. We will demonstrate the use of the proposed semiparametric information criterion using the well-known data on germination of Orobanche. Key Words: Bayes linear methods, generalized linear models, Kullback-Leibler information, overdispersion, quasi-likelihood 1 Introduction Let y i (i = 1,..., n) be one-dimensional independent response variables having mean µ i and variance φ v(µ i ), where v( ) is a known positive function of the mean µ i and φ is an unknown constant dispersion parameter. Although the y i s may be either discrete or continuous, in this paper we shall mainly be interested in responses which are either proportions or counts. As in the usual generalized linear models (GLM; McCullagh and Nelder, 1989), we assume in addition that g(µ i ) = η i, where η i = x i β is a linear predictor, with x i = (x 1,..., x p ) being a p-vector of covariates, β = (β 1,..., β p ) a p-vector of unknown repression parameters, and g( ) a known (monotonic) link function. In many situations, it is difficult to impose a specific distributional assumption on y i as in the classical GLM. In such cases we may use the quasi-likelihood method (Wedderburn, 1974) to estimate the regression parameters β. Under weak regularity conditions, it can be shown that the quasi-likelihood estimator β is consistent and asymptotically normally distributed (Moore, 1986). 1

Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3864 However, works on model selection based on quasi-likelihoods are rather limited. There are mainly two approaches for existing works in this area. The first approach is to restrict the choice of the variance function v( ) so that one can integrate the quasi-score function to obtain the scalar quasi-likelihood function and formally use AIC with the likelihood function being replaced by the quasi-likelihood function (e.g., Lebreton, et al. (1992), Anderson, Burnham and White (1994), Qian, Gabor and Gupta (1996), Pan (2001)). The second approach makes no restriction on the choice of the variance functions, instead the quasi-score vector is projected on a subspace of estimating functions so that the projection becomes integrable (e.g., McLeish and Small (1992), Li (1993), Hanfelt and Liang (1995)), consequently one may use the first approach to formally construct the information criterion (e.g., Lin (2011)). In this paper we shall propose a different approach by projecting the true distribution function onto the subspace of probability distribution functions satisfying the first and the second moment assumptions. This projection will be used as the semiparametric predictive distribution for the underlying statistical model. Using the obtained semiparametric predictive distribution, we then formally extend the idea underlying the derivation of the information criterion AIC to construct a new semiparametric information criterion SIC. We shall demonstrate the use of the proposed semiparametric information criterion SIC using the well-known data on germination of Orobanche. 2 Semiparametric Model Selection Criteria Akaike (1973) proposed a model selection criterion called AIC which has been widely used in many areas of applications. Derivation of AIC is based upon minimizing the Kullback-Leibler information between the predictive distribution and the true distribution generating the data. In this section we will extend this ideal to the semiparametric setting with the full distributional assumption replaced by the assumptions on the mean and variance of the response variables. Let H(y) denote the true but unknown distribution function with density function h(y). Let F (y) be any distribution function having mean µ and variance φ v(µ), the corresponding density (probability) function being denoted by f( ). Let g(µ) = x β, where x is a p-vector of covariates and β a p-vector of regression parameters. Let F denote the space of all density functions satisfying these two assumptions on the mean and variance. We assume that there exists a unique density function f which has minimum Kullback-Leibler information with the true density function h compared with any density function f( ) belonging to F. Given the covariate vector x, the Kullback-Leibler information between h and f depends only on β and φ. Let w(β, φ) = ( ) h(y) log h(y) dy = inf f (y) f F log ( ) h(y) h(y) dy (2.1) f(y) We look for the value θ = (β, φ ), which minimizes w(β, φ) of (2.1). Let t = (t 1, t 2 ), and define l(y, θ, t) = log(1 + t m(y, θ)) where m(y, θ) is a two-dimensional vector defined by m(y, θ) = (µ(β), (y µ(β)) 2 φ v(µ(β))). 2

Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3865 Using the results in convex analysis (e.g., Kitamura, 2006), it can be shown that the minimizer θ of (2.1) is given by { } θ = arg min θ Θ sup E H [l(y, θ, t)] t T (θ) where Θ is the parameter space for θ, and T (θ) a two-dimensional region defined by T (θ) = {t : t m(y, θ) > 1}. The value θ is not computable since it involves the expectation with respect to the true unknown density function h( ). One natural way to approximate θ is by using the value θ which minimizes the empirical version of the expected semiparametric Kullback-Leibler information: θ = arg min θ Θ { 1 n where t(θ) is a function of θ and is defined by t(θ) = arg sup t R 2 (2.2) } n l (y i, θ, t (θ)), (2.3) i=1 l(y, θ, t)h(y) dy (2.4) Since the right-hand side of (2.4) again depends on the unknown true density function h( ), we shall approximate the value of t(θ), given θ, using the plug-in principle: { } 1 n t(θ) = arg sup l(y i, θ, t). (2.5) t R 2 n Now we state the main results of the paper. PROPOSITION 2.1. Let k = dim(θ) be the dimension of θ. Define SIC = n i=1 i=1 ( l y i, θ, t( θ) ) + k. (2.6) where θ = ( β, ˆφ), with β being the quasi-likelihood estimator and ˆφ the estimator from the usual Pearson residual. Under suitable regularity conditions, SIC is an asymptotically unbiased estimator of ] [ E H [SKL( θ) = E H ( l y, θ, t( θ) ) ] h(y) dy We shall call SIC of (2.6) the semiparametric information criterion in the rest of the paper. Sometimes we may also have information available for the parameter θ a priori before observing the data. By imposing the restrictions on the first two moments on θ, Godambe (1999) advocated the use of optimum Bayesian estimating functions. More detailed accounts on optimum Bayesian estimating functions can be found in Small and Wang (2003). If we let m(y, θ) to denote the optimum Bayesian estimating functions, we can then extend the theories obtained so far to construct a semiparametric Bayesian information criterion. We will report these results on the conference. 3 (2.7)

Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3866 3 Data Analysis Now we illustrate the use of SIC using a well-known data set Orobanche on germination of the seed variates, which was considered in Crowder (1978, Table 3). In this experiment, two types of seeds from Orobanche (O.aegyptiaca75 and O.aegyptiaca73) were tested for germination when they were given with two different root extracts (Bean and cucumber). The goal of the experiment was to determine the effect of the root extract on inhibiting the growth of the parasitic plant. There is suspicion of overdispersion for this data set and a number of authors have used this data set to study methodologies extending the classical GLM to take into consideration the problem of overdispersion. For instance, this data was analyzed by Breslow and Clayton (1993) to illustrate the use of the generalized linear mixed models for overdispersion. In these studies, however, the authors have not considered the aspects of model selection. Now we shall reanalyze it by applying the model selection technique proposed in the previous section based on the quasi-likelihood. Now let y i be the proportions of germinated seeds, and n i the seeds in the ith data set (i = 1,..., 21). Suppose that E[y i ] = π i and var[y i ] = φv(π i ) = φπ i (1 π i )/n i, where φ is the overdispersion parameter. Further, we shall suppose that logit(π i ) = log(π i /(1 π i )) = x iβ, where x i = (1, x 1i, x 2i, x 1i x 2i ), where x 1i = 1 if the root extract is cucumber, otherwise 0 if the root extract is Bean; and x 2i = 1 if the type of seeds is O.aegyptiaca75, otherwise 0 if the type of seeds is O.aegyptiaca73. We shall compare the folowing five candidate models: (1) logit(π i ) = β 0 (2) logit(π i ) = β 0 + β 1 x 1i (3) logit(π i ) = β 0 + β 2 x 2i (4) logit(π i ) = β 0 + β 1 x 1i + β 2 x 2i (5) logit(π i ) = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 1i x 2i Table 1: Model selection criteria for the Orobanche data Models (1) (2) (3) (4) (5) SIC 3.821 2.732 5.971 3.881 5.521 SIC T 0.551 1.260 0.305 1.860 3.380 AIC BB 133.0 120.4 133.1 119.7 117.5 We computed the values of the various information criteria and the results are summarized in Table 1. The proposed SIC has the smallest value for model (2) among all candidate models. SIC 4

Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3867 ranks the models in the following way: (2) > (1) > (4) > (5) > (3). This result is consistent with the results obtained in Crawley (2005, p.255-260). Crawley (2005) used quasi-likelihood method to analyze the Orobanche data. He applied the F-test to compare the full model with other simpler models. He found that there is no compelling evidence that the types of seeds and the interaction should be kept in the model, and the minimal adequate model is model (2). In Table 1, we have also shown the values of AIC BB, the values of AIC based on the beta-binomial model. AIC BB has minimum value for model (5), the most complicated model considered here. In Table 1 the values SIC T are also shown; SIC T is a modified version of SIC which is not discussed here. SIC T is a corresponding version of the Takeuchi type information criteria in full likelihood analysis. References [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, B. N., Petrov and F., Csaki (eds), 267 281, Budapest: Akademiai Kiado. [2] Anderson, D. R., Burnham, K. P. and White, G. C. (1994). AIC model selection in overdispersed capture-recapture data. Ecology, 75, 1780 1793. [3] Breslow, N E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J Am. Statist. Ass., 88, 9 25. [4] Crawley, M. J. (2005). Statistics: an introduction using R. John Wiley & Sons, Ltd. [5] Crowder, M. J. (1978). Beta-binomial anova for proportions. Appl. Statist., 27, 34 37. [6] Godambe, V. P. (1999). Linear Bayes and optimal estimation. Annals of the Institute of Statistical Mathematics, 51, 201 215. [7] Hanfelt, J. J. and Liang, K.-Y. (1995). Approximate likelihood ratios for general estimating functions. Biometrika, 82, 461 477. [8] Lebreton, J. D., K. P. Burnham, J. Clobert., and Anderson, D. R. (1992). Modeling survival and testing biological hypothesis using marked animals: a unified approach with case studies. Ecological Monographs, 62, 67 118. [9] Li, B. (1993). A deviance function for the quasi-likelihood method. Biometrika, 80, 741 753. [10] Lin, P.-S. (2011). Quasi-deviance functions for spatially correlated data. Statistica Sinica, 21, 1785 1806. [11] McCullagh, P. and Nelder, J. A. (1989). Generalized linear models. Chapman and Hall: London. [12] McLeish, D. L. and Small, C. G. ( 1992). A projected likelihood function for semiparametric models. Biometrika, 79, 93 102. 5

Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3868 [13] Moore, D. F. (1986). Asymptotic properties of moment estimators for overdispersed counts and proportions. Biometrika, 73, 283 288. [14] Pan, W. (2001). Akaike s information criterion in generalized estimating equations. Biometrics, 57, 120 5. [15] Qian, G., Gabor, G. and Gupta, R. P. (1996). Generalized linear model selection by the predictive least quasi-deviance criterion. Biometrika, 83, 41 54. [16] Small, C. G. and Wang, J. (2003). Numerical Methods for Nonlinear Estimating Equations. Clarendon Press: Oxford. [17] Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika, 61, 439 447. 6