# arxiv: v2 [stat.me] 8 Jun 2016

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Orthogonality of the Mean and Error Distribution in Generalized Linear Models 1 BY ALAN HUANG 2 and PAUL J. RATHOUZ 3 University of Technology Sydney and University of Wisconsin Madison 4th August, 2013 arxiv: v2 [stat.me] 8 Jun 2016 Abstract We show that the mean-model parameter is always orthogonal to the error distribution in generalized linear models. Thus, the maximum likelihood estimator of the meanmodel parameter will be asymptotically efficient regardless of whether the error distribution is known completely, known up to a finite vector of parameters, or left completely unspecified, in which case the likelihood is taken to be an appropriate semiparametric likelihood. Moreover, the maximum likelihood estimator of the mean-model parameter will be asymptotically independent of the maximum likelihood estimator of the error distribution. This generalizes some well-known results for the special cases of normal, gamma and multinomial regression models, and, perhaps more interestingly, suggests that asymptotically efficient estimation and inferences can always be obtained if the error distribution is nonparametrically estimated along with the mean. In contrast, estimation and inferences using misspecified error distributions or variance functions are generally not efficient. Keywords: regression model; nuisance tangent space; semiparametric model. 1 Introduction It is well-known that in the normal linear regression model, Y i X i Xi T i.i.d. β +ǫ i, ǫ i N(0,σ 2 ), the mean-model parameter β is orthogonal to the error variance σ 2 (e.g. Cox & Reid, 1987, Section 3.3). There are two important implications of this. First, the maximum likelihood estimator (MLE) ofβ is asymptotically efficient regardless of whetherσ 2 is known or estimated simultaneously from the data. Second, the MLE ofβ is independent of the MLE of σ 2, which is central to deriving the usual t-tests for inferences on β. (Note that orthogonal parameters are only asymptotically independent in general; finite-sample independence is special to the normal distribution.) When interest lies primarily in the mean-model, the error variance is often deemed a nuisance parameter. Similar orthogonality results hold for other generalized linear models (GLMs). Specific examples include the gamma regression model, in which the mean-model parameter is orthogonal to a nuisance shape parameter (Cox & Reid, 1987, Section 3.2), and multinomial models for polytomous data, in which the mean-model parameter is orthogonal to a nuisance vector of baseline probability masses (Rathouz & Gao, 2009). Note that the orthogonality property holds for any link function. 1 Accepted for publication in Communications in Statistics - Theory and Methods, on 20 Sep DOI: / Corresponding Author. Address: School of Mathematics and Physics, University of Queensland, AUS- TRALIA, Address: K6/446a, Clinical Science Center, 600 Highland Ave, Madison, WI, USA, Phone:

2 In each of the above settings, the error distribution is characterized by a finite vector of nuisance parameters and orthogonality is established by showing that the Fisher information matrix is block-diagonal, with the blocks corresponding to the vector of mean-model parameters and the vector of nuisance parameters. It is usually straightforward to perform these calculations on a case-by-case basis, working with the specific family of distributions under consideration, but a general result for parametric GLMs can be found in Jørgensen & Knudsen (2004). When we move away from specific parametric families to consider the class of all GLMs, we find that a general orthogonality property, although expected, may not be so easy to establish. This is because such a class constitutes a semiparametric model, and it is no longer feasible to compute and examine the Fisher information matrix in the presence of an infinitedimensional parameter. In this note, we show that the mean-model parameter is always orthogonal to the error distribution in GLMs, even when the error distribution is treated as an infinite-dimensional parameter, belonging to the space of all distributions having a Laplace transform in some neighborhood of zero. This class includes, as special cases, the classical normal, Poisson, gamma and multinomial distributions, as well as many interesting and non-standard distributions, such as the generalized Poisson distribution of Wang & Famoye (1997) for overdispersed counts and the class of all exponential dispersion models with constant dispersion (Jørgensen, 1987). We note in Section 2 that this class of distributions is as large as possible for GLMs, so the result here is indeed the most general possible. That a general orthogonality property should hold is alluded to in Jørgensen & Knudsen (2004, Section 6.2). In that paper, the notion of orthogonality between a finite-dimensional and infinite-dimensional parameter being considered is that orthogonality holds, in the usual Fisher information matrix sense, for every finite-dimensional submodel. In this note, we use a slightly stronger notion of orthogonality, namely, that the score function for the finite-dimensional parameter is orthogonal to the nuisance tangent space of the infinitedimensional parameter. Recalling that the nuisance tangent space is the closure of all finitedimensional submodel tangent spaces, we see that the notion of orthogonality here implies the notion considered in Jørgensen & Knudsen (2004). Our proof proceeds along the following lines. We first use an exponential tilt representation of GLMs, introduced in Rathouz & Gao (2009) and expanded upon in Huang (2013), to index any GLM by just two parameters, namely, a finite-dimensional mean-model parameter β and an infinite-dimensional error distribution parameter F. The orthogonality of the two parameters is then characterized by the orthogonality of the score function for β to the nuisance tangent space for F. As it turns out, the nuisance tangent space for F is rather difficult to work with directly, but by embedding the model into a larger class of models, we find that the required calculations become particularly simple. The exponential tilt representation is derived in Section 2 and the general orthogonality property is proven in Section 3. A connection with mean-variance models is outlined in Section 4. A brief discussion of the theoretical and practical implications of the our findings is given in Section 5, which concludes the note. 2 An exponential tilt representation of generalized linear models Recall that a GLM (McCullagh & Nelder, 1989) for independent data pairs (X 1,Y 1 ),...,(X n,y n ) is defined by two components. First, there is a conditional mean model for the responses, E(Y i X i ) = µ(x T i β), (1) 2

3 whereµis a user-specified inverse-link function andβ R q is a vector of unknown regression parameters. Second, the conditional distributions F i of each response Y i given covariate X i are assumed to come from some exponential family. Assuming the distributions F i have densitiesdf i with respect to some dominating measure, the second component can be written in the exponential tilt form df i (y) = exp{b(x i ;β,f)+θ(x i ;β,f)y}df(y) (2) for some reference distributionf, where b(x i ;β,f) = log exp{θ(x i ;β,f)y}df(y) (3) is a normalizing function and, in order to satisfy (1), the tilt θ(x i ;β,f) is implicitly defined as the solution to the mean constraint µ(x T i β) = yexp{b(x i ;β,f)+θ(x i ;β,f)y}df(y). (4) It is easy to see that the exponential tilt representation (1) (4) encompasses all classical GLMs. For example, normal, Poisson and gamma regression models can be recovered by choosing df to be a Gaussian, Poisson or gamma kernel, respectively. The main advantage of this representation is that it naturally allows for the reference distribution F to be considered as an infinite-dimensional nuisance parameter, along with the finite-dimensional parameter β, in the model. It is this novel representation that allows us to conveniently characterize any GLM using just the two parametersβ and F. As with any GLM, the reference distributionf is required to have a Laplace transform in some neighborhood of the origin so that the cumulant generating function (3) is well-defined. Thus, the parameter space forf is the class of all distributions that have a Laplace transform in some neighborhood of the origin. Note that this class of distribution functions is as large as it can be for GLMs, because any distribution outside this class cannot be used to generate a valid model. The exponential tilt representation (1) (4) was first introduced in Rathouz & Gao (2009). In that paper, the representation is used to derive a useful alternative parametrization of the multinomial regression model for polytomous responses. The representation is also used in Huang (2013) to motivate a semiparametric extension of GLMs for arbitrary responses. 3 The orthogonality of parameters In parametric models, orthogonality of parameters can be characterized by the Fisher information matrix being block-diagonal. In semiparametric models however, orthogonality between a finite-dimensional parameter β and an infinite-dimensional parameter F cannot be characterized in this way. Rather, it is characterized through the score function for β being orthogonal to the nuisance tangent space for F. Intuitively speaking, the projection of the score function for β on to the nuisance tangent space is a measure of the loss of information about β due to the presence of the nuisance parameterf this is zero if and only if the score function is orthogonal to the nuisance tangent space. Note that this general notion of orthogonality reduces to the Fisher information matrix criterion when the nuisance parameter is finite-dimensional. Now, the loglikelihood function corresponding to model (1) (4) is l(β, F X, Y) = logdf(y)+b(x;β,f) +θ(x;β,f)y. Thus, the score function for β is given by S β,f (X,Y) := β l(β,f) = β b(x;β,f)+ β θ(x;β,f)y. 3

4 Implicit differentiation of the defining equations forband θ leads to the identities β b(x;β,f) = µ(xt β) θ(x;β,f) and β β θ(x;β,f) = µ (X T β) X, where = E β,f [(Y µ(x T β)) 2 X] is the conditional variance of Y given X under parameter value (β,f). The score function for β therefore reduces to S β,f (X,Y) = X µ (X T β) ( Y µ(x T β) ), (5) which is of the same weighted least-squares form as for a parametric GLM. The difference here is that the variance function is not known becausef is not specified. The orthogonality between β and F now reduces to the score function (5) being orthogonal to the nuisance tangent space for F. Although it is not hard to derive a score function for F (e.g. Huang, 2013, Section 3.3), it turns out to be rather difficult to compute the nuisance tangent space explicitly. This is also noted in Jørgensen & Knudsen (2004, Section 6.2). We can work around this, however, by embedding model (1) (4) into a more general class of semiparametric restricted moment models (e.g. Tsiatis, 2006, Section 4.5) for which the required calculations are much easier. This class is given by Y = µ(x,β)+ǫ, (6) where the conditional distribution ofǫgiven X is specified only up to the moment condition E(ǫ X) = 0. It is clear that the semiparametric extension (1) (4) is a subclass of the restricted moment model, with µ(x,β) = µ(x T β) and E(ǫ X) = E(Y µ(x T β) X) = 0 by construction. The nuisance tangent space for F in the semiparametric model (1) (4) must therefore be a subspace of the nuisance tangent space for the restricted moment model. Elementary calculations (see Tsiatis, 2006, pp.81 83) show that the nuisance tangent space for the restricted moment model is given by Λ = {all q 1 functions a(x,y)such that E β,f [(Y µ(x,β))a(x,y ) X] = 0} and the projection operatorπ β,f onto this nuisance tangent space is given by Π β,f s = s E β,f [(Y µ(x,β))s X] (Y µ(x,β)). Applying this operator to the score function (5), withµ(x,β) = µ(x T β), gives Π β,f S β,f = X µ (X T β) ( Y µ(x T β) ) [ X µ (X T β) E β,f = 0 for all (β,f), (Y µ(x T β)) 2 ] X (Y µ(x T β)) because = E β,f [(Y µ(x T β)) 2 X] by definition. Thus, the score function (5) is orthogonal to the nuisance tangent space in the restricted moment model (6) and therefore necessarily orthogonal to the nuisance tangent space in the semiparametric model (1) (4) also. We summarize as follows: Proposition 1 (Orthogonality) The mean-model parameter β and the error distribution F in any generalized linear model are orthogonal. 4

5 Note that the nuisance tangent space for any parametric model (that is, a model in which F is characterized by a finite number of parameters) is necessarily a subspace of the semiparametric nuisance tangent space. We therefore have the following corollary: Corollary 1 (Orthogonality in parametric models) If the error distribution is characterized by a finite vector of nuisance parameters φ, then the mean-model parameter β is orthogonal to φ. 4 A connection with quasilikelihood models A popular extension of GLMs is the class of quasilikelihood (QL) models, also known meanvariance models (e.g. Wedderburn, 1974). These models make the assumption that E(Y X) = µ(x,β) for some mean function µ and Var(Y X) = v(µ) for some positive variance function v. Such models can be characterized by their quasi-score functions forβ, µ β = Y µ v(µ). (7) In classical QL literature, the functional forms of both µ and v are usually specified, although there is growing literature on adaptive estimation in which the variance function is left unspecified and estimated nonparametrically from data (e.g. Dewanji & Zhao, 2002). By comparing score equations (5) and (7), note that GLMs form a subset of QL models. For models characterized by (7), Jørgensen & Knudsen (2004) showed that the meanmodel parameter β is orthogonal to the variance function v whenever the latter can be characterized by a finite number of parameters. A general orthogonality result for arbitrary, infinitedimensional v remains elusive, however, perhaps because of the fact that not all QL score functions (7) correspond to actual probability models. Indeed, such correspondences are atypical. Nevertheless, there is an interesting connection between QL models with unspecified variance functions and GLMs with unspecified error distributions in a certain asymptotic sense made more precise below. The connection is based on a rather remarkable, but relatively obscure, result from Hiejima (1997), who showed that GLMs can be considered dense in the class of QL models in the following asymptotic sense: for any mean-variance relationship, there exists an exponential family of distributions (i.e. a GLM with some error distribution F ) whose score equations for β admit roots that are arbitrarily close to the roots of the corresponding QL score equation, as the sample size increases. Thus, for large enough sample sizes, any adaptive QL model with unspecified variance function v can be approximated arbitrarily well by a GLM with unspecified error distribution F, with the latter possessing the orthogonality property 1. We conjecture that this connection may be the best possible for adaptive QL models, mainly because of the aforementioned fact that QL score functions typically do not correspond to actual probability models. 5 Practical implications The orthogonality property 1 naturally suggests the idea of estimating the error distribution nonparametrically and simultaneously with the mean-model, leading to a kind of adaptive GLM. Indeed, if the joint estimation procedure is based on maximum semiparametric likelihood, then the estimator for β is guaranteed to be asymptotically efficient and asymptotically independent of the estimated error distribution. In other words, both estimation and inferences on β are asymptotically unaffected by having to also estimate the error distribution. In contrast, estimation and inferences in GLMs with misspecified error distributions, or QL models with misspecified variance functions, are generally not efficient. 5

6 Table 1: Relative root mean-square errors of a semiparametric MLE (ˆβ SP ) and the usual MLE (ˆβ MLE ) in three simulation settings, based on 5000 replications each. Estimator Data n Parameter ˆβSP ˆβMLE Exponential 33 Intercept Group effect Common slope Intercept Group effect Common slope Poisson 44 Intercept Coefficient of X Coefficient of X Coefficient of X The idea of jointly estimating the mean and error distribution in GLMs is considered in more detail in Huang (2013). In that paper, it is demonstrated that inferences on β based on profiling out the error distribution F in the likelihood can be more accurate than inferences based on QL methods. Here, we focus our attention on the accuracy of the point estimates of β. The results here complement those found in Huang (2013, Section 6). In Table 1, we compare the relative root mean-square error of a semiparametric MLE ofβ (with F unknown) to that of the usual MLE (with F set to the true distribution) from three sets of simulations. Recall that the relative root mean-square error of an estimator ˆβ is defined as the root mean-square error of ˆβ divided by the absolute value β of the parameter. The simulation settings are based on a leukemia survival dataset from Davison & Hinkley (1997) and a mine injury dataset from Myers et al. (2010), and are described in more detail in Sections 6.1 and 6.2 of Huang (2013). The particular semiparametric estimation approach we use for estimating β is based on empirical likelihood and is described in more detail in Section 4 of Huang (2013). We see from Table 1 that the relative root mean-square errors of the two estimators are essentially the same, even for moderately small sample sizes. This supports the claim that maximum likelihood estimation of β is asymptotically efficient regardless of whether F is known or completely unknown and estimated nonparametrically from data. 6 Conclusion In this note, we have shown that orthogonality between the mean-model parameter and the error distribution holds for any GLM, parametric or nonparametric. This confirms, in greatest generality, what is well-known for the special cases of normal, gamma and multinomial regression. The result also has implications for applied statistical work, with our numerical results suggesting that little is lost by treating the error distribution nonparametrically, even in moderately sized problems. (It can also be said that little is gained by knowing the error distribution completely!) Nonparametric estimation of the error distribution can therefore safeguard against biases due to parametric model misspecification, without sacrificing much in terms of efficiency. Acknowledgments The authors thank the Associate Editor and an anonymous referee for suggestions that improved the paper. Paul. J. Rathouz was funded by NIH grant R01 HL

7 References Cox, D. R. and Reid, N. (1987). Parameter orthogonality and approximate conditional inference, J. Roy. Stat. Soc. Ser. B, 49, Davison, A. C. and Hinkley, D. V. (1997). Bootstrap methods and their applications, Cambridge University Press, Cambridge, UK. Dewanji, A. and Zhao, L. P. (2002). An optimal estimating equation with unspecified variances Sankhyā, 64, Hiejima, Y. (1997). Interpretation of the quasi-likelihood via the tilted exponential family, J. Japan Statist. Soc, 27, Huang, A. (2013). Joint estimation of the mean and error distribution in generalized linear models, J. Amer. Stat. Assoc., (to appear). Jørgensen, B. (1987). Exponential dispersion models, J. Roy. Stat. Soc. Ser. B, 49, , with discussion and a reply by the author. Jørgensen, B. and Knudsen, S. J. (2004). Parameter orthogonality and adjustment for estimating functions. Scand. J. Stat., 31, McCullagh, P. and Nelder, J. A. (1989). Generalized linear models, Chapman & Hall, London, UK. Myers, R. H., Montgomery, D. C., Vining, G. G., and Robinson, T. J. (2010). Generalized linear models, John Wiley & Sons Inc., Hoboken, USA. Rathouz, P. J. and Gao, L. P. (2009). Generalized linear models with unspecified reference distribution, Biostatistics, 10, Tsiatis, A. A. (2006). Semiparametric theory and missing data, Springer, New York, USA. Wang, W. and Famoye, F. (1997). Modeling household fertility decisions with generalized Poisson regression, J. Population. Econ., 10, Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss- Newton method, Biometrika, 61,

### Outline of GLMs. Definitions

Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

### FULL LIKELIHOOD INFERENCES IN THE COX MODEL

October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

### Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

### LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

### arxiv: v1 [stat.me] 9 Mar 2016

Semiparametric generalized linear models for time-series data Thomas Fung a and Alan Huang b a Macquarie University and b University of Queensland thomas.fung@mq.edu.au, alan.huang@uq.edu.au 3 November,

### A PARADOXICAL EFFECT OF NUISANCE PARAMETERS ON EFFICIENCY OF ESTIMATORS

J. Japan Statist. Soc. Vol. 34 No. 1 2004 75 86 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS ON EFFICIENCY OF ESTIMATORS Masayuki Henmi* This paper is concerned with parameter estimation in the presence

### Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

### Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

### A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

### Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of

### Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

### Full likelihood inferences in the Cox model: an empirical likelihood approach

Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

### Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

### Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

### Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

### University of California, Berkeley

University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

### Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

### Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine

Modern Likelihood-Frequentist Inference Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine Shortly before 1980, important developments in frequency theory of inference were in the air. Strictly, this

### Logistic Regression and Generalized Linear Models

Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In

### Regression models for multivariate ordered responses via the Plackett distribution

Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

### The International Journal of Biostatistics

The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University

### Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

PhD School in Statistics XXV cycle, 2010 Theory and Methods of Statistical Inference PART I Frequentist likelihood methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution

### PRINCIPLES OF STATISTICAL INFERENCE

Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra

### Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

PhD School in Statistics cycle XXVI, 2011 Theory and Methods of Statistical Inference PART I Frequentist theory and methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution

### Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

### Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized

### Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

### Theory and Methods of Statistical Inference

PhD School in Statistics cycle XXIX, 2014 Theory and Methods of Statistical Inference Instructors: B. Liseo, L. Pace, A. Salvan (course coordinator), N. Sartori, A. Tancredi, L. Ventura Syllabus Some prerequisites:

### A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

### GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical

### A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

### Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

### UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

### BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

### Pseudo-score confidence intervals for parameters in discrete statistical models

Biometrika Advance Access published January 14, 2010 Biometrika (2009), pp. 1 8 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp074 Pseudo-score confidence intervals for parameters

### Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

### STAT331. Cox s Proportional Hazards Model

STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

### Random Effects Models for Network Data

Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

### Machine Learning 2017

Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

### Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Hanxiang Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis March 4, 2009 Outline Project I: Free Knot Spline Cox Model Project I: Free Knot Spline Cox Model Consider

### BFF Four: Are we Converging?

BFF Four: Are we Converging? Nancy Reid May 2, 2017 Classical Approaches: A Look Way Back Nature of Probability BFF one to three: a look back Comparisons Are we getting there? BFF Four Harvard, May 2017

### Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

### Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

### Quantitative Empirical Methods Exam

Quantitative Empirical Methods Exam Yale Department of Political Science, August 2016 You have seven hours to complete the exam. This exam consists of three parts. Back up your assertions with mathematics

### D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

### A nonparametric two-sample wald test of equality of variances

University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

### i=1 h n (ˆθ n ) = 0. (2)

Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating

### asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data Xingqiu Zhao and Ying Zhang The Hong Kong Polytechnic University and Indiana University Abstract:

### BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA

### Lawrence D. Brown* and Daniel McCarthy*

Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

### Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

### Complements on Simple Linear Regression

Complements on Simple Linear Regression Terry R. McConnell Syracuse University March 16, 2015 Abstract We present a simple-minded approach to a variant of simple linear regression that seeks to minimize

### Gaussian processes. Basic Properties VAG002-

Gaussian processes The class of Gaussian processes is one of the most widely used families of stochastic processes for modeling dependent data observed over time, or space, or time and space. The popularity

### Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

### Parametric Techniques

Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

### Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

### A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

### On Consistency of Estimators in Simple Linear Regression 1

On Consistency of Estimators in Simple Linear Regression 1 Anindya ROY and Thomas I. SEIDMAN Department of Mathematics and Statistics University of Maryland Baltimore, MD 21250 (anindya@math.umbc.edu)

### GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

### Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

### Lecture 22: Variance and Covariance

EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce

### Sigmaplot di Systat Software

Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical

### Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

### Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

### Linear Regression Models P8111

Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

### Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement

### On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

### Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

### Multinomial Logistic Regression Models

Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

### WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University

### DELTA METHOD and RESERVING

XXXVI th ASTIN COLLOQUIUM Zurich, 4 6 September 2005 DELTA METHOD and RESERVING C.PARTRAT, Lyon 1 university (ISFA) N.PEY, AXA Canada J.SCHILLING, GIE AXA Introduction Presentation of methods based on

### Negative Multinomial Model and Cancer. Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

### Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

### Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

### Numerical Analysis for Statisticians

Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method

### Linear Regression With Special Variables

Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

### General Linear Model: Statistical Inference

Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

### Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

### Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

### Asymptotic Relative Efficiency in Estimation

Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer

### Discrete Dependent Variable Models

Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

### Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

### A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index

### Kernel methods and the exponential family

Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia

### A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

### Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

### Models as Approximations A Conspiracy of Random Regressors and Model Misspecification Against Classical Inference in Regression

Submitted to Statistical Science Models as Approximations A Conspiracy of Random Regressors and Model Misspecification Against Classical Inference in Regression Andreas Buja,,, Richard Berk, Lawrence Brown,,

### Gaussian Processes (10/16/13)

STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

### Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

### AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

### Local Mixtures and Exponential Dispersion Models

Local Mixtures and Exponential Dispersion Models by Paul Marriott Department of Statistics & Applied Probability, National University of Singapore, 3 Science Drive, Singapore 117543. email: stapkm@nus.edu.sg

PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights

### Regression Analysis for Data Containing Outliers and High Leverage Points

Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

### Filters in Analysis and Topology

Filters in Analysis and Topology David MacIver July 1, 2004 Abstract The study of filters is a very natural way to talk about convergence in an arbitrary topological space, and carries over nicely into

### On Least Squares Linear Regression Without Second Moment

On Least Squares Linear Regression Without Second Moment BY RAJESHWARI MAJUMDAR University of Connecticut If \ and ] are real valued random variables such that the first moments of \, ], and \] exist and

### Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

### Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract