The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

Size: px
Start display at page:

Download "The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27"

Transcription

1 The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification Brett Presnell Dennis Boos Department of Statistics University of Florida and Department of Statistics North Carolina State University The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

2 The Problem Question: Given data Y 1,..., Y n, independent, with predictors x 1,..., x n, and a parametric model f(y i ; x i, θ) for the density of Y i, with θ a p-vector of unknown model parameters, how do we test for model misspecification? Answer: It depends. Are the data i.i.d.? (no x i s) Are there replications? Are the Y i s continuous or categorical? Univariate or multivariate? Is f of some specific form that I have a test for?... The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.2/27

3 Some Answers Kolmogorov-Smirnov, Cramér-von Mises, Anderson-Darling, Shapiro-Wilks, Lillefors,... Pearson chi-sq, deviance, power divergence,... Mardia s skewness and kurtosis tests,... Find (or contrive) a bigger model (but we are NOT trying to do variable selection here). However, there is often no obvious answer, esp. if we have x i s and no replication. Might try: Graphical methods. Information matrix test? The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.3/27

4 Motivation θ = MLE of θ. θ (i) = MLE of θ if ith obs. deleted from sample. f(y i ; x i, θ (i) ) measures how well the model predicts the ith observation. Compare to f(y i ; x i, θ ). f(y i ; x i, θ (i) ) f(y i ; x i, θ ) always: n l j ( θ ) l i ( θ ) n l j ( θ (i) ) l i ( θ (i) ) n l j ( θ ) l i ( θ (i) ) j=1 j=1 j=1 If f(y i ; x i, θ (i) ) << f(y i ; x i, θ ), then fitted model must shift appreciably to accomodate ith obs. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.4/27

5 Compare the in-sample and out-of-sample likelihoods as a global measure of lack-of-fit: IOS = log ( n i=1 f(y i; x i, θ) ) n i=1 f(y i; x i, θ (i) ) An Idea = n { l(yi ; x i, θ) l(y i ; x i, θ (i) ) }, i=1 The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.5/27

6 Connections The cross-validated (out-of-sample) log-likelihood (Stone, 1977) or Bayesian analogues (Geisser and Eddy, 1979) are popular in model selection. Geisser (1990) uses a Bayesian s version of the individual terms of IOS to test for outliers (discordancy). Similar motivation for Cook s distance. Asymptotic form of IOS related to information matrix test (White, 1982). In simple examples, IOS approximates well known, intuitive test statistics. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.6/27

7 For model X 1,..., X n i.i.d. Poisson(λ), IOS s2 Example: IID Poisson Obvious comparison of moments (Fisher, 1973). IOS P 1 if model correctly specified. Asymptotically normal. Y The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.7/27

8 For model X 1,..., X n i.i.d.exp(µ), Example: IID Exponential IOS s2 Y 2. Obvious comparison of moments again. IOS P 1 if model correctly specified. Asymptotically normal. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.8/27

9 For model X 1,..., X n i.i.d.n(µ, σ 2 ), IOS 1 ( ) µ4 2 σ 4 + 1, where µ 4 is sample 4th central moment. IOS is approximately a kurtosis test. IOS P 2 if model correctly specified. Asymptotically normal. Example: IID Normal The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.9/27

10 Example: Normal Regression For model Y i indep N(x T i β, σ2 ), let β be the LSE. Then IOS n i=1 h iê 2 i σ 2 + µ 4 2 σ 4 1 2, where h i = x T i (XT X) 1 x i, the leverage of the ith obs., e i = Y i x T i β, σ 2 = n 1 n i=1 e 2 i, µ 4 = n 1 n i=1 e 4 i. Looks for heterogeneity of variance and kurtosis. IOS P p = dim(β) + 1 if model correctly specified. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.10/27

11 Approximate Form For theory, regularity conditions are similar to those for consistency and asymptotic normality of MLEs. Details are provided for the i.i.d. case. IOS IOS A = o p (n 1/2 ), where (i.i.d. case) IOS A = 1 n with n i=1 l(y i ; θ) T Â( θ) 1 l(yi ; θ) = tr { Â( θ) 1 B( θ) }, Â( θ) = n 1 n { l(y i ; θ)}, B( θ) = n 1 n l(y i ; θ) l(y i ; θ) T i=1 i=1 The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.11/27

12 IOS P IOS = E { l(y1 ; θ 0 ) T A(θ 0 ) 1 l(y1 ; θ 0 ) } Some Theory where = tr { A(θ 0 ) 1 B(θ 0 ) } H = 0 p I(θ) = E{ l(y 1 ; θ)}, B(θ) = E{ l(y 1 ; θ) l(y 1 ; θ) T }. n 1/2 (IOS IOS ) is asymptotically normally distributed (but the asymptotic variance is complicated and convergence is slow). In i.i.d. location-scale models, the null hypothesis distributions of IOS and IOS A do not depend on parameter values. Parametric bootstrap p-values exact up to MC. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.12/27

13 Discussion of IOS A IOS A /n is the trace term appearing in model selection criteria based on Kullback-Leibler discrepancy (Linhart and Zucchini, 1986)). This is related to the fact that AIC can be viewed as an approximation to the out-of-sample log-likelihood in which the trace term is replaced by p/n (Stone, 1977). IOS A is the trace of the ratio of the observed Fisher information matrix, Â( θ), a model-dependent estimate of cov{ l(y 1 ; θ 0 )}, and B( θ), the sample covariance of l(y i ; θ), a model-free estimate. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.13/27

14 Information Matrix Test White s 1982 (668) information matrix (IM) test is based on a quadratic form in the vector of differences between elements of B( θ) and Â( θ). Similarities: Both IOS A and IM test looking at something important, e.g., should we use a sandwich estimator of var( θ), or can we trust the model and use inverse Fisher information? Asymptotic distributions not very useful for either IOS/IOS A or IM. Use parametric bootstrap instead. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.14/27

15 IOS vs IM Consistency: IM test consistent against any alternative with A(θ 0 ) B(θ 0 )? IOS suggests how to look for alternatives that IOS will miss. Can I use it this afternoon? IM requires a lot of prior analysis/programming for each type of problem, even if we bootstrap. IOS is automatic if we bootstrap. IOS A requires much less analysis than IM. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.15/27

16 Larsen and Marx (2001) Example: Hurricane Rainfall Max 24 hour precipitations for 36 hurricanes. Fit a gamma distribution (p = 2). shape = 2.2 ŝcale = 3.3 IOS = 3.6. (Largest obs. contributes 1.73.) Bootstrap p-value Test All data Drop largest Drop 2 largest IOS IOS A A-D K-S The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.16/27

17 Hurricane Rainfall Data Max Rainfall (inches) The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.17/27

18 Johnson and Wichern (1998) Example: Board Stiffness Four measurements of stiffness on 30 boards. Model: 4-dimensional normal (p = 14). IOS = 30.7 p-value boot =.002 J&W identify two outliers (which contribute 5.1 and 13.5 to IOS). After deletion: IOS = 27.2 p-value boot =.006 For testing i.i.d. multivariate normal model: IOS with parametric bootstrap is exact. IOS is very much like Mardia s multivariate kurtosis test. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.18/27

19 Feigl and Zelen (1965) Example: Leukemia Survival Survival times of 33 leukemia patients. Predictors: WBC = log of white blood count AG = a binary factor (17 AG pos., 16 AG neg.) Models include WBC:AG interaction (p = 5 for both): Gamma GLM with log link. IOS = 15.7, p-value boot =.03 Linear regression w/ log(survival) as response. IOS = 7.29, p-value boot =.22 The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.19/27

20 Leukemia Survival (ctd) Does it matter which model we use? Interaction non-significant in gamma model, marginally significant in lognormal model. Important scientific finding in original paper was that WBC slope for AG negative group not significantly different from zero. Simulations suggest that.05 level tests maintain nominal size for both models. If testing lognormal model, power against gamma model is only about.20. Not great, but maybe not bad for n = 33, p = 5. Where s the competition? The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.20/27

21 Agresti (1996) Example: Horseshoe Crabs No. of satellites counted for 173 nesting females. Model: Poisson GLM with log link Carapace width as predictor (p = 2) Agresti tries Pearson chi-sq and deviance tests: Must pool data over ranges of carapace width. Finds no evidence of lack of fit. Later finds other evidence of overdispersion. IOS = 5.6 p-value boot 0 Negbinom model (p = 3): IOS = 2.66, p-val =.91. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.21/27

22 Slaton, Piegorsch and Durham (2000) 107 rat litters, 4 dose levels. Example: Toxicology Model: Heckman-Willis. Beta-binomial regression with implicit logit link. α = exp(a 0 + a 1 x), β = exp(b 0 + b 1 x). p = 4. Slaton, et al tested HW model against a larger (p = 6) model and found no evidence against HW. Larger model allows intralitter correlation to vary freely between the 4 dose levels, BUT still has beta-binomial response, and implicit logit link. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.22/27

23 For HW model IOS = 6.34 p-value boot =.04 IOS A = 5.19 p-value boot =.03 Toxicology (ctd) Further analysis (using IOS and other more standard tests) suggests that: Logit link inappropriate. Response at three lowest doses adequately modeled as binomial with same p for all three doses. Response at highest dose NOT adequately modeled by beta-binomial. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.23/27

24 Toxicology (ctd) Overall Proportion Dead Dose The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.24/27

25 IOS is computer intensive (IOS A less so); automatic and easy to employ; and applicable to a variety of problems, where sometimes IM is the only obvious competitor. Other things to work on: dependent data (time series; spatial data). censored data. models without a fully specified likelihood. Conclusions The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.25/27

26 Some References AGRESTI, A. (1996). An Introduction to Categorical Data Analysis. Wiley, New York. FEIGL, P. and ZELEN, M. (1965). Estimation of exponential survival probabilities with concomitant information. Biometrics FISHER, R. A. (1973). Statistical Methods for Research Workers. 14th ed. Hafner, New York. GEISSER, S. (1990). Predictive approaches to discordancy testing. In Bayesian and likelihood methods in statistics and econometrics: Essays in honor of George A. Barnard (S. Geisser, J. S. Hodges, S. J. Press and A. Zellner, eds.). North-Holland Publishing Co., Amsterdam. GEISSER, S. and EDDY, W. F. (1979). A predictive approach to model selection. J. Am. Statist. Ass JOHNSON, R. A. and WICHERN, D. W. (1998). Applied Multivariate Statistical Analysis. 4th ed. Prentice Hall, Upper Saddle River, NJ. The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.26/27

27 More References LARSEN, R. J. and MARX, M. L. (2001). An Introduction to Mathematical Statistics and Its Applications. 3rd ed. Prentice-Hall Inc., Englewood Cliffs, NJ. LINHART, H. and ZUCCHINI, W. (1986). Model Selection. Wiley, New York. SLATON, T. L., PIEGORSCH, W. W. and DURHAM, S. D. (2000). Estimation and testing with overdispersed proportions using the beta-logistic regression model of Heckman and Willis. Biometrics STONE, M. (1977). An asymptotic equivalence of choice of model by cross-validation and Akaike s criterion. J. R. Statist. Soc. B WHITE, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.27/27

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification Brett Presnell and Dennis D. Boos Abstract A new test of model misspecification is proposed, based on the ratio of in-sample

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters Communications for Statistical Applications and Methods 2017, Vol. 24, No. 5, 519 531 https://doi.org/10.5351/csam.2017.24.5.519 Print ISSN 2287-7843 / Online ISSN 2383-4757 Goodness-of-fit tests for randomly

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

STATISTICS ( CODE NO. 08 ) PAPER I PART - I STATISTICS ( CODE NO. 08 ) PAPER I PART - I 1. Descriptive Statistics Types of data - Concepts of a Statistical population and sample from a population ; qualitative and quantitative data ; nominal and

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE Supported by Patrick Adebayo 1 and Ahmed Ibrahim 1 Department of Statistics, University of Ilorin, Kwara State, Nigeria Department

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Prediction of Bike Rental using Model Reuse Strategy

Prediction of Bike Rental using Model Reuse Strategy Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Module 6: Model Diagnostics

Module 6: Model Diagnostics St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

The Adequate Bootstrap

The Adequate Bootstrap The Adequate Bootstrap arxiv:1608.05913v1 [stat.me] 21 Aug 2016 Toby Kenney Department of Mathematics and Statistics, Dalhousie University and Hong Gu Department of Mathematics and Statistics, Dalhousie

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern. STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

You can compute the maximum likelihood estimate for the correlation

You can compute the maximum likelihood estimate for the correlation Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = 6.5 5.8 97.84 Σ = 9.70 4.9 9.70 75.05 7.80 4.9 7.80 4.96 (b) 08.7 0 S = Σ = 03 9 6.58 03 305.6 30.89 6.58 30.89 5.5 (c) You can compute

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of

More information

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA J. Jpn. Soc. Comp. Statist., 26(2013), 53 69 DOI:10.5183/jjscs.1212002 204 MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA Yiping Tang ABSTRACT Overdispersion is a common

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Model Fitting and Model Selection. Model Fitting in Astronomy. Model Selection in Astronomy.

Model Fitting and Model Selection. Model Fitting in Astronomy. Model Selection in Astronomy. Model Fitting and Model Selection G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu http://astrostatistics.psu.edu Model Fitting Non-linear regression Density (shape) estimation Parameter

More information

Investigation of goodness-of-fit test statistic distributions by random censored samples

Investigation of goodness-of-fit test statistic distributions by random censored samples d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodness-of-fit

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Akaike Information Criterion

Akaike Information Criterion Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Introduction to Generalized Models

Introduction to Generalized Models Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Estimating prediction error in mixed models

Estimating prediction error in mixed models Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models

More information

Variability within multi-component systems. Bayesian inference in probabilistic risk assessment The current state of the art

Variability within multi-component systems. Bayesian inference in probabilistic risk assessment The current state of the art PhD seminar series Probabilistics in Engineering : g Bayesian networks and Bayesian hierarchical analysis in engeering g Conducted by Prof. Dr. Maes, Prof. Dr. Faber and Dr. Nishijima Variability within

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Discrete Dependent Variable Models

Discrete Dependent Variable Models Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350 bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does

More information