Model Checking and Improvement

Similar documents
Model Assessment and Comparisons

Problem Selected Scores

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Generalized Linear Models Introduction

Weighted Least Squares

Lecture 13 Fundamentals of Bayesian Inference

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model

Accounting for Complex Sample Designs via Mixture Models

Linear Methods for Prediction

Linear Models A linear model is defined by the expression

Classical and Bayesian inference

Bayesian Linear Regression

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bayesian Methods: Naïve Bayes

Chapter 5. Bayesian Statistics

Probability and Statistics Notes

GAUSSIAN PROCESS REGRESSION

Simple Linear Regression for the MPG Data

PMR Learning as Inference

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Data Mining Stat 588

Y i = η + ɛ i, i = 1,...,n.

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Bias Variance Trade-off

Joint Probability Distributions

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Hierarchical Linear Models

Matrix Approach to Simple Linear Regression: An Overview

Probability Review - Bayes Introduction

Linear Methods for Prediction

Statistics 135: Fall 2004 Final Exam

STA 2201/442 Assignment 2

STA 414/2104, Spring 2014, Practice Problem Set #1

Regression Models - Introduction

Weighted Least Squares

UNIVERSITY OF TORONTO Faculty of Arts and Science

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

g-priors for Linear Regression

Introduction to Machine Learning

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Review of Econometrics

Simple Linear Regression

A Significance Test for the Lasso

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

AGEC 621 Lecture 16 David Bessler

One-way ANOVA Model Assumptions

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Scatter plot of data from the study. Linear Regression

Statistics 135 Fall 2008 Final Exam

BIOS 2083 Linear Models c Abdus S. Wahed

For more information about how to cite these materials visit

Bayesian Linear Models

Learning Bayesian network : Given structure and completely observed data

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

variability of the model, represented by σ 2 and not accounted for by Xβ

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Principles of Bayesian Inference

Scatter plot of data from the study. Linear Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

1/15. Over or under dispersion Problem

Bayesian Linear Models

Properties of the least squares estimates

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

AMS-207: Bayesian Statistics

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Lecture 3. Univariate Bayesian inference: conjugate analysis

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Linear Models and Estimation by Least Squares

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Time Series and Dynamic Models

STAT 730 Chapter 4: Estimation

Master s Written Examination - Solution

CS281A/Stat241A Lecture 22

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Predictive Distributions

Principles of Bayesian Inference

Density Estimation. Seungjin Choi

Lecture 15. Hypothesis testing in the linear model

General Linear Model: Statistical Inference

Principles of Bayesian Inference

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COS513 LECTURE 8 STATISTICAL CONCEPTS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Bayesian Linear Models

Section 3: Simple Linear Regression

Gaussian Processes 1. Schedule

The linear model is the most fundamental of all serious statistical models encompassing:

Regression Models - Introduction

David Giles Bayesian Econometrics

Confidence Intervals, Testing and ANOVA Summary

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Transcription:

Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin

Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number of models and examined them with example data sets. Do the models used accurately describe the data used? In standard analyses, we will often check model assumptions. For example, in standard regression we will check for Correct form of the regression function (e.g. linear vs quadratic) Constant variance of the residuals Independence of of the residuals Normality of the residuals Model Checking 1

Basic question: How sensitive are our posterior inferences to our modelling assumptions? Rat Example: Will the following models give significantly different answers about tumor rates in each group 1. Original model Data model: y i = number of tumors in group i y i θ i ind Bin(n i, θ i ) i = 1,..., 71 Process model: θ i = tumor rate in group i θ i α, β ind Beta(α, β) Parameter model: p(α, β) 1 (α + β) 5/2 Model Checking 2

2. Alternative model 1 Data model: y i = number of tumors in group i y i θ i ind Bin(n i, θ i ) i = 1,..., 71 Process model: θ i = tumor rate in group i Parameter model: logit(θ i ) µ, σ 2 ind N(µ, σ 2 ) p(µ, σ 2 ) 1 σ 2 Model Checking 3

3. Alternative model 2 Data model: y i = number of tumors in group i y i α i, β i ind Beta bin(n i, α i, β i ) i = 1,..., 71 Process model: (α i, β i ) = tumor rate parameters in group i α i, β i γ α, δ α, γ β, δ β ind Gamma(α i γ α, δ α )Gamma(β i γ β, δ β ) The tumor rate for group i is E [ ] yi α i, β i n i = α i α i + β i Parameter model: p(γ α, δ α, γ β, δ β ) 1 Model Checking 4

Note that we will not be trying to answer the question of whether our model is correct or not. Its not (see Box). We are interested in whether the inaccuracies matter. Examples you may have seen in the past where deviations from assumptions don t hurt much (at least in big samples): t-test of H 0 : µ = µ 0 vs H A : µ µ 0 Normality often isn t important, though large skewness can hurt. Linear Regression: Y = Xβ + ɛ ˆβ = (X T X) 1 X T Y is unbiased if E[ɛ] = 0 ˆβ is minimum variance unbiased estimator if E[ɛ] = 0 and constant variance. (Gauss-Markov theorem) Neither of these results require normality of ɛ. Model Checking 5

There are cases where assumptions can matter. For example consider the F -test for examining H 0 : σ 2 1 = σ 2 2 vs H A : σ 2 1 σ 2 2. The results of this test can be highly dependent on the iid normal assumptions for each group. One approach to build a super-model that contains all of our models of interest as special cases. This approach usually isn t taken as it is usually difficult to build this super-model and computation is usually infeasible, assuming you can build the model. Instead we will base these checks on the posterior predictive distribution. Does our data look like our fitted model says it should. This can either be done by External validation: future data is compared with the posterior predictive distribution. Internal validation: predictive distribution. observed data is compared with the posterior Model Checking 6

Posterior Predictive Checking Idea: If the model fits, replicated data generated under the model should look similar to the observed data. If we see some discrepancy, is it due to model misspecification or due to chance. Approach: Generate L datasets, y rep 1,..., y rep L from the posterior predictive distribution p(y rep y). y rep corresponds to replicated data. So if there are any covariates that are conditioned on in the original data. For example, in the rat tumor example, we need to use the same group sample sizes as in the original data set. ỹ represents any future outcome whereas y rep indicates a replication exactly like the observed y. ỹ does not need to have the same covariate structure as the original data. Posterior Predictive Checking 7

The approach has a similar feel to hypothesis testing, where a test statistic T (y, θ) needs to be defined to measure the discrepancy between the data and the predictive simulations. Note that the test statistic can depend on the data y and the parameters and hyperparameters θ, which is different from standard hypothesis testing where the test statistic only depends on the data, but not the parameters. Posterior Predictive Checking 8

Tail-area probabilities The lack of fit of the data as compared to the posterior predictive distribution can be compared by a tail-area probability (e.g. p-value) of the test statistic T (y, θ). To calculate this probability we will use the replicates sampled from p(y rep y). Classical p-value p C = P [T (y rep ) T (y) θ] where the probability is calculated over the distribution of y rep given a fixed θ. In the classical testing setting θ would correspond to the null hypothesis value. It could also be a point estimate (say the MLE). Posterior Predictive Checking 9

Posterior predictive p-values To evaluate the fit of a Bayesian model, we need to consider what possible data sets are plausible under the model. When doing this we need to consider not only the observations y, but also the parameter values θ. Thus the p-value of interest is p B = P [T (y rep, θ) T (y, θ) y] = I(T (y rep, θ) T (y, θ))p(y rep θ)p(θ y)dy rep dθ Usually we can t calculate the Bayesian p-value exactly, but can do it by simulation. Suppose that we have L simulations of θ(θ 1,..., θ L ) from the posterior distribution p(θ y). Then for each of these θ samples, generate one sample y repl from p(y rep θ l ). Posterior Predictive Checking 10

We want to compare each of the T (y repl, θ l ) with T (y, θ l ) Then ˆp B = 1 L L I(T (y repl, θ l ) T (y, θ l )) l=1 (i.e. the proportion of samples where T (y repl, θ l ) T (y, θ l )) is an estimate of p B. Note that the test statistic T (y, θ) needs to be chosen to investigate deviations of interest. This is similar to choosing a powerful test statistic when conducting a hypothesis test For example, in the analysis of Newcomb s speed of light experiment discussed in the text, a worry was the effect of outliers. Thus T (y, θ) needs to be chosen to focus on this issue. Posterior Predictive Checking 11

In the book T (y, θ) = min y i was used (they were worried about low outliers). Another possibility would be T (y, θ) = max y i µ (e.g. the biggest residual in magnitude). This would be appropriate if the worry was either big positive or big negative residuals. This might occur if instead of y i µ, σ 2 t ν (µ, σ 2 ) as used in the analysis. y i µ, σ 2 N(µ, σ 2 ) Posterior Predictive Checking 12

While the approach is a bit more focused on test statistics, this has a similar feel to residual analysis in regression. Is there any pattern in the residual plot (e i vs ŷ i ) Plotting e i vs e i 1 or the Durbin-Watson test to examine whether residuals are correlated over time Normal scores plot or Anderson-Darling test for normality of residuals For example, if we see some curvature (but constant variance) in the residual plot, it suggests we might be missing a x 2 term in the model. If there is some curvature and non-constant variance in the residual plot, maybe we need to transform y. Posterior Predictive Checking 13

Examples: For the two random effects model examples (detergent filling and sodium level in beer), two concerns might be 1. Conditional normality of the observations (e.g. y ij N(θ j, σ 2 )) 2. Constant variance of observations within each group Note that these are probably of limited concern in both of these examples, as the total sample size is fairly large and there are equal numbers of observations in each group for both data sets. Posterior Predictive Checking 14

Possible test statistics to evaluate these are 1. Normality: Let e ij = y ij θ j and e (1) e (2)... e (n) be the ordered residuals. Let ( ( )) i T (y, θ) = Corr e (i), Φ 1 n + 1 (e.g. Correlation of points in a normal scores plot). If the data is conditionally normal, this correlation should be close to one. Otherwise the normal scores plot will have some non-linearity, which will pull this correlation down from one. (This test statistic has the feel of the Shapiro-Wilks normality test.) Posterior Predictive Checking 15

2. Equal variance: Let s 2 i be the sample variance of the observation in group i. If the constant variance assumption is reasonable T (y, θ) = max s2 i min s 2 i should not be much bigger than one. Posterior Predictive Checking 16

Detergent example: Fill Volume 31.8 32.2 32.6 1 2 3 4 5 6 Machine Posterior Predictive Checking 17

Density 0 50 100 150 Density 0.0 0.2 0.4 0.6 0.970 0.980 0.990 1.000 Correlation 2 4 6 8 10 Variance Ratio Normality test: ˆp B = 0.4886 Equal variance test: ˆp B = 0.9866 Posterior Predictive Checking 18

Beer example: Sodium 10 15 20 25 1 2 3 4 5 6 Beer Posterior Predictive Checking 19

Density 0 20 40 60 80 Density 0.00 0.04 0.08 0.12 0.93 0.95 0.97 0.99 Correlation 0 10 20 30 40 50 Variance Ratio Normality test: ˆp B = 0.5270 Equal variance test: ˆp B = 0.3520 Posterior Predictive Checking 20