Generalized Cp (GCp) in a Model Lean Framework

Similar documents
Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Mallows Cp for Out-of-sample Prediction

Construction of PoSI Statistics 1

Lawrence D. Brown* and Daniel McCarthy*

Post-Selection Inference for Models that are Approximations

Inference for Approximating Regression Models

Department of Criminology

Models as Approximations A Conspiracy of Random Regressors and Model Misspecification Against Classical Inference in Regression

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

L. Brown. Statistics Department, Wharton School University of Pennsylvania

Assumption Lean Regression

Models as Approximations, Part I: A Conspiracy of Nonlinearity and Random Regressors in Linear Regression

PoSI and its Geometry

Models as Approximations Part II: A General Theory of Model-Robust Regression

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Ch 2: Simple Linear Regression

Inference For High Dimensional M-estimates. Fixed Design Results

Variable Selection Insurance aka Valid Statistical Inference After Model Selection

Robust model selection criteria for robust S and LT S estimators

Reliability of inference (1 of 2 lectures)

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

STAT 540: Data Analysis and Regression

Hypothesis Testing For Multilayer Network Data

MIT Spring 2015

Heteroskedasticity. Part VII. Heteroskedasticity

Outline of GLMs. Definitions

PoSI Valid Post-Selection Inference

Ch 3: Multiple Linear Regression

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Regression and Statistical Inference

Lecture 6: Discrete Choice: Qualitative Response

Efficient Estimation in Convex Single Index Models 1

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Introduction to Estimation Methods for Time Series models. Lecture 1

Statistical Inference

Recent Developments in Post-Selection Inference

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression

Diagnostics of Linear Regression

Heteroskedasticity and Autocorrelation

Inference For High Dimensional M-estimates: Fixed Design Results

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Quantile Processes for Semi and Nonparametric Regression

Data Mining Stat 588

Knockoffs as Post-Selection Inference

STAT 4385 Topic 03: Simple Linear Regression

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Models as Approximations II: A Model-Free Theory of Parametric Regression

Econometrics of Panel Data

Regression I: Mean Squared Error and Measuring Quality of Fit

Econometrics - 30C00200

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Practitioner s Guide to Cluster-Robust Inference

Dealing with Heteroskedasticity

Discussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn.

Inference After Variable Selection

Bayesian linear regression

Covariance function estimation in Gaussian process regression

Least Squares Estimation-Finite-Sample Properties

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure

Lectures on Simple Linear Regression Stat 431, Summer 2012

Statistical Inference of Covariate-Adjusted Randomized Experiments

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Selective Inference for Effect Modification

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables

A Resampling Method on Pivotal Estimating Functions

Total Least Squares Approach in Regression Methods

The outline for Unit 3

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

Political Science 236 Hypothesis Testing: Review and Bootstrapping

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

3 Multiple Linear Regression

(Part 1) High-dimensional statistics May / 41

Linear models and their mathematical foundations: Simple linear regression

Review of Econometrics

Confounder Adjustment in Multiple Hypothesis Testing

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

1 Mixed effect models and longitudinal data analysis

Empirical Economic Research, Part II

1 Appendix A: Matrix Algebra

Discrete Dependent Variable Models

Homoskedasticity. Var (u X) = σ 2. (23)

Applied Regression Analysis

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

High-dimensional regression with unknown variance

A Least Squares Formulation for Canonical Correlation Analysis

MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests

Statistics: A review. Why statistics?

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Transcription:

Generalized Cp (GCp) in a Model Lean Framework Linda Zhao University of Pennsylvania Dedicated to Lawrence Brown (1940-2018) September 9th, 2018 WHOA 3 Joint work with Larry Brown, Juhui Cai, Arun Kumar Kuchibhotla, and the Wharton Team Richard Berk, Andreas Buja, Ed George, Weijie Su Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 1 / 36

Table of Content Introduction Conventional Linear Model Assumption Lean Framework OLS and Predictive Risk under Model Lean Framework Generalized C p (GC p ) Definition Properties An alternative: boot GCp Distribution of the Difference in GC p s Simulations Summary and Ongoing Research Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 2 / 36

Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 3 / 36

Conventional Linear Model The conventional linear model assumes: Y = Xβ + ϵ (1) Y N 1 is the response vector X N r are the r predictors β r 1 is the vector of parameters ϵ N (0, σ 2 I N N ) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 4 / 36

Linear Model Violation OFTEN, the model assumptions may not hold! Nonlinearity Heteroscedasticity Missing important variables Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 5 / 36

Assumption Lean Setup We proceed without many of the restrictions Without assuming a well-specified linear model To include a random design Without homoscedasticity Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 6 / 36

Assumption Lean Setup: Well-defined β Assumption Lean Framework: Observe sample (X i, Y i ) with X i IR r, (X i, Y i ) iid F No assumptions about F, other than existence of low order moments A well-defined parameter β: [ ( ) ] 2 β = argmine F Y X b β = [ E b ( XX )] 1 E [XY ]. (2) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 7 / 36

Interpretation of the β [ ( ) ] 2 [ ( β = argmine F Y X b = E XX )] 1 E [XY ]. b It is a statistical functional Best linear approximation or Best linear prediction or The linear portion in a semi-parametric model Same meaning as in the linear model when all the usual assumptions are held See Buja et al (2014, 2016) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 8 / 36

β p of Sub-Model Let M p be a sub-model where Note: M p = {i 1, i 2,..., i p } {1,..., p}, p r X p contains only (x i1, x i2,..., x ip ) [ ( ) ] 2 [ ( )] 1 β p = argmine F Y X p b = E X p X p E [Xp Y ] b For simplicity the submodel subscript will be dropped if unnecessary β p is defined within M p Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 9 / 36

Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 10 / 36

Model Lean Framework: OLS Estimate ˆβ Given data X and Y, the usual sample matrix presentation Natural estimate of β is the Least Square Estimate: ˆβ = (X X) 1 X Y. (3) Goal: 1 Properties of the OLS ˆβ 2 Criterion to choose a good submodel 3 Properties of the criterion Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 11 / 36

Properties of OLS Asymptotic Sandwich formula n ( ˆβ β) Dist N (0, Σ sand ) (4) where Σ sand = [ E (XX )] [ 1 ( ] 2 [ ( E XX Y X β) E XX )] 1. See White, Halbert (1980 a,b) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 12 / 36

The Sandwich Estimator Simple (and rather naïve) plug in yields the sandwich estimator: ˆΣ sand = where ˆρ = Y X ˆβ. [ ] 1 { } [ n 1 X X n 1 (ˆρ 2 1 i X i X i ) n 1 X X] (5) See White, Halbert (1980 a,b) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 13 / 36

The Sandwich Estimator Theorem (Kuchibhotla et al. 2018) Under mild assumptions, the sandwich estimator ˆΣ sand is a consistent estimator of Σ sand, i.e., ˆΣ sand P Σsand. Moreover, ˆΣ sand is a semi-parametrically efficient estimator of Σ sand. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 14 / 36

Model Lean Framework: Predictive Risk Contemplate a future observation (X, Y) F. For any submodel M p, the predictive risk of the LS is R p E F [ ( Y X p ˆβ p ) 2 ]. (6) We next Propose a good estimator GC p for R p Study the properties of GC p Derive the distribution of GC p difference Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 15 / 36

Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 16 / 36

Estimation of Predictive Risk: GC p Define the Generalized C p (GC p ) as follows. GC p = n 1 SSE + 2n 1 ˆξ2 (7) where SSE = Y X ˆβ 2 (8a) ( ) 1 ( ˆξ 2 = tr X X X D 2 ) r X n n (8b) and D 2 r is the diagonal matrix with D 2 r,ii = (Y i X i ˆβ) 2. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 17 / 36

Properties of GC p Theorem I GC p is a consistent estimator for the predictive risk R, i.e., GC p P R. Remark: The theorems are true under mild assumptions such as existence of moments. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 18 / 36

GCp: Derivation [ R E (Y X ˆβ) 2] [ = E (Y X β) 2] [ ( ) ] 2 + E X ( ˆβ β) [ E (Y X β) 2] [ + E X Σ sand X ] Sandwich ) n 1 Y Xβ 2 + n 1 tr (ˆΣ sand X X/n Empirical moment n 1 Y X ˆβ 2 ) ) + n 1 tr (ˆΣ sand X X/n + n 1 tr (ˆΣ sand X X/n ( = n 1 Y X ˆβ 2 + 2n 1 tr X X n = n 1 Y X ˆβ 2 + 2n 1 ˆξ2 GC p. ) 1 ( X D 2 r X n ) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 19 / 36

GC p boot : GC p through Bootstrap An alternative estimator formulation is obtained through M-of-N Bootstrap. GC p boot n 1 Y X ˆβ 2 + 2tr(n 1 X XˆΣ boot ) (9) where and ˆβ bt i ˆΣ boot = 1 n boot ( ˆβ bt i n boot i=1 ˆβ)( ˆβ bt i is an M-of-N bootstrap OLS estimator. ˆβ) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 20 / 36

GC p and GC p boot To compare GC p and GC p boot we have Theorem II: GC p is the limit of the M-of-N bootstrap GC p boot as M for a fixed sample of size n, i.e., lim M GC p boot = GC p. Note: GC p and GC p boot are different for fixed n. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 21 / 36

Remark: GC p and Mallows C p Mallows version for a sub-model of size p is C p = (SSE p /ˆσ 2 r ) n + 2p (10) C U p, an alternate form of C p GC p and Mallows C p are very different! C U p = n 1 SSE p + 2n 1 pˆσ 2 r (11) Mallows C p is for fixed design and all the related results only hold under strict linear model assumptions. Comparison and examples are presented in our paper. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 22 / 36

Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 23 / 36

Comparison of Sub-Models For simplicity, let M p M p+q be two nested sub-models where M p = {1,..., p} with β p = (β p 1,..., βp p) M p+q = {1,..., p + q} with β p+q = (β p+q 1,..., β p+q p+q). Goal: Choose a model with min{r p, R p+q }. Question: How good is the decision based on = GC p+q GC p? Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 24 / 36

Contiguity setup Question: How good is the decision based on = GC p+q GC p?. Decisions based on works well when the predictive risks of two nested submodels are well-separated, i.e., R p R p+q = O(1). The problem of interest is when the predictive risks of two nested submodels are close, i.e., under the contiguity condition, R p R p+q = O(1/n). Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 25 / 36

Distribution of = GC p+q GC p ( ) WLOG assume all the X i s are in their canonical form, i.e., E X i = 0, ) ( ) E (X i X j = 0 and E = 1. Consider two nested models M p M p+q. Theorem III X 2 i Under the contiguous setting, i.e. R p+q R p = O(1/n), consider the two nested models M p M p+q. Also assume the canonical conditions for X, then in distribution. n(gc p+q GC p ) c 1 G Z 2 + c 2 Note: Z follows a multivariate normal distribution and G denotes the CDF of Z 2. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 26 / 36

Distribution of = GC p+q GC p As a special case of Theorem III, we have the following Corollary In addition to its canonical form, assume Full model is well-specified and Homoscedasticity, i.e., Var(Y i X i ) = σ 2 = 1 Then and ) ( n (GC p+q GC p χ 2 q n β [p+1,...,p+q] 2) + 2q n(r p+q R p ) = q n β [p+1,...,p+q] 2 Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 27 / 36

( ) P GC p+q GC p < 0 ρ, q ρ = n β [p+1,...,p+q] 2 /q ρ 1 R p+q R p ρ = 0 β [p+1,...,p+q] 2 = 0 M p = M p+q Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 28 / 36

P(Choosing the model with smaller R) ρ = n β [p+1,...,p+q] 2 /q ρ 1 R p+q R p ρ = 0 β [p+1,...,p+q] 2 = 0 M p = M p+q Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 29 / 36

Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 30 / 36

Set-up Sample size n = 100, 1000 X 1 = 1, X i = ( ) 2cos π(i 1)U i = 2,..., m + 1 = r where U i.i.d. ( ) Unif 1, 1, The design is in canonical form ( ) ) E = 1, i = 1,, r and E (X i X j = 0 for i j X 2 i β 2 p = E ( X 2 p σ2( X) ) + β [ p] 2 n+p 1 Y = X β + ϵ where ϵ i.i.d. N ( ) 0, 1 Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 31 / 36

Goal For each of the models: M 1, M 2,..., M r R Mi 10,000 j=1 GC p,mi /10, 000 Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 32 / 36

Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 33 / 36

Summary Set up the assumption lean framework to explore the relationship between Y and X. Studied the OLS estimator and the predictive risk R under the assumption lean framework. Proposed the Generalized C p (GC p ) and an alternative GC p boot to estimate the predictive risk. Derived the distribution of GC p difference between nested models. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 34 / 36

Ongoing and Future Research GC p based decision rules are optimal. General formulation of the distribution of GC p difference between non-nested models. GC p for Generalized Linear Model (GLM) Semi-supervised Regression Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 35 / 36

References Buja, A., Berk, R., Brown, L., George, E., Pitkin, E., Traskin, M.,... & Zhao, L. (2014). Models as approximations, Part I: A conspiracy of nonlinearity and random regressors in linear regression. arxiv preprint arxiv:1404.1578. Buja, A., Berk, R., Brown, L., George, E., Kuchibhotla, A. K., & Zhao, L. (2016). Models as Approximations Part II: A General Theory of Model-Robust Regression. arxiv preprint arxiv:1612.03257. Kuchibhotla, A. K., Brown, L. D., Buja, A., George, E. I., & Zhao, L. (2018). Valid Post-selection Inference in Assumption-lean Linear Regression. arxiv preprint arxiv:1806.04119. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 36 / 36