Confidence Sets Based on Shrinkage Estimators

Similar documents
Confidence Sets Based on Shrinkage Estimators

Part III. A Decision-Theoretic Approach and Bayesian testing

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Machine learning, shrinkage estimation, and economic theory

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

Habilitationsvortrag: Machine learning, shrinkage estimation, and economic theory

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Data Mining Stat 588

Long-Run Covariability

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

The outline for Unit 3

Model comparison and selection

Statistical Inference

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Regression, Ridge Regression, Lasso

Averaging Estimators for Regressions with a Possible Structural Break

Understanding Regressions with Observations Collected at High Frequency over Long Span

Machine Learning for OR & FE

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

Linear Model Selection and Regularization

A more powerful subvector Anderson and Rubin test in linear instrumental variables regression. Patrik Guggenberger Pennsylvania State University

Simultaneous Confidence Bands: Theoretical Comparisons and Recommendations for Practice

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Model Selection and Geometry

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

optimal inference in a class of nonparametric models

g-priors for Linear Regression

Optimizing forecasts for inflation and interest rates by time-series model averaging

Efficient Shrinkage in Parametric Models

δ -method and M-estimation

Quick Review on Linear Multiple Regression

Linear Algebra Massoud Malek

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Cross-Validation with Confidence

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK

A Very Brief Summary of Statistical Inference, and Examples

Cross-Validation with Confidence

Time Series and Forecasting Lecture 4 NonLinear Time Series

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

VALIDITY OF SUBSAMPLING AND PLUG-IN ASYMPTOTIC INFERENCE FOR PARAMETERS DEFINED BY MOMENT INEQUALITIES

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Applied Econometrics (QEM)

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Regime switching models

Lecture 8 Inequality Testing and Moment Inequality Models

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Econ 2148, spring 2019 Statistical decision theory

Analysis Methods for Supersaturated Design: Some Comparisons

Bayesian methods in economics and finance

High-dimensional regression with unknown variance

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Econometrics of Panel Data

Nonparametric Regression. Badr Missaoui

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 Revised May 2017

Testing Statistical Hypotheses

BIOS 312: Precision of Statistical Inference

Size Distortion and Modi cation of Classical Vuong Tests

Lecture 2: Statistical Decision Theory (Part I)

Quantile Regression for Panel/Longitudinal Data

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Multiscale Adaptive Inference on Conditional Moment Inequalities

Some Curiosities Arising in Objective Bayesian Analysis

Lectures on Structural Change

Working Paper Series. Selecting models with judgment. No 2188 / October Simone Manganelli

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms

Vector Auto-Regressive Models

What s New in Econometrics. Lecture 13

VAR Models and Applications

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Carl N. Morris. University of Texas

Program Evaluation with High-Dimensional Data

Threshold Autoregressions and NonLinear Autoregressions

Why experimenters should not randomize, and what they should do instead

Lecture 3. Inference about multivariate normal distribution

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Linear Models and Estimation by Least Squares

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Lecture 8: Information Theory and Statistics

Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction

ST5215: Advanced Statistical Theory

Estimation under Ambiguity

Lecture 11 Weak IV. Econ 715

STAT 200C: High-dimensional Statistics

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Improved Inference for First Order Autocorrelation using Likelihood Analysis

Robust Backtesting Tests for Value-at-Risk Models

Transcription:

Confidence Sets Based on Shrinkage Estimators Mikkel Plagborg-Møller Harvard University June 2017

Shrinkage estimators in applied work ˆβ shrink = argmin β { ˆQ(β) + λc(β) } Shrinkage/penalized estimators popular in economics: Random effects. High-dimensional prediction. Smoothing jagged functions. Shiller (1973); Hodrick & Prescott (1981); Breitung & Roling (2015); Barnichon & Brownlees (2017) Estimating fixed effects. Chetty et al. (2014); Chamberlain (2016) Shrinking toward theory. Hansen (2016); Fessler & Kasy (2017) Shrinkage parameter λ often data-dependent. 2

Challenges of shrinkage inference How to calculate SEs for shrinkage estimators? With data-dependent shrinkage parameter λ, asy. distribution often discontinuous in true parameters. Example For finite-dim parameters, impossible to estimate CDF of ˆβ shrink uniformly consistently. Leeb & Pötscher (2005) Standard bootstrap typically doesn t work. Beran (2010) Applied researchers often just undersmooth (i.e., SE for usual point estimator). Not always valid. 3

This project Class of generalized ridge regression estimators: Vinod (1978) ˆβ M,W (λ) = argmin β R n { β ˆβ 2 W + λ Mβ 2}. Shrinkage parameter λ selected by unbiased risk estimate. Gaussian location model: ˆβ N n (β, Σ), known Σ. Conditional QLR test for linear hypothesis on β. Exact size. Conditional QLR confidence set by test inversion. Simulations show favorable average length/area of CSs. Uniform asymptotic validity even when data is non-gaussian. 4

Relationship to literature Large stats lit uses analytically convenient transformations and priors. Casella & Hwang (1982, 1984, 1987, 2012); Tseng & Brown (1997) My starting point: How to calculate SEs for given ridge estimator? Arbitrary correlation structure, arbitrary shrinkage hypothesis. CSs tied to (and always contain) meaningful point estimator. Tests/CSs have Empirical Bayes (random effects) interpretation. But I do not start from decision-theoretic first principles. Impossible to uniformly dominate expected volume of Wald ellipsoid for 1-D or 2-D problems. Stein (1962); Brown (1966); Joshi (1969) 5

Other related literature Shrinkage: Stein (1956); James & Stein (1961); Bock (1975); Oman (1982); Casella & Hwang (1987) Unbiased risk estimate: Mallows (1973); Stein (1973, 1981); Berger (1985); Claeskens & Hjort (2003); Hansen (2010) Asymptotics for shrinkage: Leeb & Pötscher (2005); Hansen (2016) Uniformity: Andrews, Cheng & Guggenberger (2011); McCloskey (2015) Post-regularization inference: Chernozhukov, Hansen & Spindler (2015) Conditional inference: Andrews & Mikusheva (2016) Adaptive confidence sets: Pratt (1961); Brown, Casella & Hwang (1995); Wasserman (2006); Armstrong & Kolesár (2016) 6

Outline 1 Shrinkage estimators and unbiased risk estimate 2 Testing 3 Confidence sets 4 Simulation study 5 Uniform asymptotic validity 6 Applications 7 Summary

Gaussian location model For now, consider finite-sample Gaussian location model β R n unknown. Σ symmetric p.d. and known. ˆβ N n (β, Σ). Will later consider asymptotic framework for which the Gaussian model is the right limit experiment. Plug in consistent estimator ˆΣ. 7

General shrinkage estimator class { ˆβ M,W (λ) = argmin β ˆβ 2 W + λ Mβ 2} = Θ M,W (λ) ˆβ, β R n Θ M,W (λ) = (I n + λw 1 M M) 1. M R m n, W R n n symmetric p.d. Example: M = Penalizes jaggedness. 1 2 1 1 2 1......... 1 2 1 R(n 2) n. Whittaker (1923); Shiller (1972); Hodrick & Prescott (1981); Wahba (1990) 8

8 6 response, basis points 4 2 0-2 -4-6 -8-10 0 6 12 18 24 30 36 42 48 horizon, months y t : GZ excess bond premium. x t : high-freq. FFF shock. Controls: 2 lags of y t, x t, log(cpi), log(ip), 1yrTreas. Sample: 1991 2012.

Projection shrinkage Shrinkage particularly tractable when W = I n and M = P R n n is orthogonal projection matrix: P = P = P 2. Projection shrinkage towards linear subspace span(i n P). Stein (1956); Oman (1982a,b); Bock (1985); Casella & Hwang (1987) ˆβ P (λ) = argmin { β ˆβ 2 + λ Pβ 2} β R n = 1 1 + λ P ˆβ + (I n P) ˆβ. Example: I n P = proj. matrix from regression onto basis functions. 10

5 response, basis points 0-5 -10 0 6 12 18 24 30 36 42 48 horizon, months y t : GZ excess bond premium. x t : high-freq. FFF shock. Controls: 2 lags of y t, x t, log(cpi), log(ip), 1yrTreas. Sample: 1991 2012.

MSE risk criterion: Unbiased risk estimate R M,W (λ; β ) = E β Unbiased risk estimate (URE): ( ) ˆβ M,W (λ) β 2 W. Bias/var. Mallows (1973); Stein (1973, 1981); Berger (1985); Hansen (2010) ˆR M,W (λ) = ˆβ M,W (λ) ˆβ 2 W + 2 tr{w Θ M,W (λ)σ}. Define ˆλ M,W = argmin λ 0 ˆR M,W (λ). May equal. lim λ ˆβM,W (λ) well defined if M full rank or proj. 12

MSE risk criterion: Unbiased risk estimate R M,W (λ; β ) = E β Unbiased risk estimate (URE): ( ) ˆβ M,W (λ) β 2 W. Bias/var. Mallows (1973); Stein (1973, 1981); Berger (1985); Hansen (2010) ˆR M,W (λ) = ˆβ M,W (λ) ˆβ 2 W + 2 tr{w Θ M,W (λ)σ}. Define ˆλ M,W = argmin λ 0 ˆR M,W (λ). May equal. lim λ ˆβM,W (λ) well defined if M full rank or proj. Suff. cond. for unique minimum: All nonzero eig.val s of MW 1 M are equal (e.g., proj shrink). Assume a.s. unique min. for rest of talk. 12

1 estimated MSE, normalized 0.8 0.6 0.4 0.2 ˆR P ( x 1 x ), x [0, 1) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 λ/(1+λ) y t : GZ excess bond premium. x t : high-freq. FFF shock. Controls: 2 lags of y t, x t, log(cpi), log(ip), 1yrTreas. Sample: 1991 2012.

Optimal projection shrinkage For projection shrinkage, can minimize URE in closed form: ˆβ P (ˆλ P ) = ( 1 tr(σ ) P) P ˆβ 2 + P ˆβ + (I n P) ˆβ, James-Stein shrinkage towards linear subspace. Stein (1956); James & Stein (1961); Oman (1982a,b); Bock (1985) Σ P = PΣP. 14

Optimal projection shrinkage For projection shrinkage, can minimize URE in closed form: ˆβ P (ˆλ P ) = ( 1 tr(σ ) P) P ˆβ 2 + P ˆβ + (I n P) ˆβ, James-Stein shrinkage towards linear subspace. Stein (1956); James & Stein (1961); Oman (1982a,b); Bock (1985) Proposition (Hansen, 2016): If tr(σ P ) > 4ρ(Σ P ), E β Σ P = PΣP. ( ˆβ P (ˆλ P ) β 2) ( < E β ˆβ β 2) for all β. Necessary cond n: rk(p) > 4. E.g., if I n P is projection onto p basis functions, then need n > p + 4. 14

Outline 1 Shrinkage estimators and unbiased risk estimate 2 Testing 3 Confidence sets 4 Simulation study 5 Uniform asymptotic validity 6 Applications 7 Summary

Hypothesis testing in shrinkage applications R R r n full row rank. No UMP test exists. H 0 : Rβ = b, H 1 : Rβ b. Wald test is UMP unbiased (r = 1), UMP invariant, and admissible. If we re already using shrinkage point estimator, might want hypothesis test tied to this estimator as well. Obtain CS by inversion. My proposed test is biased+noninvariant, so may achieve higher power than usual Wald test for some DGPs. 15

Empirical Bayes quasi-likelihood ratio test Base hypothesis test on (negative) quasi-log-likelihood ˆL M,W (β) = β ˆβ 2 W + ˆλ M,W Mβ 2. Empirical Bayes (random effects) interpretation: β data N ( ˆβ M,W (ˆλ M,W ), (W + ˆλ M,W M M) 1). QLR test statistic of Rβ = b: min β : Rβ=b ˆL M,W (β) min ˆL M,W (β) β = R ˆβ M,W (ˆλ M,W ) b 2 (R(W +ˆλ M,W M M) 1 R ) 1 16

Null distribution impractical LR M,W (b) = R ˆβ M,W (ˆλ M,W ) b 2 (R(W +ˆλ M,W M M) 1 R ) 1 Assume Var(RZ MZ) nonsingular, Z N n (0, I n ). Then LR well defined even when ˆλ M,W =. Holds in many cases. If Var(RZ MZ) singular, can use ad hoc LR statistic LR M,W (b) = R ˆβ M,W (ˆλ M,W ) b 2 (RW 1 R ) 1. 17

Null distribution impractical LR M,W (b) = R ˆβ M,W (ˆλ M,W ) b 2 (R(W +ˆλ M,W M M) 1 R ) 1 Assume Var(RZ MZ) nonsingular, Z N n (0, I n ). Then LR well defined even when ˆλ M,W =. Holds in many cases. If Var(RZ MZ) singular, can use ad hoc LR statistic LR M,W (b) = R ˆβ M,W (ˆλ M,W ) b 2 (RW 1 R ) 1. Practical problem: Null distribution of LR statistic depends on Mβ. Solution: Condition on sufficient statistic for n r nuisance param s. 17

Sufficient statistic for nuisance parameters Define ζ = ΣR (RΣR ) 1 R n r and P = ζr R n n. Statistic ˆν = (I n P) ˆβ is S-ancillary wrt. Rβ : ˆβ ˆν F Rβ,Σ, ˆν F (In P)β,Σ. It would be uncontroversial to condition on ˆν in the absence of prior information linking Rβ and (I n P)β. In practice, the prior information Mβ 1 may not substantially constrain the relationship between Rβ and (I n P)β. Then conditioning wastes little information. Severini (1995) I condition on ˆν. Later: connection to Empirical Bayes HPD set. 18

Critical value by simulation Conditional QLR test rejects H 0 if LR M,W (b) > q 1 α,m,w (b, ˆν). Conditional critical value given ˆν = ν: q 1 α,m,w (b, ν) = quantile 1 α ( R β( λ; U) b 2 (R(W + λ(u)m M) 1 R ) 1 ), where U N r (b, RΣR ), β(λ; U) = Θ M,W (λ)(ζu + ν) for all λ 0, { } λ(u) = argmin β(λ; U) (ζu + ν) 2 W + 2 tr(w Θ M,W (λ)σ). λ 0 By design, conditional (and thus unconditional) size = 1 α. 19

Outline 1 Shrinkage estimators and unbiased risk estimate 2 Testing 3 Confidence sets 4 Simulation study 5 Uniform asymptotic validity 6 Applications 7 Summary

Confidence set by test inversion Invert CQLR test to obtain CS for b = Rβ : Ĉ M,W = { b R r : LR } M,W (b) q 1 α,m,w (b, ˆν). Do this by grid search. Simulate quantile at each point. Feasible in one or two dimensions (proj. shrinkage fast). Uniform band If M full rank or proj., can compute simple, finite upper bound on critical value. More Ĉ M,W contained in bounded ellipsoid centered at R ˆβ M,W (ˆλ M,W ). Limits grid search. 20

Properties of shrinkage confidence set 1 ĈM,W always contains shrinkage point estimate. 2 Generally not symmetric around point estimate. 3 Not always convex. 4 Converges a.s. to usual Wald ellipsoid as Mβ, M fixed. 5 Expected volume depends on β only through Mβ. Appears difficult to characterize expected volume. Even for projection shrinkage, conditional power of CQLR test depends on 6 parameters. 21

Empirical Bayes HPD set ˆL M,W (β) = β ˆβ 2 W + ˆλ M,W Mβ 2, β data N ( ˆβ M,W (ˆλ M,W ), (W + ˆλ M,W M M) 1). Empirical Bayes 1 α Highest Posterior Density set for Rβ : Ĉ EB = Doesn t control frequentist coverage. { b R r : LR } M,W (b) χ 2 r,1 α. Like shrinkage CS, but non-random critical value. 22

Minimum coverage discrepancy with EB HPD set Symmetric set difference: A B = (A B)\(A B). Proposition (following Andrews & Mikusheva, 2016) Let C be any similar confidence set for Rβ (like ĈM,W ): P β ( Rβ C ) = 1 α for all β R n. Then P β ( ) ( ) Rβ ĈM,W ĈEB P β Rβ C ĈEB for all β R n. Proof 23

Outline 1 Shrinkage estimators and unbiased risk estimate 2 Testing 3 Confidence sets 4 Simulation study 5 Uniform asymptotic validity 6 Applications 7 Summary

Illustration: bivariate shrinkage toward average Bivariate model, projection shrinkage toward average: Lindley (1962) ˆβ = ( ˆβ 1 ˆβ 2 ) e 1 ˆβ P (ˆλ P ) = ˆβ 1 + ˆβ 2 2 Parameter of interest: β 1. N 2 (( β 1 β 2 ), ( 1 ρ ρ 1 ( ) 2(1 ρ) + 1 ( ˆβ 1 ˆβ 2 ) 2 + )), ˆβ 1 ˆβ 2. 2 Both MSE of shrinkage estimator and expected length of shrinkage CI depend on DGP only through β 2 β 1 and ρ. 24

Illustration: bivariate shrinkage toward average 1.2 1.1 RMSE 1 0.9 0.8 0 1 2 3 4 5 6 7 8 3.8 avg. length of 90% CI 3.6 3.4 3.2 3 = 0.0 = 0.3 = 0.9 2.8 0 1 2 3 4 5 6 7 8 25

Simulation study of confidence intervals β i = ˆβ N n (β, Σ), 1 i 1 n 1 if K = 0, sin 2πK(i 1) n 1 if K > 0, Σ ij = σ i σ j κ i j, σ i = σ 0 ( 1 + (i 1) ϕ 1 n 1 Consider projection shrinkage toward quadratic polynomial. Lower bound on expected length relative to Wald CI: Pratt (1961) ). (1 α)φ 1 (1 α) + (2π) 1/2 e 1 2 (Φ 1 (1 α)) 2 Φ 1 (1 α/2) 0.808 for α = 0.1. 26

Simulation study of confidence intervals MSE ˆβ(ˆλ) Length Ĉ n K κ σ 0 ϕ Tot 1st Mid 1st Mid 10 0.5 0.5 0.25 1 0.63 0.95 0.56 0.97 0.85 25 0.5 0.5 0.25 1 0.34 0.69 0.29 0.88 0.86 50 0.5 0.5 0.25 1 0.19 0.46 0.16 0.83 0.88 25 0 0.5 0.25 1 0.34 0.68 0.29 0.87 0.86 25 1 0.5 0.25 1 0.93 1.29 0.77 1.10 0.88 25 2 0.5 0.25 1 0.96 0.93 0.86 0.98 0.90 25 0.5 0 0.25 1 0.16 0.35 0.13 0.83 0.84 25 0.5 0.9 0.25 1 0.81 1.11 0.76 1.05 0.91 25 0.5 0.5 0.5 1 0.34 0.66 0.28 0.88 0.86 25 0.5 0.5 0.25 3 0.35 1.19 0.30 0.96 0.85 MSE relative to ˆβ, average length relative to Wald. Conf. level = 90%. 1st = β 1, Mid = β 1+[n/2]. 27

Simulation study of 2-D confidence sets Same design, but now construct 2-D confidence set for (β 1, β 1+[n/2] ). Lower bound on expected area relative to Wald ellipse: Pratt (1961); Brown, Casella & Hwang (1995) 2 0 r Φ ( Φ 1 (1 α) r ) dr χ 2 1 α,2 0.565 for α = 0.1. 28

Simulation study of 2-D confidence sets Area n K κ σ 0 ϕ Ĉ Ĉ adhoc 10 0.5 0.5 0.25 1 0.91 0.88 25 0.5 0.5 0.25 1 0.86 0.76 50 0.5 0.5 0.25 1 0.81 0.70 25 0 0.5 0.25 1 0.84 0.76 25 1 0.5 0.25 1 1.01 1.02 25 2 0.5 0.25 1 0.90 0.94 25 0.5 0 0.25 1 0.69 0.70 25 0.5 0.9 0.25 1 1.32 1.05 25 0.5 0.5 0.5 1 0.85 0.76 25 0.5 0.5 0.25 3 1.20 0.86 Average area relative to Wald. Conf. level = 90%. 29

Takeaways from simulations Shrinkage CS works well when shrinkage point estimator works well. Shrinkage may be harmful when... 1 Mβ conveys little info about Rβ. 2 Mβ neither small nor large. 3 Correlations are high. 4 Variance of MLE of nuisance parameters large relative to variance of MLE of parameter of interest (e.g., small n). 30

Outline 1 Shrinkage estimators and unbiased risk estimate 2 Testing 3 Confidence sets 4 Simulation study 5 Uniform asymptotic validity 6 Applications 7 Summary

Uniform asymptotic size control CQLR test achieves uniform asymptotic size control, provided ˆβ is uniformly asy. normal, and ˆΣ is uniformly consistent for Σ. Uniform frequentist validity contrasts with other approaches. Undersmoothing: Pretend λ is small, ignore bias of shrinkage estimator as well as variability in λ. Switching rule: Use Wald SE if M ˆβ > c, otherwise use asymptotics under assumption Mβ = 0. Random effects: Treat random effects assumption as part of the DGP rather than just a prior. Size control wrt. random effects distribution. 31

Assumption: Preliminary estimator well-behaved Assumption Define S = {A S n + : c 1/ρ(A 1 ) ρ(a) c} for fixed c, c > 0. The distribution of the data F T for sample size T is indexed by three parameters β B R n, Σ S, and γ Γ. The estimators ( ˆβ, ˆΣ) R n S n + satisfy the following: For every sequence {β T, Σ T, γ T } T 1 B S Γ and every subsequence {k T } T 1 of {T } T 1, there exists a further subsequence { k T } T 1 such that k T ˆΣ 1/2 ( ˆβ β kt ) d N n(0, I n ), F k T (β k T,Σ k T,γ k ) T (ˆΣ Σ kt ) p 0, as T. F k T (β k T,Σ k T,γ k ) T S n + = set of symmetric positive definite n n matrices. 32

Shrinkage test is uniformly valid Let LR and ˆq 1 α denote CQLR test statistic and quantile obtained by plugging in T 1 ˆΣ in place of Σ. (Suppress M, W.) Proposition Let the previous assumption hold. Assume either rk(m) = m or M = P. Assume also Var(RZ MZ) is nonsingular, Z N n (0, I n ). Then ( lim inf inf Prob F T T (β,σ,γ) LR(Rβ) ˆq 1 α (Rβ, ˆν)) = 1 α. (β,σ,γ) B S Γ 33

Shrinkage test is uniformly valid Let LR and ˆq 1 α denote CQLR test statistic and quantile obtained by plugging in T 1 ˆΣ in place of Σ. (Suppress M, W.) Proposition Let the previous assumption hold. Assume either rk(m) = m or M = P. Assume also Var(RZ MZ) is nonsingular, Z N n (0, I n ). Then ( lim inf inf Prob F T T (β,σ,γ) LR(Rβ) ˆq 1 α (Rβ, ˆν)) = 1 α. (β,σ,γ) B S Γ Caveat: I have only written down the full proof for proj. shrinkage. I believe I have the arguments worked out for the general case. Proof idea: Consider drifting parameters β T... 1 If T Mβ T, we converge to non-shrinkage case. 2 If T Mβ T is bounded, we re in the Gaussian model in the limit. 33

Outline 1 Shrinkage estimators and unbiased risk estimate 2 Testing 3 Confidence sets 4 Simulation study 5 Uniform asymptotic validity 6 Applications 7 Summary

Treatment effect heterogeneity NSW job training experiment. Lalonde (1986); Dehejia & Wahba (1999) Outcome: earnings (absolute $) 3 years after treatment assignment. 297 treated, 425 control. Bin subjects by age decile. 52 98 subjects per bin. ˆβ R 10 : ATE estimate by bin. Projection shrinkage toward average ˆβ. 34

Treatment effect heterogeneity: confidence intervals 6000 4000 2000 0-2000 -4000 17-18 19 20-21 22 23-24 25-26 27-28 29-32 33+ 6000 4000 2000 0-2000 -4000 17-18 19 20-21 22 23-24 25-26 27-28 29-32 33+ Conf. level = 90%. Vertical axis = ATE ($), horizontal axis = age (years). 35

Treatment effect heterogeneity: 2-D confidence set 5000 4000 3000 ages 33+ 2000 1000 0-1000 -2000-4000 -3000-2000 -1000 0 1000 2000 ages 17- Conf. level = 90%. Axes = ATE ($). Ad hoc QLR statistic. 36

MIDAS forecasting Predict monthly PCE inflation using daily commodity prices, 1991:2 2017:2. MIDAS specification (lag lengths chosen by AIC): 6 25 p PCE,t = µ + γ l p PCE,t l + β j z t,j + ε t. l=1 j=1 z t,j : j-th daily observation of log Bloomberg commodity price index (BCOM) on or after 1st day of month t. ˆβ R 25 : least-squares estimator. Projection shrinkage toward straight line. Breitung & Roling (2015) 37

MIDAS forecasting: confidence intervals 0.08 0.06 0.04 0.02 0-0.02-0.04-0.06 0 5 10 15 20 25 0.08 0.06 0.04 0.02 0-0.02-0.04-0.06 0 5 10 15 20 25 Conf. level = 90%. Vertical axis = inflation (log points), horizontal axis = lags (days). 38

MIDAS forecasting: 2-D confidence set 0.1 0.08 0.06 0.04 0.02 0-0.02-0.04-0.06-0.08-0.1 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 Conf. level = 90%. Axes = inflation (log points). 39

Outline 1 Shrinkage estimators and unbiased risk estimate 2 Testing 3 Confidence sets 4 Simulation study 5 Uniform asymptotic validity 6 Applications 7 Summary

Summary Considered setting where generalized ridge regression point estimator is of interest: smoothing, shrinking toward average, etc. Proposed conditional QLR test based on same quasi-log-likelihood as shrinkage point estimator. Exact conditional size in Gaussian location model. Asymptotic uniform size control more generally. Shrinkage confidence set by test inversion. Contains shrinkage point estimate. Minimum coverage discrepancy w. EB HPD set among similar CSs. Computationally feasible in 1 2 dimensions. Proj. shrinkage fast. Promising simulation evidence. 40

Thank you

Non-standard asymptotics: example ˆβ N n (β, T 1 I n ) James-Stein estimator of β R n : ˆβ JS = ( 1 n 2 ) T ˆβ 2 ˆβ. If β 0: T ( ˆβ JS β ) d N n (0, I n ). If β = 0: ( T ( ˆβ JS β ) d 1 n 2 Z 2 ) Z, Z N n (0, I n ). Back 42

W = I n for simplicity. URE captures bias/variance tradeoff Risk decomposition: Claeskens & Hjort (2003) R M,In (λ) = tr { [I n Θ M,In (λ)] 2 β β } + tr { Θ M,In (λ) 2 Σ }. }{{}}{{} bias squared variance Unbiased estimate: β β = E( ˆβ ˆβ ) Σ. Plug in: R M,In (λ) = tr { [I n Θ M,In (λ)] 2 ( ˆβ ˆβ Σ) } + tr { Θ M,In (λ) 2 Σ } = ˆR M,In (λ) tr(σ). Back 43

Triangle inequality: Bound on critical value LR M,W (Rβ) R( ˆβ M,W (ˆλ M,W ) ˆβ) V (ˆλ) 1 + R( ˆβ β) V (ˆλ) 1. Let Z N n (0, W 1 ). For any β R n and A R n n symm. p.d., ( R(β ˆβ) 2 β ˆβ 2 V (ˆλ) 1 A ρ RA 1 R Var(RZ MZ) 1). Since ˆR M,W (ˆλ M,W ) ˆR M,W (0), { ˆβ M,W (ˆλ M,W ) ˆβ 2 W 2 tr MΣM (MW 1 M ) 1}. Under the null H 0 : Rβ = Rβ, R( ˆβ β) 2 (RΣR ) 1 χ 2 (r). Back 44

Uniform confidence band Supremum test statistic of H 0 : β i = β i, i = 1,..., n: ŜLR M,W (β) = sup i=1,...,n ˆβ i,m,w (ˆλ M,W ) β i e i (W + ˆλ M,W M M) 1. e i Simulate null critical value q 1 α,m,w (β) for any β. Simultaneous confidence band: rectangular envelope of inverted test. n C M,W = inf β i, sup β i. i=1 β : ŜLR(β) q 1 α (β) β : ŜLR(β) q 1 α (β) Computationally challenging. Can sample from band. Inoue & Kilian (2016) Back 45

Coverage discrepancy: proof sketch Proof is a confidence set reinterpretation of Andrews & Mikusheva (2016) result on conditional testing. =1 α ( ) { [ }}{ P β Rβ C ĈEB = E β 1(Rβ C) ] [ ] +E β 1(Rβ ĈEB) [ 2E β 1(Rβ C)1(Rβ ] ĈEB) 46

Coverage discrepancy: proof sketch Proof is a confidence set reinterpretation of Andrews & Mikusheva (2016) result on conditional testing. =1 α ( ) { [ }}{ P β Rβ C ĈEB = E β 1(Rβ C) ] [ ] +E β 1(Rβ ĈEB) [ 2E β 1(Rβ C)1(Rβ ] ĈEB) ( ) ( ) P β Rβ C ĈEB P β Rβ ĈM,W ĈEB [{ = 2E β 1(Rβ ĈM,W ) 1(Rβ C) } ] 1(Rβ ĈEB) [{ = 2E β 1(Rβ ĈM,W ) 1(Rβ C) } )] 1( LR M,W (Rβ ) χ 2 r,1 α 46

Similarity of C and completeness of the Gaussian family imply conditional similarity (like ĈM,W ): ( P β Rβ C ) ˆν = 1 α. By law of iterated expectations, [{ } ( )] 1(Rβ ĈM,W ) 1(Rβ C) 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α = 0. E β 47

Similarity of C and completeness of the Gaussian family imply conditional similarity (like ĈM,W ): ( P β Rβ C ) ˆν = 1 α. By law of iterated expectations, [{ } ( )] 1(Rβ ĈM,W ) 1(Rβ C) 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α = 0. E β ( ) ( ) P β Rβ C ĈEB P β Rβ ĈM,W ĈEB [ { = 2E β 1(Rβ ĈM,W ) 1(Rβ C) } { ) ( )} ] 1( LR M,W (Rβ ) χ 2 r,1 α 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α Variable inside the expectation is a.s. nonnegative by def n of ĈM,W. 47

Similarity of C and completeness of the Gaussian family imply conditional similarity (like ĈM,W ): ( P β Rβ C ) ˆν = 1 α. By law of iterated expectations, [{ } ( )] 1(Rβ ĈM,W ) 1(Rβ C) 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α = 0. E β ( ) ( ) P β Rβ C ĈEB P β Rβ ĈM,W ĈEB [ { = 2E β 1(Rβ ĈM,W ) 1(Rβ C) } { ) ( )} ] 1( LR M,W (Rβ ) χ 2 r,1 α 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α Variable inside the expectation is a.s. nonnegative by def n of ĈM,W. Crucial: EB set inverts same test stat., but non-random crit. val. Back 47