Confidence Sets Based on Shrinkage Estimators

Similar documents
Confidence Sets Based on Shrinkage Estimators

Part III. A Decision-Theoretic Approach and Bayesian testing

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Machine learning, shrinkage estimation, and economic theory

The outline for Unit 3

Averaging Estimators for Regressions with a Possible Structural Break

Statistical Inference

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Regression, Ridge Regression, Lasso

Data Mining Stat 588

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Simultaneous Confidence Bands: Theoretical Comparisons and Recommendations for Practice

Efficient Shrinkage in Parametric Models

Habilitationsvortrag: Machine learning, shrinkage estimation, and economic theory

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Machine Learning for OR & FE

Econometrics of Panel Data

BIOS 312: Precision of Statistical Inference

A Very Brief Summary of Statistical Inference, and Examples

Carl N. Morris. University of Texas

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Understanding Regressions with Observations Collected at High Frequency over Long Span

Model comparison and selection

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Model Selection and Geometry

Lecture 8 Inequality Testing and Moment Inequality Models

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Lecture 32: Asymptotic confidence sets and likelihoods

A more powerful subvector Anderson and Rubin test in linear instrumental variables regression. Patrik Guggenberger Pennsylvania State University

δ -method and M-estimation

Cross-Validation with Confidence

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Multiscale Adaptive Inference on Conditional Moment Inequalities

Projection Inference for Set-Identified Svars

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

g-priors for Linear Regression

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK

Cross-Validation with Confidence

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

VALIDITY OF SUBSAMPLING AND PLUG-IN ASYMPTOTIC INFERENCE FOR PARAMETERS DEFINED BY MOMENT INEQUALITIES

Some Curiosities Arising in Objective Bayesian Analysis

Linear Algebra Massoud Malek

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Math 181B Homework 1 Solution

STAT 200C: High-dimensional Statistics

Bayesian methods in economics and finance

Lecture 11 Weak IV. Econ 715

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

MCMC CONFIDENCE SETS FOR IDENTIFIED SETS. Xiaohong Chen, Timothy M. Christensen, and Elie Tamer. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO.

optimal inference in a class of nonparametric models

Lecture 2: Statistical Decision Theory (Part I)

simple if it completely specifies the density of x

Nonparametric Inference via Bootstrapping the Debiased Estimator

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Lecture notes on statistical decision theory Econ 2110, fall 2013

Regime switching models

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Long-Run Covariability

Vector Auto-Regressive Models

What s New in Econometrics. Lecture 13

Analysis Methods for Supersaturated Design: Some Comparisons

Political Science 236 Hypothesis Testing: Review and Bootstrapping

VAR Models and Applications

Quick Review on Linear Multiple Regression

Testing Statistical Hypotheses

Estimation under Ambiguity

Statistics: Learning models from data

Review. December 4 th, Review

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Applied Econometrics (QEM)

P Values and Nuisance Parameters

Statistical Measures of Uncertainty in Inverse Problems

Time Series and Forecasting Lecture 4 NonLinear Time Series

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Econ 2148, spring 2019 Statistical decision theory

Lecture 3. Inference about multivariate normal distribution

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

ST5215: Advanced Statistical Theory

1 Mixed effect models and longitudinal data analysis

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Robust Backtesting Tests for Value-at-Risk Models

Linear Model Selection and Regularization

Stat 5101 Lecture Notes

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

A Very Brief Summary of Statistical Inference, and Examples

Transcription:

Confidence Sets Based on Shrinkage Estimators Mikkel Plagborg-Møller April 12, 2017

Shrinkage estimators in applied work { } ˆβ shrink = argmin β ˆQ(β) + λc(β) Shrinkage/penalized estimators popular in economics: Random effects. High-dimensional prediction. Smoothing jagged functions. Shiller (1973); Barnichon & Brownlees (2017) Estimating fixed effects. Chetty et al. (2014); Chamberlain (2016) Shrinking toward theory. Hansen (2016); Fessler & Kasy (2017) Shrinkage parameter λ often data-dependent. 2

Challenges of shrinkage inference How to calculate SEs for shrinkage estimators? With data-dependent shrinkage parameter λ, asy. distribution often discontinuous in true parameters. Impossible to estimate CDF of ˆβ shrink uniformly consistently. Leeb & Pötscher (2005) Standard bootstrap typically doesn t work. Beran (2010) Applied researchers often just undersmooth (i.e., SE for usual point estimator). Not always valid. 3

This project Class of generalized ridge regression estimators: Vinod (1978) ˆβ M,W (λ) = argmin β R n { β ˆβ 2 W + λ Mβ 2}. Shrinkage parameter λ selected by unbiased risk estimate. Gaussian location model: ˆβ N n (β, Σ), known Σ. Conditional QLR test for linear hypothesis on β. Exact size. Conditional QLR confidence region by test inversion. Simulations show favorable average length of CIs. Uniform asymptotic validity even when data is non-gaussian. 4

Relationship to literature Large stats lit uses analytically convenient transformations and priors. Casella & Hwang (1982, 1984, 1987, 2012); Tseng & Brown (1997) My starting point: How to calculate SEs for given ridge estimator? Arbitrary correlation structure, arbitrary shrinkage hypothesis. CSs tied to (and always contain) meaningful point estimator. Tests/CSs have Empirical Bayes (random effects) interpretation. But I do not start from decision-theoretic first principles. Impossible to uniformly dominate expected volume of Wald ellipsoid for 1-D or 2-D problems. Stein (1962); Brown (1966); Joshi (1969) 5

Other related literature Shrinkage: Stein (1956); James & Stein (1961) Projection shrinkage: Bock (1975); Oman (1982); Casella & Hwang (1987) Unbiased risk estimate: Mallows (1973); Stein (1973, 1981); Berger (1985); Claeskens & Hjort (2003); Hansen (2010) Asymptotics for shrinkage: Hansen (2016) Uniform inference: Andrews et al. (2011); McCloskey (2015) Post-regularization inference: Chernozhukov et al. (2015) 6

Outline 1 Shrinkage estimators and Unbiased Risk Estimate 2 Testing 3 Confidence sets (and simulations) 4 Uniform asymptotic validity 5 Summary and next steps

Gaussian location model For now, consider finite-sample Gaussian location model β R n unknown. Σ symmetric p.d. and known. ˆβ N n (β, Σ). Will later consider asymptotic framework for which the Gaussian model is the right limit experiment. Plug in consistent estimator ˆΣ. 7

General shrinkage estimator class { ˆβ M,W (λ) = argmin β ˆβ 2 W + λ Mβ 2} = Θ M,W (λ) ˆβ, β R n Θ M,W (λ) = (I n + λw 1 M M) 1. M R m n, W R n n symmetric p.d. Example: M = Penalizes jaggedness. 1 2 1 1 2 1......... 1 2 1 R(n 2) n. Whittaker (1923); Shiller (1972); Hodrick & Prescott (1981); Wahba (1990) 8

8 6 response, basis points 4 2 0-2 -4-6 -8-10 0 6 12 18 24 30 36 42 48 horizon, months y t : GZ excess bond premium. x t : high-freq. FFF shock. Controls: 2 lags of y t, x t, log(cpi), log(ip), 1yrTreas. Sample: 1991 2012.

Projection shrinkage Shrinkage particularly tractable when W = I n and M = P R n n is orthogonal projection matrix: P = P = P 2. Projection shrinkage towards linear subspace span(i n P). Stein (1956); Oman (1982a,b); Bock (1985); Casella & Hwang (1987) ˆβ P (λ) = argmin { β ˆβ 2 + λ Pβ 2} β R n = 1 1 + λ P ˆβ + (I n P) ˆβ. Example: I n P = proj. matrix from regression onto basis functions. 10

5 response, basis points 0-5 -10 0 6 12 18 24 30 36 42 48 horizon, months y t : GZ excess bond premium. x t : high-freq. FFF shock. Controls: 2 lags of y t, x t, log(cpi), log(ip), 1yrTreas. Sample: 1991 2012.

Unbiased Risk Estimate MSE risk criterion: R M,W (λ) = E Unbiased Risk Estimate (URE): ( ) ˆβ M,W (λ) β 2 W. Bias/var. Mallows (1973); Stein (1973, 1981); Berger (1985); Hansen (2010) ˆR M,W (λ) = ˆβ M,W (λ) ˆβ 2 W + 2 tr{w Θ M,W (λ)σ}. If rk(m) = m or M = P, URE is strictly convex in Define ˆλ M,W = argmin λ 0 ˆR M,W (λ). λ 1+λ. May equal. lim λ ˆβ M,W (λ) well defined if M full rank or proj. 12

1 estimated MSE, normalized 0.8 0.6 0.4 0.2 ˆR P ( x 1 x ), x [0, 1) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 λ/(1+λ) y t : GZ excess bond premium. x t : high-freq. FFF shock. Controls: 2 lags of y t, x t, log(cpi), log(ip), 1yrTreas. Sample: 1991 2012.

Optimal projection shrinkage For projection shrinkage, can minimize URE in closed form: ˆβ P (ˆλ P ) = ( 1 tr(σ ) P) P ˆβ 2 + P ˆβ + (I n P) ˆβ, James-Stein shrinkage towards linear subspace. Stein (1956); James & Stein (1961); Oman (1982a,b); Bock (1985) Σ P = PΣP. 14

Optimal projection shrinkage For projection shrinkage, can minimize URE in closed form: ˆβ P (ˆλ P ) = ( 1 tr(σ ) P) P ˆβ 2 + P ˆβ + (I n P) ˆβ, James-Stein shrinkage towards linear subspace. Stein (1956); James & Stein (1961); Oman (1982a,b); Bock (1985) Proposition (Hansen, 2016): If tr(σ P ) > 4ρ(Σ P ), E β Σ P = PΣP. ( ˆβ P (ˆλ P ) β 2) ( < E β ˆβ β 2) for all β. Necessary cond n: rk(p) > 4. E.g., if I n P is projection onto p basis functions, then need n > p + 4. 14

Outline 1 Shrinkage estimators and Unbiased Risk Estimate 2 Testing 3 Confidence sets (and simulations) 4 Uniform asymptotic validity 5 Summary and next steps

Hypothesis testing in shrinkage applications R R r n full row rank. No UMP test exists. H 0 : Rβ = b, H 1 : Rβ b. Usual Wald test is UMP unbiased/invariant and admissible. If we re already using shrinkage point estimator, might want hypothesis test tied to this estimator as well. Obtain CS by inversion. My proposed test is biased+noninvariant, so may achieve higher power than usual Wald test for some DGPs. 15

Empirical Bayes quasi-likelihood ratio test Base hypothesis test on (negative) quasi-log-likelihood ˆL M,W (β) = β ˆβ 2 W + ˆλ M,W Mβ 2. Empirical Bayes (random effects) interpretation: β data N ( ˆβ M,W (ˆλ M,W ), (W + ˆλ M,W M M) 1). QLR test statistic of Rβ = b: min β : Rβ=b ˆL M,W (β) min ˆL M,W (β) β = R ˆβ M,W (ˆλ M,W ) b 2 (R(W +ˆλ M,W M M) 1 R ) 1 16

Null distribution impractical LR M,W (b) = R ˆβ M,W (ˆλ M,W ) b 2 (R(W +ˆλ M,W M M) 1 R ) 1 Assume Var(RZ MZ) nonsingular, Z N n (0, W 1 ). Then LR well defined even when ˆλ M,W =. Holds in many cases. If Var(RZ MZ) singular, can use ad hoc LR statistic LR M,W (b) = R ˆβ M,W (ˆλ M,W ) b 2 (RW 1 R ) 1. 17

Null distribution impractical LR M,W (b) = R ˆβ M,W (ˆλ M,W ) b 2 (R(W +ˆλ M,W M M) 1 R ) 1 Assume Var(RZ MZ) nonsingular, Z N n (0, W 1 ). Then LR well defined even when ˆλ M,W =. Holds in many cases. If Var(RZ MZ) singular, can use ad hoc LR statistic LR M,W (b) = R ˆβ M,W (ˆλ M,W ) b 2 (RW 1 R ) 1. Practical problem: Null distribution of LR statistic depends on entire n-dimensional parameter vector β. Proposed solution: Condition on sufficient statistic for n r nuisance parameters. Andrews & Mikusheva (2016) 17

Sufficient statistic for nuisance parameters Define ζ = ΣR (RΣR ) 1 R n r and P = ζr R n n. Statistic ˆν = (I n P) ˆβ is S-ancillary wrt. Rβ : ˆβ ˆν F Rβ,Σ, ˆν F (In P)β,Σ. It would be uncontroversial to condition on ˆν in the absence of prior information linking Rβ and (I n P)β. In practice, the prior information Mβ 1 may not substantially constrain the relationship between Rβ and (I n P)β. Then conditioning wastes little information. Severini (1995) I condition on ˆν. 18

Critical value by simulation Conditional QLR test rejects H 0 if LR M,W (b) > q 1 α,m,w (b, ˆν). Conditional critical value given ˆν = ν: q 1 α,m,w (b, ν) = quantile 1 α ( R β( λ; U) b 2 (R(W + λ(u)m M) 1 R ) 1 ), where U N r (b, RΣR ), β(λ; U) = Θ M,W (λ)(ζu + ν) for all λ 0, { } λ(u) = argmin β(λ; U) (ζu + ν) 2 W + 2 tr(w Θ M,W (λ)σ). λ 0 By design, conditional (and thus unconditional) size = 1 α. 19

Outline 1 Shrinkage estimators and Unbiased Risk Estimate 2 Testing 3 Confidence sets (and simulations) 4 Uniform asymptotic validity 5 Summary and next steps

Confidence set by test inversion Invert CQLR test to obtain CS for b = Rβ : Ĉ M,W = { b R r : LR } M,W (b) q 1 α,m,w (b, ˆν). Do this by grid search. Simulate quantile at each point. Feasible in one or two dimensions (proj. shrinkage fast). Uniform band If M full rank or proj., can compute simple, finite upper bound on critical value. More Ĉ M,W contained in bounded ellipsoid centered at R ˆβ M,W (ˆλ M,W ). Limits grid search. 20

Properties of shrinkage confidence set 1 ĈM,W always contains shrinkage point estimate. 2 Generally not symmetric around point estimate. 3 Empirical Bayes intuition: CS should have small volume for DGPs where shrinkage estimator has low MSE. 4 Appears to not always be convex in simulations. 5 Converges a.s. to usual Wald ellipsoid as Mβ, M fixed. Appears difficult to characterize expected volume. Even for projection shrinkage, conditional power of CQLR test depends on 6 parameters. 21

Simulation study of confidence intervals β i = ˆβ N n (β, Σ), 1 i 1 n 1 if K = 0, sin 2πK(i 1) n 1 if K > 0, Σ ij = σ i σ j κ i j, σ i = σ 0 ( 1 + (i 1) ϕ 1 n 1 Consider projection shrinkage toward quadratic polynomial. Lower bound on expected length relative to Wald CI: Pratt (1961) ). (1 α)φ 1 (1 α) + (2π) 1/2 e 1 2 (Φ 1 (1 α)) 2 Φ 1 (1 α/2) 0.808 for α = 0.1. 22

MSE ˆβ(ˆλ) Length Ĉ n K κ σ 0 ϕ Tot 1st Mid 1st Mid 10 0.5 0.5 0.25 1 0.63 0.95 0.56 0.97 0.85 25 0.5 0.5 0.25 1 0.34 0.69 0.29 0.88 0.86 50 0.5 0.5 0.25 1 0.19 0.46 0.16 0.83 0.88 25 0 0.5 0.25 1 0.34 0.68 0.29 0.87 0.86 25 1 0.5 0.25 1 0.93 1.29 0.77 1.10 0.88 25 2 0.5 0.25 1 0.96 0.93 0.86 0.98 0.90 25 0.5 0 0.25 1 0.16 0.35 0.13 0.83 0.84 25 0.5 0.9 0.25 1 0.81 1.11 0.76 1.05 0.91 25 0.5 0.5 0.5 1 0.34 0.66 0.28 0.88 0.86 25 0.5 0.5 0.25 3 0.35 1.19 0.30 0.96 0.85 MSE relative to ˆβ, average length relative to Wald. Level = 90%. 1st = β 1, Mid = β 1+[n/2].

Takeaways from simulation β 1+[n/2] : Expected length of CI close to performance limit. β 1 : Expected length competitive with Wald CI, but sometimes slightly wider. Intuition: Fewer relevant parameters to average across. Shrinkage works less well when... 1 n is small. 2 Shrinkage hypothesis Mβ = 0 is neither approximately true nor dramatically false. 3 Correlations are high. 4 Variance of MLE of nuisance parameters large relative to variance of MLE of parameter of interest. 24

Empirical Bayes HPD set ˆL M,W (β) = β ˆβ 2 W + ˆλ M,W Mβ 2, β data N ( ˆβ M,W (ˆλ M,W ), (W + ˆλ M,W M M) 1). Empirical Bayes 1 α Highest Posterior Density set for Rβ : Ĉ EB = Doesn t control frequentist coverage. { b R r : LR } M,W (b) χ 2 r,1 α. Like shrinkage CS, but non-random critical value. 25

Minimum coverage discrepancy with EB HPD set Symmetric set difference: A B = (A B)\(A B). Proposition (following Andrews & Mikusheva, 2016) Let C be any similar confidence set for Rβ (like ĈM,W ): P β ( Rβ C ) = 1 α for all β R n. Then P β ( ) ( ) Rβ ĈM,W ĈEB P β Rβ C ĈEB for all β R n. Proof 26

Outline 1 Shrinkage estimators and Unbiased Risk Estimate 2 Testing 3 Confidence sets (and simulations) 4 Uniform asymptotic validity 5 Summary and next steps

Uniform asymptotic size control CQLR test achieves uniform asymptotic size control, provided ˆβ is uniformly asy. normal, and ˆΣ is uniformly consistent for Σ. Uniform frequentist validity stands in stark contrast to other approaches. Undersmoothing: Pretend λ is small, ignore bias of shrinkage estimator as well as variability in λ. Switching rule: Use Wald SE if M ˆβ > c, otherwise use asymptotics under assumption Mβ = 0. Random effects: Treat random effects assumption as part of the DGP rather than just a prior. Size control wrt. random effects distribution. 27

Assumption: Preliminary estimator well-behaved Assumption Define S = {A S n + : c 1/ρ(A 1 ) ρ(a) c} for fixed c, c > 0. The distribution of the data F T for sample size T is indexed by three parameters β B R n, Σ S, and γ Γ. The estimators ( ˆβ, ˆΣ) R n S n + satisfy the following: For all sequences {β T, Σ T, γ T } T 1 B S Γ and all subsequences {k T } T 1 of {T } T 1, kt ˆΣ 1/2 ( ˆβ β kt ) (ˆΣ Σ kt ) d F kt (β kt,σ kt,γ kt ) N n(0, I n ), p 0, as T. F kt (β kt,σ kt,γ kt ) S n = set of symmetric positive definite n n matrices. 28

Shrinkage test is uniformly valid Let LR and ˆq 1 α denote CQLR test statistic and quantile obtained by plugging in T 1 ˆΣ in place of Σ. (Suppress M, W.) Proposition Let the previous assumption hold. Assume either rk(m) = m or M = P. Assume also Var(RZ MZ) is nonsingular, Z N n (0, W 1 ). Then ( lim inf inf Prob T (β,σ,γ) R n F T (β,σ,γ) LR(Rβ) ˆq 1 α (Rβ, ˆν)) = 1 α. S Γ 29

Shrinkage test is uniformly valid Let LR and ˆq 1 α denote CQLR test statistic and quantile obtained by plugging in T 1 ˆΣ in place of Σ. (Suppress M, W.) Proposition Let the previous assumption hold. Assume either rk(m) = m or M = P. Assume also Var(RZ MZ) is nonsingular, Z N n (0, W 1 ). Then ( lim inf inf Prob T (β,σ,γ) R n F T (β,σ,γ) LR(Rβ) ˆq 1 α (Rβ, ˆν)) = 1 α. S Γ Caveat: I have only written down the full proof for proj. shrinkage. I believe I have the arguments worked out for the general case. Proof idea: Consider drifting parameters β T... 1 If T Mβ T, we converge to non-shrinkage case. 2 If T Mβ T is bounded, we re in the Gaussian model in the limit. 29

Outline 1 Shrinkage estimators and Unbiased Risk Estimate 2 Testing 3 Confidence sets (and simulations) 4 Uniform asymptotic validity 5 Summary and next steps

Summary Considered setting where generalized ridge regression point estimator is of interest: smoothing, shrinking toward average, penalization, etc. Proposed conditional QLR test based on same quasi-log-likelihood as shrinkage point estimator. Exact conditional size in Gaussian location model. Asymptotic uniform size control more generally. Shrinkage confidence set by test inversion. Contains shrinkage point estimate. Minimum coverage discrepancy with EB HPD set among similar CSs. Computationally feasible in 1 2 dimensions. Proj. shrinkage fast. Promising simulation evidence. 30

Next steps More simulation evidence. Comparison of 2-D ellipse with infeasible optimum. Empirics: impulse responses, MIDAS, exchangeable parameters,...? Analytical/low-dimensional power/volume comparisons. Probably only feasible for special cases, e.g., Σ = I n. 31

Thank you

W = I n for simplicity. URE captures bias/variance tradeoff Risk decomposition: Claeskens & Hjort (2003) R M,In (λ) = tr { [I n Θ M,In (λ)] 2 β β } + tr { Θ M,In (λ) 2 Σ }. }{{}}{{} bias squared variance Unbiased estimate: β β = E( ˆβ ˆβ ) Σ. Plug in: R M,In (λ) = tr { [I n Θ M,In (λ)] 2 ( ˆβ ˆβ Σ) } + tr { Θ M,In (λ) 2 Σ } = ˆR M,In (λ) tr(σ). Back 33

Triangle inequality: Bound on critical value LR M,W (Rβ) R( ˆβ M,W (ˆλ M,W ) ˆβ) V (ˆλ) 1 + R( ˆβ β) V (ˆλ) 1. Let Z N n (0, W 1 ). For any β R n and A R n n symm. p.d., ( R(β ˆβ) 2 β ˆβ 2 V (ˆλ) 1 A ρ RA 1 R Var(RZ MZ) 1). Since ˆR M,W (ˆλ M,W ) ˆR M,W (0), { ˆβ M,W (ˆλ M,W ) ˆβ 2 W 2 tr MΣM (MW 1 M ) 1}. Under the null H 0 : Rβ = Rβ, R( ˆβ β) 2 (RΣR ) 1 χ 2 (r). Back 34

Uniform confidence band Supremum test statistic of H 0 : β i = β i, i = 1,..., n: ŜLR M,W (β) = sup i=1,...,n ˆβ i,m,w (ˆλ M,W ) β i e i (W 1 + ˆλ M,W M M) 1. e i Simulate null critical value q 1 α,m,w (β) for any β. Simultaneous confidence band: rectangular envelope of inverted test. n C M,W = inf β i, sup β i. i=1 β : ŜLR(β) q 1 α (β) β : ŜLR(β) q 1 α (β) Computationally challenging. Can sample from band. Inoue & Kilian (2016) Back 35

Coverage discrepancy: proof sketch Proof reinterprets Andrews & Mikusheva (2016) result on conditional testing. =1 α ( ) { [ }}{ P β Rβ C ĈEB = E β 1(Rβ C) ] [ ] +E β 1(Rβ ĈEB) [ 2E β 1(Rβ C)1(Rβ ] ĈEB) 36

Coverage discrepancy: proof sketch Proof reinterprets Andrews & Mikusheva (2016) result on conditional testing. =1 α ( ) { [ }}{ P β Rβ C ĈEB = E β 1(Rβ C) ] [ ] +E β 1(Rβ ĈEB) [ 2E β 1(Rβ C)1(Rβ ] ĈEB) ( ) ( ) P β Rβ C ĈEB P β Rβ ĈM,W ĈEB [{ = 2E β 1(Rβ ĈM,W ) 1(Rβ C) } ] 1(Rβ ĈEB) [{ = 2E β 1(Rβ ĈM,W ) 1(Rβ C) } )] 1( LR M,W (Rβ ) χ 2 r,1 α 36

Similarity of C and completeness of the Gaussian family imply conditional similarity (like ĈM,W ): ( P β Rβ C ) ˆν = 1 α. By law of iterated expectations, [{ } ( )] 1(Rβ ĈM,W ) 1(Rβ C) 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α = 0. E β 37

Similarity of C and completeness of the Gaussian family imply conditional similarity (like ĈM,W ): ( P β Rβ C ) ˆν = 1 α. By law of iterated expectations, [{ } ( )] 1(Rβ ĈM,W ) 1(Rβ C) 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α = 0. E β ( ) ( ) P β Rβ C ĈEB P β Rβ ĈM,W ĈEB [ { = 2E β 1(Rβ ĈM,W ) 1(Rβ C) } { ) ( )} ] 1( LR M,W (Rβ ) χ 2 r,1 α 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α Variable inside the expectation is a.s. nonnegative by def n of ĈM,W. 37

Similarity of C and completeness of the Gaussian family imply conditional similarity (like ĈM,W ): ( P β Rβ C ) ˆν = 1 α. By law of iterated expectations, [{ } ( )] 1(Rβ ĈM,W ) 1(Rβ C) 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α = 0. E β ( ) ( ) P β Rβ C ĈEB P β Rβ ĈM,W ĈEB [ { = 2E β 1(Rβ ĈM,W ) 1(Rβ C) } { ) ( )} ] 1( LR M,W (Rβ ) χ 2 r,1 α 1 q 1 α,m,w (Rβ, ˆν) χ 2 r,1 α Variable inside the expectation is a.s. nonnegative by def n of ĈM,W. Crucial: EB set inverts same test stat., but non-random crit. val. Back 37