Empirical Model Discovery

Size: px

Start display at page:

Download "Empirical Model Discovery"

Harold Thompson
6 years ago
Views:

1 Automatic Methods for Empirical Model Discovery p.1/151 Empirical Model Discovery David F. Hendry Department of Economics, Oxford University Statistical Science meets Philosophy of Science Conference LSE, June 2010 Research jointly with Jennifer Castle, Jurgen Doornik, Bent Nielsen and Søren Johansen Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke, Profiles of The Future, 1961

2 Introduction Automatic Methods for Empirical Model Discovery p.2/151 Many features of models not derivable from theory Need empirical evidence on: which variables are actually relevant, any lagged responses (dynamic reactions), functional forms of connections (non-linearity), structural breaks and unit roots (non-stationarities), simultaneity (or exogeneity), expectations, etc. Almost always must be data-based on available sample: need to discover what matters empirically: see Phillips (2005) and Doornik and Hendry (2009). Four key steps: (1) define the framework the target for modelling; (2) embed that target in a general formulation; (3) search for the simplest acceptable representation; (4) evaluate the findings.

3 Automatic methods can outperform David F. Hendry Automatic Methods for Empirical Model Discovery p.3/151 [A] formulation: many candidate variables, long lag lengths, non-linearities, outliers, and parameter shifts [B] selection: handle more variables than observations, yet deliver high success rates by multi-path search [C] estimation: near unbiased estimates despite selection [D] evaluation: automatically conduct a range of pertinent tests of specification and mis-specification Now routinely create models with N > 200 variables: and select successfully even if only T = 150 observations. This talk aims to explain how we do so: it only appears to be magic! Approach embodied in Autometrics: see Doornik (2009) So let s perform some magic.

4 Basis of approach Automatic Methods for Empirical Model Discovery p.4/151 Data generation process (DGP): joint density of all variables in economy Impossible to accurately theorize about or model precisely Too high dimensional and far too non-stationary. Need to reduce to manageable size in local DGP (LDGP): the DGP in space of n variables {x t } being modelled Theory of reduction explains derivation of LDGP: joint density D x (x 1...x T θ). Acts as DGP, but parameter θ may be time varying Knowing LDGP, can generate look alikes for relevant {x t } which only deviate from actual data by unpredictable noise Cannot do better than know D x ( ) so the LDGP D x ( ) is the target for model selection

5 Formulating a good LDGP David F. Hendry Automatic Methods for Empirical Model Discovery p.5/151 Choice of n variables, {x t }, to analyze is fundamental: determines the modelling target LDGP, D x ( ), and its properties. Prior reasoning, theoretical analysis, previous evidence, historical and institutional knowledge all important. Should be 90+% of effort in an empirical analysis. Aim to avoid complicated and non-constant LDGPs. Crucial not to omit substantively important variables: small set {x t } more likely to do so. Given {x t }, have defined the target D x ( ) in (1). Now embed that target in a general model formulation.

6 Extensions for discovering a good model David F. Hendry Automatic Methods for Empirical Model Discovery p.6/151 Second of four key steps: extensions of {x t } determine how well LDGP is approximated. Four main groups of automatic extensions: additional candidate variables that might be relevant; lag formulation, implementing a sequential factorization; functional form transformations for non-linearity; impulse-indicator saturation (IIS) for parameter non-constancy and data contamination. Must also handle mapping to non-integrated data, conditional factorizations, and simultaneity. Good choices facilitate locating a congruent parsimonious-encompassing model of LDGP. Congruence: matches the evidence on desired criteria; parsimonious: as small a model as viable; encompassing: explains the results of all other models.

7 How to select an empirical model? Automatic Methods for Empirical Model Discovery p.7/151 Many grounds on which to select empirical models: theoretical empirical aesthetic philosophical Within each category, many criteria: theory: generality; internal consistency; invariance empirical: goodness-of-fit; congruence; parsimony; consistency with theory; constancy; encompassing; forecast accuracy aesthetic: elegance; relevance; tell a story philosophical: novelty; excess content; making money...

8 How to resolve? Automatic Methods for Empirical Model Discovery p.8/151 Obvious solution: match all the criteria Cannot do so as: many of the criteria conflict human knowledge is limited economy too complex, high dimensional, heterogeneous data inaccurate, samples small, and non-stationary Every decision about: a theory formulation its implementation its empirical evidential base and its evaluation involves selection Selection is inevitable, unavoidable and ubiquitous

9 Implications Automatic Methods for Empirical Model Discovery p.9/151 Any test + decision = selection, so ubiquitous Most decisions undocumented often not recognized as selection Unfortunately, model selection theory is difficult: all statistics have interdependent distributions altered by every modelling decision Fortunately, computer selection algorithms allow operational studies of alternative strategies Mainly consider Gets: General-to-specific modelling Explain approach and review progress

10 How to judge performance? Automatic Methods for Empirical Model Discovery p.10/151 Many ways to judge success of selection algorithms (A) Maximizing the goodness of fit Traditional criterion for fitting a given model, but does not lead to useful selections (B) Matching a theory-derived specification Widely used, and must work well if LDGP theory, but otherwise need not (C) Frequency of discovery of the LDGP. Overly demanding may be nearly impossible even if commenced from LDGP (eg t < 0.1) (D) Improves inference about parameters Seek small, accurate, uncertainty regions around parameters of interest but oracle principle invalid

11 Operational criteria Automatic Methods for Empirical Model Discovery p.11/151 (E) Improved forecasting over other methods Many contenders: other selections, factors, model averages, robust devices...but forecasting is different (F) Works for realistic LDGPs Unclear what those are but many claimed contenders. (G) Relative frequency of recovering LDGP starting from GUM as against starting from LDGP Costs of search additional to commencing from LDGP (H) Operating characteristics match theory Nominal null rejection frequency matches actual; retained parameters of interest unbiasedly estimated (I) Find well-specified undominated model of LDGP Internal criterion algorithm could not do better

12 Which criteria? Automatic Methods for Empirical Model Discovery p.12/151 (G), (H) and (I) are main basis: aim to satisfy all three Two costs of selection: costs of inference and search First inevitable if tests have non-zero null and non-unit rejection frequencies under alternative Applies even if commence from LDGP. Measure costs of inference by RMSE of selecting or conducting inference on LDGP When a GUM nests the LDGP, additional costs of search: calculate by increase in RMSEs for relevant variables when starting from the GUM as against the LDGP, plus those for retained irrelevant variables Also see if Autometrics outperforms other automatic methods: Information Criteria, Step-wise, Lasso...

13 Selecting and evaluating the model David F. Hendry Automatic Methods for Empirical Model Discovery p.13/151 Extensions create the general unrestricted model (GUM). The GUM should nest the LDGP, making it a special case; reductions commence from GUM to locate a specific model. Selection is step (3): search for the simplest acceptable representation. Much of this talk concerns how that selection is done, and checking how well it works. Finally, step (4): evaluate the findings and the selection process. Includes tests of new aspects, such as super exogeneity (essentially causality) for policy, and parameter invariance (constancy across regimes).

14 Route map Automatic Methods for Empirical Model Discovery p.14/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

15 Discovery Automatic Methods for Empirical Model Discovery p.15/151 Discovery: learning something previously unknown. Cannot know how to discover what is not known unlikely there is a best way of doing so. Many discoveries have element of chance: luck: Fleming penicillin from a dirty petrie dish serendipity: Becquerel discovery of radioactivity natural experiment : Dicke role of gluten in celiac disease trial and error: Edison incandescent lamp brilliant intuition: Faraday dynamo from electric shock false theories: Kepler regular solids for planetary laws valid theories: Pasteur germs not spontaneous generation systematic exploration: Lavoisier oxygen not phlogiston careful observation: Harvey circulation of blood new instruments: Galileo moons around Jupiter self testing: Marshall ulcers caused by Helicobacter pylori.

16 Common aspects of discovery Automatic Methods for Empirical Model Discovery p.16/151 Theoretical reasoning also frequent: Newton, Einstein. Science is both inductive and deductive. Must distinguish between: context of discovery where anything goes, and context of evaluation rigorous attempts to refute. Accumulation and consolidation of evidence crucial: data reduction a key attribute of science (think E = mc 2 ). Seven aspects in common to above examples of discovery. First, theoretical context or framework of ideas. Second, going outside existing state. Third, search for something. Fourth, recognition of significance of what is found. Fifth, quantification of what is found. Sixth, evaluating discovery to ascertain its reality. Seven, parsimoniously summarize vast information set.

17 Discovery in economics Automatic Methods for Empirical Model Discovery p.17/151 Discoveries in economics mainly from theory. But all economic theories are: (a) incomplete; (b) incorrect; and (c) mutable. (a) Need strong ceteris paribus assumptions: inappropriate in a non-stationary, evolving world. (b) Consider an economic analysis which suggests: y = f (x) (1) where y depends on n explanatory variables x. Form of f ( ) in (1) depends on: utility or loss functions of agents, constraints they face, & information they possess. Analyses arbitrarily assume: a form for f ( ), that f ( ) is constant, that only x matters, & that the xs are exogenous. Yet must aggregate across heterogeneous individuals whose endowments shift over time, often abruptly.

18 Theory evolves Automatic Methods for Empirical Model Discovery p.18/151 (c) Economic analyses have changed the world, and our understanding: from the invisible hand in Adam Smith s Theory of Moral Sentiments (1759, p.350) onwards, theory has progressed dramatically key insights into option pricing, auctions and contracts, principal-agent and game theories, trust and moral hazard, asymmetric information, institutions: major impacts on market functioning, industrial, and even political, organization. Imagine imposing 1900 s economic theory in empirical research today. Much past applied econometrics research is forgotten: discard the economic theory that it quantified and you discard the associated empirical evidence. Hence fads & fashions, cycles and schools in economics.

19 But the impossible is not possible David F. Hendry Automatic Methods for Empirical Model Discovery p.19/151 Why, sometimes I ve believed as many as six impossible things before breakfast. Quote from the White Queen in Lewis Carroll (1899), Through the Looking-Glass, Macmillan. If only it were just 6!

20 Empirical econometrics David F. Hendry Automatic Methods for Empirical Model Discovery p.20/151 To establish truth requires at least these 12 assumptions: 1. correct, comprehensive, & immutable economic theory; 2. correct, complete choice of all relevant variables & lags; 3. validity & relevance of all regressors & instruments; 4. precise functional forms for all variables; 5. absence of hidden dependencies; 6. all expectations formulations correct; 7. all parameters identified, constant over time, & invariant; 8. exact data measurements on every variable; 9. errors are independent & homoscedastic; 10. error distributions constant over time; 11. appropriate estimator at relevant sample sizes; 12. valid and non-distortionary method of model selection. If truth is not on offer what is?

21 Data matters for empirical implementation David F. Hendry Automatic Methods for Empirical Model Discovery p.21/151 Sample of T observations, {y t,x t }: but no theory specification of a unit of time, observations may be contaminated (measurement errors), underlying processes integrated, abrupt shifts induce various forms of breaks. Need to discover what matters empirically. So how to utilize economic analyses efficiently if cannot impose theory empirically? Answer: embed theory specification in vastly more general empirical formulation. Even if truth is not on offer, theory-guided, congruent, parsimonious encompassing models with parameters invariant to relevant policies may be. But not by conventional routes...

22 Classical econometrics: covert discovery David F. Hendry Automatic Methods for Empirical Model Discovery p.22/151 Postulated: y t = β x t +ɛ t, t = 1,...,T (2) Aim to obtain best estimate of the constant parameters β, given the n correct variables, x, independent of {ɛ t } and uncontaminated observations, T, with ɛ t IID [ 0,σ 2 ɛ]. Need severe testing (Mayo and Spanos, 2006) to discover if model is mis-specified (see Mayo, 1981). Many tests to discover departures from assumptions of (2), followed by recipes for fixing them covert and unstructured empirical model discovery.

23 Model selection: discovering the best model David F. Hendry Automatic Methods for Empirical Model Discovery p.23/151 Start from (2): y t = β x t +ɛ t, t = 1,...,T assuming N correct initial x & T, and valid conditioning where x includes many candidate regressors. Aim to discover the subset of relevant variables, x t, then estimate the associated parameters, β. Departures from assumptions often simply ignored: information criteria usually do not even check. May select best model in set, yet be a poor approximation to LDGP.

24 Robust statistics: discovering the best sample David F. Hendry Automatic Methods for Empirical Model Discovery p.24/151 Same start (2), but aim to find a robust estimate of β by selecting over T. Worry about data contamination and outliers, so select sample, T, where outliers least in evidence, given correct set of relevant variables x. Other difficulties still need separate tests, and must be fixed if found. x rarely selected jointly with T, so assumes x = x. Similarly for non-parametric methods: aim to discover best functional form & distribution assuming correct x, no data contamination, constant β, etc., all rarely checked.

25 Automatic empirical model discovery David F. Hendry Automatic Methods for Empirical Model Discovery p.25/151 Re-frame empirical modelling as discovery process: part of a progressive research strategy. Starting from T observations on N > n variables x, aim to find β for s lagged functions g(x t )...g(x t s ) of a subset of n variables x, jointly with T and {1 {t=ti }} indicators for breaks, outliers etc. Final model is congruent, parsimonious-encompassing representation: s i=0 λ i y t i = s i=0 (β i) g ( x ) m t i + γ j d t,j +v t (3) j=1 where λ 0 = 1 and {v t } is an innovation process. Embeds initial y = f (x), but much more general. Globally, learning must be simple to general; but locally, need not be.

26 Implications for automatic methods Automatic Methods for Empirical Model Discovery p.26/151 Same seven stages as for discovery in general. First, theoretical derivation of the relevant set x. Second, going outside current view by automatic creation of a general model from x embedding y = f (x). Third, search by automatic selection to find viable representations: too large for manual labor. Fourth, criteria to recognize when search is completed: congruent parsimonious-encompassing model. Fifth, quantification of the outcome: translated into unbiasedly estimating the resulting model. Sixth, evaluate discovery to check its reality: new data, new tests or new procedures. Can also evaluate the selection process itself. Seventh, summarize vast information set in parsimonious but undominated model.

27 Retaining economic theory insights David F. Hendry Automatic Methods for Empirical Model Discovery p.27/151 Approach is not atheoretic. Theory formulations should be embedded in GUM, can be retained without selection. Call such imposition forcing variables ensures they are retained, but does not guarantee they will be significant. Can also ensure theory-derived signs of long-run relation maintained, if not significantly rejected by the evidence. But much observed data variability in economics is due to features absent from most economic theories: which empirical models must handle. Extension of LDGP candidates, x t, in GUM allows theory formulation as special case, yet protects against contaminating influences (like outliers) absent from theory. Extras can be selected at tight significance levels.

28 Possible economic theory outcomes David F. Hendry Automatic Methods for Empirical Model Discovery p.28/151 1] Theory kept exactly: all aspects significant with anticipated signs, no other variables kept. 2] Theory kept but only part of explanation: all aspects significant with anticipated signs, but other variables also kept as substantively relevant. 3] Theory partially kept: only some aspects significant with anticipated signs, and other variables also kept as substantively relevant. 4] Theory not kept: no aspects significant and other variables do all explanation. Win-win situation: theory kept if valid and complete; yet learn when it is not correct so can automatic model selection work?

29 Route map Automatic Methods for Empirical Model Discovery p.29/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

30 Extensions outside standard information David F. Hendry Automatic Methods for Empirical Model Discovery p.30/151 Extensions determine how well LDGP is approximated Create three extensions automatically: (i) lag formulation to implement sequential factorization; (ii) functional form transformations for non-linearity; (iii) impulse-indicator saturation (IIS) for parameter non-constancy and data contamination. (i) Create s lags x t...x t s to formulate general linear model: y t = β 0 + s λ i y t i + i=1 r i=1 s β i,j x i,t j +ɛ t (4) j=0 x t could also be modelled as a system: x t = γ + s Γ j x t j +ɛ t (5) j=1

31 Automatic non-linear extensions Automatic Methods for Empirical Model Discovery p.31/151 Test for non-linearity in general linear model by low-dimensional portmanteau test in Castle and Hendry (2010) (cubics of principal components z t of the x t ). (ii) If reject, create g(z t ), otherwise g(x t ) = x t : presently, implemented general cubics with exponential functions. Number of potential regressors for cubic polynomials is: M K = K(K +1)(K +5)/6. Explosion in number of terms as K = r (s+1) increases: K M K Quickly reach huge M K : but only 3K if use z k i,t j. (Investigating squashing functions, to better approximate non-linearity in economics, suggested by Hal White)

32 Impulse-indicator saturation Automatic Methods for Empirical Model Discovery p.32/151 (iii) To tackle multiple breaks & data contamination (outliers), add T impulse indicators to candidates for T observations. Consider x i IID [ µ,σ 2 ɛ] for i = 1,...,T µ is parameter of interest Uncertain of outliers, so add T indicators 1 {t=ti } to set of candidate regressors. First, include half of indicators, record significant: just dummying out T/2 observations for estimating µ Then omit, include other half, record again. Combine sub-sample indicators, & select significant. αt indicators selected on average at significance level α Feasible split-sample impulse-indicator saturation (IIS) algorithm: see Hendry, Johansen and Santos (2008)

33 Dynamic generalizations Automatic Methods for Empirical Model Discovery p.33/151 Johansen and Nielsen (2009) extend IIS to both stationary and unit-root autoregressions When distribution is symmetric, adding T impulse-indicators to a regression with n variables, coefficient β (not selected) and second moment Σ: T 1/2 ( β β) D N n [ 0,σ 2 ɛ Σ 1 Ω β ] Efficiency of IIS estimator β with respect to OLS β measured by Ω β depends on c α and distribution Must lose efficiency under null: but small loss αt only 1% at α = 1/T if T = 100, despite T extra candidates. Potential for major gain under alternatives of breaks and/or data contamination: variant of robust estimation

34 Efficiency of IIS estimator David F. Hendry Automatic Methods for Empirical Model Discovery p.34/ Efficiency of indicator saturated estimator critical value c tail probability for c Figure from Johansen and Nielsen (2009) shows outcome for equal split in ɛ i [ IID µ,σ 2 ] ɛ Left panel shows efficiency as function of c α ; right panel as function of tail probability P( ɛ t > σ ɛ c α )

35 Structural break example Automatic Methods for Empirical Model Discovery p.35/151 Y Fitted 2 scaled residuals Size of the break is 10 standard errors at 0.75T There are no outliers in this mis-specified model as all residuals [ 2,2] SDs: outliers structural breaks step-wise regression has zero power Let s see what Autometrics reports

36 Split-sample search in IIS David F. Hendry Automatic Methods for Empirical Model Discovery p.36/151 Dummies included in GUM Final model: actual and fitted 15 Final model: Dummies selected Block Block Final

37 Specification of GUM Automatic Methods for Empirical Model Discovery p.37/151 Most major formulation decisions now made: which r variables (z t, after transforming x t ); their lag lengths (s); functional forms (cubics); structural breaks (any number, anywhere). Leads to general unrestricted model (GUM): y t = + r i=1 r i=1 s β i,j x i,t j + j=0 s γ i,j zi,t j 3 + j=0 r i=1 s ρ i,j z i,t j + j=0 s λ j y t j + j=1 r i=1 s θ i,j zi,t j 2 j=0 T δ i 1 {i=t} +ɛ t K = 4r(s+1)+s potential regressors, plus T indicators: includes factors z and variables x despite collinearity. Bound to have N > T : consider exogeneity later. i=1

38 Route map Automatic Methods for Empirical Model Discovery p.38/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

39 Model selection Automatic Methods for Empirical Model Discovery p.39/151 To successfully determine what matters and how it enters, all main determinants must be included: omitting key variables adversely affects selected models. Especially forceful issue when data are wide sense non-stationary both integrated and not time invariant Catch 22 have more variables N than observations T : so all cannot be entered from the outset. Requires expanding as well as contracting searches Have to select: not an option just to estimate. To resolve conundrum, analysis proceeds in 5 stages.

40 Five main stages, then evaluation Automatic Methods for Empirical Model Discovery p.40/151 1] 1-cut selection for orthogonal designs with N << T. 2] Selection matters, so consider effects of bias correction on distributions of estimates 3] Compare 1-cut with Autometrics, which works in non-orthogonal models, still with N << T. 4] More variables N than observations T follows as IIS. 5] Multiple breaks, using IIS Having resolved selection, next consider evaluation: 6] Impact of diagnostic testing 7] Role of encompassing in automatic selection 8] Testing exogeneity and invariance

41 Understanding model selection Automatic Methods for Empirical Model Discovery p.41/151 Consider a perfectly orthogonal regression model: y t = N i=1 β iz i,t +ɛ t (6) E[z i,t z j,t ] = λ i,i for i = j & 0 i j, ɛ t IN[0,σ 2 ɛ] and T >> N. Order the N sample t 2 -statistics testing H 0 : β j = 0: t 2 (N) t2 (N 1) t2 (1) Cut-off m between included and excluded variables is: t 2 (m) c2 α > t 2 (m 1) Larger values retained: all others eliminated. Only one decision needed even for N 1000: repeated testing does not occur, and goodness of fit is never considered. Maintain average false null retention at one variable by α 1/N, with α declining as T

42 Interpretation Automatic Methods for Empirical Model Discovery p.42/151 Path search gives impression of repeated testing. Confused with selecting from 2 N possible models (here = , an impossible task). We are selecting variables, not models, & only N variables. But selection matters, as only retain significant outcomes. Sampling variation also entails retain irrelevant, or miss relevant, by chance near selection margin. Conditional on selecting, estimates biased away from origin: but can bias correct as know c α. Small efficiency cost under null for examining many candidate regressors, even N >> T. Almost as good as commencing from LDGP at same c α.

43 Monte Carlo simulations for N = 1000 David F. Hendry Automatic Methods for Empirical Model Discovery p.43/151 We now illustrate above theory by simulating selection of 10 relevant from 1000 candidate variables. DGP is given by: y t = β 1 x 1,t + +β 10 x 10,t +ɛ t, (7) x t IN 1000 [0,Ω], (8) ɛ t IN[0,1], (9) where x t = (x 1,t,,x 1000,t ). Set Ω = I 1000 for simplicity, keeping regressors fixed between experiments Use T = 2000 observations. DGP coefficients, β, and non-centralities, ψ, in table 1 Also theoretical powers of t-tests on individual coefficients.

44 Non-centralities Automatic Methods for Empirical Model Discovery p.44/151 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 β ψ P P Table 1: Coefficientsβ i, non-centralitiesψ i, theoretical retention probabilities, P α,i. GUM contains all 1000 regressors and intercept: y t = β 0 +β 1 x 1,t + +β 1000 x 1000,t +u t, t = 1,...,2000. DGP has first n = 10 variables relevant, so 991 variables irrelevant in GUM (with intercept). Report outcomes for α = 1% and 0.1% M = 1000 replications, where 1( ) is indicator.

45 Gauge and potency Automatic Methods for Empirical Model Discovery p.45/151 Let β i denote estimate of β i after model selection. gauge : average retention frequency of irrelevant variables. potency : average retention frequency of relevant variables. In simulation experiments with M replications: retention rate: p k = 1 M M i=1 1 ( β k,i 0), k = 0,...,N, potency: = 1 n n k=1 p k, gauge: = 1 N n+1 ( p 0 + N k=n+1 p k All retained variables significant at c α in 1-cut. But not necessarily the case with automated Gets. Irrelevant variables may be retained because of: (a) diagnostic checking when a variable is insignificant, but deletion makes a diagnostic test significant, or (b) with encompassing, a variable can be individually insignificant, but not jointly with all variables deleted so far. ).

46 Simulation outcomes Automatic Methods for Empirical Model Discovery p.46/151 Simulation gauges and potencies recorded in table 2. α Gauge Potency 1% 1.01% 81% 0.1% 0.10% 69% Table 2: Potency and gauge for 1-cut selection with 1000 variables. Gauges not significantly different from nominal sizes α: selection is not over-sized even with N = 1000 variables Potencies close to average powers even when selecting just n = 10 relevant regressors from 1000 variables.

47 Calculating MSEs Automatic Methods for Empirical Model Discovery p.47/151 Also report MSEs after model selection. β k,i is OLS estimate of coefficient on x k,t in GUM for replication i. β k,i is OLS estimate after model selection β k,i = 0 when x k,t not selected in final model. Calculate following MSEs: MSE k = M 1 ) M 2, i=1( βk,i β k UMSE k = 1 M M i=1( βk,i β k ) 2, CMSE k = M i=1 [ ( βk,i β k) 2 1 ( βk,i 0)] M i=1 1 ( βk,i 0), ( β 2 k if M i=1 1 ( β k,i 0) = 0 ) Unconditional MSE (UMSE): sets β k,i = 0 when a variable is not selected. Conditional MSE (CMSE) is over retained variables only.

48 1-cut selection, N = 1000, n = 10 David F. Hendry Automatic Methods for Empirical Model Discovery p.48/151 Retention of relevant variables at 1% 1.00 Monte Carlo Theory Retention of relevant variables at 0.1% 1.00 Monte Carlo Theory MSEs at 1% x kt MSEs at 0.1% UMSE at 1% CMSE at 1% UMSE at 0.1% CMSE at 0.1% x kt ~ β k ~ β k

49 Simulation MSEs Automatic Methods for Empirical Model Discovery p.49/151 Figure 48 shows that retention rates for individual relevant variables are as expected from the theory. CMSEs are always below UMSEs for relevant variables (bottom graphs in Fig. 48), with exception of β 1 at 0.1%. As a baseline, UMSE for every estimated coefficient of a variable in GUM is Generally worse than GUM for relevant in MSE terms: but reduced by about 990 variables on average. Main gains for irrelevant variables, as we will see. First consider impact of bias-correcting for selecting just those variables where t c α

50 Route map Automatic Methods for Empirical Model Discovery p.50/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

51 Bias corrections Automatic Methods for Empirical Model Discovery p.51/151 Selection matters as only retain significant variables: so correct final estimates for selection Convenient approximation that: t β = β σ β β σ β N [ β σ β,1 ] = N[ψ,1] when non-centrality of t-test is ψ = β/σ β Use Gaussian approximation (IIS helps ensure): φ(w) = 1 2π exp ( 12 w2 ) where Φ(w) = w Doubly-truncated distribution expected t-value is: E [ ] t β t β > c α ;ψ φ(x) dx = ψ (10) so observed t -value is unbiased estimator for ψ

52 Truncation correction Automatic Methods for Empirical Model Discovery p.52/151 Sample selection induces: ψ = ψ + φ(c α ψ) φ( c α ψ) 1 Φ(c α ψ)+φ( c α ψ) = ψ +r(ψ,c α) (11) Correct β once ψ is known: for β > 0, say: Let: E[ β β σ β c α ψ = t β r ] = β (t β,c α ) ( 1+ r(ψ,c α) ψ ) = β, then ψ = t β r ( ψ ) ψ ( ψ,cα ) leading to the bias-corrected parameter estimate: from inverting (12). β = β ( ψ/t β) (12) (13) (14)

53 Implementing bias correction Automatic Methods for Empirical Model Discovery p.53/151 Bias corrects closely, not exactly, for relevant: over-corrects for some t-values. Some increase in MSEs of relevant variables. Correction exacerbates downward bias in unconditional estimates of relevant coefficients & increases MSEs slightly. No impact on bias of estimated parameters of irrelevant variables as their β i = 0, so unbiased with or without selection But remarkable decrease in MSEs of irrelevant variables First free lunch of new approach. Obvious why in retrospect most correction for t near c α, which occurs for retained irrelevant variables.

54 Simulation MSEs Automatic Methods for Empirical Model Discovery p.54/151 Impact of bias corrections on retained irrelevant and relevant variables, for N = 1000 and n = 10 in (6). α 1% 0.1% 1% 0.1% average CMSE over average CMSE over 990 irrelevant variables 10 relevant variables uncorrected β β after correction Table 3: Average CMSEs, times 100, for retained relevant and irrelevant variables (excluding β 0 ), with and without bias correction. Greatly reduces MSEs of irrelevant variables in both unconditional and conditional distributions. Coefficients of retained variables with t c α are not bias corrected insignificant estimates set to zero.

55 Bias correcting conditional distributions at 5% David F. Hendry Automatic Methods for Empirical Model Discovery p.55/151 (a) ψ=2 (b) ψ= (c) ψ= (d) Intercept, ψ=

56 Implications of selection Automatic Methods for Empirical Model Discovery p.56/151 Despite selecting from N = 1000 potential variables when only n = 10 are relevant: (1) nearly unbiased estimates of coefficients and equation standard errors can be obtained; (2) little loss of efficiency from checking many irrelevant variables; (3) some loss from not retaining relevant variables at large c α ; (4) huge gain by not commencing from an under-specified model; (5) even works well for fat-tailed errors at tight α when IIS used see below. Now add path search for non-orthogonal data sets.

57 Route map Automatic Methods for Empirical Model Discovery p.57/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

58 Autometrics improves on previous algorithms David F. Hendry Automatic Methods for Empirical Model Discovery p.58/151 Search paths: Autometrics examines whole search space; discards irrelevant routes systematically. Likelihood-based: Autometrics implemented in likelihood framework. Efficiency: Autometrics improves computational efficiency: avoids repeated estimation & diagnostic testing, remembers terminal models. Structured: Autometrics separates estimation criterion, search algorithm, evaluation, & termination decision. Generality: Autometrics can handle N > T. If GUM is congruent, so are all terminals: undominated, mutually-encompassing representations. If several terminal models, all reported: can combine, or one selected (by, e.g., Schwarz, 1978, criterion).

59 Autometrics tree search Automatic Methods for Empirical Model Discovery p.59/151 Search follows branches till no insignificant variables; tests for congruence and parsimonious encompassing; backtracks if either fails, till first non-rejection found.

60 Selecting by Autometrics Automatic Methods for Empirical Model Discovery p.60/151 Even when 1-cut applicable, little loss, and often a gain, from using path-search algorithm Autometrics. Autometrics applicable to non-orthogonal problems, and N > T. Gauge (average retention rate of irrelevant variables) close to α. Potency (average retention rate of relevant variables) near theory value for a 1-off test. Goodness-of-fit not directly used to select models & no attempt to prove that a given set of variables matters, but choice of c α affects R 2 and n through retention by t (n) c α. Conclude: repeated testing is not a concern.

61 Moving from 1-cut to Autometrics Automatic Methods for Empirical Model Discovery p.61/151 For more detail about selection outcomes, we consider 10 experiments with N = 10 candidate regressors and T = 75 based on the design in Castle, Qin and Reed (2009): y t = β 0 +β 1 x 1,t + +β 10 x 10,t +ɛ t, (15) x t IN 10 [0,I 10 ], (16) [ ɛ t IN 0, ( λ n ) ] 2, n = 1,...,10, t = 1,...,T (17) where x t = (x 1,t,,x 10,t ), fixed across replications. Equations (15) (17) specify 10 different DGPs, indexed by n, each having n relevant variables with β 1 = = β n = 1 and 10 n irrelevant variables (β n+1 = = β 10 = 0). Throughout, they set β 0 = 5. λ = 0.4 is their R 2 = 0.9 experiment. Table 4 reports the non-centralities, ψ.

62 Experimental design Automatic Methods for Empirical Model Discovery p.62/151 n ψ 1...ψ n Table 4: Non-centralities for simulation experiments (15) (17). The GUM is the same for all 10 DGPs: y t = β 0 +β 1 x 1,t + +β 10 x 10,t +u t. We first investigate how the general search algorithm Autometrics, without diagnostic checking, performs relative to 1-cut. Comparative gauges are recorded in figure cut gauge is very accurate, and Autometrics is quite close, especially for α = Potency ratios are unity for all significance levels.

63 Gauges for 1-cut & Autometrics Automatic Methods for Empirical Model Discovery p.63/151 Gauges for 1 cut rule and Autometrics α=10% Cut 10% Autometrics 10% 1 Cut 5% Autometrics 5% 1 Cut 1% Autometrics 1% α=5% α=1% n

64 MSEs for 1-cut & Autometrics Automatic Methods for Empirical Model Discovery p.64/151 Figure 65 records ratios of MSEs of Autometrics selection to 1-cut for both unconditional and conditional distributions, but with no diagnostic tests and no bias correction. Lines labelled Relevant report the ratios of average MSEs over all relevant variables for a given n. Lines labelled Irrelevant are based on average MSEs of irrelevant variables for each DGP (none when n = 10). Unconditionally, ratios are close to 1 for irrelevant variables; but there is some advantage to using Autometrics for relevant variables, as ratios are uniformly less than 1. Benefits to selection are largest when there are few relevant variables that are highly significant. Conditionally, Autometrics outperforms 1-cut in most cases.

65 Ratios of MSEs for Autometrics to 1-cut David F. Hendry Automatic Methods for Empirical Model Discovery p.65/ UMSE ratio: Autometrics to 1 cut, α=5% 1.0 UMSE ratio: Autometrics to 1 cut, α=1% Relevant Irrelevant n CMSE ratio: Autometrics to 1 cut, α=5% n CMSE ratio: Autometrics to 1 cut, α=1% n n

66 Selecting non-linear models Automatic Methods for Empirical Model Discovery p.66/151 Transpires there are four major sub-problems: (A) specify general form of non-linearity (B) non-normality: non-linear functions capture outliers (C) excess numbers of irrelevant variables (D) potentially more variables than observations Have solutions to all four sub-problems: (A) investigator s preferred general function, followed by encompassing tests against specific ogive forms (B) remove outliers by IIS (C) super-conservative selection strategy (D) multi-stage combinatorial selection for N > T Automatic algorithm for up to cubic polynomials with polynomials times exponentials in Castle and Hendry (2009).

67 Information criteria Automatic Methods for Empirical Model Discovery p.67/151 Performance of information criteria selection algorithms well known for stationary and ergodic autoregressions BIC and HQ consistent: DGP model selected with p 1 as T relative to n: 2n log(log(t))/t is minimum rate AIC is asymptotically efficient: good for selecting stationary AR forecasting models In Gets terms, non-centralities ψ of t tests diverge, and significance levels α converge on zero Figure 68 illustrates rules for N = 10 BIC and HQ differ substantively for small T Deliberate choice of blocks for Gets

68 Significance levels of selection rules Automatic Methods for Empirical Model Discovery p.68/ AIC HQ 0.05 Liberal T 0.8 BIC Conservative T

69 AIC, SIC, HQ, t 2 = 2 significance levels Automatic Methods for Empirical Model Discovery p.69/ AIC(k=40) t 2 =2 (k=10) SIC(k=10) SIC(k=40) AIC(k=10) T

70 Information criteria selection Automatic Methods for Empirical Model Discovery p.70/151 BIC selects n from N regressors in: to minimize: where: σ 2 n = 1 T y t = N β i z i,t +u t (18) i=1 BIC n = ln σ 2 n + nlnt T, ( ) 2 T n y t β i z i,t = 1 T t=1 i=1 T t=1 Full search for all n (0,N) entails 2 N models When N = 40, then 2 40 = possible models AIC and HQ also penalize log-likelihood by f(n,t) for n parameters and sample T ũ 2 t (19)

71 Comparisons with BIC Automatic Methods for Empirical Model Discovery p.71/151 When T = 140, with N = 40: BIC 41 < BIC 40 whenever t2 (41) 3.63, or t (41) 1.9 To select no variables if null is true requires: t2 (k) (T k) ( T 1/T 1 ) k n, so require every t(i) < 1.9, which occurs with probability: P ( t (i) < 1.9, i = 1,...,40 ) = ( ) 40 = 0.09 (20) BIC would retain some null variables 91% of the time If discount highly-insignificant variables, t(i) < 2.2 then P( t (i) < 2.2) = 0.3 Still keeps 70% of time, but much improved explains why BIC does well in Hansen (1999)

72 Improving ICs Automatic Methods for Empirical Model Discovery p.72/151 [1] ICs do not ensure adequate initial model specification Gets commences by testing GUM for congruency [2] Pre-selection alters IC calculations, helps locate DGP Gets elimination tests at tight α reduce from N [3] Trade-off retain irrelevant versus lose relevant remains: changing IC weights alters implicit significance level [4] Asymptotic comfort of efficient or consistent selection does not greatly restrict strategy in small samples [5] Selection criteria too loose as N T : ad hoc fix [6] Unclear how to use when N > T Gets corrects all of these drawbacks

73 Lasso Automatic Methods for Empirical Model Discovery p.73/151 Writing: ɛ = y X β Lasso minimizes RSS subject to the sum of the absolute coefficients being less than a constant τ: min ɛ ɛ subject to k β i τ (21) Effects: Shrinkage: some coefficients shrunk towards zero i=1 Selection: some coefficients set to zero Usage: Regression shrinkage (predictions, etc.) Model selection: keep regressors with non-zero coefficients, but use OLS estimates instead

74 Computational aspects Automatic Methods for Empirical Model Discovery p.74/151 Forward stepwise find regressor most correlated with y, partial out from y and remaining xs, repeat Forward stagewise Less greedy version of stepwise: take many tiny steps Least angle regression find regressor most correlated with y, say x 1, only take a step up to the point that the next most correlated regressor has equal correlation, say x 2, increase both β 1 and β 2 until the third in line has equal correlation with current residuals, repeat

75 Inference Automatic Methods for Empirical Model Discovery p.75/151 Tibshirani (1996): quadratic programming for fixed τ requiring 2k +1 steps Efron, Hastie, Johnstone and Tibshirani (2004): algorithm for least angle regression small modification produces Lasso But no standard errors, so use bootstrap Sandwich-type from penalized likelihood Selecting degree of shrinkage by AIC, SC, C p or generalized cross-validation. Need to know degrees of freedom for IC: approximate by k, number of non-zero coefficients Basically an attempt to revive stepwise

76 Lasso on the JEDC experiments Automatic Methods for Empirical Model Discovery p.76/151 Stepwise Lasso Autometrics SC SC SC SC 5% 15% 5% 11% ρ Gauge % Potency % DGP found% Final rej%@5% * Stepwise Lasso Autometrics true k true k true k true k 5% 10% ρ Gauge % Potency % DGP found% Final * * at nominal size

77 Dynamic correlation experiments Automatic Methods for Empirical Model Discovery p.77/ Null rejection % Autometrics (1%) Autometrics (5%) Stepwise (1%) Stepwise (5%) Lasso (BIC) Non null rejection % 40 Lasso (BIC) Stepwise (1%) Stepwise (5%)

78 Correlation range experiments Automatic Methods for Empirical Model Discovery p.78/ Null rejection (%) 100 Non null rejection (%) 100 Final rejected (%) Autometrics (1%) Autometrics (5%) Stepwise (1%) Stepwise (5%) Lasso (BIC) % nominal size % nominal size ρ

79 HP experiments: Lasso Automatic Methods for Empirical Model Discovery p.79/151 HP2 HP7 HP8 HP9 HP2 HP7 HP8 HP9 Lasso selection Autometrics BIC 5% nominal size Gauge Potency DGP Enforcing the true model size does not help Lasso selection: HP2 HP7 HP8 HP9 HP2 HP7 HP8 HP9 Lasso selection Autometrics true k true k Gauge Potency DGP

80 Route map Automatic Methods for Empirical Model Discovery p.80/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

81 Model evaluation criteria Automatic Methods for Empirical Model Discovery p.81/151 Given transformed system version of (3): Λ(L)h(y) t = B(L)g(z) t +Γd t +v t (22) where v t D n1 [0,Σ v ] when x t = (y t : z t ), (n 1 : n 2 ) [1] homoscedastic innovation v t [2] weak exogeneity of z t for parameters of interest µ [3] constant, invariant parameter µ [4] data-admissible formulations on accurate observations [5] theory consistent, identifiable structures [6] encompass rival models Exhaustive nulls to test but many alternatives Models which satisfy first four are congruent Encompassing, congruent, theory-consistent model satisfies all six criteria

82 Integrated data Automatic Methods for Empirical Model Discovery p.82/151 Autometrics conducts inferences for I(0) Most selection tests remain valid: see Sims, Stock and Watson (1990) Only tests for a unit root need non-standard critical values Implemented PcGive cointegration test in PcGets Quick Modeler Most diagnostic tests also valid for integrated series: see Wooldridge (1999) Heteroscedasticity tests an exception: powers of variables then behave oddly see Caceres (2007)

83 Role of mis-specification testing Automatic Methods for Empirical Model Discovery p.83/151 Under null of a congruent GUM, Figure 84 compares gauges for Autometrics with diagnostic checking on versus off (n = 1,...,N = 10;R 2 = 0.9). Gauge is close to α if diagnostic tests not checked. Gauge is larger than α with diagnostics on, when checking to ensure a congruent reduction. Difference due to retaining insignificanct irrelevant variables which proxy chance departures from null of mis-specification tests.

84 Gauges with diagnostic tests off & on David F. Hendry Automatic Methods for Empirical Model Discovery p.84/151 Gauges for Autometrics with and without diagnostics Autometrics (diagnostics) 10% Autometrics (diagnostics) 5% Autometrics (diagnostics) 1% Autometrics (no diagnostics) 10% Autometrics (no diagnostics) 5% Autometrics (no diagnostics) 1% α=10% α=5% α=1% n

85 Impact of mis-specification testing on MSEs David F. Hendry Automatic Methods for Empirical Model Discovery p.85/151 Figure 86 records ratios of UMSEs and CMSEs with diagnostic tests switched off to on, averaging within relevant and irrelevant variables. Switching diagnostics off generally improves UMSEs, but worsens results conditionally, with the impact coming through the irrelevant variables. Switching diagnostics off leads to fewer irrelevant regressors being retained overall, improving UMSEs, but retained irrelevant variables are now more significant than with diagnostics on. Impact is largest at tight significance levels: at 10%, MSE ratios so close to unity they are not plotted.

86 Ratios of MSEs for diagnostic tests off & on David F. Hendry Automatic Methods for Empirical Model Discovery p.86/151 UMSE ratio, diagnostics off to on, 5% UMSE ratio, diagnostics off to on, 1% Relevant Irrelevant n CMSE ratio, diagnostics off to on, 5% n CMSE ratio, diagnostics off to on, 1% n n

87 Monte Carlo experimental designs Automatic Methods for Empirical Model Discovery p.87/151 Table 5: Monte Carlo designs Design T N + n n N t -values HP HP HP HP (10.9, 16.7, 8.2) JEDC (2,3,4,6,8) S S (2,2,2,2,2,2,2,2) S (3,3,3,3,3,3,3,3) S (4,4,4,4,4,4,4,4) S S (2,2,2,2,2,2,2,2) S (3,3,3,3,3,3,3,3) S (4,4,4,4,4,4,4,4)

88 Selection effects on tests Automatic Methods for Empirical Model Discovery p.88/151 Figure 89 shows calibration null rejection distributions (QQ plots) of all tests Also studied impact of selection on heteroscedasticity test statistics across experiments Figure 90 shows ratios of actual to nominal sizes Little change in quantiles above nominal significance: but increasing impact as quantile decreases Bound to occur: models with significant heteroscedasticity not selected Not a distortion of sampling properties: decision is taken for GUM Conditional on that, no change should occur

Econometrics Spring School 2016 Econometric Modelling. Lecture 6: Model selection theory and evidence Introduction to Monte Carlo Simulation

Econometrics Spring School 2016 Econometric Modelling Jurgen A Doornik, David F. Hendry, and Felix Pretis George-Washington University March 2016 Lecture 6: Model selection theory and evidence Introduction