Empirical Model Discovery

Size: px
Start display at page:

Download "Empirical Model Discovery"

Transcription

1 Automatic Methods for Empirical Model Discovery p.1/151 Empirical Model Discovery David F. Hendry Department of Economics, Oxford University Statistical Science meets Philosophy of Science Conference LSE, June 2010 Research jointly with Jennifer Castle, Jurgen Doornik, Bent Nielsen and Søren Johansen Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke, Profiles of The Future, 1961

2 Introduction Automatic Methods for Empirical Model Discovery p.2/151 Many features of models not derivable from theory Need empirical evidence on: which variables are actually relevant, any lagged responses (dynamic reactions), functional forms of connections (non-linearity), structural breaks and unit roots (non-stationarities), simultaneity (or exogeneity), expectations, etc. Almost always must be data-based on available sample: need to discover what matters empirically: see Phillips (2005) and Doornik and Hendry (2009). Four key steps: (1) define the framework the target for modelling; (2) embed that target in a general formulation; (3) search for the simplest acceptable representation; (4) evaluate the findings.

3 Automatic methods can outperform David F. Hendry Automatic Methods for Empirical Model Discovery p.3/151 [A] formulation: many candidate variables, long lag lengths, non-linearities, outliers, and parameter shifts [B] selection: handle more variables than observations, yet deliver high success rates by multi-path search [C] estimation: near unbiased estimates despite selection [D] evaluation: automatically conduct a range of pertinent tests of specification and mis-specification Now routinely create models with N > 200 variables: and select successfully even if only T = 150 observations. This talk aims to explain how we do so: it only appears to be magic! Approach embodied in Autometrics: see Doornik (2009) So let s perform some magic.

4 Basis of approach Automatic Methods for Empirical Model Discovery p.4/151 Data generation process (DGP): joint density of all variables in economy Impossible to accurately theorize about or model precisely Too high dimensional and far too non-stationary. Need to reduce to manageable size in local DGP (LDGP): the DGP in space of n variables {x t } being modelled Theory of reduction explains derivation of LDGP: joint density D x (x 1...x T θ). Acts as DGP, but parameter θ may be time varying Knowing LDGP, can generate look alikes for relevant {x t } which only deviate from actual data by unpredictable noise Cannot do better than know D x ( ) so the LDGP D x ( ) is the target for model selection

5 Formulating a good LDGP David F. Hendry Automatic Methods for Empirical Model Discovery p.5/151 Choice of n variables, {x t }, to analyze is fundamental: determines the modelling target LDGP, D x ( ), and its properties. Prior reasoning, theoretical analysis, previous evidence, historical and institutional knowledge all important. Should be 90+% of effort in an empirical analysis. Aim to avoid complicated and non-constant LDGPs. Crucial not to omit substantively important variables: small set {x t } more likely to do so. Given {x t }, have defined the target D x ( ) in (1). Now embed that target in a general model formulation.

6 Extensions for discovering a good model David F. Hendry Automatic Methods for Empirical Model Discovery p.6/151 Second of four key steps: extensions of {x t } determine how well LDGP is approximated. Four main groups of automatic extensions: additional candidate variables that might be relevant; lag formulation, implementing a sequential factorization; functional form transformations for non-linearity; impulse-indicator saturation (IIS) for parameter non-constancy and data contamination. Must also handle mapping to non-integrated data, conditional factorizations, and simultaneity. Good choices facilitate locating a congruent parsimonious-encompassing model of LDGP. Congruence: matches the evidence on desired criteria; parsimonious: as small a model as viable; encompassing: explains the results of all other models.

7 How to select an empirical model? Automatic Methods for Empirical Model Discovery p.7/151 Many grounds on which to select empirical models: theoretical empirical aesthetic philosophical Within each category, many criteria: theory: generality; internal consistency; invariance empirical: goodness-of-fit; congruence; parsimony; consistency with theory; constancy; encompassing; forecast accuracy aesthetic: elegance; relevance; tell a story philosophical: novelty; excess content; making money...

8 How to resolve? Automatic Methods for Empirical Model Discovery p.8/151 Obvious solution: match all the criteria Cannot do so as: many of the criteria conflict human knowledge is limited economy too complex, high dimensional, heterogeneous data inaccurate, samples small, and non-stationary Every decision about: a theory formulation its implementation its empirical evidential base and its evaluation involves selection Selection is inevitable, unavoidable and ubiquitous

9 Implications Automatic Methods for Empirical Model Discovery p.9/151 Any test + decision = selection, so ubiquitous Most decisions undocumented often not recognized as selection Unfortunately, model selection theory is difficult: all statistics have interdependent distributions altered by every modelling decision Fortunately, computer selection algorithms allow operational studies of alternative strategies Mainly consider Gets: General-to-specific modelling Explain approach and review progress

10 How to judge performance? Automatic Methods for Empirical Model Discovery p.10/151 Many ways to judge success of selection algorithms (A) Maximizing the goodness of fit Traditional criterion for fitting a given model, but does not lead to useful selections (B) Matching a theory-derived specification Widely used, and must work well if LDGP theory, but otherwise need not (C) Frequency of discovery of the LDGP. Overly demanding may be nearly impossible even if commenced from LDGP (eg t < 0.1) (D) Improves inference about parameters Seek small, accurate, uncertainty regions around parameters of interest but oracle principle invalid

11 Operational criteria Automatic Methods for Empirical Model Discovery p.11/151 (E) Improved forecasting over other methods Many contenders: other selections, factors, model averages, robust devices...but forecasting is different (F) Works for realistic LDGPs Unclear what those are but many claimed contenders. (G) Relative frequency of recovering LDGP starting from GUM as against starting from LDGP Costs of search additional to commencing from LDGP (H) Operating characteristics match theory Nominal null rejection frequency matches actual; retained parameters of interest unbiasedly estimated (I) Find well-specified undominated model of LDGP Internal criterion algorithm could not do better

12 Which criteria? Automatic Methods for Empirical Model Discovery p.12/151 (G), (H) and (I) are main basis: aim to satisfy all three Two costs of selection: costs of inference and search First inevitable if tests have non-zero null and non-unit rejection frequencies under alternative Applies even if commence from LDGP. Measure costs of inference by RMSE of selecting or conducting inference on LDGP When a GUM nests the LDGP, additional costs of search: calculate by increase in RMSEs for relevant variables when starting from the GUM as against the LDGP, plus those for retained irrelevant variables Also see if Autometrics outperforms other automatic methods: Information Criteria, Step-wise, Lasso...

13 Selecting and evaluating the model David F. Hendry Automatic Methods for Empirical Model Discovery p.13/151 Extensions create the general unrestricted model (GUM). The GUM should nest the LDGP, making it a special case; reductions commence from GUM to locate a specific model. Selection is step (3): search for the simplest acceptable representation. Much of this talk concerns how that selection is done, and checking how well it works. Finally, step (4): evaluate the findings and the selection process. Includes tests of new aspects, such as super exogeneity (essentially causality) for policy, and parameter invariance (constancy across regimes).

14 Route map Automatic Methods for Empirical Model Discovery p.14/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

15 Discovery Automatic Methods for Empirical Model Discovery p.15/151 Discovery: learning something previously unknown. Cannot know how to discover what is not known unlikely there is a best way of doing so. Many discoveries have element of chance: luck: Fleming penicillin from a dirty petrie dish serendipity: Becquerel discovery of radioactivity natural experiment : Dicke role of gluten in celiac disease trial and error: Edison incandescent lamp brilliant intuition: Faraday dynamo from electric shock false theories: Kepler regular solids for planetary laws valid theories: Pasteur germs not spontaneous generation systematic exploration: Lavoisier oxygen not phlogiston careful observation: Harvey circulation of blood new instruments: Galileo moons around Jupiter self testing: Marshall ulcers caused by Helicobacter pylori.

16 Common aspects of discovery Automatic Methods for Empirical Model Discovery p.16/151 Theoretical reasoning also frequent: Newton, Einstein. Science is both inductive and deductive. Must distinguish between: context of discovery where anything goes, and context of evaluation rigorous attempts to refute. Accumulation and consolidation of evidence crucial: data reduction a key attribute of science (think E = mc 2 ). Seven aspects in common to above examples of discovery. First, theoretical context or framework of ideas. Second, going outside existing state. Third, search for something. Fourth, recognition of significance of what is found. Fifth, quantification of what is found. Sixth, evaluating discovery to ascertain its reality. Seven, parsimoniously summarize vast information set.

17 Discovery in economics Automatic Methods for Empirical Model Discovery p.17/151 Discoveries in economics mainly from theory. But all economic theories are: (a) incomplete; (b) incorrect; and (c) mutable. (a) Need strong ceteris paribus assumptions: inappropriate in a non-stationary, evolving world. (b) Consider an economic analysis which suggests: y = f (x) (1) where y depends on n explanatory variables x. Form of f ( ) in (1) depends on: utility or loss functions of agents, constraints they face, & information they possess. Analyses arbitrarily assume: a form for f ( ), that f ( ) is constant, that only x matters, & that the xs are exogenous. Yet must aggregate across heterogeneous individuals whose endowments shift over time, often abruptly.

18 Theory evolves Automatic Methods for Empirical Model Discovery p.18/151 (c) Economic analyses have changed the world, and our understanding: from the invisible hand in Adam Smith s Theory of Moral Sentiments (1759, p.350) onwards, theory has progressed dramatically key insights into option pricing, auctions and contracts, principal-agent and game theories, trust and moral hazard, asymmetric information, institutions: major impacts on market functioning, industrial, and even political, organization. Imagine imposing 1900 s economic theory in empirical research today. Much past applied econometrics research is forgotten: discard the economic theory that it quantified and you discard the associated empirical evidence. Hence fads & fashions, cycles and schools in economics.

19 But the impossible is not possible David F. Hendry Automatic Methods for Empirical Model Discovery p.19/151 Why, sometimes I ve believed as many as six impossible things before breakfast. Quote from the White Queen in Lewis Carroll (1899), Through the Looking-Glass, Macmillan. If only it were just 6!

20 Empirical econometrics David F. Hendry Automatic Methods for Empirical Model Discovery p.20/151 To establish truth requires at least these 12 assumptions: 1. correct, comprehensive, & immutable economic theory; 2. correct, complete choice of all relevant variables & lags; 3. validity & relevance of all regressors & instruments; 4. precise functional forms for all variables; 5. absence of hidden dependencies; 6. all expectations formulations correct; 7. all parameters identified, constant over time, & invariant; 8. exact data measurements on every variable; 9. errors are independent & homoscedastic; 10. error distributions constant over time; 11. appropriate estimator at relevant sample sizes; 12. valid and non-distortionary method of model selection. If truth is not on offer what is?

21 Data matters for empirical implementation David F. Hendry Automatic Methods for Empirical Model Discovery p.21/151 Sample of T observations, {y t,x t }: but no theory specification of a unit of time, observations may be contaminated (measurement errors), underlying processes integrated, abrupt shifts induce various forms of breaks. Need to discover what matters empirically. So how to utilize economic analyses efficiently if cannot impose theory empirically? Answer: embed theory specification in vastly more general empirical formulation. Even if truth is not on offer, theory-guided, congruent, parsimonious encompassing models with parameters invariant to relevant policies may be. But not by conventional routes...

22 Classical econometrics: covert discovery David F. Hendry Automatic Methods for Empirical Model Discovery p.22/151 Postulated: y t = β x t +ɛ t, t = 1,...,T (2) Aim to obtain best estimate of the constant parameters β, given the n correct variables, x, independent of {ɛ t } and uncontaminated observations, T, with ɛ t IID [ 0,σ 2 ɛ]. Need severe testing (Mayo and Spanos, 2006) to discover if model is mis-specified (see Mayo, 1981). Many tests to discover departures from assumptions of (2), followed by recipes for fixing them covert and unstructured empirical model discovery.

23 Model selection: discovering the best model David F. Hendry Automatic Methods for Empirical Model Discovery p.23/151 Start from (2): y t = β x t +ɛ t, t = 1,...,T assuming N correct initial x & T, and valid conditioning where x includes many candidate regressors. Aim to discover the subset of relevant variables, x t, then estimate the associated parameters, β. Departures from assumptions often simply ignored: information criteria usually do not even check. May select best model in set, yet be a poor approximation to LDGP.

24 Robust statistics: discovering the best sample David F. Hendry Automatic Methods for Empirical Model Discovery p.24/151 Same start (2), but aim to find a robust estimate of β by selecting over T. Worry about data contamination and outliers, so select sample, T, where outliers least in evidence, given correct set of relevant variables x. Other difficulties still need separate tests, and must be fixed if found. x rarely selected jointly with T, so assumes x = x. Similarly for non-parametric methods: aim to discover best functional form & distribution assuming correct x, no data contamination, constant β, etc., all rarely checked.

25 Automatic empirical model discovery David F. Hendry Automatic Methods for Empirical Model Discovery p.25/151 Re-frame empirical modelling as discovery process: part of a progressive research strategy. Starting from T observations on N > n variables x, aim to find β for s lagged functions g(x t )...g(x t s ) of a subset of n variables x, jointly with T and {1 {t=ti }} indicators for breaks, outliers etc. Final model is congruent, parsimonious-encompassing representation: s i=0 λ i y t i = s i=0 (β i) g ( x ) m t i + γ j d t,j +v t (3) j=1 where λ 0 = 1 and {v t } is an innovation process. Embeds initial y = f (x), but much more general. Globally, learning must be simple to general; but locally, need not be.

26 Implications for automatic methods Automatic Methods for Empirical Model Discovery p.26/151 Same seven stages as for discovery in general. First, theoretical derivation of the relevant set x. Second, going outside current view by automatic creation of a general model from x embedding y = f (x). Third, search by automatic selection to find viable representations: too large for manual labor. Fourth, criteria to recognize when search is completed: congruent parsimonious-encompassing model. Fifth, quantification of the outcome: translated into unbiasedly estimating the resulting model. Sixth, evaluate discovery to check its reality: new data, new tests or new procedures. Can also evaluate the selection process itself. Seventh, summarize vast information set in parsimonious but undominated model.

27 Retaining economic theory insights David F. Hendry Automatic Methods for Empirical Model Discovery p.27/151 Approach is not atheoretic. Theory formulations should be embedded in GUM, can be retained without selection. Call such imposition forcing variables ensures they are retained, but does not guarantee they will be significant. Can also ensure theory-derived signs of long-run relation maintained, if not significantly rejected by the evidence. But much observed data variability in economics is due to features absent from most economic theories: which empirical models must handle. Extension of LDGP candidates, x t, in GUM allows theory formulation as special case, yet protects against contaminating influences (like outliers) absent from theory. Extras can be selected at tight significance levels.

28 Possible economic theory outcomes David F. Hendry Automatic Methods for Empirical Model Discovery p.28/151 1] Theory kept exactly: all aspects significant with anticipated signs, no other variables kept. 2] Theory kept but only part of explanation: all aspects significant with anticipated signs, but other variables also kept as substantively relevant. 3] Theory partially kept: only some aspects significant with anticipated signs, and other variables also kept as substantively relevant. 4] Theory not kept: no aspects significant and other variables do all explanation. Win-win situation: theory kept if valid and complete; yet learn when it is not correct so can automatic model selection work?

29 Route map Automatic Methods for Empirical Model Discovery p.29/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

30 Extensions outside standard information David F. Hendry Automatic Methods for Empirical Model Discovery p.30/151 Extensions determine how well LDGP is approximated Create three extensions automatically: (i) lag formulation to implement sequential factorization; (ii) functional form transformations for non-linearity; (iii) impulse-indicator saturation (IIS) for parameter non-constancy and data contamination. (i) Create s lags x t...x t s to formulate general linear model: y t = β 0 + s λ i y t i + i=1 r i=1 s β i,j x i,t j +ɛ t (4) j=0 x t could also be modelled as a system: x t = γ + s Γ j x t j +ɛ t (5) j=1

31 Automatic non-linear extensions Automatic Methods for Empirical Model Discovery p.31/151 Test for non-linearity in general linear model by low-dimensional portmanteau test in Castle and Hendry (2010) (cubics of principal components z t of the x t ). (ii) If reject, create g(z t ), otherwise g(x t ) = x t : presently, implemented general cubics with exponential functions. Number of potential regressors for cubic polynomials is: M K = K(K +1)(K +5)/6. Explosion in number of terms as K = r (s+1) increases: K M K Quickly reach huge M K : but only 3K if use z k i,t j. (Investigating squashing functions, to better approximate non-linearity in economics, suggested by Hal White)

32 Impulse-indicator saturation Automatic Methods for Empirical Model Discovery p.32/151 (iii) To tackle multiple breaks & data contamination (outliers), add T impulse indicators to candidates for T observations. Consider x i IID [ µ,σ 2 ɛ] for i = 1,...,T µ is parameter of interest Uncertain of outliers, so add T indicators 1 {t=ti } to set of candidate regressors. First, include half of indicators, record significant: just dummying out T/2 observations for estimating µ Then omit, include other half, record again. Combine sub-sample indicators, & select significant. αt indicators selected on average at significance level α Feasible split-sample impulse-indicator saturation (IIS) algorithm: see Hendry, Johansen and Santos (2008)

33 Dynamic generalizations Automatic Methods for Empirical Model Discovery p.33/151 Johansen and Nielsen (2009) extend IIS to both stationary and unit-root autoregressions When distribution is symmetric, adding T impulse-indicators to a regression with n variables, coefficient β (not selected) and second moment Σ: T 1/2 ( β β) D N n [ 0,σ 2 ɛ Σ 1 Ω β ] Efficiency of IIS estimator β with respect to OLS β measured by Ω β depends on c α and distribution Must lose efficiency under null: but small loss αt only 1% at α = 1/T if T = 100, despite T extra candidates. Potential for major gain under alternatives of breaks and/or data contamination: variant of robust estimation

34 Efficiency of IIS estimator David F. Hendry Automatic Methods for Empirical Model Discovery p.34/ Efficiency of indicator saturated estimator critical value c tail probability for c Figure from Johansen and Nielsen (2009) shows outcome for equal split in ɛ i [ IID µ,σ 2 ] ɛ Left panel shows efficiency as function of c α ; right panel as function of tail probability P( ɛ t > σ ɛ c α )

35 Structural break example Automatic Methods for Empirical Model Discovery p.35/151 Y Fitted 2 scaled residuals Size of the break is 10 standard errors at 0.75T There are no outliers in this mis-specified model as all residuals [ 2,2] SDs: outliers structural breaks step-wise regression has zero power Let s see what Autometrics reports

36 Split-sample search in IIS David F. Hendry Automatic Methods for Empirical Model Discovery p.36/151 Dummies included in GUM Final model: actual and fitted 15 Final model: Dummies selected Block Block Final

37 Specification of GUM Automatic Methods for Empirical Model Discovery p.37/151 Most major formulation decisions now made: which r variables (z t, after transforming x t ); their lag lengths (s); functional forms (cubics); structural breaks (any number, anywhere). Leads to general unrestricted model (GUM): y t = + r i=1 r i=1 s β i,j x i,t j + j=0 s γ i,j zi,t j 3 + j=0 r i=1 s ρ i,j z i,t j + j=0 s λ j y t j + j=1 r i=1 s θ i,j zi,t j 2 j=0 T δ i 1 {i=t} +ɛ t K = 4r(s+1)+s potential regressors, plus T indicators: includes factors z and variables x despite collinearity. Bound to have N > T : consider exogeneity later. i=1

38 Route map Automatic Methods for Empirical Model Discovery p.38/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

39 Model selection Automatic Methods for Empirical Model Discovery p.39/151 To successfully determine what matters and how it enters, all main determinants must be included: omitting key variables adversely affects selected models. Especially forceful issue when data are wide sense non-stationary both integrated and not time invariant Catch 22 have more variables N than observations T : so all cannot be entered from the outset. Requires expanding as well as contracting searches Have to select: not an option just to estimate. To resolve conundrum, analysis proceeds in 5 stages.

40 Five main stages, then evaluation Automatic Methods for Empirical Model Discovery p.40/151 1] 1-cut selection for orthogonal designs with N << T. 2] Selection matters, so consider effects of bias correction on distributions of estimates 3] Compare 1-cut with Autometrics, which works in non-orthogonal models, still with N << T. 4] More variables N than observations T follows as IIS. 5] Multiple breaks, using IIS Having resolved selection, next consider evaluation: 6] Impact of diagnostic testing 7] Role of encompassing in automatic selection 8] Testing exogeneity and invariance

41 Understanding model selection Automatic Methods for Empirical Model Discovery p.41/151 Consider a perfectly orthogonal regression model: y t = N i=1 β iz i,t +ɛ t (6) E[z i,t z j,t ] = λ i,i for i = j & 0 i j, ɛ t IN[0,σ 2 ɛ] and T >> N. Order the N sample t 2 -statistics testing H 0 : β j = 0: t 2 (N) t2 (N 1) t2 (1) Cut-off m between included and excluded variables is: t 2 (m) c2 α > t 2 (m 1) Larger values retained: all others eliminated. Only one decision needed even for N 1000: repeated testing does not occur, and goodness of fit is never considered. Maintain average false null retention at one variable by α 1/N, with α declining as T

42 Interpretation Automatic Methods for Empirical Model Discovery p.42/151 Path search gives impression of repeated testing. Confused with selecting from 2 N possible models (here = , an impossible task). We are selecting variables, not models, & only N variables. But selection matters, as only retain significant outcomes. Sampling variation also entails retain irrelevant, or miss relevant, by chance near selection margin. Conditional on selecting, estimates biased away from origin: but can bias correct as know c α. Small efficiency cost under null for examining many candidate regressors, even N >> T. Almost as good as commencing from LDGP at same c α.

43 Monte Carlo simulations for N = 1000 David F. Hendry Automatic Methods for Empirical Model Discovery p.43/151 We now illustrate above theory by simulating selection of 10 relevant from 1000 candidate variables. DGP is given by: y t = β 1 x 1,t + +β 10 x 10,t +ɛ t, (7) x t IN 1000 [0,Ω], (8) ɛ t IN[0,1], (9) where x t = (x 1,t,,x 1000,t ). Set Ω = I 1000 for simplicity, keeping regressors fixed between experiments Use T = 2000 observations. DGP coefficients, β, and non-centralities, ψ, in table 1 Also theoretical powers of t-tests on individual coefficients.

44 Non-centralities Automatic Methods for Empirical Model Discovery p.44/151 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 β ψ P P Table 1: Coefficientsβ i, non-centralitiesψ i, theoretical retention probabilities, P α,i. GUM contains all 1000 regressors and intercept: y t = β 0 +β 1 x 1,t + +β 1000 x 1000,t +u t, t = 1,...,2000. DGP has first n = 10 variables relevant, so 991 variables irrelevant in GUM (with intercept). Report outcomes for α = 1% and 0.1% M = 1000 replications, where 1( ) is indicator.

45 Gauge and potency Automatic Methods for Empirical Model Discovery p.45/151 Let β i denote estimate of β i after model selection. gauge : average retention frequency of irrelevant variables. potency : average retention frequency of relevant variables. In simulation experiments with M replications: retention rate: p k = 1 M M i=1 1 ( β k,i 0), k = 0,...,N, potency: = 1 n n k=1 p k, gauge: = 1 N n+1 ( p 0 + N k=n+1 p k All retained variables significant at c α in 1-cut. But not necessarily the case with automated Gets. Irrelevant variables may be retained because of: (a) diagnostic checking when a variable is insignificant, but deletion makes a diagnostic test significant, or (b) with encompassing, a variable can be individually insignificant, but not jointly with all variables deleted so far. ).

46 Simulation outcomes Automatic Methods for Empirical Model Discovery p.46/151 Simulation gauges and potencies recorded in table 2. α Gauge Potency 1% 1.01% 81% 0.1% 0.10% 69% Table 2: Potency and gauge for 1-cut selection with 1000 variables. Gauges not significantly different from nominal sizes α: selection is not over-sized even with N = 1000 variables Potencies close to average powers even when selecting just n = 10 relevant regressors from 1000 variables.

47 Calculating MSEs Automatic Methods for Empirical Model Discovery p.47/151 Also report MSEs after model selection. β k,i is OLS estimate of coefficient on x k,t in GUM for replication i. β k,i is OLS estimate after model selection β k,i = 0 when x k,t not selected in final model. Calculate following MSEs: MSE k = M 1 ) M 2, i=1( βk,i β k UMSE k = 1 M M i=1( βk,i β k ) 2, CMSE k = M i=1 [ ( βk,i β k) 2 1 ( βk,i 0)] M i=1 1 ( βk,i 0), ( β 2 k if M i=1 1 ( β k,i 0) = 0 ) Unconditional MSE (UMSE): sets β k,i = 0 when a variable is not selected. Conditional MSE (CMSE) is over retained variables only.

48 1-cut selection, N = 1000, n = 10 David F. Hendry Automatic Methods for Empirical Model Discovery p.48/151 Retention of relevant variables at 1% 1.00 Monte Carlo Theory Retention of relevant variables at 0.1% 1.00 Monte Carlo Theory MSEs at 1% x kt MSEs at 0.1% UMSE at 1% CMSE at 1% UMSE at 0.1% CMSE at 0.1% x kt ~ β k ~ β k

49 Simulation MSEs Automatic Methods for Empirical Model Discovery p.49/151 Figure 48 shows that retention rates for individual relevant variables are as expected from the theory. CMSEs are always below UMSEs for relevant variables (bottom graphs in Fig. 48), with exception of β 1 at 0.1%. As a baseline, UMSE for every estimated coefficient of a variable in GUM is Generally worse than GUM for relevant in MSE terms: but reduced by about 990 variables on average. Main gains for irrelevant variables, as we will see. First consider impact of bias-correcting for selecting just those variables where t c α

50 Route map Automatic Methods for Empirical Model Discovery p.50/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

51 Bias corrections Automatic Methods for Empirical Model Discovery p.51/151 Selection matters as only retain significant variables: so correct final estimates for selection Convenient approximation that: t β = β σ β β σ β N [ β σ β,1 ] = N[ψ,1] when non-centrality of t-test is ψ = β/σ β Use Gaussian approximation (IIS helps ensure): φ(w) = 1 2π exp ( 12 w2 ) where Φ(w) = w Doubly-truncated distribution expected t-value is: E [ ] t β t β > c α ;ψ φ(x) dx = ψ (10) so observed t -value is unbiased estimator for ψ

52 Truncation correction Automatic Methods for Empirical Model Discovery p.52/151 Sample selection induces: ψ = ψ + φ(c α ψ) φ( c α ψ) 1 Φ(c α ψ)+φ( c α ψ) = ψ +r(ψ,c α) (11) Correct β once ψ is known: for β > 0, say: Let: E[ β β σ β c α ψ = t β r ] = β (t β,c α ) ( 1+ r(ψ,c α) ψ ) = β, then ψ = t β r ( ψ ) ψ ( ψ,cα ) leading to the bias-corrected parameter estimate: from inverting (12). β = β ( ψ/t β) (12) (13) (14)

53 Implementing bias correction Automatic Methods for Empirical Model Discovery p.53/151 Bias corrects closely, not exactly, for relevant: over-corrects for some t-values. Some increase in MSEs of relevant variables. Correction exacerbates downward bias in unconditional estimates of relevant coefficients & increases MSEs slightly. No impact on bias of estimated parameters of irrelevant variables as their β i = 0, so unbiased with or without selection But remarkable decrease in MSEs of irrelevant variables First free lunch of new approach. Obvious why in retrospect most correction for t near c α, which occurs for retained irrelevant variables.

54 Simulation MSEs Automatic Methods for Empirical Model Discovery p.54/151 Impact of bias corrections on retained irrelevant and relevant variables, for N = 1000 and n = 10 in (6). α 1% 0.1% 1% 0.1% average CMSE over average CMSE over 990 irrelevant variables 10 relevant variables uncorrected β β after correction Table 3: Average CMSEs, times 100, for retained relevant and irrelevant variables (excluding β 0 ), with and without bias correction. Greatly reduces MSEs of irrelevant variables in both unconditional and conditional distributions. Coefficients of retained variables with t c α are not bias corrected insignificant estimates set to zero.

55 Bias correcting conditional distributions at 5% David F. Hendry Automatic Methods for Empirical Model Discovery p.55/151 (a) ψ=2 (b) ψ= (c) ψ= (d) Intercept, ψ=

56 Implications of selection Automatic Methods for Empirical Model Discovery p.56/151 Despite selecting from N = 1000 potential variables when only n = 10 are relevant: (1) nearly unbiased estimates of coefficients and equation standard errors can be obtained; (2) little loss of efficiency from checking many irrelevant variables; (3) some loss from not retaining relevant variables at large c α ; (4) huge gain by not commencing from an under-specified model; (5) even works well for fat-tailed errors at tight α when IIS used see below. Now add path search for non-orthogonal data sets.

57 Route map Automatic Methods for Empirical Model Discovery p.57/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

58 Autometrics improves on previous algorithms David F. Hendry Automatic Methods for Empirical Model Discovery p.58/151 Search paths: Autometrics examines whole search space; discards irrelevant routes systematically. Likelihood-based: Autometrics implemented in likelihood framework. Efficiency: Autometrics improves computational efficiency: avoids repeated estimation & diagnostic testing, remembers terminal models. Structured: Autometrics separates estimation criterion, search algorithm, evaluation, & termination decision. Generality: Autometrics can handle N > T. If GUM is congruent, so are all terminals: undominated, mutually-encompassing representations. If several terminal models, all reported: can combine, or one selected (by, e.g., Schwarz, 1978, criterion).

59 Autometrics tree search Automatic Methods for Empirical Model Discovery p.59/151 Search follows branches till no insignificant variables; tests for congruence and parsimonious encompassing; backtracks if either fails, till first non-rejection found.

60 Selecting by Autometrics Automatic Methods for Empirical Model Discovery p.60/151 Even when 1-cut applicable, little loss, and often a gain, from using path-search algorithm Autometrics. Autometrics applicable to non-orthogonal problems, and N > T. Gauge (average retention rate of irrelevant variables) close to α. Potency (average retention rate of relevant variables) near theory value for a 1-off test. Goodness-of-fit not directly used to select models & no attempt to prove that a given set of variables matters, but choice of c α affects R 2 and n through retention by t (n) c α. Conclude: repeated testing is not a concern.

61 Moving from 1-cut to Autometrics Automatic Methods for Empirical Model Discovery p.61/151 For more detail about selection outcomes, we consider 10 experiments with N = 10 candidate regressors and T = 75 based on the design in Castle, Qin and Reed (2009): y t = β 0 +β 1 x 1,t + +β 10 x 10,t +ɛ t, (15) x t IN 10 [0,I 10 ], (16) [ ɛ t IN 0, ( λ n ) ] 2, n = 1,...,10, t = 1,...,T (17) where x t = (x 1,t,,x 10,t ), fixed across replications. Equations (15) (17) specify 10 different DGPs, indexed by n, each having n relevant variables with β 1 = = β n = 1 and 10 n irrelevant variables (β n+1 = = β 10 = 0). Throughout, they set β 0 = 5. λ = 0.4 is their R 2 = 0.9 experiment. Table 4 reports the non-centralities, ψ.

62 Experimental design Automatic Methods for Empirical Model Discovery p.62/151 n ψ 1...ψ n Table 4: Non-centralities for simulation experiments (15) (17). The GUM is the same for all 10 DGPs: y t = β 0 +β 1 x 1,t + +β 10 x 10,t +u t. We first investigate how the general search algorithm Autometrics, without diagnostic checking, performs relative to 1-cut. Comparative gauges are recorded in figure cut gauge is very accurate, and Autometrics is quite close, especially for α = Potency ratios are unity for all significance levels.

63 Gauges for 1-cut & Autometrics Automatic Methods for Empirical Model Discovery p.63/151 Gauges for 1 cut rule and Autometrics α=10% Cut 10% Autometrics 10% 1 Cut 5% Autometrics 5% 1 Cut 1% Autometrics 1% α=5% α=1% n

64 MSEs for 1-cut & Autometrics Automatic Methods for Empirical Model Discovery p.64/151 Figure 65 records ratios of MSEs of Autometrics selection to 1-cut for both unconditional and conditional distributions, but with no diagnostic tests and no bias correction. Lines labelled Relevant report the ratios of average MSEs over all relevant variables for a given n. Lines labelled Irrelevant are based on average MSEs of irrelevant variables for each DGP (none when n = 10). Unconditionally, ratios are close to 1 for irrelevant variables; but there is some advantage to using Autometrics for relevant variables, as ratios are uniformly less than 1. Benefits to selection are largest when there are few relevant variables that are highly significant. Conditionally, Autometrics outperforms 1-cut in most cases.

65 Ratios of MSEs for Autometrics to 1-cut David F. Hendry Automatic Methods for Empirical Model Discovery p.65/ UMSE ratio: Autometrics to 1 cut, α=5% 1.0 UMSE ratio: Autometrics to 1 cut, α=1% Relevant Irrelevant n CMSE ratio: Autometrics to 1 cut, α=5% n CMSE ratio: Autometrics to 1 cut, α=1% n n

66 Selecting non-linear models Automatic Methods for Empirical Model Discovery p.66/151 Transpires there are four major sub-problems: (A) specify general form of non-linearity (B) non-normality: non-linear functions capture outliers (C) excess numbers of irrelevant variables (D) potentially more variables than observations Have solutions to all four sub-problems: (A) investigator s preferred general function, followed by encompassing tests against specific ogive forms (B) remove outliers by IIS (C) super-conservative selection strategy (D) multi-stage combinatorial selection for N > T Automatic algorithm for up to cubic polynomials with polynomials times exponentials in Castle and Hendry (2009).

67 Information criteria Automatic Methods for Empirical Model Discovery p.67/151 Performance of information criteria selection algorithms well known for stationary and ergodic autoregressions BIC and HQ consistent: DGP model selected with p 1 as T relative to n: 2n log(log(t))/t is minimum rate AIC is asymptotically efficient: good for selecting stationary AR forecasting models In Gets terms, non-centralities ψ of t tests diverge, and significance levels α converge on zero Figure 68 illustrates rules for N = 10 BIC and HQ differ substantively for small T Deliberate choice of blocks for Gets

68 Significance levels of selection rules Automatic Methods for Empirical Model Discovery p.68/ AIC HQ 0.05 Liberal T 0.8 BIC Conservative T

69 AIC, SIC, HQ, t 2 = 2 significance levels Automatic Methods for Empirical Model Discovery p.69/ AIC(k=40) t 2 =2 (k=10) SIC(k=10) SIC(k=40) AIC(k=10) T

70 Information criteria selection Automatic Methods for Empirical Model Discovery p.70/151 BIC selects n from N regressors in: to minimize: where: σ 2 n = 1 T y t = N β i z i,t +u t (18) i=1 BIC n = ln σ 2 n + nlnt T, ( ) 2 T n y t β i z i,t = 1 T t=1 i=1 T t=1 Full search for all n (0,N) entails 2 N models When N = 40, then 2 40 = possible models AIC and HQ also penalize log-likelihood by f(n,t) for n parameters and sample T ũ 2 t (19)

71 Comparisons with BIC Automatic Methods for Empirical Model Discovery p.71/151 When T = 140, with N = 40: BIC 41 < BIC 40 whenever t2 (41) 3.63, or t (41) 1.9 To select no variables if null is true requires: t2 (k) (T k) ( T 1/T 1 ) k n, so require every t(i) < 1.9, which occurs with probability: P ( t (i) < 1.9, i = 1,...,40 ) = ( ) 40 = 0.09 (20) BIC would retain some null variables 91% of the time If discount highly-insignificant variables, t(i) < 2.2 then P( t (i) < 2.2) = 0.3 Still keeps 70% of time, but much improved explains why BIC does well in Hansen (1999)

72 Improving ICs Automatic Methods for Empirical Model Discovery p.72/151 [1] ICs do not ensure adequate initial model specification Gets commences by testing GUM for congruency [2] Pre-selection alters IC calculations, helps locate DGP Gets elimination tests at tight α reduce from N [3] Trade-off retain irrelevant versus lose relevant remains: changing IC weights alters implicit significance level [4] Asymptotic comfort of efficient or consistent selection does not greatly restrict strategy in small samples [5] Selection criteria too loose as N T : ad hoc fix [6] Unclear how to use when N > T Gets corrects all of these drawbacks

73 Lasso Automatic Methods for Empirical Model Discovery p.73/151 Writing: ɛ = y X β Lasso minimizes RSS subject to the sum of the absolute coefficients being less than a constant τ: min ɛ ɛ subject to k β i τ (21) Effects: Shrinkage: some coefficients shrunk towards zero i=1 Selection: some coefficients set to zero Usage: Regression shrinkage (predictions, etc.) Model selection: keep regressors with non-zero coefficients, but use OLS estimates instead

74 Computational aspects Automatic Methods for Empirical Model Discovery p.74/151 Forward stepwise find regressor most correlated with y, partial out from y and remaining xs, repeat Forward stagewise Less greedy version of stepwise: take many tiny steps Least angle regression find regressor most correlated with y, say x 1, only take a step up to the point that the next most correlated regressor has equal correlation, say x 2, increase both β 1 and β 2 until the third in line has equal correlation with current residuals, repeat

75 Inference Automatic Methods for Empirical Model Discovery p.75/151 Tibshirani (1996): quadratic programming for fixed τ requiring 2k +1 steps Efron, Hastie, Johnstone and Tibshirani (2004): algorithm for least angle regression small modification produces Lasso But no standard errors, so use bootstrap Sandwich-type from penalized likelihood Selecting degree of shrinkage by AIC, SC, C p or generalized cross-validation. Need to know degrees of freedom for IC: approximate by k, number of non-zero coefficients Basically an attempt to revive stepwise

76 Lasso on the JEDC experiments Automatic Methods for Empirical Model Discovery p.76/151 Stepwise Lasso Autometrics SC SC SC SC 5% 15% 5% 11% ρ Gauge % Potency % DGP found% Final rej%@5% * Stepwise Lasso Autometrics true k true k true k true k 5% 10% ρ Gauge % Potency % DGP found% Final * * at nominal size

77 Dynamic correlation experiments Automatic Methods for Empirical Model Discovery p.77/ Null rejection % Autometrics (1%) Autometrics (5%) Stepwise (1%) Stepwise (5%) Lasso (BIC) Non null rejection % 40 Lasso (BIC) Stepwise (1%) Stepwise (5%)

78 Correlation range experiments Automatic Methods for Empirical Model Discovery p.78/ Null rejection (%) 100 Non null rejection (%) 100 Final rejected (%) Autometrics (1%) Autometrics (5%) Stepwise (1%) Stepwise (5%) Lasso (BIC) % nominal size % nominal size ρ

79 HP experiments: Lasso Automatic Methods for Empirical Model Discovery p.79/151 HP2 HP7 HP8 HP9 HP2 HP7 HP8 HP9 Lasso selection Autometrics BIC 5% nominal size Gauge Potency DGP Enforcing the true model size does not help Lasso selection: HP2 HP7 HP8 HP9 HP2 HP7 HP8 HP9 Lasso selection Autometrics true k true k Gauge Potency DGP

80 Route map Automatic Methods for Empirical Model Discovery p.80/151 (1) Discovery (2) Automatic model extension (3) 1-cut model selection (4) Automatic estimation (5) Automatic model selection (6) Model evaluation (7) Excess numbers of variables N > T (8) An empirical example: food expenditure (9) Improving nowcasts Conclusions

81 Model evaluation criteria Automatic Methods for Empirical Model Discovery p.81/151 Given transformed system version of (3): Λ(L)h(y) t = B(L)g(z) t +Γd t +v t (22) where v t D n1 [0,Σ v ] when x t = (y t : z t ), (n 1 : n 2 ) [1] homoscedastic innovation v t [2] weak exogeneity of z t for parameters of interest µ [3] constant, invariant parameter µ [4] data-admissible formulations on accurate observations [5] theory consistent, identifiable structures [6] encompass rival models Exhaustive nulls to test but many alternatives Models which satisfy first four are congruent Encompassing, congruent, theory-consistent model satisfies all six criteria

82 Integrated data Automatic Methods for Empirical Model Discovery p.82/151 Autometrics conducts inferences for I(0) Most selection tests remain valid: see Sims, Stock and Watson (1990) Only tests for a unit root need non-standard critical values Implemented PcGive cointegration test in PcGets Quick Modeler Most diagnostic tests also valid for integrated series: see Wooldridge (1999) Heteroscedasticity tests an exception: powers of variables then behave oddly see Caceres (2007)

83 Role of mis-specification testing Automatic Methods for Empirical Model Discovery p.83/151 Under null of a congruent GUM, Figure 84 compares gauges for Autometrics with diagnostic checking on versus off (n = 1,...,N = 10;R 2 = 0.9). Gauge is close to α if diagnostic tests not checked. Gauge is larger than α with diagnostics on, when checking to ensure a congruent reduction. Difference due to retaining insignificanct irrelevant variables which proxy chance departures from null of mis-specification tests.

84 Gauges with diagnostic tests off & on David F. Hendry Automatic Methods for Empirical Model Discovery p.84/151 Gauges for Autometrics with and without diagnostics Autometrics (diagnostics) 10% Autometrics (diagnostics) 5% Autometrics (diagnostics) 1% Autometrics (no diagnostics) 10% Autometrics (no diagnostics) 5% Autometrics (no diagnostics) 1% α=10% α=5% α=1% n

85 Impact of mis-specification testing on MSEs David F. Hendry Automatic Methods for Empirical Model Discovery p.85/151 Figure 86 records ratios of UMSEs and CMSEs with diagnostic tests switched off to on, averaging within relevant and irrelevant variables. Switching diagnostics off generally improves UMSEs, but worsens results conditionally, with the impact coming through the irrelevant variables. Switching diagnostics off leads to fewer irrelevant regressors being retained overall, improving UMSEs, but retained irrelevant variables are now more significant than with diagnostics on. Impact is largest at tight significance levels: at 10%, MSE ratios so close to unity they are not plotted.

86 Ratios of MSEs for diagnostic tests off & on David F. Hendry Automatic Methods for Empirical Model Discovery p.86/151 UMSE ratio, diagnostics off to on, 5% UMSE ratio, diagnostics off to on, 1% Relevant Irrelevant n CMSE ratio, diagnostics off to on, 5% n CMSE ratio, diagnostics off to on, 1% n n

87 Monte Carlo experimental designs Automatic Methods for Empirical Model Discovery p.87/151 Table 5: Monte Carlo designs Design T N + n n N t -values HP HP HP HP (10.9, 16.7, 8.2) JEDC (2,3,4,6,8) S S (2,2,2,2,2,2,2,2) S (3,3,3,3,3,3,3,3) S (4,4,4,4,4,4,4,4) S S (2,2,2,2,2,2,2,2) S (3,3,3,3,3,3,3,3) S (4,4,4,4,4,4,4,4)

88 Selection effects on tests Automatic Methods for Empirical Model Discovery p.88/151 Figure 89 shows calibration null rejection distributions (QQ plots) of all tests Also studied impact of selection on heteroscedasticity test statistics across experiments Figure 90 shows ratios of actual to nominal sizes Little change in quantiles above nominal significance: but increasing impact as quantile decreases Bound to occur: models with significant heteroscedasticity not selected Not a distortion of sampling properties: decision is taken for GUM Conditional on that, no change should occur

Econometrics Spring School 2016 Econometric Modelling. Lecture 6: Model selection theory and evidence Introduction to Monte Carlo Simulation

Econometrics Spring School 2016 Econometric Modelling. Lecture 6: Model selection theory and evidence Introduction to Monte Carlo Simulation Econometrics Spring School 2016 Econometric Modelling Jurgen A Doornik, David F. Hendry, and Felix Pretis George-Washington University March 2016 Lecture 6: Model selection theory and evidence Introduction

More information

Semi-automatic Non-linear Model Selection

Semi-automatic Non-linear Model Selection Semi-automatic Non-linear Model Selection Jennifer L. Castle Institute for New Economic Thinking at the Oxford Martin School, University of Oxford Based on research with David F. Hendry ISF 2013, Seoul

More information

Evaluating Automatic Model Selection

Evaluating Automatic Model Selection Evaluating Automatic Model Selection Jennifer L. Castle, Jurgen A. Doornik and David F. Hendry Department of Economics, University of Oxford, UK Abstract We outline a range of criteria for evaluating model

More information

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES ISSN 171-098 DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES EVALUATING AUTOMATIC MODEL SELECTION Jennifer L. Castle, Jurgen A. Doornik and David F. Hendry Number 7 January 010 Manor Road Building, Oxford

More information

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES ISSN 1471-0498 DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES STATISTICAL MODEL SELECTION WITH BIG DATA Jurgen A. Doornik and David F. Hendry Number 735 December 2014 Manor Road Building, Manor Road,

More information

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES ISSN 1471-0498 DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES STEP-INDICATOR SATURATION Jurgen A. Doornik, David F. Hendry and Felix Pretis Number 658 June 2013 Manor Road Building, Manor Road, Oxford

More information

Model Selection when there are Multiple Breaks

Model Selection when there are Multiple Breaks Model Selection when there are Multiple Breaks Jennifer L. Castle, Jurgen A. Doornik and David F. Hendry Department of Economics, University of Oxford, UK Abstract We consider model selection when there

More information

Statistical model selection with Big Data

Statistical model selection with Big Data ECONOMIC METHODOLOGY, PHILOSOPHY & HISTORY RESEARCH ARTICLE Statistical model selection with Big Data Jurgen A. Doornik 1,2 and David F. Hendry 1,2 * Received: 11 March 2015 Accepted: 01 April 2015 Published:

More information

Teaching Undergraduate Econometrics

Teaching Undergraduate Econometrics Teaching Undergraduate Econometrics David F. Hendry Department of Economics, Oxford University ESRC Funded under Grant RES-062-23-0061 Teaching Undergraduate Econometrics p.1/38 Introduction Perfect time

More information

IIF Economic Forecasting Summer School 2018, Boulder Colorado, USA

IIF Economic Forecasting Summer School 2018, Boulder Colorado, USA IIF Economic Forecasting Summer School 2018, Boulder Colorado, USA Dr. Jennifer L. Castle, David F. Hendry, and Dr. J. James Reade Economics Department, Oxford and Reading Universities, UK. September 30,

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project

1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project Erik Swanson Cori Saviano Li Zha Final Project 1 Introduction In analyzing time series data, we are posed with the question of how past events influences the current situation. In order to determine this,

More information

The Properties of Automatic Gets Modelling

The Properties of Automatic Gets Modelling The Properties of Automatic Gets Modelling David F. Hendry and Hans-Martin Krolzig Economics Department, Oxford University. First version: September 00 This version: March 003 Abstract We examine the properties

More information

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i 1/34 Outline Basic Econometrics in Transportation Model Specification How does one go about finding the correct model? What are the consequences of specification errors? How does one detect specification

More information

Department of Economics. Discussion Paper Series

Department of Economics. Discussion Paper Series Department of Economics Discussion Paper Series Selecting a Model for Forecasting Jennifer L. Castle, Jurgen A. Doornik and David F. Hendry Number 861 November, 2018 Department of Economics Manor Road

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 11 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 30 Recommended Reading For the today Advanced Time Series Topics Selected topics

More information

Optimizing forecasts for inflation and interest rates by time-series model averaging

Optimizing forecasts for inflation and interest rates by time-series model averaging Optimizing forecasts for inflation and interest rates by time-series model averaging Presented at the ISF 2008, Nice 1 Introduction 2 The rival prediction models 3 Prediction horse race 4 Parametric bootstrap

More information

Inflation Revisited: New Evidence from Modified Unit Root Tests

Inflation Revisited: New Evidence from Modified Unit Root Tests 1 Inflation Revisited: New Evidence from Modified Unit Root Tests Walter Enders and Yu Liu * University of Alabama in Tuscaloosa and University of Texas at El Paso Abstract: We propose a simple modification

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Econometric Modelling

Econometric Modelling Econometric Modelling David F. Hendry Nuffield College, Oxford University. July 8, 2000 Abstract The theory of reduction explains the origins of empirical models, by delineating all the steps involved

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018

More information

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES ISSN 1471-0498 DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES Improving the Teaching of Econometrics David F. Hendry and Grayham E. Mizon Number 785 March 2016 Manor Road Building, Oxford OX1 3UQ Improving

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Econometría 2: Análisis de series de Tiempo

Econometría 2: Análisis de series de Tiempo Econometría 2: Análisis de series de Tiempo Karoll GOMEZ kgomezp@unal.edu.co http://karollgomez.wordpress.com Segundo semestre 2016 IX. Vector Time Series Models VARMA Models A. 1. Motivation: The vector

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Problems in model averaging with dummy variables

Problems in model averaging with dummy variables Problems in model averaging with dummy variables David F. Hendry and J. James Reade Economics Department, Oxford University Model Evaluation in Macroeconomics Workshop, University of Oslo 6th May 2005

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2018 Overview Stochastic vs. deterministic

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Bootstrapping the Grainger Causality Test With Integrated Data

Bootstrapping the Grainger Causality Test With Integrated Data Bootstrapping the Grainger Causality Test With Integrated Data Richard Ti n University of Reading July 26, 2006 Abstract A Monte-carlo experiment is conducted to investigate the small sample performance

More information

Volume 03, Issue 6. Comparison of Panel Cointegration Tests

Volume 03, Issue 6. Comparison of Panel Cointegration Tests Volume 03, Issue 6 Comparison of Panel Cointegration Tests Deniz Dilan Karaman Örsal Humboldt University Berlin Abstract The main aim of this paper is to compare the size and size-adjusted power properties

More information

New Developments in Automatic General-to-specific Modelling

New Developments in Automatic General-to-specific Modelling New Developments in Automatic General-to-specific Modelling David F. Hendry and Hans-Martin Krolzig Economics Department, Oxford University. December 8, 2001 Abstract We consider the analytic basis for

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Econ 623 Econometrics II Topic 2: Stationary Time Series

Econ 623 Econometrics II Topic 2: Stationary Time Series 1 Introduction Econ 623 Econometrics II Topic 2: Stationary Time Series In the regression model we can model the error term as an autoregression AR(1) process. That is, we can use the past value of the

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

The Linear Regression Model

The Linear Regression Model The Linear Regression Model Carlo Favero Favero () The Linear Regression Model 1 / 67 OLS To illustrate how estimation can be performed to derive conditional expectations, consider the following general

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Prof. Massimo Guidolin 019 Financial Econometrics Winter/Spring 018 Overview ARCH models and their limitations Generalized ARCH models

More information

Testing for Unit Roots with Cointegrated Data

Testing for Unit Roots with Cointegrated Data Discussion Paper No. 2015-57 August 19, 2015 http://www.economics-ejournal.org/economics/discussionpapers/2015-57 Testing for Unit Roots with Cointegrated Data W. Robert Reed Abstract This paper demonstrates

More information

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations Simultaneous Equation Models. Introduction: basic definitions 2. Consequences of ignoring simultaneity 3. The identification problem 4. Estimation of simultaneous equation models 5. Example: IS LM model

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

TESTING FOR CO-INTEGRATION

TESTING FOR CO-INTEGRATION Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND Testing For Unit Roots With Cointegrated Data NOTE: This paper is a revision of

More information

CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS

CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS 21.1 A stochastic process is said to be weakly stationary if its mean and variance are constant over time and if the value of the covariance between

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

Linear Models (continued)

Linear Models (continued) Linear Models (continued) Model Selection Introduction Most of our previous discussion has focused on the case where we have a data set and only one fitted model. Up until this point, we have discussed,

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Non-Stationary Time Series, Cointegration, and Spurious Regression

Non-Stationary Time Series, Cointegration, and Spurious Regression Econometrics II Non-Stationary Time Series, Cointegration, and Spurious Regression Econometrics II Course Outline: Non-Stationary Time Series, Cointegration and Spurious Regression 1 Regression with Non-Stationarity

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions Shin-Huei Wang and Cheng Hsiao Jan 31, 2010 Abstract This paper proposes a highly consistent estimation,

More information

Forecast comparison of principal component regression and principal covariate regression

Forecast comparison of principal component regression and principal covariate regression Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES ISSN 1471-0498 DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES Deciding Between Alternative Approaches In Macroeconomics David Hendry Number 778 January 2016 Manor Road Building, Oxford OX1 3UQ Deciding

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

EC4051 Project and Introductory Econometrics

EC4051 Project and Introductory Econometrics EC4051 Project and Introductory Econometrics Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Intro to Econometrics 1 / 23 Project Guidelines Each student is required to undertake

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Lecture 7: Dynamic panel models 2

Lecture 7: Dynamic panel models 2 Lecture 7: Dynamic panel models 2 Ragnar Nymoen Department of Economics, UiO 25 February 2010 Main issues and references The Arellano and Bond method for GMM estimation of dynamic panel data models A stepwise

More information

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Instrumental Variables and the Problem of Endogeneity

Instrumental Variables and the Problem of Endogeneity Instrumental Variables and the Problem of Endogeneity September 15, 2015 1 / 38 Exogeneity: Important Assumption of OLS In a standard OLS framework, y = xβ + ɛ (1) and for unbiasedness we need E[x ɛ] =

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

Section 2 NABE ASTEF 65

Section 2 NABE ASTEF 65 Section 2 NABE ASTEF 65 Econometric (Structural) Models 66 67 The Multiple Regression Model 68 69 Assumptions 70 Components of Model Endogenous variables -- Dependent variables, values of which are determined

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Alfredo A. Romero * College of William and Mary

Alfredo A. Romero * College of William and Mary A Note on the Use of in Model Selection Alfredo A. Romero * College of William and Mary College of William and Mary Department of Economics Working Paper Number 6 October 007 * Alfredo A. Romero is a Visiting

More information

USING MODEL SELECTION ALGORITHMS TO OBTAIN RELIABLE COEFFICIENT ESTIMATES

USING MODEL SELECTION ALGORITHMS TO OBTAIN RELIABLE COEFFICIENT ESTIMATES doi: 10.1111/j.1467-6419.2011.00704.x USING MODEL SELECTION ALGORITHMS TO OBTAIN RELIABLE COEFFICIENT ESTIMATES Jennifer L. Castle Oxford University Xiaochuan Qin University of Colorado W. Robert Reed

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

SUPPLEMENTARY TABLES FOR: THE LIKELIHOOD RATIO TEST FOR COINTEGRATION RANKS IN THE I(2) MODEL

SUPPLEMENTARY TABLES FOR: THE LIKELIHOOD RATIO TEST FOR COINTEGRATION RANKS IN THE I(2) MODEL SUPPLEMENTARY TABLES FOR: THE LIKELIHOOD RATIO TEST FOR COINTEGRATION RANKS IN THE I(2) MODEL Heino Bohn Nielsen and Anders Rahbek Department of Economics University of Copenhagen. heino.bohn.nielsen@econ.ku.dk

More information

Non-Stationary Time Series and Unit Root Testing

Non-Stationary Time Series and Unit Root Testing Econometrics II Non-Stationary Time Series and Unit Root Testing Morten Nyboe Tabor Course Outline: Non-Stationary Time Series and Unit Root Testing 1 Stationarity and Deviation from Stationarity Trend-Stationarity

More information

A nonparametric test for seasonal unit roots

A nonparametric test for seasonal unit roots Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna To be presented in Innsbruck November 7, 2007 Abstract We consider a nonparametric test for the

More information

FinQuiz Notes

FinQuiz Notes Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable

More information

ECON 4160, Spring term Lecture 12

ECON 4160, Spring term Lecture 12 ECON 4160, Spring term 2013. Lecture 12 Non-stationarity and co-integration 2/2 Ragnar Nymoen Department of Economics 13 Nov 2013 1 / 53 Introduction I So far we have considered: Stationary VAR, with deterministic

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Forecasting Levels of log Variables in Vector Autoregressions

Forecasting Levels of log Variables in Vector Autoregressions September 24, 200 Forecasting Levels of log Variables in Vector Autoregressions Gunnar Bårdsen Department of Economics, Dragvoll, NTNU, N-749 Trondheim, NORWAY email: gunnar.bardsen@svt.ntnu.no Helmut

More information

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56 Cointegrated VAR s Eduardo Rossi University of Pavia November 2013 Rossi Cointegrated VAR s Financial Econometrics - 2013 1 / 56 VAR y t = (y 1t,..., y nt ) is (n 1) vector. y t VAR(p): Φ(L)y t = ɛ t The

More information

Assessing Structural VAR s

Assessing Structural VAR s ... Assessing Structural VAR s by Lawrence J. Christiano, Martin Eichenbaum and Robert Vigfusson Zurich, September 2005 1 Background Structural Vector Autoregressions Address the Following Type of Question:

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information