Forecast Evaluation: A Likelihood Scoring Method. by Matthew A. Diersen and Mark R. Manfredo

Similar documents
A Simple Regression Problem

Non-Parametric Non-Line-of-Sight Identification 1

Estimation of Korean Monthly GDP with Mixed-Frequency Data using an Unobserved Component Error Correction Model

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

OBJECTIVES INTRODUCTION

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

Bootstrapping Dependent Data

Modeling the Structural Shifts in Real Exchange Rate with Cubic Spline Regression (CSR). Turkey

Testing equality of variances for multiple univariate normal populations

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Ensemble Based on Data Envelopment Analysis

COS 424: Interacting with Data. Written Exercises

Topic 5a Introduction to Curve Fitting & Linear Regression

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Block designs and statistics

Estimating Parameters for a Gaussian pdf

Inflation Forecasts: An Empirical Re-examination. Swarna B. Dutt University of West Georgia. Dipak Ghosh Emporia State University

Machine Learning Basics: Estimators, Bias and Variance

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Bayesian Approach for Fatigue Life Prediction from Field Inspection

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Meta-Analytic Interval Estimation for Bivariate Correlations

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

Boosting with log-loss

Sampling How Big a Sample?

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

Biostatistics Department Technical Report

LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES

arxiv: v1 [cs.ds] 3 Feb 2014

When Short Runs Beat Long Runs

Feature Extraction Techniques

AVOIDING PITFALLS IN MEASUREMENT UNCERTAINTY ANALYSIS

CS Lecture 13. More Maximum Likelihood

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period

Probability Distributions

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

Nonlinear Log-Periodogram Regression for Perturbed Fractional Processes

Optimal Jackknife for Discrete Time and Continuous Time Unit Root Models

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

A method to determine relative stroke detection efficiencies from multiplicity distributions

MULTISTEP YULE-WALKER ESTIMATION OF AUTOREGRESSIVE MODELS YOU TINGYAN

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Interactive Markov Models of Evolutionary Algorithms

Analyzing Simulation Results

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests

Online Publication Date: 19 April 2012 Publisher: Asian Economic and Social Society

Supplementary Information for Design of Bending Multi-Layer Electroactive Polymer Actuators

A Note on the Applied Use of MDL Approximations

A Comparative Study of Parametric and Nonparametric Regressions

Combining Classifiers

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

In this chapter, we consider several graph-theoretic and probabilistic models

Comparing Probabilistic Forecasting Systems with the Brier Score

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area

Physics 139B Solutions to Homework Set 3 Fall 2009

Kernel Methods and Support Vector Machines

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science

Chapter 6: Economic Inequality

INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

The Transactional Nature of Quantum Information

Predicting FTSE 100 Close Price Using Hybrid Model

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Pattern Recognition and Machine Learning. Artificial Neural networks

Best Procedures For Sample-Free Item Analysis

Generalized Augmentation for Control of the k-familywise Error Rate

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

A Jackknife Correction to a Test for Cointegration Rank

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Revealed Preference and Stochastic Demand Correspondence: A Unified Theory

Revealed Preference with Stochastic Demand Correspondence

Reduction of Uncertainty in Post-Event Seismic Loss Estimates Using Observation Data and Bayesian Updating

What is Probability? (again)

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

The Weierstrass Approximation Theorem

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

SPECTRUM sensing is a core concept of cognitive radio

Statistical Logic Cell Delay Analysis Using a Current-based Model

A proposal for a First-Citation-Speed-Index Link Peer-reviewed author version

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Introduction to Machine Learning. Recitation 11

An Extension to the Tactical Planning Model for a Job Shop: Continuous-Time Control

A LOSS FUNCTION APPROACH TO GROUP PREFERENCE AGGREGATION IN THE AHP

Using a De-Convolution Window for Operating Modal Analysis

2.9 Feedback and Feedforward Control

Tracking using CONDENSATION: Conditional Density Propagation

Moments of the product and ratio of two correlated chi-square variables

Transcription:

Forecast Evaluation: A Likelihood Scoring Method by Matthew A. Diersen and Mark R. Manfredo Suggested citation forat: Diersen, M. A., and M. R. Manfredo. 1998. Forecast Evaluation: A Likelihood Scoring Method. Proceedings of the CR-4 Conference on Applied Coodity Price Analysis, Forecasting, and Market Risk Manageent. Chicago, IL. [http://www.fardoc.uiuc.edu/nccc4].

Forecast Evaluation: A Likelihood Scoring Method Matthew A. Diersen and Mark R. Manfredo* While any forecast evaluation techniques are available, ost are designed for the end user of forecasts. Most statistical evaluation procedures rely on a particular loss function. Forecast evaluation procedures, such as ean squared error and ean absolute error, that have different underlying loss functions, ay provide conflicting results. This paper develops a new approach of evaluating forecasts, a likelihood scoring ethod, that does not rely on a particular loss function. The ethod takes a Bayesian approach to forecast evaluation and uses inforation fro forecast prediction intervals. This ethod is used to evaluate structural econoetric and ARIMA forecasting odels of quarterly hog price. Introduction What akes a good forecast? The easiest response is that the forecast ust be accurate - that is, it ust predict well. Most evaluation easures in use today are siply suary statistics of the forecast errors under the assuption of a particular loss function (i.e., ean squared error assues a quadratic loss function; ean absolute error assues a linear loss function). A shortcoing of coon statistical evaluation procedures is the possibility of obtaining conflicting results fro easures which depend on alternative loss functions. For exaple, Ferris (1998) copares siple hog forecasts using conventional evaluation ethods. His results show an exaple of contradicting results between adjusted ean absolute percentage error (AMAPE) and root ean square percentage error (RMSPE) and others (See Ferris, 1998, pp. 147-8). Preference for a forecast odel thus depends on the evaluation easure used and the loss function of the decision aker. Forecasters are concerned that the evaluation easures atch the loss function of the forecast user. Generally, forecast users can siply use the forecast odel that optiizes the evaluation easure corresponding to their loss function, e.g., ean squared error. Because of this, forecasters often report any forecasting odels and evaluation easures. The forecaster hiself ay not have a well defined loss function and/or ay not know the forecast user's loss function, causing the forecaster to struggle over which odel to use. Even without contradictory evaluation easures, the forecaster ay struggle since the easures eployed tend to give a synopsis of perforance. For exaple, ean squared error (MSE) of zero represents a perfect forecasting odel. However, because MSE is siply relative to zero, no benchark level of MSE exists to tell a forecaster when the odel is no good. Forecasters usually have good.diersen and Manfredo are doctoral candidates in the Departent of Agricultural Econoics, University of Illinois at Urbana-Chapaign. and Consuer 321

knowledge of their odels and it sees shortsighted to discard or keep a forecast based solely on MSE or other suary easures. The objective of this research is to exaine evaluation fro a forecaster's perspective and specifically 1) to develop an additional tool that easures forecast perforance using the inforation fro the prediction interval and 2) to apply the new ethod to a set of actual forecasts. The proposed evaluation ethod uses inforation easily obtainable by forecasters - the prediction interval of a forecast. Each forecasted point is assigned a score based on the likelihood of it coing fro a distribution that is consistent with the forecasting odel. The reainder of the paper begins with a brief literature review that provides background inforation and otivation for the developent of the likelihood scoring technique. The likelihood scoring ethod is then forally developed. Quarterly hog prices are odeled using regression and ARIMA ethods, and the odels are used to generate out-of-saple forecasts. The forecasting odels are then copared using the likelihood scoring and traditional techniques. Finally, iplications and avenues for further research are discussed. Literature Review Allen (1994) provides a general survey of forecasting ethods. There are as any evaluation techniques as odels. Allen (1994) further suarizes evaluation easures coonly used in price analysis, such as turning point easures and accuracy easures. While turning points are useful when taking arket positions, in this paper we focus solely on accuracy Kennedy (1992) provides an overview of forecast error sources and an outline of the loss functions or cost of being wrong assued by different evaluation easures. Specification, conditioning, sapling and rando error ay lead to inaccurate forecasts. F orecasters have to work backward fro forecast errors to iprove their odels by eliinating errors. F or exaple, ean squared error can be decoposed to find the bias coponent. If this coponent is large, then the forecasting odel can be exained to correct the bias. In addition to MSE and siilar easures, a siple forecast evaluation technique is to count the nuber of actual outcoes that fall within given confidence intervals. Makridakis et al. ( 1987) use this approach to assess a large nuber of forecast odels. They found that ost confidence intervals used were too narrow. Chatfield (1993), however, points out that soe of the interval forulas used by Makridakis et al. (1987) were inaccurate. The counting ethod has been foralized by Christofferson (1998) who addresses the confidence intervals in a calibration fraework. Forecasts are evaluated based on how well different confidence intervals encopass actual observations. Ipleenting this technique sees to require a large saple size as repeated observations at various confidence levels are needed. 322

Diebold, et al. (1997) ove beyond confidence intervals to predictive densities. Their ethod begins by assuing an unknown forecast density. If the density is calibrated with actual outcoes then probability transfors of the forecast will follow a unifor distribution. technique also sees to require a large saple size. While this technique says when a forecast density is accurate, it gives no direction about oving fro an inaccurate to an accurate density. A sipler for of this technique is presented in Bessler and Kling (1989). This Developent of Likelihood Scoring Technique This section outlines the conceptual fraework necessary for developing the likelihood scoring technique. The likelihood scoring technique proposed relies on infonnation fro forecast prediction intervals. In creating prediction intervals, estiates of the variance of a forecast are needed. This section reviews forecast error variance statistics for regression and tie series odels. A graphical approach to forecast evaluation, a precursor to the likelihood scoring technique, is then described followed by an explanation of prediction density in a Bayesian fraework. provided. Finally, an exact description of the likelihood scoring technique is Forecast Error Variance: Regression Models Before coputing forecast prediction intervals, the variance of a forecast is needed. Consider a siple regression forecasting odel (1) YT+l = a +fix T+l where XT+l is known. In classical regressions the variance of a forecast is estiated and conditioned on the current observed independent variable(s). The forula for the univariate case of the estiated forecast error variance (see Pindyck and Rubinfeld, 1991) is (2) 2 SEcono =S I 1+-+" - T f(xl -X)2 1=] where T is the nuber of in-sarnple observations, S2 is the estiated variance of the regression defined as 1 S = 1 (3) T-kI( -)2 and k is the degrees of freedo. The forecast error variance, (2), accounts for the difference of the current independent variable fro the ean, accounts for differences in saple size, and shows that there is always reaining variability.this univariate exaple is the ost intuitive for otivating the likelihood scoring technique, however, equation (2) can also be extended to the ultivariate fraework such that 323

where x is the vector of the values of the independent variable in the period that the forecast is ade, and X is the atrix of independent variables. Forecast Error Variance: Tie Series Models The forecast error variance is soewhat different for tie series odels since they are dependent on the forecasting horizon.) The [estiated] forecast error variance for one-period ahead Box-Jenkins ARIMA odels is defined as 2 2 I ( A )2 (5) s ARlMA = s =..J. - T-p-q where p and q are the nuber of autoregressive and oving average paraeters respectively.2 For ultiple-period ahead forecast horizons, the forecast error variance is 2 [ 2 1 ] ( A2 A2 A2 ) 2 (6) s ARlMA = E et( ) = I/lo + 1/1) +...+1/1 L-) s where E[e2T(I)] is the expectation of the squared forecast errors (I) periods ahead and Ijf; + 1jf)2 +...+ljfi-) are weights that iniize the ean square forecast error. 3 ote that for one period ahead forecasts \j/2o = I. Prediction Intervals Prediction intervals are key to the understanding of the graphical approach and likelihood scoring technique to be presented. Under the assuption that the standardized error of a forecast follows a t distribution, with degrees of freedo corresponding to the estiated odel, the prediction interval around a forecast is defined as (7) P.I.j = y :t (a/2s j where f refers to the forecasting odel. The classical interpretation of (7) is that, in repeated sapling, ( 1- a. )% of coputed prediction intervals contain the true value of Y. For regression odels, prediction intervals are conditioned on the current independent variable(s).4 Forecast Evaluation: A Graphical Approach Once the prediction intervals are estiated it is possible to graphically analyze forecasts. The graphical approach, while not necessarily new, is a precursor to the likelihood scoring 1 For a review of estiation techniques for the forecast error variance of tie-series odels see Chatfield (1993). 2 This for of the forecast error variance for tie series odels illustrates the unconditional case and the current independent variable(s) are not accounted for. Chatfield (1993) states that using the unconditional case could isstate variances. Conditional fors have been proposed, but are not widely used. Chatfield (1993) also criticizes approxiations to the error variance for tieseries odels. 3 See Pindyck and Rubenfield chapter 18 for derivation of the optial \II weights. 4 The regression prediction interval widens out as the independent variables ove away fro the ean while for the ARIMA odel the prediction interval will have a constant width. 324

technique. The procedure plots the forecasts and prediction intervals and attepts to identify observations with large errors that ay aplify differences (and ay cause contradictions) aong traditional statistical evaluation easures (i.e., MSE-style criteria). Paying attention to prediction intervals allows the forecaster to exaine each forecast relative to its out-of-saple realization instead of relying on suary easures to deterine the superiority of a forecasting ethodology. The best forecasts are identified as those that fall within their widest respective prediction intervals while poor forecasts are identified as those in which the actual outcoe( s ) fall outside of the widest prediction intervals. Visually exaining the extent to which a realized value falls relative to its prediction interval also provides insight into the penalty nature of existing statistical evaluation ethods which assue a specific loss function. The Likelihood Scoring Technique The likelihood scoring technique developed in this paper uses ideas fro the Bayesian fraework. Rather than having to continuously changing prediction intervals and counting outcoes inside and outside of the intervals, a less pedantic approach is taken. Start with a regression equation such as (8) YT+l = ft+ftxt+l +et+l where et+i-(o,s2). Using E(eT+l)=O and inserting XT+l into (8) gives YT+l, the expected value of the forecast. In a Bayesian fraework, the estiated coefficients and the error ter follow t distributions. This iplies that other values of y T+l can also be characterized by a t distribution (See Hey, 1983 and Ze1lner, 1971). Because y T+l is a cobination oft distributions, A (9) y; -y; T+l T+l- IT,-I( Sf where Sf is fro the corresponding error variance in (2) or (5) dependant on the forecasting odel and k is the nuber of estiated paraeters. Therefore, (9) provides a prediction density instead of a prediction interval and allows a forecaster to odel forecasted points as coing fro a t distribution. Hey (1983, p. 233) calls this prediction density a "posterior assessent" and stresses that it is a "coplete characterization" ofy T+l. Using this fraework, once a prediction density is known or assued, actual outcoes can be assessed relative to the prediction density. In (9), note that the expected outcoe and standard error are fixed when considering the density. The prediction density, as characterized by (9) is shown in figure I. This figure shows one forecast period. Outcoes can be tested to deterine the likelihood that actual outcoes coe fro the underlying forecast distribution.5 Once a forecasted outcoe, y T+1ActuaJ, is observed it can be copared to the distribution (as shown at point Ion the horizontal axis in figure I ). If the forecast was perfect then the actual would atch the ean forecast and lie at the center of the t distribution where YT+l = YT/ual. T'he actual outcoe is then used to copute a t-score using (9). The probability density function, This approach is conceptually siilar to the veridical approach described in Bunn (1988) 325

p.d.f., for the t distribution evaluated at the t-score defines the likelihood score for that observation (shown at the point f(l) on the vertical axis in figure I). By coparing likelihood scores for different forecasting odels (i.e., causal vs. tie series) a "best" forecast odel can be chosen. We suggest using a suation of scores over the out-of-saple period. The forecasting odel which produces the highest likelihood score would be considered the best odel. Intuitively, this best odel would be the one to have ost likely produced the observed outcoes. Unlike traditional statistical evaluation procedures, this approach is not dependent on a particular loss function. This approach is particularly useful since it accounts for different standard errors and past history or knowledge of the separate forecasting odels. Such a perspective rewards a forecast when it occurs within its learned history instead of penalizing istakes. Epirical Procedure The following section provides an epirical application of the likelihood scoring ethod. Quarterly hog prices are odeled using both a structural econoetric odel and a Box-Jenkins ARIMA odel. Out-of-saple forecasts fro these odels are coputed. Subsequently the likelihood score is shown for these two odels. Forecast perforance is then evaluated using the likelihood score in addition to traditional ethods. Data The data used in this study are quarterly data spanning fro 1976.1 to 1997.3.6 The data coe fro various governent publications. Quarterly hog prices are for 230 pound barrows and gilts for 7 arkets as reported in Livestock, Dairy and Poultry Monthly (ERS). Other data used in the estiation of the structural odel include coercial cattle slaughter (1,000 hd.) and federally inspected, ready-to-cook, young chicken slaughter (Poultry Slaughter, ASS). Ten state sow farrowings (1,000 hd) are taken fro the quarterly Hogs and Pigs report (ASS), while the hog-corn ratio is coputed as the ratio of quarterly hog price to the U.S. average price of nuber 2 yellow corn received by farers (Agricultural Prices, ASS). Structural Econoetric Model The econoetric forecasting odel is a single price equation based on a reduced for specification inspired by Brandt (1985) and Leuthold, et al. (1989). The future price of hogs, P'+I' is dependent on beef slaughter (BSJ, chicken slaughter (CSJ, the hog-co ratio (HCJ, sow farrowings lagged one period (SF,-1) and two periods (SF,-2)' and quarterly duy variables (D2" 6 Quarterly data are consistent with hog quarters as reported in the Hogs and Pigs report (ASS). 326

D3t' and D4J.7 In-saple estiates are coputed fro 1976.1 to 1990.1. Table in-saple estiates of the econoetric forecasting odel. presents the The signs are as expected for the slaughter variables (negative), for the hog-co ratio (positive), and for farrowings lagged one period (negative). These paraeter estiates are all statistically significant. The exceptions were sow farrowings lagged two periods and the second and fourth quarterly duy variables. Sow farrowings lagged two periods was kept in the odel since it alleviated soe inor first order autocorrelation. The odel is adequate in explaining the variability in hog prices as evidenced by an adjusted R2 of 0.4 The Durbin-Watson is in the indeterinate range. ARIMA Model The tie series odel for the price of hogs is ARIMA (3,1,O)x(O,O,1)5.A nuber of specifications were checked during the identification stage prior to the final odel presented. The final odel passed the Lung-Box Q-test for a white noise process and the residual autocorrelation and partial autocorrelation plots are clean of any identifiable effects. The paraeter estiates of the ARIMA forecasting odel are shown in table 2. The AR(2), AR(3) and seasonal MA ters are all statistically significant. Forecasts Both the econoetric and ARIMA odels are estiated over the tie period fro 1976.1 to 1990.1. The forecasting odels are tested out-of-saple fro 1990.2 to 1997.3 giving 30 out-of-saple observations. All forecasts are for one-period ahead. Two separate sets of outof-saple forecasts are developed for both the econoetric and ARIMA specifications to illustrate the forecast evaluation procedures put forth in this paper. One set of forecasts uses the paraeter estiates shown in tables 1 and 2 for each period (fixed), while another set of forecasts are developed fro updated odels (updated). For the updated odels, the paraeter estiates for each odel are updated at each out-of-saple period, but the specifications reain unchanged. Table 3 presents forecasts for both the econoetric and ARIMA odels using fixed paraeter estiates and table 4 gives forecasts fro the updated odels. Figures 2 through 5 illustrates the graphical approach described earlier. In figure 2, econoetric forecasts (fixed estiates) with their corresponding 95% prediction intervals are plotted along with the actual values of hog price. In the early part of the saple the predictions look good while towards the end they fall apart. The prediction interval widens as the standard error becoes larger due to the independent variables being far fro the ean. Even with the wider interval, the econoetric odel does a poor job of predicting those later prices. Figure 3 shows the sae inforation for the ARIMA forecasts (fixed estiates). At first glance the 7 Brandt ( 1985) and Leutho1d et a1. ( 1989) included per-capita incoe in their specifications. However, per-capita incoe was found to be highly correlated with chicken-slaughter over the entire saple period and was subsequently dropped. 327

ARIMA odel looks ore consistent over the entire period. The prediction intervals are fixed for this odel and are narrower than the econoetric intervals. Throughout the period there are actual outcoes that fall near or outside the prediction interval. Hence quite often the actual outcoes occur where the ARIMA odel doesn't say they would occur. Measures such as MSE would not provide this type of insight and would not say when the odel was falling apart. Figures 4 and 5 show siilar plots of forecasts, 95% prediction intervals and actual values for the updated odels (both econoetric and ARIMA). otice the iproved perforance of the econoetric odel for observations in the latter part of the saple due to updating the paraeter estiates. The prediction interval is narrower relative to the fixed estiate forecasts. The agnitude of the errors declines and ore actual values fall within the prediction intervals. The ARIMA odel is little changed. Although this graphical ethod is not new, it illustrates what other inforation is available to forecasters by showing where the realized values fall relative to the forecast distribution. Likelihood Scores Table 3 shows the likelihood scores for the (fixed) econoetric and ARIMA odels and provides inforation needed to copute the likelihood score. To illustrate the likelihood scoring technique, consider the first (1990.1) forecast in table 3. The actual price, p Ttual, is $56.07, the econoetric forecast,.pt+l' is $48.37 with a standard error, SEcono' of5.16. Dividing by the standard error (5.16 noralizes the forecast error (48.37-56.07). oralizing the forecast error gives a t-score of-1.493. Using the Bayesian approach and assuing a naive prior, the t-score follows a t distribution (See equation 8). The likelihood score is the p.d.f. value of the t-score, 0.1. Intuitively, this is the likelihood that the actual price cae fro the econoetric odel. The axiu height oft distributions increases with the degrees offreedo, but is close to 0. The likelihood scoring technique foralizes the graphical approach and helps forecasters identify how a particular odel perfors on its own and relative to other odels. F or instance, starting in 1996 it becoes readily apparent that the fixed econoetric odel no longer predicts prices well. The individual likelihood scores becoe very sall for those forecasts. It is not just that fact that the forecast errors are large, but that the actual outcoes are outside of the real the econoetric odel predicts. Likelihood scores for the updated odels are shown in table The scores are higher for ore of the latter econoetric forecasts. Updating the estiates gave the econoetric odel ore inforation or history to consider. The ARIMA odels were siilar for both fixed and updated odels. The individual scores are then sued to provide an overall likelihood score for the particular forecast. This total likelihood score can then be copared against ore traditional forecast evaluation procedures. The likelihood scoring technique is siilar to ore traditional approaches in soe respects. Measures such as root ean square error (RMSE) and ean absolute error (MAE) both weight forecast errors. Likelihood scoring weights errors by standardizing by the error and then 328

plugging that t-score into a corresponding t distribution. Table 5 gives the sued likelihood score, RMSE and MAE of the different forecasting odels. The likelihood score is consistent with the results of ore traditional easures for these forecasts. ote that the likelihood score rewards good perforance and a higher score is preferred. F or RMSE and MAE low values are better because these easures penalize errors. Between the fixed odels, the ARIMA is better than the econoetric forecasting odel. The updated results show large iproveent in the econoetric odel and little change in the ARIMA odel perforance. The discrepancy between the MAE nubers, the RMSE nubers, and the overall likelihood scores all becoe saller. The likelihood scoring technique thus provides additional inforation in the forecast evaluation process, augenting traditional easures. The likelihood scoring technique outlined is unique, however, since it in essence rewards good forecasts instead ofjust penalizing forecasts with large errors. Suary and Iplications This paper presented and developed a new forecast evaluation technique, a likelihood scoring ethod. Siilar to a graphical analysis, this technique utilizes inforation contained in forecast prediction intervals. F orecast error variance statistics for both regression odels and ARIMA odels were also presented. The foundations of the graphical and likelihood scoring techniques were established and deonstrated for both econoetric and ARIMA forecasts of hog prices. The forecast evaluation procedure put forth in this paper is unique in any ways. First, it is intuitive and easy to ipleent, incorporating readily available statistics. This ethod also encourages forecasters to pay ore attention to the inforation in forecast prediction intervals as suggested by Makridakis, et al. (1987) and akes forecasters pay ore attention to individual forecasts. The graphical and likelihood scoring procedures can stand alone or can be used in conjunction with ore traditional forecast evaluation procedures such as ean square error. They can also work with a sall nuber of out-of-saple observations. The likelihood scoring technique is particularly useful for forecast evaluation in cases where traditional easures, especially those with different loss functions, provide conflicting results. Most iportantly, however, is the approach likelihood scoring takes; it rewards good forecasts instead of penalizing poor forecasts. This gives insight into the underlying perforance of the forecasting odel - instead of blindly suarizing errors. Like the graphical approach, the likelihood scoring technique can be extended for use with a variety of forecasting procedures. F or instance, a coposite forecast could be evaluated (but its prediction density ust account for any correlation between forecasts). Subjective forecasts, such as expert opinion forecasts, ay also be evaluated using likelihood scoring ethods assuing a triangular distribution if a high, average, and low forecasts are provided. Siilarly for futures forecasts, the distribution derived fro corresponding option preius ay be used. 329

References Allen, P.G. "Econoic Forecasting in Agriculture." International Journal of Forecasting 10(1994):81-5. Bessler, David A. and John L. Kling. "The Forecast and Policy Analysis." Aerican Journal of Agricultural Econoics 71(1989):503-506. Brandt, Jon A. "Forecasting and Hedging: An Illustration of Risk Reduction in the Hog Industry." Aerican Journal of Agricultural Econoics 67(1985):24-31. Bunn, Derek W. "Cobining F orecasts." European Journal of Operational Research 33(1988):223-229. Chatfield, Chris. "Calculating Interval Forecasts." Journal of Business & Econoic Statistics 11(1993):121-5. Christofferson, Peter F. "Evaluating Interval Forecasts." International Econoic Review, in press, 1998. Diebold, Francis X., Todd A. Gunther, and Anthony S. Tay. "Evaluating Density Forecasts." Working paper. 23 August 1997. httq:llwww.ssc.udenn.edu/diebold/. F erris, John. Agricultural Prices and Coodity Market Analysis. Boston, MA: WCB McGraw-Hill, 1988. Hey, John D. Data in Doubt. Oxford: Martin Robertson, 1983. Kennedy, Peter. A Guide to Econoetrics. Cabridge, MA: The MIT Press, 1992. Leuthold, Rayond M., Philip Garcia, Brian D. Ada, and Wayne I. Park. "An Exaination of the ecessary and Sufficient Conditions for Market Efficiency: The Case of Hogs." Applied Econoics 21(1989):193-20 Makridakis, Spyros, Michele Hibon, Ed Lusk, and MoncefBelhadjali. "Confidence Intervals: An Epirical Investigation of the Series in the M-Copetition." International Journal of Forecasting 3(1987):489-508. Pindyck, Robert S. and Daniel L. Rubinfeld. Econoetric Models and Econoic Forecasts. ew York, Y: McGraw-Hill, Inc., 1991. Zellner, Arnold. An Introduction to Bayesian Inference in Econoetrics. ew York: John Wiley & Sons, Inc., 1971. 330

y; Actual -i 1 = T+l T+l Sf A YT+I-YT+l-tT-k Sf Figure I. Illustration of likelihood score for a single forecasted point Table I. Structural Econoetric Forecasting Model of Quarterly Hog Prices = 142.68-0.00459 BST -0.00009 CST + 0.294 HCT -0.0323 SFT-l + 0.0115 SFT-2 PT+l (3.59) (-2.29) (-1.95) (2.08) (-45) (1.66) -2.449 D2T + 1756 T + 0.251 D4T (-0.94) (3.73) (0.08) Adjusted R2 = 0.44 F-Statistic = 6.39 D.W. = 1.33 d.f. = 47 ote: Values in parentheses are t-statistics. Table 2. ARIMA Forecasting Model of Quarterly Hog Prices PT+I-PT = 0.107-0.193 (PT-PT-J-0.344 (PT-I-PT-2) -0.443 (PT-2-PT-3) -0.876 ET-4 (0.6 (-1.51) (-2.94) (-3.59) (18.56) ote: Values in parentheses are t-statistics. 331

70 T tj Q) tj 'i: a. o :1: 10., +:f: L1.- + "" " -",..,". 'r::6,t... 6,, -..; + + +++ b. Ih.'., '6.. + ++ 0-,. 0. + '...., "...-. '..,. '.',... 6. 0 -+ I'! C'! o 0; M Date -+- -+- "" -+-!'! It> C'! <0 "! r-- 70 T, 60.A : '. T '.. "! '.-;. '. -:- 50 $ + '..-, }' :., + C.),.-+ 40,..'. 2k: -.., C.) Q) -* 30..-. C) O :l: 20 ;....-'.., ',,...., +..',.-......... '.0...' +! Ih '...+ I ',. : +...::+. A Ih A +., : T :'. 'f'.'.'. A A+..I +A,.-,.... :.--. I A,"...'...'..I.+ + '. +...' '..'.', A A I.,.,...,.,.:., :....,- 10 0 C'! o C'! '"'! --+ M Date -+-- "! ""' 01 01 --tt"! 10,,- -+-- - "' "' -- Figure 3. Forecasts fro ARIMA odel using fixed coefficients 332

T.. - +,.... 70 T.ts2k; 2k!h.+..- -, " + '...,. 8 ', \..'- A A :.-- "-"., '..,'. -. 't."- ++++ + +!++++.-,'-' ""..' 10 + 0 I I "! O M.- Date t"! 1'- b. Figure Forecasts fro econoetric odel using quarterly updated coefficients 70 I..,'.. 60,. -:... +,,-.... 50 f tj....;: + \.., + }" (.) :.. + iii- 40 '..tj.....l. -.-0 G) (.).A..,. 30.- CI O :I: 20..,.-\. #..,,.'..'... :,..'\,.;..fj.:tj. A + b,.'-.. -+ 2k +..#..+..fj.: + + fj. 2k! + t A fj. fj. + +..\...; fj. :.\ +.'.'..! I.+., #.... t1....., 10 + 0 -+-- C'! o,-,.. <'! 01 01 M Date "! <"! It) <"! --+ 333

Table 3. Forecasts and Likelihood Scores fro Models with Fixed Coefficients Econoetric Model Std. Likelihood Actual Forecast Error t-score Score ARIMA Model Std. Likelihood Forecast Error t-score Score Quarter 56.07 59.56 53.98 50.36 52.35 53.52 42.50 38.59 41.99 45.61 41.88 42.39 46.32 47.52 45.92 43.91 42.90 42.45 31.89 36.06 36.93 46.44 443 417 52.38 58.38 588 53.30 53.43 56.76 48.37 47.76 46.64 47.53 46.96 490 414 40.16 42.18 449 39.12 40.61 42.38 41.58 43.01 40.83 42.42 33.44 35.96 339 36.70 31.98 35.98 32.88 32.38 31.34 41.29 38.98 37.89 33.66 5.160 5.184 5.181 5.155 5.167 5.420 5.357 5.397 5.475 5.457 5.728 5.543 5.649 6.067 6.203 5.730 5.766 6.890 7.120 6.760 6.637 7.763 7.850 7.300 7.556 8.368 7.428 6.548 7.069 7.924-1.493-2.276-1.418-0.549-1.043-1.590 0.306 0.290 0.036-0.205-0.482-0.321-0.698-0.978-0.470-0.539-0.083-1.307 0.572-0.247-0.035-1.863-1.076-1.546-2.647-3.232-1.829-2.187-2.199-2.915 0.1 0.032 0.145 0.340 0.229 0.1 0.378 0.380 0.388 0.353 0.377 0.310 0.245 0.355 0.342 0.395 0.168 0.336 0.385 0.072 0.221 0.121 0.014 0.003 0.076 0.039 0.038 0.007 46.41 57.39 55.96 50.01 50.26 47.33 52.42 45.18 42.42 45.83 40.17 48.64 47.85 47.53 46.01 42.68 49.90 45.94 43.89 372 38.12 46.25 45.62 51.78 39.63 52.92 545 51.01 58.92 444-2.338-0.526 0.478-0.084-0.507-1.497 2.401 1.597 0.106 0.053-0.414 1.5 0.371 0.002 0.022-0.298 1.695 0.846 2.906-0.325 0.288-0.046 0.289 1.841-3.087-1.322-0.105-0.555 1.328-2.983 0.028 0.345 0.353 0.396 0.348 0.0 0.025 0.112 0.395 0.396 0.364 0.127 0.370 0.379 0.095 0.276 0.007 0.376 0.381 0.380 0.074 0.005 0.165 0.395 0.339 0.164 0.006 334

Table Forecasts and Likelihoods fro Models with Updated Coefficients Econoetric Model Std. Likelihood Actual Forecast Error t-score Score ARIMA Model Std. Likelihood Forecast Error t-score Score Quarter 56.07 59.56 53.98 50.36 52.35 53.52 42.50 38.59 41.99 45.61 41.88 42.39 46.32 47.52 45.92 43.91 42.90 42.45 31.89 36.06 36.93 46.44 443 417 52.38 58.38 588 53.30 53.43 56.76 48.37 48.33 48.56 49.83 49.59 48.76 48.23 43.53 45.66 48.10 41.80 42.88 45.34 45.39 46.26 43.77 45.88 38.28 40.18 37.21 39.59 36.38 38.80 37.66 38.30 41.67 47.70 46.22 47.85 47.53-1.493-2.145-1.007-0.099-0.527-0.883 1.082 0.930 0.692 0.479-0.015 0.096-0.193-0.409 0.065-0.029 0.611-0.784 1.592 0.230 0.542-1.965-1.069-1.302-2.801-3.119-1.246-1.281-1.021-1.653 0.1 0.042 0.238 0.395 0.345 0.268 0.220 0.257 0.312 0.353 0.395 0.390 0.365 0.396 0.329 0.291 0.112 0.387 0.342 0.059 0.224 0.170 0.009 0.004 0.182 0.175 0.235 0.102 46.57 58.24 57.04 50.47 50.60 48.50 53.81 45.44 42.92 46.95 40.73 48. 46.82 48.10 46.76 42.82 49.09 464 402 33.62 39.11 43.06 422 50.43 39.15 52.85 51.64 50.52 56.43 47.08 1 293 254 218 197 2 173 380 421 359 370 335 340 310 276 211 176 254 236 442 420 395 333 303 363 583 586 566 547 534-2.299-0.308 0.719 0.025-0.417-1.215 2.709 1.563 0.210 0.308-0.263 1.325 0.116 0.5 0.196-0.260 1.482 0.516 2.864-0.549 0.493-0.770-0.048 1.454-3.032-1.206-0.706-0.609 0.659-2.5 0.031 0.378 0.306 0.364 0.189 0.012 0.118 0.388 0.379 0.384 0.165 0.395 0.394 0.390 0.384 0.3 0.347 0.008 0.341 0.351 0.295 0.8 0.005 0.192 0.309 0.330 0.319 0.042 Table 5. Coparison of Evaluation Measures Econoetric Model Likelihood Score RMSE MAE ARIMA Model Likelihood Score RMSE MAE 335 5.160 5.236 5.380 5.340 5.241 5.392 5.292 5.309 5.300 5.187 5.285 5.104 5.087 5.206 5.236 909 882 5.316 5.205 5.018 906 5.118 5.267 5.001 5.027 5.358 5.765 5.525 5.461 5.584