Stochastic population forecast: a supra-bayesian approach to combine experts opinions and observed past forecast errors

Similar documents
Stochastic Population Forecasting based on a Combination of Experts Evaluations and accounting for Correlation of Demographic Components

Regional Probabilistic Fertility Forecasting by Modeling Between-Country Correlations

Uncertainty quantification of world population growth: A self-similar PDF model

A comparison of official population projections with Bayesian time series forecasts for England and Wales

ESRC Centre for Population Change Working Paper Number 7 A comparison of official population projections

DEMOGRAPHIC RESEARCH A peer-reviewed, open-access journal of population sciences

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

EXPERIENCE-BASED LONGEVITY ASSESSMENT

Statistical Methods in Particle Physics

European Demographic Forecasts Have Not Become More Accurate Over the Past 25 Years

What Do Bayesian Methods Offer Population Forecasters? Guy J. Abel Jakub Bijak Jonathan J. Forster James Raymer Peter W.F. Smith.

STA 4273H: Statistical Machine Learning

Bayesian Inference. p(y)

Intro to Probability. Andrei Barbu

What Do Bayesian Methods Offer Population Forecasters? Guy J. Abel, Jakub Bijak, Jonathan J. Forster, James Raymer and Peter W.F. Smith.

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Density Estimation. Seungjin Choi

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Bayesian Inference for the Multivariate Normal

Approximate Bayesian Computation

Uncertainty in energy system models

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

arxiv: v1 [stat.ap] 27 Mar 2015

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian Models in Machine Learning

Other Noninformative Priors

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Nonparametric Bayesian Methods (Gaussian Processes)

Cromwell's principle idealized under the theory of large deviations

Review of Maximum Likelihood Estimators

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

STA 4273H: Sta-s-cal Machine Learning

Prequential Analysis

How to build an automatic statistician

A BAYESIAN SOLUTION TO INCOMPLETENESS

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Introduction to Bayesian Inference

Introduction. Chapter 1

New Insights into History Matching via Sequential Monte Carlo

Statistical Data Analysis Stat 3: p-values, parameter estimation

Empirical Prediction Intervals for County Population Forecasts

Bayesian Approaches Data Mining Selected Technique

VCMC: Variational Consensus Monte Carlo

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Machine Learning 4771

Introduction to Machine Learning CMU-10701

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Bayesian analysis in nuclear physics

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Bayesian Regression Linear and Logistic Regression

STA 4273H: Statistical Machine Learning

ECE521 week 3: 23/26 January 2017

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Introduction to Machine Learning Midterm Exam Solutions

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Probability theory basics

ICES REPORT Model Misspecification and Plausibility

Bayesian Inference of Noise Levels in Regression

Bayesian Social Learning with Random Decision Making in Sequential Systems

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

E. Santovetti lesson 4 Maximum likelihood Interval estimation

CMU-Q Lecture 24:

BAYESIAN PROCESSOR OF OUTPUT: A NEW TECHNIQUE FOR PROBABILISTIC WEATHER FORECASTING. Roman Krzysztofowicz

FORECASTING STANDARDS CHECKLIST

Overall Objective Priors

STA414/2104 Statistical Methods for Machine Learning II

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Structural Uncertainty in Health Economic Decision Models

Statistics and Data Analysis

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Frequentist-Bayesian Model Comparisons: A Simple Example

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

1 Using standard errors when comparing estimated values

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

The Expectation-Maximization Algorithm

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Sensor Tasking and Control

Bayesian linear regression

Relative Merits of 4D-Var and Ensemble Kalman Filter

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dynamic System Identification using HDMR-Bayesian Technique

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Machine Learning Lecture 2

Bayesian Inference and MCMC

Diffusion Process of Fertility Transition in Japan Regional Analysis using Spatial Panel Econometric Model

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Beta statistics. Keywords. Bayes theorem. Bayes rule

Bayesian Methods for Machine Learning

Transcription:

Stochastic population forecast: a supra-bayesian approach to combine experts opinions and observed past forecast errors Rebecca Graziani and Eugenio Melilli 1 Abstract We suggest a method for developing stochastic population forecasts based on a combination of experts opinions and observed past forecast errors. The supra-bayesian approach (Lindley, 1983) is used. The experts opinions on a given vital are treated by the researcher as observed data. The researcher specifies a likelihood, as the distribution of the experts evaluations given the rate value, parametrized on the basis of the observed past forecast errors of the experts and expresses his prior opinion on the vital rate by assigning a prior distribution to it. Therefore a posterior distribution for the rate can be obtained, in which the researchers prior opinion is updated on the ground of the evaluations expressed by the experts. Such posterior distribution is used to describe the future probabilistic behavior of the vital rates so to derive probabilistic population forecasts in the framework of the traditional cohort component model. 1 Introduction Population forecasts are strongly requested both by public and private institutions, as main ingredients for long-range planning. Traditionally official national and international agencies derive population projections in a deterministic way: in general three deterministic scenarios are specified, low, medium and high scenarios, based on combinations of assumptions on vital rates and separate forecasts are derived by applying the cohort-component method. In this way, uncertainty is not incorporated so that the expected accuracy of the forecasts cannot be assessed: prediction intervals for any population size or index of interest cannot be computed. Yet the high-low scenario interval is generally portrayed as containing likely future population sizes. In recent years stochastic (or probabilistic) population forecasting has, finally, received a great attention by researchers. In the literature on stochastic population forecasting, three main approaches have been developed (Keilman et al., 2002). The first approach builds on time series models and it derives stochastic population forecasts by estimating the parameters of population dynamics (e.g., vital rates) on the basis of past data. The second approach is based on the extrapolation of empirical errors, with observed errors from historical forecasts used in the assessment of uncertainty 1 Rebecca Graziani, Department of Decision Sciences and Carlo F. Dondena Center for Research on Social Dynamics, Bocconi University, Milan, Italy; email rebecca.graziani@unibocconi.it. Eugenio Melilli, Department of Decision Sciences Bocconi University, Milan, Italy; email eugenio.melilli@unibocconi.it.

in forecasts (e.g., Stoto, 1983). In particular Alho and Spencer (1997) proposed in this framework the so-called Scaled Model of Error, which was used for deriving stochastic population forecasts within the Uncertainty Population of Europe, (UPE) project. Roughly speaking the Scaled Model of Error defines a generic age-specific rate (in logarithm scale) as the sum of a point forecast and a gaussian error term, with variance and correlation across age and time estimated on the basis of the past forecast errors. Finally, the third approach referred to as random scenario defines the probabilistic distribution of each vital rate on the basis of expert opinions. In Lutz et al. (1998), the forecast of a vital rate at a given future time T is assumed to be the realization of a random variable, having Gaussian distribution with parameters specified on the basis of expert opinions. For each time t in the forecasting interval [0, T ] the vital rate forecast is obtained by interpolation from the starting known and final random rate. In Billari et al. (2010) the full probability distribution of forecasts is specified by expert opinions on future developments, elicited conditional on the realization of high, central, low scenarios, in such a way to allow for not perfect correlation across time. The method, we suggest in this paper, draws on both the extrapolation of empirical errors and the random scenario approaches. The forecasts are based on a combination of expert opinions and past forecast errors. Such combination is the result of a formal approach known as supra-bayesian introduced by Lindley in 1983. The experts opinions are treated as data to be used for updating a prior opinion on the distribution law of each vital rate. The derived posterior is then used as the rate forecast distribution. The method is described in detail in the next section. 2 The Proposal. Let R be the quantity to be projected over the time period [0, T ]; R can be an overall rate such as the total fertility rate or the male or female life expectancy, but it can also be an absolute quantity such as the sex-specific net migration flow. For convenience, we will refer to R as a rate. Denote by R 0 the (known) value of R at the starting time t = 0. In order to define the random process {R t, t [0, T ]}, i.e. to determine the joint distribution of all values or R in the interval [0, T ], we proceed as follows: we define the distribution of the random variable R T ; by linear interpolation of R 0 and R T, we define the whole distribution of the process. To perform the first step, we resort to the supra-bayesian approach, introduced by Lindley in 1983 and used, among others, by Winkler (1981) and Gelfand et al (1995) to model and combine experts opinions; Roback and Givens (2001) apply it in the framework of deterministic simulation models. Let x 1,... x k, be the evaluations on the rate R T provided by k experts, usually expressed in terms of central scenarios.

The idea is to treat such evaluations as data provided by the experts. As in any inferential problem, the analyst is, then, asked to specify the joint distribution f(x 1,..., x k θ) indexed by a vector of parameters θ, of which one component is the rate R T. Moreover in a Bayesian approach the analyst assigns a prior distribution to θ expressing his prior beliefs and knowledge on it. The Bayes theorem makes it possible to derive the posterior distribution of π(θ x 1,..., x k ) and the marginal posterior distribution law of R T can then be used as the required distribution of the forecasts of the rate R at time T. In the following, for the sake of simplicity we consider k = 2 experts providing therefore two central scenarios for the rate at time T, x 1, x 2. We assume that the observed sample vector (x 1, x 2 ) is the realization of a bivariate normal distribution having vector mean µ T = (R T, R T ) and covariance matrix Σ = σ2 1 ρσ 1 σ 2. With such specification of the mean vector, the analyst states that he expects ρσ 1 σ 2 σ2 2 the experts to be unbiased in their evaluations, excluding a systematic underestimation or overestimation. A prior distribution has then to be assigned to R T and Σ. It is is reasonable to assume that R T and Σ are independent and to choose a flat prior for R T. Indeed if the analyst has prior information on the future value of the rate of interest, he can convey it into the analysis by considering such information as provided by an additional expert. As for Σ, if series of past forecasts of the experts along with the past observed values of the rate are available, then the analyst might decide to set the variances σ 2 1 and σ2 2 equal to the observed mean squared errors of the forecasts of the first and the second expert respectively (s 2 1 and s2 2 ); the correlation can be fixed at the correlation r of the forecast errors of the two experts. This is equivalent to assume a prior distribution degenerate on s 2 1, s2 2 and r. Of course the same values can result from the estimation of more complex models fitted to the series of past forecast errors (as in Scaled Model of Error). Under such assignment of prior distribution, the posterior distribution π(r T x 1, x 2 ) of R T is derived: π(r T x 1, x 2 ) = N ( x 1 σ 2 2 ρσ 1σ 2 σ 2 1 + σ2 2 2ρσ 1σ 2 + x 2 σ 2 1 ρσ 1σ 2 σ 2 1 + σ2 2 2ρσ 1σ 2, σ1 2σ2 2 (1 ρ2 ) σ1 2 + σ2 2 2ρσ 1σ 2 The posterior mean of the rate is a linear combination of the experts evaluations. The weights of the combination are functions of the covariance matrix Σ. The mixture gives more weight to the value given by the expert who has shown more accuracy in its past evaluations, that is a smaller mean squared forecast error. The combination is not convex, that is one of the weights can be negative and therefore the other greater than one. If for instance σ 2 1 < σ2 2, then the weight associated to x 2 is negative, whenever ρ > σ 1 σ 2. If the correlation between the forecast errors of the two experts is positive and high and the forecasts of expert 2 have shown a greater variability than the forecasts of expert 1, then the mixture will give a negative weight to the evaluation of the expert who has shown to be less accurate (and greater than one to the other evaluation). Indeed in presence of a high positive correlation between the past forecast ).

errors, the errors of the two experts evaluations are expected to have the same sign (both experts are expected to underestimate or both to overestimate the future value of the rate) and the difference between the evaluations, that is x 1 x 2, informs on the sign of the error. If negative, both experts are expected to overestimate the value rate and the linear combination shrinks the evaluation of expert 1 towards the future unknown value of R T. The distribution law of R T in this way is derived from both the experts evaluations and the observed forecast errors of the same experts, so that the suggested method can be seen as an attempt to combine the extrapolation of errors method and the random scenario approach. Other assignments of prior distributions on Σ are of course possible. An inverted Wishart distribution,centered at Σ 0 with δ 0 degrees of freedom can be used, where Σ 0 can be defined on the basis of the past forecast errors, if available, or on information from any source. In this case the posterior distribution of R T turns out to be a Student-t distribution, with mean having the same expression as in the previous case, with σ2 1, σ2 2 and ρ being the corresponding elements of Σ 0. 3 An application In this Section we apply the method previously described to obtain the forecast distribution at 2030 of the following summary indicators: Total Fertility Rate, Male and Female Life Expectancies at Birth. Indeed, age-specific fertility and mortality rates can be derived by resorting to specific models and then used as inputs for the cohort-component method of population forecast. We consider two experts: the Italian National Statistical Office (ISTAT) and an expert whose forecasts are based on the so-called naive forecast method (see Keifitz, 1981). As for the second expert, forecasts over a time span of 20 years are obtained as follows: the Total Fertility Rate is held constant at the value observed at the beginning of the forecast period, while forecasts of the Male and Female Life Expectancies are derived by linear extrapolation, on the basis of series of past data. The 2030 forecasts of the summary indicators are derived on the same way. As for ISTAT, we use as forecast of each summary indicator, the central scenario at 2030 of the latest projections released by the Statistical Institute with starting year 2010. Moreover the responsible of the office uncharged of the Italian population projections provided us with series of past forecast errors.table 1 shows for each indicator, the forecasts, x 1 and x 2 of the two experts along with their past forecast mean squared errors and the correlation between their past forecast errors to be used as σ 2 1, σ2 2 and ρ respectively. The mean and variance of the 2030 forecast distribution of each summary indicator is shown in Table 2. As we can observe from Table 1, for the Total Fertility Rate, σ1 2, the mean squared forecast error of ISTAT, is smaller than σ 2 2, the mean squared forecast error of the naive expert. Moreover the ratio σ 1 σ 2 is smaller than ρ. Therefore, in the determination of the posterior mean of Total Fertility Rate, ISTAT

forecast has a weight greater than 1 (1.095) and the naive forecast a weight smaller than 0 ( 0.095), this leading to a posterior mean (as shown in Table 2) equal to 1.58. For both Male and Female Life Expectancies at Birth, the posterior mean arises as a convex combination of the evaluations of the two experts. In all three cases the posterior variances are smaller than the mean squared errors of the past forecasts of the two experts. Table 1: Evaluations for indicators, past forecast mean squared errors and correlation of the errors. T F R is the Total Fertility Rate, EM and EF are respectively Male and Female Life Expectancy at Birth. Indicator Istat Naive expert x 1 σ1 2 x 2 σ2 2 ρ T F R 1.57 0.010 1.42 0.276 0.50 EM 82.20 0.480 83.02 0.248 0.65 EF 87.50 0.545 87.35 0.253 0.66 Table 2: Posterior mean and posterior variance of the 2030 forecast distribution of the Total Fertility Rate, T F R, the Male and Female Life Expectancy at Birth, EM and EF respectively. Indicator posterior mean posterior variance T F R 1.58 0.009 EM 82.96 0.239 EF 87.35 0.252 References Alho, J.M. and Spencer, B.D. (1997) The practical specification of the expected error of population forecasts, Journal of Official Statistics, 13, 204 225. Billari, F.C., Graziani R. and Melilli, E. (2010) Stochastic population forecasts based on conditional expert opinions, Working Paper 33. Dynamics, Bocconi University. Milan: Carlo F. Dondena Centre for Research on Social Gelfand, A. E., Mallick, B. K. and Dey, D. K. (1995) Modeling Expert Opinion Arising as Partial Probabilistic Specification, Journal of the American Statistical Association, 90, 430, 598 604. Keilman, N., Pham, D.Q. and Hetland, A. (2002) Why population forecasts should be probabilistic - illustrated by the case of Norway, Demographic Research, 6, 15, 409 454.

Keyfitz, N. (1981) The limits of Population Forecasting, Population and Development Review, 7, 4, 579 593. Lindley, D. (1983) Reconciliation of Probability Distributions, Operations Research, 31, 5, 866 880. Lutz, W., Sanderson, W.C. and Scherbov, S. (1998) Expert-Based Probabilistic Population Projections, Population and Development Review, 24, 139 155. Roback, P.J and Givens, G.H. (2001) Supra-Bayesian pooling of priors linked by a deterministic simulation model, Communications in Statistics Simulation and Computation, 30, 3, 381, 447 476. Stoto, M.A. (1983) The Accuracy of Population Projections, Journal of the American Statistical Association, 78, 381, 13 20. Winkler, R. L. (1981) Combining Probability Distribution from Dependent Information Sources, Management Science, 27, 4, 479 488.