Chaos, Complexity, and Inference (36-462)

Similar documents
DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

Time Series Analysis Fall 2008

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

The Expectation Maximization Algorithm

CSE446: Clustering and EM Spring 2017

Bayesian Methods for Machine Learning

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462)

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Chaos, Complexity, and Inference (36-462)

Lecture 3: Pattern Classification

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

Bayesian Inference by Density Ratio Estimation

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

Chaos, Complexity, and Inference (36-462)

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Lecture 6: April 19, 2002

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

The Variational Gaussian Approximation Revisited

Statistical Computing (36-350)

Ways to make neural networks generalize better

Statistical Inference with Regression Analysis

Chaos, Complexity, and Inference (36-462)

Lecture 14 More on structural estimation

Factor Analysis and Kalman Filtering (11/2/04)

Chapter 2: simple regression model

Lecture 16 Deep Neural Generative Models

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

Lecture 12: Quality Control I: Control of Location

12 Statistical Justifications; the Bias-Variance Decomposition

Using R in Undergraduate Probability and Mathematical Statistics Courses. Amy G. Froelich Department of Statistics Iowa State University

Estimating Gaussian Mixture Densities with EM A Tutorial

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

1 Cricket chirps: an example

STA 4273H: Sta-s-cal Machine Learning

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Machine Learning. Instructor: Pranjal Awasthi

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Lecture 6: Gaussian Mixture Models (GMM)

The Bayesian Approach to Multi-equation Econometric Model Estimation

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Chaos, Complexity, and Inference (36-462)

STA 414/2104: Lecture 8

Conditional probabilities and graphical models

Structural Estimation

Bias evaluation and reduction for sample-path optimization

Bayesian Learning (II)

ECE531 Lecture 10b: Maximum Likelihood Estimation

Graphical Models for Collaborative Filtering

Financial Econometrics

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Variational Autoencoders

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Lecture 17: Numerical Optimization October 2014

Least squares: the big idea

Testing for Regime Switching in Singaporean Business Cycles

Linear Methods for Prediction

Auto-Encoding Variational Bayes

Indirect estimation of a simultaneous limited dependent variable model for patient costs and outcome

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

One-Parameter Processes, Usually Functions of Time

Stochastic Variational Inference

A General Overview of Parametric Estimation and Inference Techniques.

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

On Bias in the Estimation of Structural Break Points

Lecture 6a: Unit Root and ARIMA Models

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Probabilistic Machine Learning. Industrial AI Lab.

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Estimating Deep Parameters: GMM and SMM

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

STAT 518 Intro Student Presentation

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

STA 4273H: Statistical Machine Learning

Tutorial on Approximate Bayesian Computation

Probabilistic Graphical Models

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Clustering with k-means and Gaussian mixture distributions

General Regression Model

Econ 623 Econometrics II Topic 2: Stationary Time Series

STA414/2104 Statistical Methods for Machine Learning II

Mixture Models and EM

Approximate Bayesian Computation and Particle Filters

Lecture 3: Pattern Classification. Pattern classification

Selected Topics in Optimization. Some slides borrowed from

Bayesian Inference and an Intro to Monte Carlo Methods

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Gaussian Process

Chaos, Complexity, and Inference (36-462)

Self-Organization by Optimizing Free-Energy

Lecture 4: Lower Bounds (ending); Thompson Sampling

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Transcription:

Chaos, Complexity, and Inference (36-462) Lecture 19: Inference from Simulations 2 Cosma Shalizi 24 March 2009

Inference from Simulations 2, Mostly Parameter Estimation Direct Inference Method of simulated generalized moments Reading: Smith (forthcoming) is comparatively easy to read; Gouriéroux et al. (1993) and (especially) Gouriéroux and Monfort (1996) are harder to read but more detailed; Kendall et al. (2005) is a nice application which does not require knowing any econometrics

Method of Simulated Moments 1 Pick your favorite test statistics T ( generalized moments ) 2 Calculate from data, t obs 3 Now pick a parameter value θ 1 simulate multiple times 2 calculate average of T E θ [T ] 4 Adjust θ so expectations are close to t obs The last step is a stochastic approximation problem Robbins and Monro (1951); Nevel son and Has minskiĭ (1972/1976)

Works if those expectations are enough to characterize the parameter Why expectations rather than medians, modes,...? Basically: easier to prove convergence The mean is not always the most probable value! Practicality: much faster & easier to optimize if the same set of random draws can be easily re-used for different parameter values

Example: use mean and variance for logistic map; chose r where simulated moments are closest (Euclidean distance) to observed r MSM = argmin r [0,1] No particular reason to weight both moments equally ((m µ r ) 2 + (s 2 σ 2 r ) 2) Mean of X 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Variance of X 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.0 0.2 0.4 0.6 0.8 1.0 r 0.0 0.2 0.4 0.6 0.8 1.0 r bµ r bσ 2 r

Density of simulated moment estimates Density 0 5 10 15 20 25 30 0.65 0.70 0.75 0.80 0.85 0.90 0.95 N = 500 Bandwidth = 0.00342 Distribution of br MSM, time series length 100, true r = 0.9

Kinks in the curve of the moments: potentially confusing to optimizer, reduces sensitivity big change in parameter leads to negligible change in moments curve crossing itself non-identifiability var 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.0 0.1 0.2 0.3 0.4 0.5 0.6 mean

The Progress of Statistical Methods First stage calculate likelihood, solve explicitly for MLE Second stage can t solve for MLE but can still write down likelihood, calculate it, and maximize numerically Third stage even calculating the likelihood is intractable Outstanding example: hidden or latent variables Y 1, Y 2,... plus observed X 1, X 2,...

Why Finding the Likelihood Becomes Hard Likelihood become an integral/sum over all possible combinations of latent variables compatible with observations: Pr θ (X1 n = x 1 n ) = dy1 n Pr θ (X1 n = x 1 n, Y 1 n = y 1 n ) = dy n 1 Pr θ (Y n 1 = y n 1 ) n i=1 ( ) Pr θ X i = x i Y1 n = y 1 n, X i 1 1 = x i 1 1 Evaluating this sum-over-histories is, itself, a hard problem One approach: Expectation-Maximization algorithm, try to simultaneously estimate latent variables and parameters (Neal and Hinton, 1998) Standard, clever, often messy

We have a model with parameter θ from which we can simulate also: data y Introduce an auxiliary model which is wrong but easy to fit Fit auxiliary to data, get parameters β Simulate from model to produce yθ S different simulations for different values of θ Fit auxiliary to simulations, get β θ S Pick θ such that β θ S is as close as possible to β Improvement: do several simulation runs at each θ, average β θ S over runs

What s going on here? The auxiliary model says: the data has these sorts of patterns Pick parameters which come as close as possible to matching those parameters For this to work, those patterns must be enough to pin down the original parameter, requires at a minimum that dim β = dim θ

A More Formal Statement Auxiliary objective function ψ, depends on data and β β T argmax ψ T (β) (1) β β T,S,θ argmax ψ T,S,θ (β) (2) β θ II argmin ( β T,S,θ β T ) Ω( β T,S,θ β T ) (3) θ Ω some positive definite matrix which one doesn t matter asymptotically Optimal choice gives most weight to the most-informative auxiliary parameters (Gouriéroux and Monfort, 1996, 4.2.3) identity matrix is usually OK

Assume: 1 As T, ψ T,S,θ (β) ψ(β, θ), uniformly in β and θ. 2 For each θ, the limiting objective function has a unique optimum in β, call this b(θ). 3 As T, β T b(θ 0 ). 4 The equation β = b(θ) has a unique solution, i.e., b 1 is well-defined. then as T, in probability θ II θ 0

Asymptotic Distribution of Indirect Estimates (Gouriéroux and Monfort, 1996, 4.2.3) Under additional (long, technical) regularity conditions, θ II θ 0 is asymptotically Gaussian with mean 0 Variance 1 ( ) T 1 + 1 S Variance depends on something like the Fisher information matrix, only with b/ θ in the role of p θ / θ basically, how sensitive is the auxiliary parameter to shifts in the underlying true parameter?

Checking Given real and auxiliary model, will indirect inference work, i.e., be consistent? Do the math Provides proof; often hard (because the simulation model leads to difficulty-to-manipulate distributions) Simulate some more Simulate from model for a particular θ, apply II, check that estimates are getting closer to θ as simulation grows, repeat for multiple θ Not as fool-proof but just requires time (you have all the code already)

Autoregressive Models Like it sounds: regress X t on its past X t 1, X t 2,... X t = β 0 + β 1 X t 1 + β 2 X t 2 +... β p X t p + ɛ t, ɛ t N (0, σ 2 ) Common as auxiliary models for time series (as well as models in their own right) Auxiliary objective function is residual sum of squares over p R command: ar

Example: Logistic Map + Noise Take logistic map and add Gaussian noise to each observation x t = y t + ɛ t, ɛ t N (0, σ 2 ) y t+1 = 4ry t (1 y t ) Any sequence x T 1 could be produced by any r logistic.noisy.ts <- function(timelength,r, initial.cond=null,noise.sd=0.1) { x <- logistic.map.ts(timelength,r,initial.cond) return(x+rnorm(timelength,0,noise.sd)) } Assume that σ 2 is known simplifies plotting if only one unknown parameter! Set it to σ 2 = 0.1

1 fix p for AR model 2 Fit AR(p) to data, get β = ( β 1,... β p ) 3 Simulate S sample trajectories with parameter r, calculate ( β 1,... β p ) for each, average over trajectories to get β S r 4 Minimize β β S r logistic.map.ii <- function(y,order=2,s=10) { T <- length(y) ar.fit <- function(x) { return(ar(x,aic=false,order.max=order)$ar) } beta.data <- ar.fit(y) beta.discrep <- function(r) { beta.s <- mean(replicate(s,ar.fit(logistic.noisy.ts(t,r)))) return(sum((beta.data - beta.s)^2)) } return(optimize(beta.discrep,lower=0.75,upper=1)) }

To see how well this does, simulate it: y <- logistic.noisy.ts(1e3,0.8) plot(density(replicate(100, logistic.map.ii(y,order=2)$minimum)), main="density of indirect estimates")

Density of indirect estimates Density 0 5 10 15 20 0.76 0.78 0.80 0.82 0.84 0.86 0.88 N = 100 Bandwidth = 0.006528 Some bias (here upward) but it shrinks as T grows, and it s pretty tight around the true value (r = 0.8) Notice: fixed data set, all variability is from simulation Also: p = 2 is arbitrary, can use more simulation to pick good/best

r = 0.8 is periodic, what about chaos, say r = 0.9? plot(density(replicate(30, logistic.map.ii(logistic.noisy.ts(1e3,r=0.9), order=2)$minimum)), main="density of indirect estimates, r=0.9") re-generate data each time

Density of indirect estimates, r=0.9 Density 0 5 10 15 20 25 30 35 0.82 0.84 0.86 0.88 0.90 0.92 N = 30 Bandwidth = 0.003931

I promised to check that the inference is working my seeing that the errors are shrinking: mse.logistic.ii <- function(t,r=0.9,reps=30,order=2,s=10) { II <- replicate(reps,logistic.map.ii(logistic.noisy.ts(t,r), order=order,s=s)$minimum) II.error = II - r # Uses recycling return(mean(ii.error^2)) }

Mean squared error of indirect inference, r=0.9 MSE 0.0002 0.0006 0.0010 500 1000 1500 2000 T Black: mean squared error b θ II, S = 10, average of 30 replications each of length T, all with r = 0.9; blue: curve T 1, fitted through last data point

MORE SCIENCE, FEWER F -TESTS Craft a really good scientific model represent your actual knowledge/assumptions/guesswork it s in my regression textbook isn t a scientific justification must be able to simulate it Pick a reasonable auxiliary model Works on your observable data Easy to fit Predicts well is nice but not necessary Estimate parameters of complex model by indirect inference Test hypotheses by indirect inference as well (Gouriéroux et al., 1993; Kendall et al., 2005)

Gouriéroux, Christian and Alain Monfort (1996). Simulation-Based Econometric Methods. Oxford, England: Oxford University Pres. Gouriéroux, Christian, Alain Monfort and E. Renault (1993).. Journal of Applied Econometrics, 8: S85 S118. URL http://www.jstor.org/pss/2285076. Kendall, Bruce E., Stephen P. Ellner, Edward Mccauley, Simon N. Wood, Cheryl J. Briggs, William W. Murdoch and Peter Turchin (2005). Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses. Ecological Monographs, 75: 259 276. URL http://www.eeb.cornell.edu/ellner/pubs/ CPDBupalusEcolMonog05.pdf. Neal, Radford M. and Geoffrey E. Hinton (1998). A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. In Learning in Graphical Models (Michael I. Jordan,

ed.), pp. 355 368. Dordrecht: Kluwer Academic. URL http://www.cs.toronto.edu/~radford/em. abstract.html. Nevel son, M. B. and R. Z. Has minskiĭ (1972/1976). Stochastic Approximation and Recursive Estimation, vol. 47 of Translations of Mathematical Monographs. Providence, Rhode Island: American Mathematical Society. Translated by the Israel Program for Scientific Translations and B. Silver from Stokhasticheskaia approksimatsia i rekurrentnoe otsenivanie, Moscow: Nauka. Robbins, Herbert and Sutton Monro (1951). A Stochastic Approximation Method. Annals of Mathematical Statistics, 22: 400 407. URL http: //projecteuclid.org/euclid.aoms/1177729586. Smith, Jr., Anthony A. (forthcoming).. In New Palgrave Dictionary of Economics (Stephen Durlauf and

Lawrence Blume, eds.). London: Palgrave Macmillan, 2nd edn. URL http://www.econ.yale.edu/smith/palgrave7.pdf.