Simulation. Alberto Ceselli MSc in Computer Science Univ. of Milan. Part 4 - Statistical Analysis of Simulated Data

Similar documents
IEOR E4703: Monte-Carlo Simulation

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Variance reduction techniques

f (1 0.5)/n Z =

Optimization and Simulation

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Limiting Distributions

Stat 710: Mathematical Statistics Lecture 31

Limiting Distributions

Lecture 2: CDF and EDF

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

1. Point Estimators, Review

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

Stochastic Simulation Variance reduction methods Bo Friis Nielsen

Qualifying Exam in Probability and Statistics.

Continuous Distributions

Math 494: Mathematical Statistics

Spring 2012 Math 541B Exam 1

Exam C Solutions Spring 2005

Math 494: Mathematical Statistics

STAT 830 Non-parametric Inference Basics

Variance reduction techniques

Exercise 5 Release: Due:

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

STAT 461/561- Assignments, Year 2015

Ch 2: Simple Linear Regression

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Probability Theory and Statistics. Peter Jochumzen

Statistics: Learning models from data

MIT Spring 2015

Asymptotic Statistics-VI. Changliang Zou

Chapter 5: Monte Carlo Integration and Variance Reduction

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

One-Sample Numerical Data

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

MS&E 226: Small Data

Bootstrap & Confidence/Prediction intervals

Notes 18 : Optional Sampling Theorem

Regression and Statistical Inference

ISyE 6644 Fall 2014 Test 3 Solutions

14.30 Introduction to Statistical Methods in Economics Spring 2009

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

Recall the Basics of Hypothesis Testing

Variance Reduction. in Statistics, we deal with estimators or statistics all the time

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Advanced Statistics II: Non Parametric Tests

5 Introduction to the Theory of Order Statistics and Rank Statistics

Bayesian Model Diagnostics and Checking

7 Influence Functions

Test Problems for Probability Theory ,

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Statistics & Data Sciences: First Year Prelim Exam May 2018

Math 562 Homework 1 August 29, 2006 Dr. Ron Sahoo


11 Survival Analysis and Empirical Likelihood

BTRY 4090: Spring 2009 Theory of Statistics

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Testing Statistical Hypotheses

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Problem Set 6 Solution

A Modified Fractionally Co-integrated VAR for Predicting Returns

Statistics and Econometrics I

Define characteristic function. State its properties. State and prove inversion theorem.

Estimation of Parameters

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Probability and Statistics Notes

Lecture 28: Asymptotic confidence sets

Math 407: Probability Theory 5/10/ Final exam (11am - 1pm)

What to do today (Nov 22, 2018)?

Notes 9 : Infinitely divisible and stable laws

Y i. is the sample mean basal area and µ is the population mean. The difference between them is Y µ. We know the sampling distribution

Negative Association, Ordering and Convergence of Resampling Methods

Random Number Generation. CS1538: Introduction to simulations

Asymptotic Statistics-III. Changliang Zou

Chapter 8 - Statistical intervals for a single sample

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Contents LIST OF TABLES... LIST OF FIGURES... xvii. LIST OF LISTINGS... xxi PREFACE. ...xxiii

Ch 9. FORECASTING. Time Series Analysis

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Recitation 2: Probability

Performance Evaluation and Comparison

Week 9 The Central Limit Theorem and Estimation Concepts

MATH20802: STATISTICAL METHODS EXAMPLES

Institute of Actuaries of India

Cross Validation & Ensembling

Learning Objectives for Stat 225

DUBLIN CITY UNIVERSITY

Tail bound inequalities and empirical likelihood for the mean

HANDBOOK OF APPLICABLE MATHEMATICS

Probability Background

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Statistical Machine Learning from Data

Transcription:

Simulation Alberto Ceselli MSc in Computer Science Univ. of Milan Part 4 - Statistical Analysis of Simulated Data A. Ceselli Simulation P.4 Analysis of Sim. data 1 / 15

Statistical analysis of simulated data Outline Outline: estimators and interval estimates bootstrapping variance reduction by antithetic variables variance reduction by control variates variance reduction by control variates goodness of fit tests A. Ceselli Simulation P.4 Analysis of Sim. data 2 / 15

Statistical analysis of simulated data Recall Recall: sample mean X = n i=1 def. unbiased and reliable estimators (blackboard) sample variance X i n n S 2 i=1 = (X i X) 2 n 1 thm: sample variance is an unbiased estimator of the variance (E[S 2 ] = σ 2 ). (proof on the blackboard) sample standard deviation: S = S 2 A. Ceselli Simulation P.4 Analysis of Sim. data 3 / 15

Statistical analysis of simulated data A data generation stopping criterion Suppose you need to estimate some parameter θ up to an acceptable value d for the standard deviation of the estimator: Fix a confidence (e.g. 95%) and precision (e.g. 1.96 d) Generate at least 100 data values X i (empirical) Keep on generating, computing S and stopping when you have k data values and S k < d estimate θ as X = k i=1 X i k Observation (efficiency): Xj+1 = j X j + X j+1 j + 1 = X j + X j+1 X j j + 1 Observation (efficiency): S 2 j+1 = (1 1 j )S2 j + (j + 1)( X j+1 X j ) A. Ceselli Simulation P.4 Analysis of Sim. data 4 / 15

Interval estimates Statistical analysis of simulated data Idea: instead of giving a single (mean) value as estimate, provide a range in which we are confident the parameter value to be. Definition: if the observed values of the sample mean and the sample standard deviation are X = x and S = s, call the interval x ± z α/2 s/ n an (approximate) 100(1 α) percent confidence interval estimate of θ. Discussion about normal distributions and Slutsky s theorem (blackboard). A. Ceselli Simulation P.4 Analysis of Sim. data 5 / 15

Statistical analysis of simulated data Bootstrapping for MSE What if the parameter to estimate is not the mean? (e.g. median or variance). Let X 1... X n be our observations (i.i.d. random variables, having CDF F ); let θ(f ) be the parameter to estimate; let g(x 1... X n ) be a corresponding estimator. To control its quality we measure (estimate) its mean square error: MSE(F ) = E F [(g(x 1... X n ) θ(f )) 2 ] Discussion: empirical CDF and bootstrap approximation of the MSE (blackboard). A. Ceselli Simulation P.4 Analysis of Sim. data 6 / 15

Variance reduction Variance reduction Let be θ: a parameter to estimate X i : n measurements of that value by simulation runs n X = X i /n the sample mean, unbiased estimate of θ i=1 MSE = E[( X θ) 2 ] = V AR[ X] = V AR[X]/n Idea: if you obtain a different unbiased estimate, you may have a smaller variance! Unfortunately, not so simple (Quality Control example, blackboard). A. Ceselli Simulation P.4 Analysis of Sim. data 7 / 15

Antithetic variables Variance reduction Let θ = E[X] and X 1, X 2 i.i.d. R.V. with exp. value θ V AR[ X 1 + X 2 ] = 1 2 4 (V AR[X 1] + V AR[X 2 ] + 2COV [X 1, X 2 ]) i.e. if X 1 and X 2 were negatively correlated (instead of independent), the variance would be smaller. Question: how to make them negatively correlated? Idea: see X 1 as a function h() of the set U 1... U m of random numbers used in the simulation; instead of simulating X 2 with new random numbers, take (1 U 1 )... (1 U m ). Theorem: when h() is a monotone function, X 1 and X 2 are negatively correlated. Example: simulating a reliability function (blackboard) Example: computing e (blackboard) A. Ceselli Simulation P.4 Analysis of Sim. data 8 / 15

Variance reduction Control Variates Let θ = E[X], where X is a R.V. output of the simulation. Suppose you have another output R.V. Y for which you already know the expected value µ. def. The R.V. Y is a control variate for the simulation estimator X. claim: for any constant c, Z = X + c (Y µ) is also an unbiased estimator of θ (discussion on the blackboard). claim: the variance of Z is not greater than that of X (discussion on the blackboard). A. Ceselli Simulation P.4 Analysis of Sim. data 9 / 15

Variance reduction Variance Reduction by Conditioning Let θ = E[X], where X is a R.V. output of the simulation. Suppose you have another output R.V. Y, and E[X Y ] is known and takes a value that can be obtained through a simulation run. claim: E[X Y ] is also an unbiased estimator of θ (discussion on the blackboard). claim: the variance of E[X Y ] is not greater than that of X (discussion on the blackboard). Example: estimating π (discussion on the blackboard). A. Ceselli Simulation P.4 Analysis of Sim. data 10 / 15

Variance reduction Variance Reduction by Stratified Sampling Let θ = E[X], where X is a R.V. output of the simulation. Suppose you have another discrete output R.V. Y, with values y 1... y k such that: the probabilities p i = P [Y = y i ] are known for each i = 1... k, we can simulate the value of X conditional on Y = y i instead of taking X n = X i /n after n simulation runs, take i=1 ε = k j=1 X j p j with X j obtained by simulating with Y = y j. claim: the variance of ε is never higher than that of X (proof on the blackboard). A. Ceselli Simulation P.4 Analysis of Sim. data 11 / 15

Statistical Validation Techniques Statistical Validation Techniques Our simulations are often hypothesis-driven. We have a conjecture about the probability distribution of random elements of our system e.g. daily number of accidents in a road network We check by simulation if observations match the conjecture or not i.e. if simulation data is consistent with the distribution we suppose to have goodness of fit tests A. Ceselli Simulation P.4 Analysis of Sim. data 12 / 15

Statistical Validation Techniques Chi-square test for discrete data When data is discrete (e.g. categories): Chi-square test (see R code) A. Ceselli Simulation P.4 Analysis of Sim. data 13 / 15

Statistical Validation Techniques Kolmogorov-Smirnov for continuous data When data is continuous: Kolmogorov-Smirnov (see R code) A. Ceselli Simulation P.4 Analysis of Sim. data 14 / 15

Statistical Validation Techniques Missing parameters What if we have a conjecture about the distribution, but not about its parameters? E.g. I conjecture accidents to be distributed as in a Poisson Process, whose rate is unknown. (see R code) A. Ceselli Simulation P.4 Analysis of Sim. data 15 / 15