Random Numbers and Simulation

Similar documents
Bayesian Inference and MCMC

19 : Slice Sampling and HMC

STAT Section 2.1: Basic Inference. Basic Definitions

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

17 : Markov Chain Monte Carlo

Computational statistics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

The Nonparametric Bootstrap

Approximate Normality, Newton-Raphson, & Multivariate Delta Method

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Probability and Estimation. Alan Moses

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Econometrics I, Estimation

Stat 451 Lecture Notes Monte Carlo Integration

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Bayes Nets: Sampling

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Bayesian Methods for Machine Learning

STAT 730 Chapter 5: Hypothesis Testing

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

ST495: Survival Analysis: Hypothesis testing and confidence intervals

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Bayesian inference for multivariate extreme value distributions

Simulating Random Variables

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

11. Bootstrap Methods

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Performance Evaluation and Comparison

Monte Carlo Methods. Leon Gu CSD, CMU

Generating the Sample

Answers and expectations

Stat 516, Homework 1

5601 Notes: The Sandwich Estimator

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Bootstrap, Jackknife and other resampling methods

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

ECE521 week 3: 23/26 January 2017

Multimodal Nested Sampling

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Annealed Importance Sampling for Neural Mass Models

MATH5835 Statistical Computing. Jochen Voss

Roots of equations, minimization, numerical integration

16 : Approximate Inference: Markov Chain Monte Carlo

CS 343: Artificial Intelligence

Chapter 10. Optimization Simulated annealing

Better Bootstrap Confidence Intervals

One-Sample Numerical Data

Markov Chains and MCMC

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 7 and 8: Markov Chain Monte Carlo

STAT 830 Non-parametric Inference Basics

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

18.05 Practice Final Exam

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

IEOR E4703: Monte-Carlo Simulation

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Markov Chain Monte Carlo Lecture 1

Monte Carlo Methods for Computation and Optimization (048715)

1 Probability and Random Variables

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Resampling and the Bootstrap

MCMC: Markov Chain Monte Carlo

Overview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Multilevel Statistical Models: 3 rd edition, 2003 Contents

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Lecture 4 January 23

Practical Solutions to Behrens-Fisher Problem: Bootstrapping, Permutation, Dudewicz-Ahmed Method

Bayesian Methods with Monte Carlo Markov Chains II

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Bayesian Regression Linear and Logistic Regression

6 Markov Chain Monte Carlo (MCMC)

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Composite Hypotheses and Generalized Likelihood Ratio Tests

Monte Carlo Methods. Geoff Gordon February 9, 2006

Introduction to Machine Learning CMU-10701

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Maximum Likelihood Estimation. only training data is available to design a classifier

Robert Collins CSE586, PSU Intro to Sampling Methods

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Non-parametric Inference and Resampling

Can we do statistical inference in a non-asymptotic way? 1

8 Error analysis: jackknife & bootstrap

Monte Carlo Methods. Handbook of. University ofqueensland. Thomas Taimre. Zdravko I. Botev. Dirk P. Kroese. Universite de Montreal

2 Random Variable Generation

POLI 8501 Introduction to Maximum Likelihood Estimation

11 Survival Analysis and Empirical Likelihood

Statistical Modeling and Cliodynamics

Numerical Analysis for Statisticians

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Transcription:

Random Numbers and Simulation Generating random numbers: Typically impossible/unfeasible to obtain truly random numbers Programs have been developed to generate pseudo-random numbers: Values generated from a complicated deterministic algorithm, which can pass any statistical test for randomness They appear to be independent and identically distributed. Random number generators for common distributions are built into R. For less common distributions, more complicated methods have been developed (e.g., Accept-Reject Sampling, Metropolis-Hastings Algorithm) STAT 740 covers these. University of South Carolina Page 1

(Monte Carlo) Simulation Some Common Uses of Simulation 1. Optimization (Example: Finding MLEs) 2. Calculating Definite Integrals (Ex: Finding Posterior Distributions) 3. Approximating the Sampling Distribution of a Statistic (Ex: Constructing CIs) University of South Carolina Page 2

Optimization 1. Finding the x that maximizes (or minimizes) a complicated function h(x) can be difficult analytically Situation even tougher if x is multidimensional Find x to maximize h(x 1, x 2,...,x p ) University of South Carolina Page 3

OTHER OPTIONS: Simple Stochastic Search: If the maximum is to take place over a bounded region, say [0, 1] p, then: Generate many uniform random observations in that region, plug each into h( ), and pick the one that gives the largest h(x). Advantage: Easy to program. Disadvantage: Very slow, especially for multidimensional problems. Requires much computation. Example: Maximize h(x 1, x 2 ) = (x 2 1 + 4x 2 2)e 1 x2 1 x 2 2 over [ 3, 3] 2. University of South Carolina Page 4

More advanced: Gradient Methods, which use derivative information to determine which area of the region to search next. Rule: go up the slope Disadvantage: Can get stuck on local maxima Simulated Annealing: Tries a sequence of x values: x 0,x 1,... If h(x i+1 ) h(x i ), move to x i+1. If h(x i+1 ) < h(x i ), move to x i+1 with a certain probability which depends on h(x i+1 ) h(x i ). University of South Carolina Page 5

R functions that perform optimization optim() optimize() one-dimensional optimization Example: optim(par = c(0,0), fn=my.fcn, control=list(fnscale=-1), maxit=100000) # Nelder-Mead optimization Other choices: method="cg", method="bfgs", method="sann" University of South Carolina Page 6

Calculating Definite Integrals In statistics, we often have to calculate difficult definite integrals (Posterior distributions, expected values) (here, x could be multidimensional) I = b a h(x) dx Example 1: Find: Example 2: Find: 1 0 1 1 0 4 1 + x 2 dx 0 (4 x2 1 2x 2 2) dx 2 dx 1 University of South Carolina Page 7

Hit-or-Miss Method Example 1: h(x) = 4 1 + x 2 Determine c such that c h(x) across entire region of interest. (Here, c = 4) Generate n random uniform (X i, Y i ) pairs, X i s from U[a, b] (here, U[0, 1]) and Y i s from U[0, c] (here, U[0, 4]) Count the number of times (call this m) that the Y i is less than the h(x i ) Then I c(b a) m n [ This is (height)(width)(proportion in shaded region) ] University of South Carolina Page 8

Classical Monte Carlo Integration I = b a h(x) dx Take n random uniform values U 1,...,U n (could be vectors) over [a, b] Then I b a n n i=1 h(u i ) University of South Carolina Page 9

Expected Value of a Function of a Random Variable Suppose X is a random variable with density f. Find E[h(X)] for some function h, e.g., E[X 2 ] E[ X] E[sin(X)] Note E[h(X)] = X h(x)f(x) dx over whatever the support of f is. Take n random values X 1,...,X n from the distribution of X (i.e., with density f ) Then E[h(X)] 1 n n i=1 h(x i ) University of South Carolina Page 10

Examples Example 3: If X is a random variable with a N(10, 1) distribution, find E(X 2 ). Example 4: If Y is a beta random variable with parameters a = 5 and b = 1, find E( log e Y ). Some more advanced methods of integration using simulation (Importance Sampling) Note: R function integrate() does numerical integration for functions of a single variable (not using simulation techniques) adapt() in the adapt package does multivariate numerical integration University of South Carolina Page 11

Approximating the Sampling Distribution of a Statistic To perform inference based on sample statistics, we typically need to know the sampling distribution of the statistics. Example: X 1,...,X n iid N(µ, σ 2 ). has a t(n 1) distribution. If σ 2 known, has a N(0, 1) distribution. T = X µ s/ n Z = X µ σ/ n Then we can use these sampling distributions for inference (CIs, hypothesis tests). University of South Carolina Page 12

What if the data s distribution is not normal? 1. Large sample: Central Limit Theorem 2. Small sample: Nonparametric procedures based on permutation distribution University of South Carolina Page 13

If population distribution is known, can approximate sampling distribution with simulation. Repeatedly (m times) generate random sample of size n from population distribution. Calculate statistic (say, S) each time. The empirical distribution of S-values approximates its true distribution. University of South Carolina Page 14

Example 1: X 1,...,X 4 Expon(1) What is the sampling distribution of X? What is the sampling distribution of sample midrange? University of South Carolina Page 15

What if we don t know the exact population distribution (more likely)? Can use bootstrap methods: Resample (randomly select n values from the original sample, with replacement). These bootstrap samples together mimic the population. For each of the, say, m bootstrap samples, calculate the statistic of interest. These m values will approximate the sampling distribution. Example 2: Observe 7, 9, 13, 12, 4, 6, 8, 10, 10, 7 from an unknown population type. University of South Carolina Page 16

Bootstrap sampling built into R in the boot package. Try library(boot); help(boot) for details. If you know the form of the population distribution, but not the parameters, a parametric bootstrap can be used. Simple bootstrap CIs have some drawbacks More complicated bias-corrected bootstrap methods have been developed University of South Carolina Page 17