Computer intensive statistical methods Lecture 1

Computer intensive statistical methods Lecture 1 Jonas Wallin Chalmers, Gothenburg. Jonas Wallin - jonwal@chalmers.se 1/27

People and literature The following people are involved in the course: Function Name Room E-mail Lecturer Jonas Wallin 2112 jonwal@chalmers.se The following material will be used: Slides. Will be available online (around the day) after lecture. Introducing Monte Carlo Methods with R, by Robert, Christian and Casella, George Some course information Jonas Wallin - jonwal@chalmers.se 2/27

Course schedule and homepage The course schedule is as follows: Weekday Time Room Lecture I Tuesday 13.15 15.00 Pascal Computer session I Tuesday 15.15 17.00 MVF522 Office hours Wednesday 10.15 11.00 L2112(My office) Lecture II Thursday 13.15 15.00 Pascal Computer session II Thursday 15.15 17.00 MVF522 The first week the computer lab will be introduction to R. Information and R files will be available at the homepage ( http://www.math.chalmers.se/stat/grundutb/cth/mve186/1415/). Some course information Jonas Wallin - jonwal@chalmers.se 3/27

Examination The examination comprises three larger projects handed out during Weeks 2, 4, and 6. Each project requires the submission of a report. The projects, which are solved in pairs, concern 1 simulation and Monte Carlo integration. 2 Bayesian modeling and inference. 3 Markov chain Monte Carlo methods, and Bayesian modeling and inference. an written exam. There will be bonus points from the projects. Some course information Jonas Wallin - jonwal@chalmers.se 4/27

Course contents Simulation and Monte Carlo integration Bayesian modeling and inference Markov chain Monte Carlo (MCMC) methods other methods like EM algorithm and INLA (if time permits) Course contents Jonas Wallin - jonwal@chalmers.se 5/27

Bayesian statistics Unlike frequentest statistics (think first course in statistics), Bayesian statistics does not consider the parameters fixed but random. Course contents Bayesian statistics Jonas Wallin - jonwal@chalmers.se 6/27

Bayesian statistics Unlike frequentest statistics (think first course in statistics), Bayesian statistics does not consider the parameters fixed but random. Bayesian modelling A Bayesian model consists of A prior, a priori, model for the parameters, Θ, given by the probability density π(θ). A conditional model for data, y, given reality, with density f (y Θ). Course contents Bayesian statistics Jonas Wallin - jonwal@chalmers.se 6/27

Bayesian statistics Unlike frequentest statistics (think first course in statistics), Bayesian statistics does not consider the parameters fixed but random. Bayesian modelling A Bayesian model consists of A prior, a priori, model for the parameters, Θ, given by the probability density π(θ). A conditional model for data, y, given reality, with density f (y Θ). The prior can be expanded into several layers creating a Bayesian hierarchical model. Course contents Bayesian statistics Jonas Wallin - jonwal@chalmers.se 6/27

Bayes Formula How should the prior and likelihood be combined to make inference about Θ, given observations of y? Bayes Formula f (Θ y) = f (y Θ)π(Θ) f (y) = f (y Θ)π(Θ) χ f (y Θ )π(θ ) dθ f (Θ y) is called the posterior, or a posteriori, distribution. Course contents Bayesian statistics Jonas Wallin - jonwal@chalmers.se 7/27

Bayes Formula How should the prior and likelihood be combined to make inference about Θ, given observations of y? Bayes Formula f (Θ y) = f (y Θ)π(Θ) f (y) = f (y Θ)π(Θ) χ f (y Θ )π(θ ) dθ f (Θ y) is called the posterior, or a posteriori, distribution. Often, only the proportionality relation f (Θ y) π(θ, y) = f (y Θ)π(Θ) is needed, when seen as a function of Θ. Course contents Bayesian statistics Jonas Wallin - jonwal@chalmers.se 7/27

Korsbetning In 1361 the Danish king Valdemar Atterdag conquered Gotland and captured the rich Hanseatic town of Visby. In 1929 1930 the graveside was excavated. A total of 493 femurs (237 right, 256 left) were found. How many people where buried there? Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 8/27

Korsbetning Using Bayesian inference, we get: prob 0.0000 0.0005 0.0010 0.0015 500 1000 1500 2000 N Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 9/27

Image recovry Suppose we have an corrupted image, how using Bayesian inference can we recover the image? Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 10/27

Image recovry At what level of corruption can we recover the image p = 0.1? Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 11/27

Image recovry At what level of corruption can we recover the image p = 0.3? Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 11/27

Image recovry At what level of corruption can we recover the image p = 0.6? Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 11/27

Image recovry At what level of corruption can we recover the image p = 0.9? Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 11/27

Change of intensity, Poisson Processes The figure shows the cumulative sum number of coal mining accidents for the years 1851 to 1963. acident number 0 50 100 150 1860 1880 1900 1920 1940 1960 year Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 12/27

Mixture models, nonparametric Bayesian Figure : Old Faithful Old Faithful eruptions Density 0.00 0.01 0.02 0.03 0.04 40 50 60 70 80 90 100 Minutes Course contents Example of Bayesian modeling Jonas Wallin - jonwal@chalmers.se 13/27

Inference, parameter estimation It is easy to setup the models for the above presented examples, but how do you make inference for the models. Course contents MCMC Jonas Wallin - jonwal@chalmers.se 14/27

Inference, parameter estimation It is easy to setup the models for the above presented examples, but how do you make inference for the models. The major tool for estimation, prediction and prediction for complex models are Markov Chain Monte Carlo (MCMC ) methods. Course contents MCMC Jonas Wallin - jonwal@chalmers.se 14/27

Inference, parameter estimation It is easy to setup the models for the above presented examples, but how do you make inference for the models. The major tool for estimation, prediction and prediction for complex models are Markov Chain Monte Carlo (MCMC ) methods. We study how, why the work, and also how to use them to make inference of the examples presented above. Course contents MCMC Jonas Wallin - jonwal@chalmers.se 14/27

The prinicple aim of MC simulation The main problem of this course is to compute some expectation τ = E[h(X )] = h(x)f (x) dx, where X is a random variable taking values in χ, χ Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 15/27

The prinicple aim of MC simulation The main problem of this course is to compute some expectation τ = E[h(X )] = h(x)f (x) dx, where X is a random variable taking values in χ, f : χ R + is the probability density of X, and h : χ R is a function s.t that the expectation above is finite. χ Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 15/27

The prinicple aim of MC simulation The main problem of this course is to compute some expectation τ = E[h(X )] = h(x)f (x) dx, where X is a random variable taking values in χ, f : χ R + is the probability density of X, and χ h : χ R is a function s.t that the expectation above is finite. This might seem like a very limited problem, however as we will se the problem covers a large sets of problem in statistics and scientific modeling. Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 15/27

The curse of dimensionality Most numerical methods are accurate to an order of O(N c/d ), where N is the number of function evaluations used to approximate the integral, and c > 0 is a constant depending on numerical method. For example the trapezoidal method c = 2. Thus the error of our numerical approximation τ N of the integral is ɛ N = τ τ N CN c/d, where C > 0 is a constant depending of the function. To guarantee that the error should be less then δ, N must satisfy Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 16/27

The curse of dimensionality Most numerical methods are accurate to an order of O(N c/d ), where N is the number of function evaluations used to approximate the integral, and c > 0 is a constant depending on numerical method. For example the trapezoidal method c = 2. Thus the error of our numerical approximation τ N of the integral is ɛ N = τ τ N CN c/d, where C > 0 is a constant depending of the function. To guarantee that the error should be less then δ, N must satisfy CN c/d δ c d log(n) log( δ C ) Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 16/27

The curse of dimensionality Most numerical methods are accurate to an order of O(N c/d ), where N is the number of function evaluations used to approximate the integral, and c > 0 is a constant depending on numerical method. For example the trapezoidal method c = 2. Thus the error of our numerical approximation τ N of the integral is ɛ N = τ τ N CN c/d, where C > 0 is a constant depending of the function. To guarantee that the error should be less then δ, N must satisfy CN c/d δ c d log(n) log( δ C ) N e d c log( δ C ) This means that for a fixed error the number of functions evaluation grows exponentially with the dimension of the problem. Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 16/27

The Monte Carlo (MC) method in a nutshell Theorem (law of large numbers*) Let X 1, X 2,..., X N be independent random variables with density f. Then, if V[h(X )] < as N tends to infinity def. τ N = 1 N N h(x i ) E(h(X )). i=1 Inspired by this result, we formulate the following basic MC sampler (Stanis lav Ulam, John von Neumann, and Nicholas Metropolis; the Los Alamos Scientific Laboratory; 40 s): for i = 1 n do draw X i f end for set τ N n i=1 h(x i)/n return τ N Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 17/27

The first thoughts and attempts I made to practice [the Monte Carlo method] were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than abstract thinking might not be to lay it out say one hundred times and simply observe and count the number of successful plays. This was already possible to envisage with the beginning of the new era of fast computers, and I immediately thought of problems of neutron diffusion and other questions of mathematical physics, and more generally how to change processes described by certain differential equations into an equivalent form interpretable as a succession of random operations. Later [in 1946], I described the idea to John von Neumann, and we began to plan actual calculations. Stanis lav Ulam Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 18/27

Example: Integration The problem of computing an integral of form (0,1) d h(x) dx can be cast into our framework by letting { χ (0, 1) d f I(0, 1) d (= unif(0, 1) d ). Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 19/27

Example: Integration (cont.) As an example for d = 1, let h(x) = sin 2 (1/ cos(log(1 + 2πx))): h(x) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 20/27

Example: Integration (cont.) As an example for d = 1, let h(x) = sin 2 (1/ cos(log(1 + 2πx))): N <- 1000 h <- function(x){sin(1/(cos(log(1 + 2*pi*x))))^2} tau_n <- mean(h(runif(n))) Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 21/27

Example: Integration (cont.) tau 0.1 0.2 0.3 0.4 0.5 0.6 0 200 400 600 800 1000 iter Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 22/27

Rate of convergence of MC So what about the rate of convergence? Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 23/27

Rate of convergence of MC So what about the rate of convergence? For the MC method, the error is random. However, the central limit theorem implies, under the assumption that V(h(X )) <, d. N (τn τ) N (0, V(h(X ))). This means that for large N s, ( ) V N (τn τ) = NV (τ N τ) V(h(X )), implying that D (τ N τ) def. = V (τ N τ) V(h(X )) N = D(h(X )) N. Thus, the MC convergence rate O(N 1/2 ) is independent of d! Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 23/27

central limit theorem example CLT implies that N V(h(X )) (τ N τ) should be almost N (0, 1) for large N. Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 24/27

central limit theorem example CLT implies that N V(h(X )) (τ N τ) should be almost N (0, 1) for large N. So lets examine this for for our example above, by running 20000 independent repetitions of the Monte Carlo simulation. Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 24/27

central limit theorem example N = 1 Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 25/27

central limit theorem example N = 2 Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 25/27

central limit theorem example N = 3 Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 25/27

central limit theorem example N = 4 Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 25/27

central limit theorem example N = 5 Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 25/27

central limit theorem example N = 10 Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 25/27

central limit theorem example N = 20 Overview of the Monte Carlo method Jonas Wallin - jonwal@chalmers.se 25/27

What do we need to know? OK, so what do we need to master for having practical use of the MC method? What s next? Jonas Wallin - jonwal@chalmers.se 26/27

What do we need to know? OK, so what do we need to master for having practical use of the MC method? Well, for instance, the following questions should be answered: 1: How do we generate the needed input random variables? 2: How many computer experiments should we do? What can be said about the error? 3: Can we exploit problem structure to speed up the computation? What s next? Jonas Wallin - jonwal@chalmers.se 26/27

Next lecture Next time we will deal with the first two issues and discuss Pseudo-random number generation and MC output analysis. See you! What s next? Jonas Wallin - jonwal@chalmers.se 27/27