Simulations. . p.1/25

Similar documents
(NRH: Sections 2.6, 2.7, 2.11, 2.12 (at this point in the course the sections will be difficult to follow))

Mean and variance. Compute the mean and variance of the distribution with density

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

ECE 275A Homework 7 Solutions

Part III. A Decision-Theoretic Approach and Bayesian testing

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Chapter 4. Continuous Random Variables

3.3 Estimator quality, confidence sets and bootstrapping

Week 2 Statistics for bioinformatics and escience

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Importance sampling in scenario generation

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Chapters 9. Properties of Point Estimators

16 : Markov Chain Monte Carlo (MCMC)

Composite Hypotheses and Generalized Likelihood Ratio Tests

ECE 275B Homework # 1 Solutions Version Winter 2015

Markov Chain Monte Carlo Lecture 1

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Probability and Distributions

Example: Letter Frequencies

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

ECE 275B Homework # 1 Solutions Winter 2018

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Non-parametric Inference and Resampling

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Example: Letter Frequencies

Probability and Measure

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Mathematical statistics

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Probability on a Riemannian Manifold

1 Probability and Random Variables

Random Variables and Their Distributions

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Ordered Sample Generation

Introduction to Statistical Learning Theory

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

1 Probability Model. 1.1 Types of models to be discussed in the course

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Stat 451 Lecture Notes Simulating Random Variables

Introduction to Machine Learning Midterm Exam Solutions

6.867 Machine Learning

Machine Learning And Applications: Supervised Learning-SVM

Statistics 3858 : Maximum Likelihood Estimators

Lesson 9 Exploring Graphs of Quadratic Functions

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Exercises in Extreme value theory

Markov Chain Monte Carlo Lecture 6

Lecture 5: Importance sampling and Hamilton-Jacobi equations

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Markov Chain Monte Carlo (MCMC)

1 Complete Statistics

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

IEOR 4703: Homework 2 Solutions

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Solutions to Homework Set #3 Channel and Source coding

13. Parameter Estimation. ECE 830, Spring 2014

COMP2610/COMP Information Theory

Directed and Undirected Graphical Models

Robustness to Parametric Assumptions in Missing Data Models

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Math Review Sheet, Fall 2008

CS281A/Stat241A Lecture 17

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Econometrics I, Estimation

Lecture 7 Introduction to Statistical Decision Theory

Information Theory and Communication

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Lecture 11: Continuous-valued signals and differential entropy

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

THE QUEEN S UNIVERSITY OF BELFAST

Econometría 2: Análisis de series de Tiempo

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

The sample complexity of agnostic learning with deterministic labels

Probability and Measure

HT Introduction. P(X i = x i ) = e λ λ x i

Brief Review on Estimation Theory

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

CS281A/Stat241A Lecture 22

Transcription:

Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications.. p.1/25

Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications.. p.1/25

Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications. We can compute transformed distributions numerically.. p.1/25

Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications. We can compute transformed distributions numerically. We can compute (complicated) mean values this is known as Monte Carlo simulation.. p.1/25

Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications. We can compute transformed distributions numerically. We can compute (complicated) mean values this is known as Monte Carlo simulation. We can investigate the behaviour of methods for statistical inference.. p.1/25

Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications. We can compute transformed distributions numerically. We can compute (complicated) mean values this is known as Monte Carlo simulation. We can investigate the behaviour of methods for statistical inference. But how can the deterministic computer generate the outcome from a probability measure?. p.1/25

Generic simulation Two step procedure behind simulation of random variables The computer emulates the generation of independent, identically distributed random variables with the uniform distribution on the unit interval [0, 1].. p.2/25

Generic simulation Two step procedure behind simulation of random variables The computer emulates the generation of independent, identically distributed random variables with the uniform distribution on the unit interval [0, 1]. The emulated uniformly distributed random variables are by transformation turned into variables with the desired distribution.. p.2/25

Theorem behind The following result is behind the generic simulation procedure for simulation from any P on E: Theorem: Let P 0 denote the uniform distribution on [0, 1] and h : [0, 1] E a map with the transformed probability measure on E being P = h(p 0 ). Then X 1, X 2,...,X n defined by X i = h(u i ) are n iid random variables each with distribution P.. p.3/25

The real problem What we need in practice is thus the construction of a transformation that can transform the uniform distribution on [0, 1] to the desired probability distribution.. p.4/25

The real problem What we need in practice is thus the construction of a transformation that can transform the uniform distribution on [0, 1] to the desired probability distribution. We focus here on two cases A general method for discrete distributions. A general method for probability measures on R given in terms of the distribution function.. p.4/25

But what about...... the simulation of the independent, uniformly distributed random variables?. p.5/25

But what about...... the simulation of the independent, uniformly distributed random variables? Thats a completely different story. Read D. E. Knuth, ACP, Chapter 3 or trust that R behaves well and that runif works correctly.. p.5/25

But what about...... the simulation of the independent, uniformly distributed random variables? Thats a completely different story. Read D. E. Knuth, ACP, Chapter 3 or trust that R behaves well and that runif works correctly. We rely on a sufficiently good pseudo random number generator with the property that as long as we can not statistically detect differences from what the generator produces and true iid [0, 1]-uniformly distributed random variables, then we live happily in ignorance.. p.5/25

Discrete random variables If P is a probability measure on a discrete sample space E given by point probabilities p(x), x E choose for each x E an interval I(x) = (a(x), b(x)] [0, 1]. p.6/25

Discrete random variables If P is a probability measure on a discrete sample space E given by point probabilities p(x), x E choose for each x E an interval such that I(x) = (a(x), b(x)] [0, 1] the length, b(x) a(x), of I(x) equals p(x), and the intervals I(x) are mutually disjoint: I(x) I(y) = for x y.. p.6/25

Discrete random variables If P is a probability measure on a discrete sample space E given by point probabilities p(x), x E choose for each x E an interval such that I(x) = (a(x), b(x)] [0, 1] the length, b(x) a(x), of I(x) equals p(x), and the intervals I(x) are mutually disjoint: I(x) I(y) = for x y. Letting u 1,...,u n be generated by a pseudo random number generator we define x i = x if u i I(x) for i = 1,...,n. Then x 1,...,x n is a realization of n iid random variables with distribution having point probabilities p(x), x E.. p.6/25

Generalized inverse Definition: Let F : R [0, 1] be a distribution function. A function F : (0, 1) R that satisfies F(x) y x F (y) (0) for all x R and y (0, 1) is called a generalized inverse of F.. p.7/25

Generalized inverse If F has a true inverse (F is strictly increasing and continuous) then F equals the inverse, F 1, of F.. p.8/25

Generalized inverse If F has a true inverse (F is strictly increasing and continuous) then F equals the inverse, F 1, of F. All distribution functions has a generalized inverse we find it by solving the inequality F(x) y.. p.8/25

Continuous sample space We will simulate from P on R having distribution function F. First find the generalized inverse, F : (0, 1) R, of F.. p.9/25

Continuous sample space We will simulate from P on R having distribution function F. First find the generalized inverse, F : (0, 1) R, of F. Then we let u 1,...,u n be generated by a pseudo random number generator and we define x i = F (u i ) for i = 1,...,n. Then x 1,...,x n is a realization of n iid random variables with distribution having distribution function F.. p.9/25

Local alignments Assume that X 1,...,X n and Y 1,...,Y m are in total n + m iid random variables with values in the 20 letter amino acid alphabet E = { A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V }.. p.10/25

Local alignments Assume that X 1,...,X n and Y 1,...,Y m are in total n + m iid random variables with values in the 20 letter amino acid alphabet E = { A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V }. We want to find optimal local alignment and in particular we are interested in the score for optimal local alignment. This is a function h : E n+m R.. p.10/25

Local alignments Assume that X 1,...,X n and Y 1,...,Y m are in total n + m iid random variables with values in the 20 letter amino acid alphabet E = { A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V }. We want to find optimal local alignment and in particular we are interested in the score for optimal local alignment. This is a function h : E n+m R. An X- and a Y -subsequence that are matched letter by letter, matched letters are given a score, positive or negative, and gaps in the subsequences are given a penalty.. p.10/25

Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m?. p.11/25

Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m? We can in principle compute its discrete distribution from the distribution of the X- and Y -variables. p.11/25

Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m? We can in principle compute its discrete distribution from the distribution of the X- and Y -variables futile and not possible in practice.. p.11/25

Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m? We can in principle compute its discrete distribution from the distribution of the X- and Y -variables futile and not possible in practice. It is possible to use simulations, but it may be quite time-consuming and not a pratical solution for current database usage.. p.11/25

Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m? We can in principle compute its discrete distribution from the distribution of the X- and Y -variables futile and not possible in practice. It is possible to use simulations, but it may be quite time-consuming and not a pratical solution for current database usage. Develop a good theoretical approximation.. p.11/25

Local alignment scores Under certain conditions on the scoring mechanism and the letter distribution a valid approximation, for n and m large, is for parameters λ, K > 0 P(S n,m x) exp( Knm exp( λx)). p.12/25

Local alignment scores Under certain conditions on the scoring mechanism and the letter distribution a valid approximation, for n and m large, is for parameters λ, K > 0 P(S n,m x) exp( Knm exp( λx)) This is a scale location transformation S n,m = log(knm) λ + S n,m λ where S n,m has a Gumbel distribution.. p.12/25

Statistical models Example: We measure the expression of out favorite gene number i on a microarray. The additive noise model reads that our measurement can be written as X i = µ i + σ i ǫ i, where µ i R, σ i > 0 and ǫ i has mean 0 and variance 1.. p.13/25

Statistical models Example: We measure the expression of out favorite gene number i on a microarray. The additive noise model reads that our measurement can be written as X i = µ i + σ i ǫ i, where µ i R, σ i > 0 and ǫ i has mean 0 and variance 1. If ǫ i N(0, 1) we have X i N(µ i, σi 2 ) and we have fully specified our model with unknown parameters (µ i, σ i ) R (0, ).. p.13/25

Statistical models Example: We want to consider pairs of nucleotides (X i, Y i ) that are evolutionary related. We assume that they are independent and identically distributed and that P(X 1 = x, Y 1 = y) = p(x)p t (x, y) p(x)(0.25 + 0.75 exp( 4αt)) if x = y = p(x)(0.25 0.25 exp( 4αt)) if x y.. p.14/25

Statistical models Example: We want to consider pairs of nucleotides (X i, Y i ) that are evolutionary related. We assume that they are independent and identically distributed and that P(X 1 = x, Y 1 = y) = p(x)p t (x, y) p(x)(0.25 + 0.75 exp( 4αt)) if x = y = p(x)(0.25 0.25 exp( 4αt)) if x y. The unknown parameters are α > 0 and the four-dimensional probability vector p. Perhaps t is also an unknown parameter.. p.14/25

Statistical models We need a sample space E.. p.15/25

Statistical models We need a sample space E. We need a parameter space Θ of unknown parameters... p.15/25

Statistical models We need a sample space E. We need a parameter space Θ of unknown parameters.... and for each θ Θ we need a probability measure P θ on E.. p.15/25

Statistical models We need a sample space E. We need a parameter space Θ of unknown parameters.... and for each θ Θ we need a probability measure P θ on E. We call (P θ ) θ Θ a parameterized family of probability measures.. p.15/25

Exponential distribution Let E 0 = [0, ), let θ (0, ), and let P θ be the distribution of n iid exponentially distributed random variables X 1,...,X n with intensity parameter θ.. p.16/25

Exponential distribution Let E 0 = [0, ), let θ (0, ), and let P θ be the distribution of n iid exponentially distributed random variables X 1,...,X n with intensity parameter θ. The distribution of X i has density f θ (x) = θ exp( θx) for x 0. The probability measure P θ on E = E0 n = (0, ) n has density f θ (x 1,...,x n ) = θ exp( θx 1 )... θ exp( θx n ) = θ n exp( θ(x 1 +...+x n )).. p.16/25

Exponential distribution Let E 0 = [0, ), let θ (0, ), and let P θ be the distribution of n iid exponentially distributed random variables X 1,...,X n with intensity parameter θ. The distribution of X i has density f θ (x) = θ exp( θx) for x 0. The probability measure P θ on E = E0 n = (0, ) n has density f θ (x 1,...,x n ) = θ exp( θx 1 )... θ exp( θx n ) = θ n exp( θ(x 1 +...+x n )). With Θ = (0, ) the family (P θ ) θ Θ of probability measures is a statistical model on E.. p.16/25

Estimators Definition: An estimator is a map ˆθ : E Θ. For a given observation x E the value of ˆθ at x, ˆϑ = ˆθ(x), is called the estimate of θ.. p.17/25

Estimators Definition: An estimator is a map ˆθ : E Θ. For a given observation x E the value of ˆθ at x, ˆϑ = ˆθ(x), is called the estimate of θ. If X has distribution P θ, the transformed random variable ˆθ(X) is also called the estimator it has distribution ˆθ(P θ ).. p.17/25

Identifiability Definition: The parameter θ is said to be identifiable if the map θ P θ is one-to-one. That is, for two different parameters θ 1 and θ 2 the corresponding measures P θ1 and P θ2 differ.. p.18/25

Identifiability Definition: The parameter θ is said to be identifiable if the map θ P θ is one-to-one. That is, for two different parameters θ 1 and θ 2 the corresponding measures P θ1 and P θ2 differ. We can not in a meaningful way estimate an unknown parameter that is not identifiable!. p.18/25

Simulations Write a function, my.rexp, that takes two parameters such that > tmp <- my.rexp(10, 1) generates the realization of 10 iid random variables with the exponential distribution with parameter λ = 1. How do you make the second parameter to be equal to 1 by default such that > tmp <- my.rexp(10) produces the same result?. p.19/25

Solution > my.rexp <- function(n, lambda) { + -log(runif(n))/lambda + } To make λ = 1 by default we define instead > my.rexp <- function(n, lambda = 1) { + -log(runif(n))/lambda + } Note that we have used that if U is uniformly distributed on [0, 1] then 1 U is uniformly distributed on [0, 1] to get rid of unnecessary computations.. p.20/25

Maximum of random variables Use > tmp <- replicate(1000, max(rexp(10, 1))) to generate 1000 replications of the maximum of 10 independent exponential random variables. Plot the distribution function for the Gumbel distribution with location parameter log(10) and compare it with > emdf <- function(x) sapply(x, function(x) sum(tmp <= x)/1000) What if we take the max of 100 exponential random variables?. p.21/25

Solutions > x <- seq(0, 5, by = 0.1) > plot(x, exp(-exp(-(x - log(10)))), type = "l") > points(x, emdf(x), type = "p", pch = 20, col = "red"). p.22/25

Solutions 0 1 2 3 4 5 x. p.23/25 exp( exp( (x log(10)))) 0.0 0.2 0.4 0.6 0.8

Solutions > tmp <- replicate(1000, max(rexp(100, 1))) > emdf <- function(x) sapply(x, function(x) sum(tmp <= x)/1000) > x <- seq(0, 8, by = 0.1) > plot(x, exp(-exp(-(x - log(100)))), type = "l") > points(x, emdf(x), type = "p", pch = 20, col = "red"). p.24/25

Solutions 0 2 4 6 8 x. p.25/25 exp( exp( (x log(100)))) 0.0 0.2 0.4 0.6 0.8 1.0