Simulations. . p.1/25

Size: px
Start display at page:

Download "Simulations. . p.1/25"

Transcription

1 Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications.. p.1/25

2 Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications.. p.1/25

3 Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications. We can compute transformed distributions numerically.. p.1/25

4 Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications. We can compute transformed distributions numerically. We can compute (complicated) mean values this is known as Monte Carlo simulation.. p.1/25

5 Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications. We can compute transformed distributions numerically. We can compute (complicated) mean values this is known as Monte Carlo simulation. We can investigate the behaviour of methods for statistical inference.. p.1/25

6 Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications. We can easily investigate a large number of scenarios (different P s) and a large number of replications. We can compute transformed distributions numerically. We can compute (complicated) mean values this is known as Monte Carlo simulation. We can investigate the behaviour of methods for statistical inference. But how can the deterministic computer generate the outcome from a probability measure?. p.1/25

7 Generic simulation Two step procedure behind simulation of random variables The computer emulates the generation of independent, identically distributed random variables with the uniform distribution on the unit interval [0, 1].. p.2/25

8 Generic simulation Two step procedure behind simulation of random variables The computer emulates the generation of independent, identically distributed random variables with the uniform distribution on the unit interval [0, 1]. The emulated uniformly distributed random variables are by transformation turned into variables with the desired distribution.. p.2/25

9 Theorem behind The following result is behind the generic simulation procedure for simulation from any P on E: Theorem: Let P 0 denote the uniform distribution on [0, 1] and h : [0, 1] E a map with the transformed probability measure on E being P = h(p 0 ). Then X 1, X 2,...,X n defined by X i = h(u i ) are n iid random variables each with distribution P.. p.3/25

10 The real problem What we need in practice is thus the construction of a transformation that can transform the uniform distribution on [0, 1] to the desired probability distribution.. p.4/25

11 The real problem What we need in practice is thus the construction of a transformation that can transform the uniform distribution on [0, 1] to the desired probability distribution. We focus here on two cases A general method for discrete distributions. A general method for probability measures on R given in terms of the distribution function.. p.4/25

12 But what about the simulation of the independent, uniformly distributed random variables?. p.5/25

13 But what about the simulation of the independent, uniformly distributed random variables? Thats a completely different story. Read D. E. Knuth, ACP, Chapter 3 or trust that R behaves well and that runif works correctly.. p.5/25

14 But what about the simulation of the independent, uniformly distributed random variables? Thats a completely different story. Read D. E. Knuth, ACP, Chapter 3 or trust that R behaves well and that runif works correctly. We rely on a sufficiently good pseudo random number generator with the property that as long as we can not statistically detect differences from what the generator produces and true iid [0, 1]-uniformly distributed random variables, then we live happily in ignorance.. p.5/25

15 Discrete random variables If P is a probability measure on a discrete sample space E given by point probabilities p(x), x E choose for each x E an interval I(x) = (a(x), b(x)] [0, 1]. p.6/25

16 Discrete random variables If P is a probability measure on a discrete sample space E given by point probabilities p(x), x E choose for each x E an interval such that I(x) = (a(x), b(x)] [0, 1] the length, b(x) a(x), of I(x) equals p(x), and the intervals I(x) are mutually disjoint: I(x) I(y) = for x y.. p.6/25

17 Discrete random variables If P is a probability measure on a discrete sample space E given by point probabilities p(x), x E choose for each x E an interval such that I(x) = (a(x), b(x)] [0, 1] the length, b(x) a(x), of I(x) equals p(x), and the intervals I(x) are mutually disjoint: I(x) I(y) = for x y. Letting u 1,...,u n be generated by a pseudo random number generator we define x i = x if u i I(x) for i = 1,...,n. Then x 1,...,x n is a realization of n iid random variables with distribution having point probabilities p(x), x E.. p.6/25

18 Generalized inverse Definition: Let F : R [0, 1] be a distribution function. A function F : (0, 1) R that satisfies F(x) y x F (y) (0) for all x R and y (0, 1) is called a generalized inverse of F.. p.7/25

19 Generalized inverse If F has a true inverse (F is strictly increasing and continuous) then F equals the inverse, F 1, of F.. p.8/25

20 Generalized inverse If F has a true inverse (F is strictly increasing and continuous) then F equals the inverse, F 1, of F. All distribution functions has a generalized inverse we find it by solving the inequality F(x) y.. p.8/25

21 Continuous sample space We will simulate from P on R having distribution function F. First find the generalized inverse, F : (0, 1) R, of F.. p.9/25

22 Continuous sample space We will simulate from P on R having distribution function F. First find the generalized inverse, F : (0, 1) R, of F. Then we let u 1,...,u n be generated by a pseudo random number generator and we define x i = F (u i ) for i = 1,...,n. Then x 1,...,x n is a realization of n iid random variables with distribution having distribution function F.. p.9/25

23 Local alignments Assume that X 1,...,X n and Y 1,...,Y m are in total n + m iid random variables with values in the 20 letter amino acid alphabet E = { A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V }.. p.10/25

24 Local alignments Assume that X 1,...,X n and Y 1,...,Y m are in total n + m iid random variables with values in the 20 letter amino acid alphabet E = { A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V }. We want to find optimal local alignment and in particular we are interested in the score for optimal local alignment. This is a function h : E n+m R.. p.10/25

25 Local alignments Assume that X 1,...,X n and Y 1,...,Y m are in total n + m iid random variables with values in the 20 letter amino acid alphabet E = { A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V }. We want to find optimal local alignment and in particular we are interested in the score for optimal local alignment. This is a function h : E n+m R. An X- and a Y -subsequence that are matched letter by letter, matched letters are given a score, positive or negative, and gaps in the subsequences are given a penalty.. p.10/25

26 Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m?. p.11/25

27 Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m? We can in principle compute its discrete distribution from the distribution of the X- and Y -variables. p.11/25

28 Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m? We can in principle compute its discrete distribution from the distribution of the X- and Y -variables futile and not possible in practice.. p.11/25

29 Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m? We can in principle compute its discrete distribution from the distribution of the X- and Y -variables futile and not possible in practice. It is possible to use simulations, but it may be quite time-consuming and not a pratical solution for current database usage.. p.11/25

30 Local alignment scores Denote by S n,m = h(x 1,...,X n, Y 1,...,Y m ) the transformed, real valued random variable. What is the distribution of S n,m? We can in principle compute its discrete distribution from the distribution of the X- and Y -variables futile and not possible in practice. It is possible to use simulations, but it may be quite time-consuming and not a pratical solution for current database usage. Develop a good theoretical approximation.. p.11/25

31 Local alignment scores Under certain conditions on the scoring mechanism and the letter distribution a valid approximation, for n and m large, is for parameters λ, K > 0 P(S n,m x) exp( Knm exp( λx)). p.12/25

32 Local alignment scores Under certain conditions on the scoring mechanism and the letter distribution a valid approximation, for n and m large, is for parameters λ, K > 0 P(S n,m x) exp( Knm exp( λx)) This is a scale location transformation S n,m = log(knm) λ + S n,m λ where S n,m has a Gumbel distribution.. p.12/25

33 Statistical models Example: We measure the expression of out favorite gene number i on a microarray. The additive noise model reads that our measurement can be written as X i = µ i + σ i ǫ i, where µ i R, σ i > 0 and ǫ i has mean 0 and variance 1.. p.13/25

34 Statistical models Example: We measure the expression of out favorite gene number i on a microarray. The additive noise model reads that our measurement can be written as X i = µ i + σ i ǫ i, where µ i R, σ i > 0 and ǫ i has mean 0 and variance 1. If ǫ i N(0, 1) we have X i N(µ i, σi 2 ) and we have fully specified our model with unknown parameters (µ i, σ i ) R (0, ).. p.13/25

35 Statistical models Example: We want to consider pairs of nucleotides (X i, Y i ) that are evolutionary related. We assume that they are independent and identically distributed and that P(X 1 = x, Y 1 = y) = p(x)p t (x, y) p(x)( exp( 4αt)) if x = y = p(x)( exp( 4αt)) if x y.. p.14/25

36 Statistical models Example: We want to consider pairs of nucleotides (X i, Y i ) that are evolutionary related. We assume that they are independent and identically distributed and that P(X 1 = x, Y 1 = y) = p(x)p t (x, y) p(x)( exp( 4αt)) if x = y = p(x)( exp( 4αt)) if x y. The unknown parameters are α > 0 and the four-dimensional probability vector p. Perhaps t is also an unknown parameter.. p.14/25

37 Statistical models We need a sample space E.. p.15/25

38 Statistical models We need a sample space E. We need a parameter space Θ of unknown parameters... p.15/25

39 Statistical models We need a sample space E. We need a parameter space Θ of unknown parameters.... and for each θ Θ we need a probability measure P θ on E.. p.15/25

40 Statistical models We need a sample space E. We need a parameter space Θ of unknown parameters.... and for each θ Θ we need a probability measure P θ on E. We call (P θ ) θ Θ a parameterized family of probability measures.. p.15/25

41 Exponential distribution Let E 0 = [0, ), let θ (0, ), and let P θ be the distribution of n iid exponentially distributed random variables X 1,...,X n with intensity parameter θ.. p.16/25

42 Exponential distribution Let E 0 = [0, ), let θ (0, ), and let P θ be the distribution of n iid exponentially distributed random variables X 1,...,X n with intensity parameter θ. The distribution of X i has density f θ (x) = θ exp( θx) for x 0. The probability measure P θ on E = E0 n = (0, ) n has density f θ (x 1,...,x n ) = θ exp( θx 1 )... θ exp( θx n ) = θ n exp( θ(x x n )).. p.16/25

43 Exponential distribution Let E 0 = [0, ), let θ (0, ), and let P θ be the distribution of n iid exponentially distributed random variables X 1,...,X n with intensity parameter θ. The distribution of X i has density f θ (x) = θ exp( θx) for x 0. The probability measure P θ on E = E0 n = (0, ) n has density f θ (x 1,...,x n ) = θ exp( θx 1 )... θ exp( θx n ) = θ n exp( θ(x x n )). With Θ = (0, ) the family (P θ ) θ Θ of probability measures is a statistical model on E.. p.16/25

44 Estimators Definition: An estimator is a map ˆθ : E Θ. For a given observation x E the value of ˆθ at x, ˆϑ = ˆθ(x), is called the estimate of θ.. p.17/25

45 Estimators Definition: An estimator is a map ˆθ : E Θ. For a given observation x E the value of ˆθ at x, ˆϑ = ˆθ(x), is called the estimate of θ. If X has distribution P θ, the transformed random variable ˆθ(X) is also called the estimator it has distribution ˆθ(P θ ).. p.17/25

46 Identifiability Definition: The parameter θ is said to be identifiable if the map θ P θ is one-to-one. That is, for two different parameters θ 1 and θ 2 the corresponding measures P θ1 and P θ2 differ.. p.18/25

47 Identifiability Definition: The parameter θ is said to be identifiable if the map θ P θ is one-to-one. That is, for two different parameters θ 1 and θ 2 the corresponding measures P θ1 and P θ2 differ. We can not in a meaningful way estimate an unknown parameter that is not identifiable!. p.18/25

48 Simulations Write a function, my.rexp, that takes two parameters such that > tmp <- my.rexp(10, 1) generates the realization of 10 iid random variables with the exponential distribution with parameter λ = 1. How do you make the second parameter to be equal to 1 by default such that > tmp <- my.rexp(10) produces the same result?. p.19/25

49 Solution > my.rexp <- function(n, lambda) { + -log(runif(n))/lambda + } To make λ = 1 by default we define instead > my.rexp <- function(n, lambda = 1) { + -log(runif(n))/lambda + } Note that we have used that if U is uniformly distributed on [0, 1] then 1 U is uniformly distributed on [0, 1] to get rid of unnecessary computations.. p.20/25

50 Maximum of random variables Use > tmp <- replicate(1000, max(rexp(10, 1))) to generate 1000 replications of the maximum of 10 independent exponential random variables. Plot the distribution function for the Gumbel distribution with location parameter log(10) and compare it with > emdf <- function(x) sapply(x, function(x) sum(tmp <= x)/1000) What if we take the max of 100 exponential random variables?. p.21/25

51 Solutions > x <- seq(0, 5, by = 0.1) > plot(x, exp(-exp(-(x - log(10)))), type = "l") > points(x, emdf(x), type = "p", pch = 20, col = "red"). p.22/25

52 Solutions x. p.23/25 exp( exp( (x log(10))))

53 Solutions > tmp <- replicate(1000, max(rexp(100, 1))) > emdf <- function(x) sapply(x, function(x) sum(tmp <= x)/1000) > x <- seq(0, 8, by = 0.1) > plot(x, exp(-exp(-(x - log(100)))), type = "l") > points(x, emdf(x), type = "p", pch = 20, col = "red"). p.24/25

54 Solutions x. p.25/25 exp( exp( (x log(100))))

(NRH: Sections 2.6, 2.7, 2.11, 2.12 (at this point in the course the sections will be difficult to follow))

(NRH: Sections 2.6, 2.7, 2.11, 2.12 (at this point in the course the sections will be difficult to follow)) Curriculum, second lecture: Niels Richard Hansen November 23, 2011 NRH: Handout pages 1-13 PD: Pages 55-75 (NRH: Sections 2.6, 2.7, 2.11, 2.12 (at this point in the course the sections will be difficult

More information

Mean and variance. Compute the mean and variance of the distribution with density

Mean and variance. Compute the mean and variance of the distribution with density Mean and variance Compute the mean and variance of the distribution with density > f

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring Independent Events Two events are independent if knowing that one occurs does not change the probability of the other occurring Conditional probability is denoted P(A B), which is defined to be: P(A and

More information

Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci

Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by specifying one or more values called parameters. The number

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018 Lecture 1 Stochastic Optimization: Introduction January 8, 2018 Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing

More information

Chapter 4. Continuous Random Variables

Chapter 4. Continuous Random Variables Chapter 4. Continuous Random Variables Review Continuous random variable: A random variable that can take any value on an interval of R. Distribution: A density function f : R R + such that 1. non-negative,

More information

3.3 Estimator quality, confidence sets and bootstrapping

3.3 Estimator quality, confidence sets and bootstrapping Estimator quality, confidence sets and bootstrapping 109 3.3 Estimator quality, confidence sets and bootstrapping A comparison of two estimators is always a matter of comparing their respective distributions.

More information

Week 2 Statistics for bioinformatics and escience

Week 2 Statistics for bioinformatics and escience Week 2 Statistics for bioinformatics and escience Line Skotte 20. november 2008 2.5.1-5) Revisited. When solving these exercises, some of you tried to capture a whole open reading frame by pattern matching

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

Importance sampling in scenario generation

Importance sampling in scenario generation Importance sampling in scenario generation Václav Kozmík Faculty of Mathematics and Physics Charles University in Prague September 14, 2013 Introduction Monte Carlo techniques have received significant

More information

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016 Lecture 1: Random number generation, permutation test, and the bootstrap August 25, 2016 Statistical simulation 1/21 Statistical simulation (Monte Carlo) is an important part of statistical method research.

More information

Chapters 9. Properties of Point Estimators

Chapters 9. Properties of Point Estimators Chapters 9. Properties of Point Estimators Recap Target parameter, or population parameter θ. Population distribution f(x; θ). { probability function, discrete case f(x; θ) = density, continuous case The

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

Markov Chain Monte Carlo Lecture 1

Markov Chain Monte Carlo Lecture 1 What are Monte Carlo Methods? The subject of Monte Carlo methods can be viewed as a branch of experimental mathematics in which one uses random numbers to conduct experiments. Typically the experiments

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

Probability on a Riemannian Manifold

Probability on a Riemannian Manifold Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and

More information

1 Probability and Random Variables

1 Probability and Random Variables 1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Ordered Sample Generation

Ordered Sample Generation Ordered Sample Generation Xuebo Yu November 20, 2010 1 Introduction There are numerous distributional problems involving order statistics that can not be treated analytically and need to simulated through

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables. Stat 260 - Lecture 20 Recap of Last Class Last class we introduced the covariance and correlation between two jointly distributed random variables. Today: We will introduce the idea of a statistic and

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional

More information

1 Probability Model. 1.1 Types of models to be discussed in the course

1 Probability Model. 1.1 Types of models to be discussed in the course Sufficiency January 18, 016 Debdeep Pati 1 Probability Model Model: A family of distributions P θ : θ Θ}. P θ (B) is the probability of the event B when the parameter takes the value θ. P θ is described

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently Chapter 3 Sufficient statistics and variance reduction Let X 1,X 2,...,X n be a random sample from a certain distribution with p.m/d.f fx θ. A function T X 1,X 2,...,X n = T X of these observations is

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Stat 451 Lecture Notes Simulating Random Variables

Stat 451 Lecture Notes Simulating Random Variables Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated:

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Lesson 9 Exploring Graphs of Quadratic Functions

Lesson 9 Exploring Graphs of Quadratic Functions Exploring Graphs of Quadratic Functions Graph the following system of linear inequalities: { y > 1 2 x 5 3x + 2y 14 a What are three points that are solutions to the system of inequalities? b Is the point

More information

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Exercises in Extreme value theory

Exercises in Extreme value theory Exercises in Extreme value theory 2016 spring semester 1. Show that L(t) = logt is a slowly varying function but t ǫ is not if ǫ 0. 2. If the random variable X has distribution F with finite variance,

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences

More information

Lecture 5: Importance sampling and Hamilton-Jacobi equations

Lecture 5: Importance sampling and Hamilton-Jacobi equations Lecture 5: Importance sampling and Hamilton-Jacobi equations Henrik Hult Department of Mathematics KTH Royal Institute of Technology Sweden Summer School on Monte Carlo Methods and Rare Events Brown University,

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

1 Complete Statistics

1 Complete Statistics Complete Statistics February 4, 2016 Debdeep Pati 1 Complete Statistics Suppose X P θ, θ Θ. Let (X (1),..., X (n) ) denote the order statistics. Definition 1. A statistic T = T (X) is complete if E θ g(t

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

IEOR 4703: Homework 2 Solutions

IEOR 4703: Homework 2 Solutions IEOR 4703: Homework 2 Solutions Exercises for which no programming is required Let U be uniformly distributed on the interval (0, 1); P (U x) = x, x (0, 1). We assume that your computer can sequentially

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Solutions to Homework Set #3 Channel and Source coding

Solutions to Homework Set #3 Channel and Source coding Solutions to Homework Set #3 Channel and Source coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits per channel use)

More information

13. Parameter Estimation. ECE 830, Spring 2014

13. Parameter Estimation. ECE 830, Spring 2014 13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations

More information

COMP2610/COMP Information Theory

COMP2610/COMP Information Theory COMP2610/COMP6261 - Information Theory Lecture 9: Probabilistic Inequalities Mark Reid and Aditya Menon Research School of Computer Science The Australian National University August 19th, 2014 Mark Reid

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions

More information

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X). 4. Interval estimation The goal for interval estimation is to specify the accurary of an estimate. A 1 α confidence set for a parameter θ is a set C(X) in the parameter space Θ, depending only on X, such

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is Likelihoods The distribution of a random variable Y with a discrete sample space (e.g. a finite sample space or the integers) can be characterized by its probability mass function (pmf): P (Y = y) = f(y).

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

CS281A/Stat241A Lecture 17

CS281A/Stat241A Lecture 17 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

THE QUEEN S UNIVERSITY OF BELFAST

THE QUEEN S UNIVERSITY OF BELFAST THE QUEEN S UNIVERSITY OF BELFAST 0SOR20 Level 2 Examination Statistics and Operational Research 20 Probability and Distribution Theory Wednesday 4 August 2002 2.30 pm 5.30 pm Examiners { Professor R M

More information

Econometría 2: Análisis de series de Tiempo

Econometría 2: Análisis de series de Tiempo Econometría 2: Análisis de series de Tiempo Karoll GOMEZ kgomezp@unal.edu.co http://karollgomez.wordpress.com Segundo semestre 2016 II. Basic definitions A time series is a set of observations X t, each

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information