Stat 516, Homework 1

Similar documents
Spring 2012 Math 541B Exam 1

Essentials on the Analysis of Randomized Algorithms

Convergence Rate of Markov Chains

Bayesian Methods with Monte Carlo Markov Chains II

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

A Bayesian Approach to Phylogenetics

Lecture 6: Markov Chain Monte Carlo

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Computational statistics

Bayesian Methods for Machine Learning

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Markov chain Monte Carlo

Markov Chain Monte Carlo (MCMC)

Approximate Counting and Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

Monte Carlo Methods. Leon Gu CSD, CMU

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

SMSTC: Probability and Statistics


Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

STA 4273H: Statistical Machine Learning

MCMC: Markov Chain Monte Carlo

STAT 425: Introduction to Bayesian Analysis

Mathematical Methods for Computer Science

Contents. Part I: Fundamentals of Bayesian Inference 1

Principles of Bayesian Inference

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Some Definition and Example of Markov Chain

Markov Chain Monte Carlo

6 Markov Chain Monte Carlo (MCMC)

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

LECTURE 15 Markov chain Monte Carlo

Introduction to Bayesian methods in inverse problems

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

ABC methods for phase-type distributions with applications in insurance risk problems

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Bayesian GLMs and Metropolis-Hastings Algorithm

Brief introduction to Markov Chain Monte Carlo

Math Homework 5 Solutions

Math 180B Problem Set 3

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Chapter 7. Markov chain background. 7.1 Finite state space

1 Stat 605. Homework I. Due Feb. 1, 2011

MARKOV CHAIN MONTE CARLO

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Markov Chains and MCMC

Randomized Algorithms

. Find E(V ) and var(v ).

CSC 2541: Bayesian Methods for Machine Learning

Control Variates for Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Lecture 20 : Markov Chains

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Numerical methods for lattice field theory

Markov Chains and MCMC

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Markov Chains on Countable State Space

Random Variable. Pr(X = a) = Pr(s)

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

MARKOV PROCESSES. Valerio Di Valerio

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Theory of Stochastic Processes 8. Markov chain Monte Carlo

CSE 312 Final Review: Section AA

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Disjointness and Additivity

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Stat 5101 Lecture Notes

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

1 Introduction. P (n = 1 red ball drawn) =

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Math 151. Rumbos Fall Solutions to Review Problems for Final Exam

Statistics & Data Sciences: First Year Prelim Exam May 2018

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Objectives for Stat 225

AARMS Homework Exercises

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

MCMC algorithms for fitting Bayesian models

1. Let X and Y be independent exponential random variables with rate α. Find the densities of the random variables X 3, X Y, min(x, Y 3 )

P(X 0 = j 0,... X nk = j k )

STAT 414: Introduction to Probability Theory

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Ch5. Markov Chain Monte Carlo

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Chapter 11 Advanced Topic Stochastic Processes

Math 6810 (Probability) Fall Lecture notes

Simulation - Lectures - Part III Markov chain Monte Carlo

Solution: (Course X071570: Stochastic Processes)

Markov Chains and Stochastic Sampling

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Transcription:

Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball that we have seen before. (a) Find an expression for Pr(N > k) for k = 2,,..., n + 1. n!(k 1) (b) Show that Pr(N = k) = for k = 2,,..., n + 1. (n k + 1)!nk n ( (c) Prove that E(N) = 2 + 1 1 ) ( 1 k 1 ). n n k=2 Hint: (b) and (c) follow directly from (a), so skipping (a) will make your life more difficult. 2. (Gamma-Poisson mixture) Recall that if Z Poisson(λ), then E(Z) = Var(Z) = λ, where λ is the intensity parameter. This property of the Poisson distribution is dissatisfying, because in practice the variance of observed count data often exceeds the mean. One possible solution to this discrepancy is to use a Gamma-Poisson mixture, which generated by first randomly drawing X Gamma(α, β), where α, β > 0 (I am using the inverse-scale parameterization, so that E(X) = α/β and Var(X) = α/β 2 ). Then, we generate Y X Poisson(X). Derive E(Y ) an Var(Y ) and show that E(Y ) < Var(Y ).. (lack of memory property) (a) Let T be a geometric random variable. Show that for any integers k, k 0 1, we have Pr(T = k + k 0 T > k 0 ) = Pr(T = k). (b) Let X be an exponential random variable with rate parameter λ. Show that for all t, t 0 > 0, we have Pr(X t + t 0 X t 0 ) = Pr(X t).

Stat 516, Homework 2 Due date: October 14 1. Let {X n } n 0 be a homogeneous Markov chain with state space E and transition matrix P. Let τ be the first time n for which X n X 0, where τ = + if X n = X 0 for all n 0. Express E[τ X 0 = i] in terms of p ii. 2. Let {X n } n 0 be a homogeneous Markov chain with state space E = {1, 2,, 4} and transition matrix 0.2 0. 0.5 0 0 0.2 0. 0.5 P =. 0.5 0 0.2 0. 0. 0.5 0 0.2 What is the probability that when starting from state 1, the chain hits state before it hits state 4?.. Write a routine to simulate realizations of the gambler s ruin chain {X n } with probabilities p i,i+1 = p, p i,i 1 = q, p + q = 1. The routine should stop simulations as soon as you hit one of the absorbing states. Your input will consist of an initial state i, state space size N, and probability of increasing gambler s fortune p. The routine should return a vector of Markov chain states until absorption. (a) Provide the source code in any computer language of your choice and output of your routine in the form of 20 random realizations of the Markov chain for input parameters N = 10, i =, and p = 0.27. (b) Use your simulation routine to estimate the probability of reaching the largest state N = 10 starting at state 5, u(5, p), for probabilities p i,i+1 = p = 0.1, 0.2,..., 0.9. Turn in a graph with estimated u(5, p) plotted against p. In your graph, include values u(5, p) computed using the formulae that we derived in class.

Due date: October 21 Stat 516, Homework 1. Consider a sequence of L nucleotides (A,G,C, and T). We model evolution of this sequence as a discrete-time Markov chain. At each step, we randomly choose one of L nucleotides and replace it with one of the three equally probable alternatives. Notice that the randomly chosen sequence position must change its state. We assume that one of 4 L possible nucleotide sequences of length L has a special property of being able to bind regulatory proteins and control expression of one or more genes nearby. Let {X n } be a Markov chain that counts the number of positions where our randomly evolving sequence at step n matches the special, regulatory sequence. (a) For what i, j {0,..., L}, transition probabilities p ij = Pr(X 1 = j X 0 = i) are not equal to zero? Provide algebraic expressions for these non-zero transition probabilities. (b) Show that the stationary distribution of {X n }, π = (π 0,...,π L ), is binomial with L trials and probability of success 1. Explain why this stationary distribution is 4 unique. (c) Let T L = inf{n 1, X n = L} be the first time X n matches the target. Using your knowledge of π, show that E(T L X 0 = L) = 4 L. (d) Let µ n (i) = E(X n X 0 = i) be the mean number of matches in the evolving sequence at step n, given i matches at step 0. Show that for n 1, µ n (i) satisfies the following recursive equations µ n (0) = µ n 1 (0) 2 + µ n 1(1) 1 µ n (i) = µ n 1 (i 1) i L + µ n 1(i) L i L 2 + µ n 1(i + 1) L i L 1, i = 1,...,L 1 µ n (L) = µ n 1 (L 1). What initial conditions do these recursive equations satisfy? 2. Prove that recurrence is a communication class property: i j and i is recurrent j is recurrent.. Let {X n } be a homogeneous Markov chain with state space E and transition matrix P. Define Y n = (X n, X n+1 ). The process {Y n } is also a homogeneous Markov chain with a state space F = {(i 0, i 1 ) E 2 : p i0 i 1 > 0}. (a) Derive the general entry of the transition matrix of {Y n }. (b) Show that if {X n } is irreducible, then so is {Y n }. (c) Show that if {X n } has a stationary distribution π, then {Y n } also has a stationary distribution. Express the general entry of this stationary distribution in terms of π and P.

Due date: October 28 Stat 516, Homework 4 1. Prove that an irreducible homogeneous Markov chain on a finite state space is positive recurrent. Hint: The main step in the proof is to establish recurrence of the Markov chain. Try to complete this part of the proof by contradiction. 2. Let {X n } be an irreducible positive recurrent homogeneous Markov chain with stationary distribution π. Define k(n) as the number of returns of the chain to a subset of state space states A E during the first n steps. Prove that k(n) a.s. π i. n i A. Consider a Markov chain with transition probability matrix P = 1 1 1 1 0 0 0 1 0 (a) Show that this Markov chain has a limiting distribution and find this distribution analytically. (b) Take three arbitrary numbers x 1, x 2, and x and form the successive running averages x n = (x n + x n 2 + x n 1 )/ starting with x 4. Using what you know about the lim n P n, prove that. lim x n = x 1 + 2x 2 + x. n 6 4. Consider the Ehrenfest model of diffusion with N = 100 gas molecules. From our derivations we know that the stationary distribution of the chain is Bin(0.5, N). We also know that the chain is irreducible and positive recurrent. Use simulations and the ergodic theorem to approximate the variance of the stationary distribution and compare your approximation with the true value of the stationary variance. 5. Square matrices A and B are called similar if there exists a non-singular matrix T such that A = T 1 BT. Prove that transition probability matrix of an irreducible and reversible Markov chain defined on a finite state-space is similar to a symmetric matrix.

Due date: November 16 Stat 516, Homework 5 1. In this exercise, you will statistically analyze the Wright-Fisher model with mutations. To simplify the analysis, assume that Pr(a A) = Pr(A a) = u, so that transition probabilities of {X n } are ( ) 2m p ij = p j i j (1 p i) 2m j, where p i = i ( (1 u) + 1 i ) u. 2m 2m (a) Write a simulation routine to generate realizations from the Markov chain. Setting the mutation probability u = 0.5 and gene number 2m = 10, generate 200 iterations of the chain starting from state 0. (b) Using your simulated data, compute the maximum likelihood estimate of the mutation probability u. I suggest doing this numerically. (c) Obtain a 95% confidence interval for u using asymptotic results discussed in class. You will need to estimate the stationary distribution. (d) Check your asymptotic-based answers by repeating the simulation and estimation 1000 times and reporting relevant summaries of the resulting empirical distribution of estimates of u. (e) Test the null hypothesis H 0 : u = 0.4 against the alternative H 1 : u 0.4 using a likelihood ratio test. Attach the source code with comments describing your steps.

Due date: November 2 Stat 516, Homework 6 1. Suppose we want to estimate the mean of the the standard normal distribution N(0, 1), so that our target density is f(x) = 1 e x2 2. 2π (a) Use double-exponential distribution with density g(x) = 1 2 e x and importance sampling to estimate the mean of the standard normal distribution. Compare Monte Carlo errors of importance sampling and naive Monte Carlo. (b) Implement the Metropolis-Hastings algorithm from the notes to approximate the mean of the standard normal distribution. Adjust the tuning parameter δ so that your acceptance probability is between 0. and 0.4. 2. Consider a toric Ising model with state-space Ω = {x = (x 1,...,x k ) : x i = ±1} and π(x) = 1 P Z eβ k i=1 x ix i+1, where x k+1 is understood to be equal to x 1. Set k = 50 and β = 0.9. Implement the Metropolis-Hastings sampler discussed in class to approximate E[M(x)] and Var[M(x)], where M(x) = k i=1 x i is the total magnetization. In each algorithm, start from a random state x = (x 1,...,x k ), obtained by flipping k independent fair coins and assigning values 1 or 1 to each component of x. Run your MCMC chains for N iterations. During the first L < N iterations, do not save sampled states of the system. L is the length of a burn-in period, needed for the Markov chain to achieve stationarity (hopefully).

Due date: December 7 1. For the ABO blood type example Stat 516, Homework 7 (a) Implement the EM algorithm and apply it to the the data n = (n A, n AB, n B, n 0 ) = (6, 4, 55, 5). (b) Nonparametric bootstrap is a Monte Carlo technique for studying sampling properties of statistics (data summaries). Suppose we observe iid data y = (y 1,...,y n ). We would like to study distributional properties of statistic T(y) (e.g. maximum likelihood estimators). Bootstrap prescribes to create synthetic data sets, y rep,1,...,y rep,n by drawing n samples from y with replacement. Distributional properties of T(y) are then obtained via sampling properties of T(y rep,1 ),..., T(y rep,n ). Use nonparametric bootstrap with 1000 synthetic blood type counts to compute the 95% confidence intervals for p A, p B, and p O. Report your estimates and confidence intervals in a tabular form. (c) Implement the Bayesian data augmentation algorithm assuming a priori that (p A, p B, p o ) Dirichlet(1, 1, 1). Use the data from part (a) to approximate the posterior distribution Pr(p A, p B, p O, m AA, m BB n). Report histograms of posterior samples for parameters and missing data and include posterior medians and Bayesian credible intervals in a tabular form. 2. Show that if (x,y) form a hidden Markov model with Pr(x,y) = Pr(x 1 ) n Pr(x t x t 1 ) t=2 then Pr(y t x 1:t,y 1:t 1 ) = Pr(y t x t ) for t = 1,...,n. n Pr(y t x t ), (1) In your derivation, you are allowed to use only the factorization (1) and elementary manipulations of conditional probabilities, marginal probabilities, etc. t=1