Confidence intervals for the mixing time of a reversible Markov chain from a single sample path

Size: px
Start display at page:

Download "Confidence intervals for the mixing time of a reversible Markov chain from a single sample path"

Transcription

1 Confidence intervals for the mixing time of a reversible Markov chain from a single sample path Daniel Hsu Aryeh Kontorovich Csaba Szepesvári Columbia University, Ben-Gurion University, University of Alberta ITA

2 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 2

3 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 There is a unique stationary distribution π with lim L(X t X 1 = x) = π, for all x X. t 2

4 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 There is a unique stationary distribution π with lim L(X t X 1 = x) = π, for all x X. t The mixing time t mix is the earliest time t with sup L(X t X 1 = x) π tv 1/4. x X 2

5 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 There is a unique stationary distribution π with lim L(X t X 1 = x) = π, for all x X. t The mixing time t mix is the earliest time t with Problem: sup L(X t X 1 = x) π tv 1/4. x X Determine (confidently) if t t mix after seeing X 1, X 2,..., X t. 2

6 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 There is a unique stationary distribution π with lim L(X t X 1 = x) = π, for all x X. t The mixing time t mix is the earliest time t with Problem: sup L(X t X 1 = x) π tv 1/4. x X Given δ (0, 1) and X 1:t, determine non-trivial I t [0, ] with P(t mix I t ) 1 δ. 2

7 Some motivation from machine learning and statistics Chernoff bounds for Markov chains X 1 X 2 : for suitably well-behaved f : X R, with probability at least 1 δ, ( ) 1 t f (X i ) E π f t i=1 Õ tmix log(1/δ). t }{{} deviation bound Bound depends on t mix, which may be unknown a priori. 3

8 Some motivation from machine learning and statistics Chernoff bounds for Markov chains X 1 X 2 : for suitably well-behaved f : X R, with probability at least 1 δ, ( ) 1 t f (X i ) E π f t i=1 Õ tmix log(1/δ). t }{{} deviation bound Bound depends on t mix, which may be unknown a priori. Examples: Bayesian inference Posterior means & variances via MCMC Reinforcement learning Mean action rewards in an MDP Supervised learning Error rates of hypotheses from non-iid data 3

9 Some motivation from machine learning and statistics Chernoff bounds for Markov chains X 1 X 2 : for suitably well-behaved f : X R, with probability at least 1 δ, ( ) 1 t f (X i ) E π f t i=1 Õ tmix log(1/δ). t }{{} deviation bound Bound depends on t mix, which may be unknown a priori. Examples: Bayesian inference Posterior means & variances via MCMC Reinforcement learning Mean action rewards in an MDP Supervised learning Error rates of hypotheses from non-iid data Need observable deviation bounds. 3

10 Observable deviation bounds from mixing time bounds? Suppose an estimator ˆt mix = ˆt mix (X 1:t ) of t mix satisfies: P(t mix ˆt mix + ε t ) 1 δ. 4

11 Observable deviation bounds from mixing time bounds? Suppose an estimator ˆt mix = ˆt mix (X 1:t ) of t mix satisfies: P(t mix ˆt mix + ε t ) 1 δ. Then with probability at least 1 2δ, ( ) 1 t f (X i ) E π f (ˆt t Õ mix + ε t ) log(1/δ). t i=1 4

12 Observable deviation bounds from mixing time bounds? Suppose an estimator ˆt mix = ˆt mix (X 1:t ) of t mix satisfies: P(t mix ˆt mix + ε t ) 1 δ. Then with probability at least 1 2δ, ( ) 1 t f (X i ) E π f (ˆt t Õ mix + ε t ) log(1/δ). t i=1 But ˆt mix is computed from X 1:t, so ε t may also depend on t mix. 4

13 Observable deviation bounds from mixing time bounds? Suppose an estimator ˆt mix = ˆt mix (X 1:t ) of t mix satisfies: P(t mix ˆt mix + ε t ) 1 δ. Then with probability at least 1 2δ, ( ) 1 t f (X i ) E π f (ˆt t Õ mix + ε t ) log(1/δ). t i=1 But ˆt mix is computed from X 1:t, so ε t may also depend on t mix. Deviation bounds for point estimators are insufficient. Need (observable) confidence intervals for t mix. 4

14 What we do 5

15 What we do 1. Shift focus to relaxation time t relax to enable spectral methods. 5

16 What we do 1. Shift focus to relaxation time t relax to enable spectral methods. 2. Lower/upper bounds on sample path length for point estimation of t relax. 5

17 What we do 1. Shift focus to relaxation time t relax to enable spectral methods. 2. Lower/upper bounds on sample path length for point estimation of t relax. 3. New algorithm for constructing confidence intervals for t relax. 5

18 Relaxation time Let P be the transition operator of the Markov chain, and let λ be its second-largest eigenvalue modulus (i.e., largest eigenvalue modulus other than 1). 6

19 Relaxation time Let P be the transition operator of the Markov chain, and let λ be its second-largest eigenvalue modulus (i.e., largest eigenvalue modulus other than 1). Spectral gap: γ := 1 λ. Relaxation time: t relax := 1/γ. for π := min x X π(x). (t relax 1) ln 2 t mix t relax ln 4 π 6

20 Relaxation time Let P be the transition operator of the Markov chain, and let λ be its second-largest eigenvalue modulus (i.e., largest eigenvalue modulus other than 1). Spectral gap: γ := 1 λ. Relaxation time: t relax := 1/γ. for π := min x X π(x). (t relax 1) ln 2 t mix t relax ln 4 π Assumptions on P ensure γ, π (0, 1). 6

21 Relaxation time Let P be the transition operator of the Markov chain, and let λ be its second-largest eigenvalue modulus (i.e., largest eigenvalue modulus other than 1). Spectral gap: γ := 1 λ. Relaxation time: t relax := 1/γ. (t relax 1) ln 2 t mix t relax ln 4 π for π := min x X π(x). Assumptions on P ensure γ, π (0, 1). Spectral approach: construct CI s for γ and π. 6

22 Our results (point estimation) We restrict to reversible Markov chains on finite state spaces. Let d be the (known a priori) cardinality of the state space X. 7

23 Our results (point estimation) We restrict to reversible Markov chains on finite state spaces. Let d be the (known a priori) cardinality of the state space X. 1. Lower bound: To estimate γ within a constant multiplicative factor, every algorithm needs (w.p. 1/4) sample path of length ( d log d Ω + 1 ). γ π 7

24 Our results (point estimation) We restrict to reversible Markov chains on finite state spaces. Let d be the (known a priori) cardinality of the state space X. 1. Lower bound: To estimate γ within a constant multiplicative factor, every algorithm needs (w.p. 1/4) sample path of length ( d log d Ω + 1 ). γ π 2. Upper bound: Simple algorithm estimates γ and π within a constant multiplicative factor (w.h.p.) with sample path of length ( ) ( ) log d log d Õ π γ 3 (for γ ), Õ (for π ). π γ 7

25 Our results (point estimation) We restrict to reversible Markov chains on finite state spaces. Let d be the (known a priori) cardinality of the state space X. 1. Lower bound: To estimate γ within a constant multiplicative factor, every algorithm needs (w.p. 1/4) sample path of length ( d log d Ω + 1 ). γ π 2. Upper bound: Simple algorithm estimates γ and π within a constant multiplicative factor (w.h.p.) with sample path of length ( ) ( ) log d log d Õ π γ 3 (for γ ), Õ (for π ). π γ But point estimator confidence interval. 7

26 Our results (confidence intervals) 3. New algorithm: Given δ (0, 1) and X 1:t as input, constructs intervals It γ and It π such that P ( ) ( ) γ It γ 1 δ and P π It π 1 δ. Widths of intervals converge a.s. to zero at log log t t rate. 8

27 Our results (confidence intervals) 3. New algorithm: Given δ (0, 1) and X 1:t as input, constructs intervals It γ and It π such that P ( ) ( ) γ It γ 1 δ and P π It π 1 δ. Widths of intervals converge a.s. to zero at log log t t rate. 4. Hybrid approach: Use new algorithm to turn error bounds for point estimators into observable CI s. (This improves asymptotic rate for π interval.) 8

28 Plug-in estimator 9

29 Plug-in estimator Reversibility grants the symmetry of M := diag(π)p = { P X1 π(x 1 = x, X 2 = x ) } x,x X (doublet state probabilities in stationary chain). 9

30 Plug-in estimator Reversibility grants the symmetry of M := diag(π)p = { P X1 π(x 1 = x, X 2 = x ) } x,x X (doublet state probabilities in stationary chain). Moreover, eigenvalues of L := diag(π) 1/2 M diag(π) 1/2 are real, and satisfy 1 = λ 1 > λ 2 λ d > 1, γ = 1 max{λ 2, λ d }. 9

31 Plug-in estimator Reversibility grants the symmetry of M := diag(π)p = { P X1 π(x 1 = x, X 2 = x ) } x,x X (doublet state probabilities in stationary chain). Moreover, eigenvalues of L := diag(π) 1/2 M diag(π) 1/2 are real, and satisfy 1 = λ 1 > λ 2 λ d > 1, γ = 1 max{λ 2, λ d }. Plug-in estimator: estimate π and M from X 1:t (using empirical frequencies), then plug-in to formula for γ. 9

32 Chicken-and-egg problem (Matrix) Chernoff bound (for Markov chains) gives error bounds for estimates of π and M (and ultimately of L and γ? ): e.g., w.h.p., s log(d) log(t/π? ) γ? γ? klb Lk O. γ? π? t 10

33 Chicken-and-egg problem (Matrix) Chernoff bound (for Markov chains) gives error bounds for estimates of π and M (and ultimately of L and γ? ): e.g., w.h.p., s log(d) log(t/π? ) γ? γ? klb Lk O. γ? π? t This has inverse dependence on γ?. 10

34 Chicken-and-egg problem (Matrix) Chernoff bound (for Markov chains) gives error bounds for estimates of π and M (and ultimately of L and γ? ): e.g., w.h.p., s log(d) log(t/π? ) γ? γ? klb Lk O. γ? π? t This has inverse dependence on γ?. Can t solve the bound for γ? (unlike empirical Bernstein inequalities). 10

35 Direct estimation of P Alternative: directly estimate P from X 1:t. 11

36 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. 11

37 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. Two problems: 11

38 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. Two problems: 1. Without appealing to symmetry structure, can argue P P ε = ˆγ γ O(ε 1/(2d) ), but this implies exponential slow-down in rate. 11

39 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. Two problems: 1. Without appealing to symmetry structure, can argue P P ε = ˆγ γ O(ε 1/(2d) ), but this implies exponential slow-down in rate. 2. Direct appeal to symmetry structure of L = diag(π) 1/2 P diag(π) 1/2 gives bounds that depend on π, which is unknown. 11

40 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. Two problems: 1. Without appealing to symmetry structure, can argue P P ε = ˆγ γ O(ε 1/(2d) ), but this implies exponential slow-down in rate. 2. Direct appeal to symmetry structure of L = diag(π) 1/2 P diag(π) 1/2 gives bounds that depend on π, which is unknown. Our approach: Directly estimate P, and indirectly estimate π via P. 11

41 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 12

42 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 2. Key step: estimate π via P via group inverse Â# of I P. 12

43 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 2. Key step: estimate π via P via group inverse Â# of I P. Â # contains virtually everything that one would want to know about the chain [with transition operator P] (Meyer, 1975). 12

44 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 2. Key step: estimate π via P via group inverse Â# of I P. Â # contains virtually everything that one would want to know about the chain [with transition operator P] (Meyer, 1975). Reveals unique stationary distribution ˆπ w.r.t. P. This is our indirect estimate of π. 12

45 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 2. Key step: estimate π via P via group inverse Â# of I P. Â # contains virtually everything that one would want to know about the chain [with transition operator P] (Meyer, 1975). Reveals unique stationary distribution ˆπ w.r.t. P. This is our indirect estimate of π. Tells us how to bound ˆπ π in terms of P P. Hence, from this, we construct a confidence interval for π. 12

46 Overall algorithm (outline) 1. Form empirical estimate and confidence intervals for P (exploit Markov property & empirical Bernstein -type bounds). 2. Form estimate and confidence intervals for π (via group inverse of I P). 3. Form estimate and confidence interval for γ (via confidence intervals for π and P, & eigenvalue perturbation theory). 13

47 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. 14

48 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. 14

49 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. Problem #1: close gap between lower and upper bounds on sample path length (for point estimation). 14

50 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. Problem #1: close gap between lower and upper bounds on sample path length (for point estimation). Problem #2: overcome computational bottlenecks from matrix operations. 14

51 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. Problem #1: close gap between lower and upper bounds on sample path length (for point estimation). Problem #2: overcome computational bottlenecks from matrix operations. Problem #3: handle large/continuous state spaces under suitable assumptions. 14

52 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. Problem #1: close gap between lower and upper bounds on sample path length (for point estimation). Problem #2: overcome computational bottlenecks from matrix operations. Problem #3: handle large/continuous state spaces under suitable assumptions. Thanks! 14

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path Daniel Hsu Columbia University djhsu@cscolumbiaedu Aryeh Kontorovich Ben-Gurion University karyeh@csbguacil November, 05 Csaba

More information

A note on adiabatic theorem for Markov chains and adiabatic quantum computation. Yevgeniy Kovchegov Oregon State University

A note on adiabatic theorem for Markov chains and adiabatic quantum computation. Yevgeniy Kovchegov Oregon State University A note on adiabatic theorem for Markov chains and adiabatic quantum computation Yevgeniy Kovchegov Oregon State University Introduction Max Born and Vladimir Fock in 1928: a physical system remains in

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

The cutoff phenomenon for random walk on random directed graphs

The cutoff phenomenon for random walk on random directed graphs The cutoff phenomenon for random walk on random directed graphs Justin Salez Joint work with C. Bordenave and P. Caputo Outline of the talk Outline of the talk 1. The cutoff phenomenon for Markov chains

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008 LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 3: RL problems, sample complexity and regret Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce the

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Edge Flip Chain for Unbiased Dyadic Tilings

Edge Flip Chain for Unbiased Dyadic Tilings Sarah Cannon, Georgia Tech,, David A. Levin, U. Oregon Alexandre Stauffer, U. Bath, March 30, 2018 Dyadic Tilings A dyadic tiling is a tiling on [0,1] 2 by n = 2 k dyadic rectangles, rectangles which are

More information

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review

More information

Extended dynamic programming: technical details

Extended dynamic programming: technical details A Extended dynamic programming: technical details The extended dynamic programming algorithm is given by Algorithm 2. Algorithm 2 Extended dynamic programming for finding an optimistic policy transition

More information

Gradient Estimation for Attractor Networks

Gradient Estimation for Attractor Networks Gradient Estimation for Attractor Networks Thomas Flynn Department of Computer Science Graduate Center of CUNY July 2017 1 Outline Motivations Deterministic attractor networks Stochastic attractor networks

More information

Lecture 14: October 22

Lecture 14: October 22 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 14: October 22 Lecturer: Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected to the

More information

Bandit View on Continuous Stochastic Optimization

Bandit View on Continuous Stochastic Optimization Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta

More information

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition R. G. Gallager January 31, 2011 i ii Preface These notes are a draft of a major rewrite of a text [9] of the same name. The notes and the text are outgrowths

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks Recap Probability, stochastic processes, Markov chains ELEC-C7210 Modeling and analysis of communication networks 1 Recap: Probability theory important distributions Discrete distributions Geometric distribution

More information

Finite-Sample Analysis in Reinforcement Learning

Finite-Sample Analysis in Reinforcement Learning Finite-Sample Analysis in Reinforcement Learning Mohammad Ghavamzadeh INRIA Lille Nord Europe, Team SequeL Outline 1 Introduction to RL and DP 2 Approximate Dynamic Programming (AVI & API) 3 How does Statistical

More information

Detailed Proofs of Lemmas, Theorems, and Corollaries

Detailed Proofs of Lemmas, Theorems, and Corollaries Dahua Lin CSAIL, MIT John Fisher CSAIL, MIT A List of Lemmas, Theorems, and Corollaries For being self-contained, we list here all the lemmas, theorems, and corollaries in the main paper. Lemma. The joint

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Positive and null recurrent-branching Process

Positive and null recurrent-branching Process December 15, 2011 In last discussion we studied the transience and recurrence of Markov chains There are 2 other closely related issues about Markov chains that we address Is there an invariant distribution?

More information

Reducing contextual bandits to supervised learning

Reducing contextual bandits to supervised learning Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing

More information

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that 15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that

More information

An Analytic Solution to Discrete Bayesian Reinforcement Learning

An Analytic Solution to Discrete Bayesian Reinforcement Learning An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

The Art of Sequential Optimization via Simulations

The Art of Sequential Optimization via Simulations The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 6: RL algorithms 2.0 Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Present and analyse two online algorithms

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Spectra of Large Random Stochastic Matrices & Relaxation in Complex Systems

Spectra of Large Random Stochastic Matrices & Relaxation in Complex Systems Spectra of Large Random Stochastic Matrices & Relaxation in Complex Systems Reimer Kühn Disordered Systems Group Department of Mathematics, King s College London Random Graphs and Random Processes, KCL

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Stochastic Primal-Dual Methods for Reinforcement Learning

Stochastic Primal-Dual Methods for Reinforcement Learning Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Approximate Counting and Markov Chain Monte Carlo

Approximate Counting and Markov Chain Monte Carlo Approximate Counting and Markov Chain Monte Carlo A Randomized Approach Arindam Pal Department of Computer Science and Engineering Indian Institute of Technology Delhi March 18, 2011 April 8, 2011 Arindam

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Essentials on the Analysis of Randomized Algorithms

Essentials on the Analysis of Randomized Algorithms Essentials on the Analysis of Randomized Algorithms Dimitris Diochnos Feb 0, 2009 Abstract These notes were written with Monte Carlo algorithms primarily in mind. Topics covered are basic (discrete) random

More information

A note on adiabatic theorem for Markov chains

A note on adiabatic theorem for Markov chains Yevgeniy Kovchegov Abstract We state and prove a version of an adiabatic theorem for Markov chains using well known facts about mixing times. We extend the result to the case of continuous time Markov

More information

Sampling from complex probability distributions

Sampling from complex probability distributions Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Ben Pence Scott Moura. December 12, 2009

Ben Pence Scott Moura. December 12, 2009 Ben Pence Scott Moura EECS 558 Project December 12, 2009 SLIDE 1of 24 Stochastic system with unknown parameters Want to identify the parameters Adapt the control based on the parameter estimate Identifiability

More information

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains, Random Walks on Graphs, and the Laplacian Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer

More information

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Stanford University (Joint work with Persi Diaconis, Arpita Ghosh, Seung-Jean Kim, Sanjay Lall, Pablo Parrilo, Amin Saberi, Jun Sun, Lin

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Oberwolfach workshop on Combinatorics and Probability

Oberwolfach workshop on Combinatorics and Probability Apr 2009 Oberwolfach workshop on Combinatorics and Probability 1 Describes a sharp transition in the convergence of finite ergodic Markov chains to stationarity. Steady convergence it takes a while to

More information

Lecture 5: Random Walks and Markov Chain

Lecture 5: Random Walks and Markov Chain Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables

More information

25.1 Markov Chain Monte Carlo (MCMC)

25.1 Markov Chain Monte Carlo (MCMC) CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,

More information

Mixing time for a random walk on a ring

Mixing time for a random walk on a ring Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013 Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Subsampling, Concentration and Multi-armed bandits

Subsampling, Concentration and Multi-armed bandits Subsampling, Concentration and Multi-armed bandits Odalric-Ambrym Maillard, R. Bardenet, S. Mannor, A. Baransi, N. Galichet, J. Pineau, A. Durand Toulouse, November 09, 2015 O-A. Maillard Subsampling and

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Stochastic process. X, a series of random variables indexed by t

Stochastic process. X, a series of random variables indexed by t Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,

More information

Mixing times via super-fast coupling

Mixing times via super-fast coupling s via super-fast coupling Yevgeniy Kovchegov Department of Mathematics Oregon State University Outline 1 About 2 Definition Coupling Result. About Outline 1 About 2 Definition Coupling Result. About Coauthor

More information

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Online Learning Schemes for Power Allocation in Energy Harvesting Communications

Online Learning Schemes for Power Allocation in Energy Harvesting Communications Online Learning Schemes for Power Allocation in Energy Harvesting Communications Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Introduction to Reinforcement Learning Part 1: Markov Decision Processes

Introduction to Reinforcement Learning Part 1: Markov Decision Processes Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for

More information

Disjointness and Additivity

Disjointness and Additivity Midterm 2: Format Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

XVI International Congress on Mathematical Physics

XVI International Congress on Mathematical Physics Aug 2009 XVI International Congress on Mathematical Physics Underlying geometry: finite graph G=(V,E ). Set of possible configurations: V (assignments of +/- spins to the vertices). Probability of a configuration

More information

Approximate Bayesian Computation and Particle Filters

Approximate Bayesian Computation and Particle Filters Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra

More information

CSE 473: Artificial Intelligence Autumn Topics

CSE 473: Artificial Intelligence Autumn Topics CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics

More information

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March

More information

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

TWO PROBLEMS IN NETWORK PROBING

TWO PROBLEMS IN NETWORK PROBING TWO PROBLEMS IN NETWORK PROBING DARRYL VEITCH The University of Melbourne 1 Temporal Loss and Delay Tomography 2 Optimal Probing in Convex Networks Paris Networking 27 Juin 2007 TEMPORAL LOSS AND DELAY

More information