Confidence intervals for the mixing time of a reversible Markov chain from a single sample path
|
|
- Iris Fields
- 5 years ago
- Views:
Transcription
1 Confidence intervals for the mixing time of a reversible Markov chain from a single sample path Daniel Hsu Aryeh Kontorovich Csaba Szepesvári Columbia University, Ben-Gurion University, University of Alberta ITA
2 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 2
3 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 There is a unique stationary distribution π with lim L(X t X 1 = x) = π, for all x X. t 2
4 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 There is a unique stationary distribution π with lim L(X t X 1 = x) = π, for all x X. t The mixing time t mix is the earliest time t with sup L(X t X 1 = x) π tv 1/4. x X 2
5 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 There is a unique stationary distribution π with lim L(X t X 1 = x) = π, for all x X. t The mixing time t mix is the earliest time t with Problem: sup L(X t X 1 = x) π tv 1/4. x X Determine (confidently) if t t mix after seeing X 1, X 2,..., X t. 2
6 Problem Irreducible, aperiodic, time-homogeneous Markov chain X 1 X 2 X 3 There is a unique stationary distribution π with lim L(X t X 1 = x) = π, for all x X. t The mixing time t mix is the earliest time t with Problem: sup L(X t X 1 = x) π tv 1/4. x X Given δ (0, 1) and X 1:t, determine non-trivial I t [0, ] with P(t mix I t ) 1 δ. 2
7 Some motivation from machine learning and statistics Chernoff bounds for Markov chains X 1 X 2 : for suitably well-behaved f : X R, with probability at least 1 δ, ( ) 1 t f (X i ) E π f t i=1 Õ tmix log(1/δ). t }{{} deviation bound Bound depends on t mix, which may be unknown a priori. 3
8 Some motivation from machine learning and statistics Chernoff bounds for Markov chains X 1 X 2 : for suitably well-behaved f : X R, with probability at least 1 δ, ( ) 1 t f (X i ) E π f t i=1 Õ tmix log(1/δ). t }{{} deviation bound Bound depends on t mix, which may be unknown a priori. Examples: Bayesian inference Posterior means & variances via MCMC Reinforcement learning Mean action rewards in an MDP Supervised learning Error rates of hypotheses from non-iid data 3
9 Some motivation from machine learning and statistics Chernoff bounds for Markov chains X 1 X 2 : for suitably well-behaved f : X R, with probability at least 1 δ, ( ) 1 t f (X i ) E π f t i=1 Õ tmix log(1/δ). t }{{} deviation bound Bound depends on t mix, which may be unknown a priori. Examples: Bayesian inference Posterior means & variances via MCMC Reinforcement learning Mean action rewards in an MDP Supervised learning Error rates of hypotheses from non-iid data Need observable deviation bounds. 3
10 Observable deviation bounds from mixing time bounds? Suppose an estimator ˆt mix = ˆt mix (X 1:t ) of t mix satisfies: P(t mix ˆt mix + ε t ) 1 δ. 4
11 Observable deviation bounds from mixing time bounds? Suppose an estimator ˆt mix = ˆt mix (X 1:t ) of t mix satisfies: P(t mix ˆt mix + ε t ) 1 δ. Then with probability at least 1 2δ, ( ) 1 t f (X i ) E π f (ˆt t Õ mix + ε t ) log(1/δ). t i=1 4
12 Observable deviation bounds from mixing time bounds? Suppose an estimator ˆt mix = ˆt mix (X 1:t ) of t mix satisfies: P(t mix ˆt mix + ε t ) 1 δ. Then with probability at least 1 2δ, ( ) 1 t f (X i ) E π f (ˆt t Õ mix + ε t ) log(1/δ). t i=1 But ˆt mix is computed from X 1:t, so ε t may also depend on t mix. 4
13 Observable deviation bounds from mixing time bounds? Suppose an estimator ˆt mix = ˆt mix (X 1:t ) of t mix satisfies: P(t mix ˆt mix + ε t ) 1 δ. Then with probability at least 1 2δ, ( ) 1 t f (X i ) E π f (ˆt t Õ mix + ε t ) log(1/δ). t i=1 But ˆt mix is computed from X 1:t, so ε t may also depend on t mix. Deviation bounds for point estimators are insufficient. Need (observable) confidence intervals for t mix. 4
14 What we do 5
15 What we do 1. Shift focus to relaxation time t relax to enable spectral methods. 5
16 What we do 1. Shift focus to relaxation time t relax to enable spectral methods. 2. Lower/upper bounds on sample path length for point estimation of t relax. 5
17 What we do 1. Shift focus to relaxation time t relax to enable spectral methods. 2. Lower/upper bounds on sample path length for point estimation of t relax. 3. New algorithm for constructing confidence intervals for t relax. 5
18 Relaxation time Let P be the transition operator of the Markov chain, and let λ be its second-largest eigenvalue modulus (i.e., largest eigenvalue modulus other than 1). 6
19 Relaxation time Let P be the transition operator of the Markov chain, and let λ be its second-largest eigenvalue modulus (i.e., largest eigenvalue modulus other than 1). Spectral gap: γ := 1 λ. Relaxation time: t relax := 1/γ. for π := min x X π(x). (t relax 1) ln 2 t mix t relax ln 4 π 6
20 Relaxation time Let P be the transition operator of the Markov chain, and let λ be its second-largest eigenvalue modulus (i.e., largest eigenvalue modulus other than 1). Spectral gap: γ := 1 λ. Relaxation time: t relax := 1/γ. for π := min x X π(x). (t relax 1) ln 2 t mix t relax ln 4 π Assumptions on P ensure γ, π (0, 1). 6
21 Relaxation time Let P be the transition operator of the Markov chain, and let λ be its second-largest eigenvalue modulus (i.e., largest eigenvalue modulus other than 1). Spectral gap: γ := 1 λ. Relaxation time: t relax := 1/γ. (t relax 1) ln 2 t mix t relax ln 4 π for π := min x X π(x). Assumptions on P ensure γ, π (0, 1). Spectral approach: construct CI s for γ and π. 6
22 Our results (point estimation) We restrict to reversible Markov chains on finite state spaces. Let d be the (known a priori) cardinality of the state space X. 7
23 Our results (point estimation) We restrict to reversible Markov chains on finite state spaces. Let d be the (known a priori) cardinality of the state space X. 1. Lower bound: To estimate γ within a constant multiplicative factor, every algorithm needs (w.p. 1/4) sample path of length ( d log d Ω + 1 ). γ π 7
24 Our results (point estimation) We restrict to reversible Markov chains on finite state spaces. Let d be the (known a priori) cardinality of the state space X. 1. Lower bound: To estimate γ within a constant multiplicative factor, every algorithm needs (w.p. 1/4) sample path of length ( d log d Ω + 1 ). γ π 2. Upper bound: Simple algorithm estimates γ and π within a constant multiplicative factor (w.h.p.) with sample path of length ( ) ( ) log d log d Õ π γ 3 (for γ ), Õ (for π ). π γ 7
25 Our results (point estimation) We restrict to reversible Markov chains on finite state spaces. Let d be the (known a priori) cardinality of the state space X. 1. Lower bound: To estimate γ within a constant multiplicative factor, every algorithm needs (w.p. 1/4) sample path of length ( d log d Ω + 1 ). γ π 2. Upper bound: Simple algorithm estimates γ and π within a constant multiplicative factor (w.h.p.) with sample path of length ( ) ( ) log d log d Õ π γ 3 (for γ ), Õ (for π ). π γ But point estimator confidence interval. 7
26 Our results (confidence intervals) 3. New algorithm: Given δ (0, 1) and X 1:t as input, constructs intervals It γ and It π such that P ( ) ( ) γ It γ 1 δ and P π It π 1 δ. Widths of intervals converge a.s. to zero at log log t t rate. 8
27 Our results (confidence intervals) 3. New algorithm: Given δ (0, 1) and X 1:t as input, constructs intervals It γ and It π such that P ( ) ( ) γ It γ 1 δ and P π It π 1 δ. Widths of intervals converge a.s. to zero at log log t t rate. 4. Hybrid approach: Use new algorithm to turn error bounds for point estimators into observable CI s. (This improves asymptotic rate for π interval.) 8
28 Plug-in estimator 9
29 Plug-in estimator Reversibility grants the symmetry of M := diag(π)p = { P X1 π(x 1 = x, X 2 = x ) } x,x X (doublet state probabilities in stationary chain). 9
30 Plug-in estimator Reversibility grants the symmetry of M := diag(π)p = { P X1 π(x 1 = x, X 2 = x ) } x,x X (doublet state probabilities in stationary chain). Moreover, eigenvalues of L := diag(π) 1/2 M diag(π) 1/2 are real, and satisfy 1 = λ 1 > λ 2 λ d > 1, γ = 1 max{λ 2, λ d }. 9
31 Plug-in estimator Reversibility grants the symmetry of M := diag(π)p = { P X1 π(x 1 = x, X 2 = x ) } x,x X (doublet state probabilities in stationary chain). Moreover, eigenvalues of L := diag(π) 1/2 M diag(π) 1/2 are real, and satisfy 1 = λ 1 > λ 2 λ d > 1, γ = 1 max{λ 2, λ d }. Plug-in estimator: estimate π and M from X 1:t (using empirical frequencies), then plug-in to formula for γ. 9
32 Chicken-and-egg problem (Matrix) Chernoff bound (for Markov chains) gives error bounds for estimates of π and M (and ultimately of L and γ? ): e.g., w.h.p., s log(d) log(t/π? ) γ? γ? klb Lk O. γ? π? t 10
33 Chicken-and-egg problem (Matrix) Chernoff bound (for Markov chains) gives error bounds for estimates of π and M (and ultimately of L and γ? ): e.g., w.h.p., s log(d) log(t/π? ) γ? γ? klb Lk O. γ? π? t This has inverse dependence on γ?. 10
34 Chicken-and-egg problem (Matrix) Chernoff bound (for Markov chains) gives error bounds for estimates of π and M (and ultimately of L and γ? ): e.g., w.h.p., s log(d) log(t/π? ) γ? γ? klb Lk O. γ? π? t This has inverse dependence on γ?. Can t solve the bound for γ? (unlike empirical Bernstein inequalities). 10
35 Direct estimation of P Alternative: directly estimate P from X 1:t. 11
36 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. 11
37 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. Two problems: 11
38 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. Two problems: 1. Without appealing to symmetry structure, can argue P P ε = ˆγ γ O(ε 1/(2d) ), but this implies exponential slow-down in rate. 11
39 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. Two problems: 1. Without appealing to symmetry structure, can argue P P ε = ˆγ γ O(ε 1/(2d) ), but this implies exponential slow-down in rate. 2. Direct appeal to symmetry structure of L = diag(π) 1/2 P diag(π) 1/2 gives bounds that depend on π, which is unknown. 11
40 Direct estimation of P Alternative: directly estimate P from X 1:t. Key advantage: observable confidence intervals for P via empirical Bernstein inequality for martingales. Two problems: 1. Without appealing to symmetry structure, can argue P P ε = ˆγ γ O(ε 1/(2d) ), but this implies exponential slow-down in rate. 2. Direct appeal to symmetry structure of L = diag(π) 1/2 P diag(π) 1/2 gives bounds that depend on π, which is unknown. Our approach: Directly estimate P, and indirectly estimate π via P. 11
41 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 12
42 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 2. Key step: estimate π via P via group inverse Â# of I P. 12
43 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 2. Key step: estimate π via P via group inverse Â# of I P. Â # contains virtually everything that one would want to know about the chain [with transition operator P] (Meyer, 1975). 12
44 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 2. Key step: estimate π via P via group inverse Â# of I P. Â # contains virtually everything that one would want to know about the chain [with transition operator P] (Meyer, 1975). Reveals unique stationary distribution ˆπ w.r.t. P. This is our indirect estimate of π. 12
45 Indirect estimation of π 1. We ensure that P is transition operator for an ergodic chain (easy via Laplace smoothing). 2. Key step: estimate π via P via group inverse Â# of I P. Â # contains virtually everything that one would want to know about the chain [with transition operator P] (Meyer, 1975). Reveals unique stationary distribution ˆπ w.r.t. P. This is our indirect estimate of π. Tells us how to bound ˆπ π in terms of P P. Hence, from this, we construct a confidence interval for π. 12
46 Overall algorithm (outline) 1. Form empirical estimate and confidence intervals for P (exploit Markov property & empirical Bernstein -type bounds). 2. Form estimate and confidence intervals for π (via group inverse of I P). 3. Form estimate and confidence interval for γ (via confidence intervals for π and P, & eigenvalue perturbation theory). 13
47 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. 14
48 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. 14
49 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. Problem #1: close gap between lower and upper bounds on sample path length (for point estimation). 14
50 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. Problem #1: close gap between lower and upper bounds on sample path length (for point estimation). Problem #2: overcome computational bottlenecks from matrix operations. 14
51 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. Problem #1: close gap between lower and upper bounds on sample path length (for point estimation). Problem #2: overcome computational bottlenecks from matrix operations. Problem #3: handle large/continuous state spaces under suitable assumptions. 14
52 Recap and future work We resolve chicken-and-egg problem of observable confidence intervals for mixing time from a single sample path. Strongly exploit Markov property and ergodicity in confidence intervals for P and π. Problem #1: close gap between lower and upper bounds on sample path length (for point estimation). Problem #2: overcome computational bottlenecks from matrix operations. Problem #3: handle large/continuous state spaces under suitable assumptions. Thanks! 14
Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path
Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path Daniel Hsu Columbia University djhsu@cscolumbiaedu Aryeh Kontorovich Ben-Gurion University karyeh@csbguacil November, 05 Csaba
More informationA note on adiabatic theorem for Markov chains and adiabatic quantum computation. Yevgeniy Kovchegov Oregon State University
A note on adiabatic theorem for Markov chains and adiabatic quantum computation Yevgeniy Kovchegov Oregon State University Introduction Max Born and Vladimir Fock in 1928: a physical system remains in
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationMarkov Chains and MCMC
Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationLecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016
Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationThe cutoff phenomenon for random walk on random directed graphs
The cutoff phenomenon for random walk on random directed graphs Justin Salez Joint work with C. Bordenave and P. Caputo Outline of the talk Outline of the talk 1. The cutoff phenomenon for Markov chains
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationCsaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008
LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,
More informationApproximate Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.
More informationReinforcement Learning
Reinforcement Learning Lecture 3: RL problems, sample complexity and regret Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce the
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationEdge Flip Chain for Unbiased Dyadic Tilings
Sarah Cannon, Georgia Tech,, David A. Levin, U. Oregon Alexandre Stauffer, U. Bath, March 30, 2018 Dyadic Tilings A dyadic tiling is a tiling on [0,1] 2 by n = 2 k dyadic rectangles, rectangles which are
More informationStochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS
Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review
More informationExtended dynamic programming: technical details
A Extended dynamic programming: technical details The extended dynamic programming algorithm is given by Algorithm 2. Algorithm 2 Extended dynamic programming for finding an optimistic policy transition
More informationGradient Estimation for Attractor Networks
Gradient Estimation for Attractor Networks Thomas Flynn Department of Computer Science Graduate Center of CUNY July 2017 1 Outline Motivations Deterministic attractor networks Stochastic attractor networks
More informationLecture 14: October 22
CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 14: October 22 Lecturer: Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected to the
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationDISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition
DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition R. G. Gallager January 31, 2011 i ii Preface These notes are a draft of a major rewrite of a text [9] of the same name. The notes and the text are outgrowths
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationRecap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks
Recap Probability, stochastic processes, Markov chains ELEC-C7210 Modeling and analysis of communication networks 1 Recap: Probability theory important distributions Discrete distributions Geometric distribution
More informationFinite-Sample Analysis in Reinforcement Learning
Finite-Sample Analysis in Reinforcement Learning Mohammad Ghavamzadeh INRIA Lille Nord Europe, Team SequeL Outline 1 Introduction to RL and DP 2 Approximate Dynamic Programming (AVI & API) 3 How does Statistical
More informationDetailed Proofs of Lemmas, Theorems, and Corollaries
Dahua Lin CSAIL, MIT John Fisher CSAIL, MIT A List of Lemmas, Theorems, and Corollaries For being self-contained, we list here all the lemmas, theorems, and corollaries in the main paper. Lemma. The joint
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationPositive and null recurrent-branching Process
December 15, 2011 In last discussion we studied the transience and recurrence of Markov chains There are 2 other closely related issues about Markov chains that we address Is there an invariant distribution?
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More information215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that
15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that
More informationAn Analytic Solution to Discrete Bayesian Reinforcement Learning
An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationThe Art of Sequential Optimization via Simulations
The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationReinforcement Learning
Reinforcement Learning Lecture 6: RL algorithms 2.0 Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Present and analyse two online algorithms
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationSpectra of Large Random Stochastic Matrices & Relaxation in Complex Systems
Spectra of Large Random Stochastic Matrices & Relaxation in Complex Systems Reimer Kühn Disordered Systems Group Department of Mathematics, King s College London Random Graphs and Random Processes, KCL
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationStochastic Primal-Dual Methods for Reinforcement Learning
Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationApproximate Counting and Markov Chain Monte Carlo
Approximate Counting and Markov Chain Monte Carlo A Randomized Approach Arindam Pal Department of Computer Science and Engineering Indian Institute of Technology Delhi March 18, 2011 April 8, 2011 Arindam
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationEssentials on the Analysis of Randomized Algorithms
Essentials on the Analysis of Randomized Algorithms Dimitris Diochnos Feb 0, 2009 Abstract These notes were written with Monte Carlo algorithms primarily in mind. Topics covered are basic (discrete) random
More informationA note on adiabatic theorem for Markov chains
Yevgeniy Kovchegov Abstract We state and prove a version of an adiabatic theorem for Markov chains using well known facts about mixing times. We extend the result to the case of continuous time Markov
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationBen Pence Scott Moura. December 12, 2009
Ben Pence Scott Moura EECS 558 Project December 12, 2009 SLIDE 1of 24 Stochastic system with unknown parameters Want to identify the parameters Adapt the control based on the parameter estimate Identifiability
More informationMarkov Chains, Random Walks on Graphs, and the Laplacian
Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer
More informationConvex Optimization of Graph Laplacian Eigenvalues
Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Stanford University (Joint work with Persi Diaconis, Arpita Ghosh, Seung-Jean Kim, Sanjay Lall, Pablo Parrilo, Amin Saberi, Jun Sun, Lin
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationOberwolfach workshop on Combinatorics and Probability
Apr 2009 Oberwolfach workshop on Combinatorics and Probability 1 Describes a sharp transition in the convergence of finite ergodic Markov chains to stationarity. Steady convergence it takes a while to
More informationLecture 5: Random Walks and Markov Chain
Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables
More information25.1 Markov Chain Monte Carlo (MCMC)
CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,
More informationMixing time for a random walk on a ring
Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013 Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationSubsampling, Concentration and Multi-armed bandits
Subsampling, Concentration and Multi-armed bandits Odalric-Ambrym Maillard, R. Bardenet, S. Mannor, A. Baransi, N. Galichet, J. Pineau, A. Durand Toulouse, November 09, 2015 O-A. Maillard Subsampling and
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationStochastic process. X, a series of random variables indexed by t
Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,
More informationMixing times via super-fast coupling
s via super-fast coupling Yevgeniy Kovchegov Department of Mathematics Oregon State University Outline 1 About 2 Definition Coupling Result. About Outline 1 About 2 Definition Coupling Result. About Coauthor
More informationTheory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk
Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationOnline Learning Schemes for Power Allocation in Energy Harvesting Communications
Online Learning Schemes for Power Allocation in Energy Harvesting Communications Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationLecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.
Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationIntroduction to Reinforcement Learning Part 1: Markov Decision Processes
Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for
More informationDisjointness and Additivity
Midterm 2: Format Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationLect4: Exact Sampling Techniques and MCMC Convergence Analysis
Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationXVI International Congress on Mathematical Physics
Aug 2009 XVI International Congress on Mathematical Physics Underlying geometry: finite graph G=(V,E ). Set of possible configurations: V (assignments of +/- spins to the vertices). Probability of a configuration
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationCSE 473: Artificial Intelligence Autumn Topics
CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics
More informationOn the Complexity of Best Arm Identification in Multi-Armed Bandit Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March
More informationLEARNING DYNAMIC SYSTEMS: MARKOV MODELS
LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationTWO PROBLEMS IN NETWORK PROBING
TWO PROBLEMS IN NETWORK PROBING DARRYL VEITCH The University of Melbourne 1 Temporal Loss and Delay Tomography 2 Optimal Probing in Convex Networks Paris Networking 27 Juin 2007 TEMPORAL LOSS AND DELAY
More information