Consistency of the maximum likelihood estimator for general hidden Markov models

Size: px
Start display at page:

Download "Consistency of the maximum likelihood estimator for general hidden Markov models"

Transcription

1 Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden

2 Collaborators Hidden Markov models This is a joint work with Randal Douc, Télécom SudParis, Eric Moulines, Télécom ParisTech, Ramon van Handel, Princeton University. Ann. Statist. Volume 39, Number 1 (2011),

3 Plan of the talk 1 Hidden Markov models 2 3 4

4 We are here 1 Hidden Markov models 2 3 4

5 Hidden Markov models (HMMs) An HMM comprises an unobservable Markov chain X (X n ) n 0 on (X, X ) with transition kernel Q θ and initial distribution ν, i.e. X 0 ν and X n+1 X n Q θ (X n, ). observations (Y n ) n 0 in (Y, Y) being conditionally independent given X such that Y n X = d. Y n X n G θ (X n, ). We assume that G θ has a transition density g θ, i.e. G θ (x, A) = g θ (x, y) µ(dy), A Y. A Here θ is a parameter vector belonging to some compact metric space Θ.

6 HMMs (cont.) Hidden Markov models Graphical representation: Y n 1 Y n Y n+1 (Observations)... X n 1 X n X n+1... (Markov chain) Y n X n G θ (X n, ) X n+1 X n Q θ (X n, ) X 0 ν

7 Likelihood estimation in HMMs Throughout this talk we fix a distinguished element θ Θ, interpreted as the true parameter value. Having done this, we presume that Q θ possesses a unique invariant distribution π θ, denote by P θ the law of the stationary HMM under θ, and assume that we are given observations Y n 0 = (Y 0,... Y n ) sampled from the distribution P θ. In this setting, we estimate θ using the maximum likelihood estimator (MLE), i.e. the argument θ n maximizing Θ θ p ν θ (Y n 0 ), where p ν θ (yn 0 ) denotes the density of the measure Pν θ (Y n 0 ).

8 Consistency of the MLE Main problem We want to show that the MLE is strongly consistent in the sense that θ n θ, P θ -a.s., as n. For HMMs, this problem has been addressed in a long series of papers, where the most significant quantum leaps are: X Y Contributors finite finite Baum & Petrie (1966) finite general Leroux (1992) compact general Douc & Mathias (2001)

9 We are here 1 Hidden Markov models 2 3 4

10 Hidden Markov models Common thread of previous contributions: (1) Show that for all θ Θ there is a constant H(θ, θ ), the asymptotic contrast, such that, P θ -a.s., lim n n 1 log p ν θ (Y 0 n ) = lim n n 1 Ē θ (log p ν θ (Y 0 n )) = H(θ, θ ). (2) Establish identifiability, i.e. that the relative entropy is minimized only at θ = θ. E(θ, θ ) H(θ, θ ) H(θ, θ ) (3) Prove that θ n converges (a.s.) to the maximizer of H(θ, θ ).

11 Hidden Markov models In the literature, establishing the existence of H(θ, θ ) has (for θ θ ) required very restrictive assumptions that are rarely satisfied in practice (particularly in the non-compact case), such as the following. Assumptions (Global Doeblin) There are constants 0 < ɛ < ɛ + and a probability measure η on (X, X ) such that for all x X and A X, ɛ η(a) Q(x, A) ɛ + η(a). Thus, a general consistency result has hitherto remained lacking.

12 We are here 1 Hidden Markov models 2 3 4

13 Exponential separability Definition (Exponential separability) For each n, Let Q n and P n be probability measures on some measurable space (Z n, Z n ). Then (Q n ) is exponentially separated from (P n ), denoted (Q n ) (P n ), if there exists a sequence (A n ) of sets A n Z n such that lim inf n P n(a n ) > 0, lim sup n 1 log Q n (A n ) < 0. n Using standard properties of the Kullback-Leibler (KL) divergence one may prove the following. Theorem If (Q n ) (P n ), then lim inf n n 1 KL(P n Q n ) > 0.

14 Main result Hidden Markov models Assumptions (1) The Markov kernel Q θ is positive Harris recurrent. (2) For some integer l 1, each Q l θ has a bounded density q θ with respect to some σ-finite measure λ. (3) For every θ θ, ( P ν θ (Y n 0 ) ) ( Pθ (Y n 0 ) ). (4)... Theorem Under the assumptions above, the MLE is strongly consistent.

15 Verifying separability The assumption (3) of exponential separability is nontrivial. However, assume that for all θ θ, (i) Q θ is ergodic and (ii) P Y θ P Y θ. By (ii) there are s < and h : Y s+1 R such that Then for sets Ē θ (h(y s 0 )) = 0 and Ē θ (h(y s 0 )) = 1; A n { y n 0 Y n+1 : } 1 n s h(yi i+s ) > 1 n s 2 it holds, by (i), that P θ (A n ) 1 and P ν θ (A n) 0 as n. i=s

16 A useful deviation inequality Exponential separability follows if we can show that P ν θ (A n) 0 occurs at an exponential rate. We thus provide the following result. Proposition (Azuma-Hoeffding inequality for Markov chains) Assume that Q is V -uniformly ergodic with invariant distribution π. Then there is a constant c > 0 such that for all t > 0 and all bounded functions h : X R, P η ( 1 n ) n h(x i ) π(h) t i=1 c ν(v ) exp ( n ( t 2 c h 2 t h )).

17 Verifying separability Write the quantity of interest as a telescoping of form ( n s n ) n h(yi i+s ) = ξ i,j + E ν i+s θ (h(yi ) X0 i 1, Y0 i 1 ), }{{} i=1 j=0 i=1 i=1 H(X i ) where (ξ i,j ) is a martingale increment sequence for each j, separability is verified by combining the inequality above with the standard Azuma-Hoeffding inequality for martingale increment sequences. Thus, V -uniform ergodicity exponential separability!

18 Some examples Hidden Markov models Our assumptions can be straightforwardly verified for, e.g., very general classes of HMMs with finite state space and nonlinear HMMs with possibly non-compact state space, e.g. the popular stochastic volatility model { X k+1 = αx k + σɛ k+1, Y k = β exp (X k /2) ε k, where (ɛ k ) and (ε k ) are sequences of i.i.d. Gaussian variables and θ = (α, β, σ) are unknown parameters with α < 1 (Taylor, 1982).

19 We are here 1 Hidden Markov models 2 3 4

20 Our approach: Main ideas Instead of proving (1), i.e. for all θ Θ, lim n n 1 log p ν θ (Y 0 n ) = H(θ, θ ), (a.s.) we note that it is enough to establish 1 the limit lim n n 1 log p ν θ (Y 0 n ) = H(θ, θ ), (a.s.) which is easily obtained using the SBM theorem, and 2 the following bound: For all closed sets C Θ with θ / C, lim sup n n 1 sup log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) θ C

21 Our approach: A blocking technique To establish the bound, consider blocks of observations of length l and bound (roughly) the likelihood according to log p ν θ (Y 0 n ) 1 n l l k=0 log p λ θ k+l (Yk ) + o a.s. (n) for some constant c > 0. This implies directly, via Birkhoff s ergodic theorem, ( ) lim sup n 1 log p ν θ (Y 0 n ) Ēθ l 1 log p λ θ (Y 0 l ) (a.s.) n

22 Our approach: Identifiability To complete the proof we use the assumed exponential separability and its connection to the KL divergence: For all θ θ, ( P ν θ (Y0 n ) ) ( Pθ (Y0 n ) ) lim inf Ē θ (n 1 log p θ (Y 0 n) ) n p λ θ (Y 0 n) > 0 ( ) lim sup Ē θ n 1 log p λ θ (Y 0 n ) < H(θ, θ ) n ( ) l θ : Ēθ l 1 log p λ θ (Y l θ 0 ) < H(θ, θ ). Lemma θ

23 Our approach: Identifiability Combining this with the previous gives lim sup n 1 log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) n and, after some more work, using that the parameter space Θ is compact, for all closed sets C Θ with θ / C, lim sup n n 1 sup log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) θ C This completes the proof of consistency.

24 Summary Hidden Markov models We have introduced HMMs and the problem of maximum likelihood-based inference in such models. proved that the MLE is strongly consistent under (what we believe) minimal assumptions. discussed an information-theoretic device, exponential separability, which is efficiently used in our proof to establish identifiability. Shown how exponential separability can be verified using a novel Azuma-Hoeffding-type inequality for V -uniformly ergodic Markov chains a result of independent interest.

Introduction. log p θ (y k y 1:k 1 ), k=1

Introduction. log p θ (y k y 1:k 1 ), k=1 ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE

More information

A Recursive Learning Algorithm for Model Reduction of Hidden Markov Models

A Recursive Learning Algorithm for Model Reduction of Hidden Markov Models A Recursive Learning Algorithm for Model Reduction of Hidden Markov Models Kun Deng, Prashant G. Mehta, Sean P. Meyn, and Mathukumalli Vidyasagar Abstract This paper is concerned with a recursive learning

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Yingfu Xie Research Report Centre of Biostochastics Swedish University of Report 2005:3 Agricultural Sciences ISSN

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Asymptotic Properties of the Maximum Likelihood Estimator in Regime Switching Econometric Models

Asymptotic Properties of the Maximum Likelihood Estimator in Regime Switching Econometric Models CIRJE-F-1049 Asymptotic Properties of the Maximum Likelihood Estimator in Regime Switching Econometric Models Hiroyuki Kasahara University of British Columbia Katsumi Shimotsu The University of Tokyo May

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

A minimalist s exposition of EM

A minimalist s exposition of EM A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting

More information

Strong approximation for additive functionals of geometrically ergodic Markov chains

Strong approximation for additive functionals of geometrically ergodic Markov chains Strong approximation for additive functionals of geometrically ergodic Markov chains Florence Merlevède Joint work with E. Rio Université Paris-Est-Marne-La-Vallée (UPEM) Cincinnati Symposium on Probability

More information

Practical conditions on Markov chains for weak convergence of tail empirical processes

Practical conditions on Markov chains for weak convergence of tail empirical processes Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Inference in Hidden Markov Models. Olivier Cappé, Eric Moulines and Tobias Rydén

Inference in Hidden Markov Models. Olivier Cappé, Eric Moulines and Tobias Rydén Inference in Hidden Markov Models Olivier Cappé, Eric Moulines and Tobias Rydén June 17, 2009 2 Contents 1 Main Definitions and Notations 1 1.1 Markov Chains.............................. 1 1.1.1 Transition

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Bayesian estimation of Hidden Markov Models

Bayesian estimation of Hidden Markov Models Bayesian estimation of Hidden Markov Models Laurent Mevel, Lorenzo Finesso IRISA/INRIA Campus de Beaulieu, 35042 Rennes Cedex, France Institute of Systems Science and Biomedical Engineering LADSEB-CNR

More information

Supplement to Hidden Rust Models

Supplement to Hidden Rust Models Supplement to Hidden Rust Models Benjamin Connault May 2016 1 Appendix for section 3: Identification 1.1 Proof of heorem 1: generic identification structure First, we prove the existence part of heorem

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Selected Exercises on Expectations and Some Probability Inequalities

Selected Exercises on Expectations and Some Probability Inequalities Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Department of Econometrics and Business Statistics

Department of Econometrics and Business Statistics ISSN 440-77X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Nonlinear Regression with Harris Recurrent Markov Chains Degui Li, Dag

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Hidden Markov model likelihoods and their derivatives behave like i.i.d. ones : Details.

Hidden Markov model likelihoods and their derivatives behave like i.i.d. ones : Details. Hidden Markov model likelihoods and their derivatives behave like i.i.d. ones : Details. Peter J. Bickel Ya acov Ritov Tobias Rydén Abstract We consider the log-likelihood function of hidden Markov models,

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Information Theory and Statistics Lecture 3: Stationary ergodic processes Information Theory and Statistics Lecture 3: Stationary ergodic processes Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Measurable space Definition (measurable space) Measurable space

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

The Discretization Filter: A Simple Way to Estimate Nonlinear State Space Models

The Discretization Filter: A Simple Way to Estimate Nonlinear State Space Models The Discretization Filter: A Simple Way to Estimate Nonlinear State Space Models Leland E. Farmer Department of Economics University of Virginia This Version: August 7, 217 First Version: August 214 Link

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

Limit theorems for dependent regularly varying functions of Markov chains

Limit theorems for dependent regularly varying functions of Markov chains Limit theorems for functions of with extremal linear behavior Limit theorems for dependent regularly varying functions of In collaboration with T. Mikosch Olivier Wintenberger wintenberger@ceremade.dauphine.fr

More information

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series C.7 Numerical series Pag. 147 Proof of the converging criteria for series Theorem 5.29 (Comparison test) Let and be positive-term series such that 0, for any k 0. i) If the series converges, then also

More information

1 Continuous-time chains, finite state space

1 Continuous-time chains, finite state space Université Paris Diderot 208 Markov chains Exercises 3 Continuous-time chains, finite state space Exercise Consider a continuous-time taking values in {, 2, 3}, with generator 2 2. 2 2 0. Draw the diagramm

More information

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij Weak convergence and Brownian Motion (telegram style notes) P.J.C. Spreij this version: December 8, 2006 1 The space C[0, ) In this section we summarize some facts concerning the space C[0, ) of real

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Lower Tail Probabilities and Normal Comparison Inequalities. In Memory of Wenbo V. Li s Contributions

Lower Tail Probabilities and Normal Comparison Inequalities. In Memory of Wenbo V. Li s Contributions Lower Tail Probabilities and Normal Comparison Inequalities In Memory of Wenbo V. Li s Contributions Qi-Man Shao The Chinese University of Hong Kong Lower Tail Probabilities and Normal Comparison Inequalities

More information

Adaptive Population Monte Carlo

Adaptive Population Monte Carlo Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

Gradient Estimation for Attractor Networks

Gradient Estimation for Attractor Networks Gradient Estimation for Attractor Networks Thomas Flynn Department of Computer Science Graduate Center of CUNY July 2017 1 Outline Motivations Deterministic attractor networks Stochastic attractor networks

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Masahito Hayashi 1, Marco Tomamichel 2 1 Graduate School of Mathematics, Nagoya University, and Centre for Quantum

More information

Bounds on Convergence of Entropy Rate Approximations in Hidden Markov Processes

Bounds on Convergence of Entropy Rate Approximations in Hidden Markov Processes Bounds on Convergence of Entropy Rate Approximations in Hidden Markov Processes By Nicholas F. Travers B.S. Haverford College 2005 DISSERTATION Submitted in partial satisfaction of the requirements for

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

On Convergence of Recursive Monte Carlo Filters in Non-Compact State Spaces

On Convergence of Recursive Monte Carlo Filters in Non-Compact State Spaces Convergence of Particle Filters 1 On Convergence of Recursive Monte Carlo Filters in Non-Compact State Spaces Jing Lei and Peter Bickel Carnegie Mellon University and University of California, Berkeley

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

VARIATIONAL PRINCIPLE FOR THE ENTROPY

VARIATIONAL PRINCIPLE FOR THE ENTROPY VARIATIONAL PRINCIPLE FOR THE ENTROPY LUCIAN RADU. Metric entropy Let (X, B, µ a measure space and I a countable family of indices. Definition. We say that ξ = {C i : i I} B is a measurable partition if:

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

Rare event simulation for the ruin problem with investments via importance sampling and duality

Rare event simulation for the ruin problem with investments via importance sampling and duality Rare event simulation for the ruin problem with investments via importance sampling and duality Jerey Collamore University of Copenhagen Joint work with Anand Vidyashankar (GMU) and Guoqing Diao (GMU).

More information

Ergodicity in data assimilation methods

Ergodicity in data assimilation methods Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation

More information

Mixing time for a random walk on a ring

Mixing time for a random walk on a ring Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013 Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix

More information

Identifiability and Inference of Hidden Markov Models

Identifiability and Inference of Hidden Markov Models Identifiability and Inference of Hidden Markov Models Yonghong An Texas A& M University Yingyao Hu Johns Hopkins University Matt Shum California Institute of Technology Abstract This paper considers the

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction tm Tatra Mt. Math. Publ. 00 (XXXX), 1 10 A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES Martin Janžura and Lucie Fialová ABSTRACT. A parametric model for statistical analysis of Markov chains type

More information

arxiv: v1 [math.st] 12 Aug 2017

arxiv: v1 [math.st] 12 Aug 2017 Existence of infinite Viterbi path for pairwise Markov models Jüri Lember and Joonas Sova University of Tartu, Liivi 2 50409, Tartu, Estonia. Email: jyril@ut.ee; joonas.sova@ut.ee arxiv:1708.03799v1 [math.st]

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Mi-Hwa Ko. t=1 Z t is true. j=0

Mi-Hwa Ko. t=1 Z t is true. j=0 Commun. Korean Math. Soc. 21 (2006), No. 4, pp. 779 786 FUNCTIONAL CENTRAL LIMIT THEOREMS FOR MULTIVARIATE LINEAR PROCESSES GENERATED BY DEPENDENT RANDOM VECTORS Mi-Hwa Ko Abstract. Let X t be an m-dimensional

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations. Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations. Wenqing Hu. 1 (Joint work with Michael Salins 2, Konstantinos Spiliopoulos 3.) 1. Department of Mathematics

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information