Consistency of the maximum likelihood estimator for general hidden Markov models
|
|
- Abel Robertson
- 5 years ago
- Views:
Transcription
1 Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden
2 Collaborators Hidden Markov models This is a joint work with Randal Douc, Télécom SudParis, Eric Moulines, Télécom ParisTech, Ramon van Handel, Princeton University. Ann. Statist. Volume 39, Number 1 (2011),
3 Plan of the talk 1 Hidden Markov models 2 3 4
4 We are here 1 Hidden Markov models 2 3 4
5 Hidden Markov models (HMMs) An HMM comprises an unobservable Markov chain X (X n ) n 0 on (X, X ) with transition kernel Q θ and initial distribution ν, i.e. X 0 ν and X n+1 X n Q θ (X n, ). observations (Y n ) n 0 in (Y, Y) being conditionally independent given X such that Y n X = d. Y n X n G θ (X n, ). We assume that G θ has a transition density g θ, i.e. G θ (x, A) = g θ (x, y) µ(dy), A Y. A Here θ is a parameter vector belonging to some compact metric space Θ.
6 HMMs (cont.) Hidden Markov models Graphical representation: Y n 1 Y n Y n+1 (Observations)... X n 1 X n X n+1... (Markov chain) Y n X n G θ (X n, ) X n+1 X n Q θ (X n, ) X 0 ν
7 Likelihood estimation in HMMs Throughout this talk we fix a distinguished element θ Θ, interpreted as the true parameter value. Having done this, we presume that Q θ possesses a unique invariant distribution π θ, denote by P θ the law of the stationary HMM under θ, and assume that we are given observations Y n 0 = (Y 0,... Y n ) sampled from the distribution P θ. In this setting, we estimate θ using the maximum likelihood estimator (MLE), i.e. the argument θ n maximizing Θ θ p ν θ (Y n 0 ), where p ν θ (yn 0 ) denotes the density of the measure Pν θ (Y n 0 ).
8 Consistency of the MLE Main problem We want to show that the MLE is strongly consistent in the sense that θ n θ, P θ -a.s., as n. For HMMs, this problem has been addressed in a long series of papers, where the most significant quantum leaps are: X Y Contributors finite finite Baum & Petrie (1966) finite general Leroux (1992) compact general Douc & Mathias (2001)
9 We are here 1 Hidden Markov models 2 3 4
10 Hidden Markov models Common thread of previous contributions: (1) Show that for all θ Θ there is a constant H(θ, θ ), the asymptotic contrast, such that, P θ -a.s., lim n n 1 log p ν θ (Y 0 n ) = lim n n 1 Ē θ (log p ν θ (Y 0 n )) = H(θ, θ ). (2) Establish identifiability, i.e. that the relative entropy is minimized only at θ = θ. E(θ, θ ) H(θ, θ ) H(θ, θ ) (3) Prove that θ n converges (a.s.) to the maximizer of H(θ, θ ).
11 Hidden Markov models In the literature, establishing the existence of H(θ, θ ) has (for θ θ ) required very restrictive assumptions that are rarely satisfied in practice (particularly in the non-compact case), such as the following. Assumptions (Global Doeblin) There are constants 0 < ɛ < ɛ + and a probability measure η on (X, X ) such that for all x X and A X, ɛ η(a) Q(x, A) ɛ + η(a). Thus, a general consistency result has hitherto remained lacking.
12 We are here 1 Hidden Markov models 2 3 4
13 Exponential separability Definition (Exponential separability) For each n, Let Q n and P n be probability measures on some measurable space (Z n, Z n ). Then (Q n ) is exponentially separated from (P n ), denoted (Q n ) (P n ), if there exists a sequence (A n ) of sets A n Z n such that lim inf n P n(a n ) > 0, lim sup n 1 log Q n (A n ) < 0. n Using standard properties of the Kullback-Leibler (KL) divergence one may prove the following. Theorem If (Q n ) (P n ), then lim inf n n 1 KL(P n Q n ) > 0.
14 Main result Hidden Markov models Assumptions (1) The Markov kernel Q θ is positive Harris recurrent. (2) For some integer l 1, each Q l θ has a bounded density q θ with respect to some σ-finite measure λ. (3) For every θ θ, ( P ν θ (Y n 0 ) ) ( Pθ (Y n 0 ) ). (4)... Theorem Under the assumptions above, the MLE is strongly consistent.
15 Verifying separability The assumption (3) of exponential separability is nontrivial. However, assume that for all θ θ, (i) Q θ is ergodic and (ii) P Y θ P Y θ. By (ii) there are s < and h : Y s+1 R such that Then for sets Ē θ (h(y s 0 )) = 0 and Ē θ (h(y s 0 )) = 1; A n { y n 0 Y n+1 : } 1 n s h(yi i+s ) > 1 n s 2 it holds, by (i), that P θ (A n ) 1 and P ν θ (A n) 0 as n. i=s
16 A useful deviation inequality Exponential separability follows if we can show that P ν θ (A n) 0 occurs at an exponential rate. We thus provide the following result. Proposition (Azuma-Hoeffding inequality for Markov chains) Assume that Q is V -uniformly ergodic with invariant distribution π. Then there is a constant c > 0 such that for all t > 0 and all bounded functions h : X R, P η ( 1 n ) n h(x i ) π(h) t i=1 c ν(v ) exp ( n ( t 2 c h 2 t h )).
17 Verifying separability Write the quantity of interest as a telescoping of form ( n s n ) n h(yi i+s ) = ξ i,j + E ν i+s θ (h(yi ) X0 i 1, Y0 i 1 ), }{{} i=1 j=0 i=1 i=1 H(X i ) where (ξ i,j ) is a martingale increment sequence for each j, separability is verified by combining the inequality above with the standard Azuma-Hoeffding inequality for martingale increment sequences. Thus, V -uniform ergodicity exponential separability!
18 Some examples Hidden Markov models Our assumptions can be straightforwardly verified for, e.g., very general classes of HMMs with finite state space and nonlinear HMMs with possibly non-compact state space, e.g. the popular stochastic volatility model { X k+1 = αx k + σɛ k+1, Y k = β exp (X k /2) ε k, where (ɛ k ) and (ε k ) are sequences of i.i.d. Gaussian variables and θ = (α, β, σ) are unknown parameters with α < 1 (Taylor, 1982).
19 We are here 1 Hidden Markov models 2 3 4
20 Our approach: Main ideas Instead of proving (1), i.e. for all θ Θ, lim n n 1 log p ν θ (Y 0 n ) = H(θ, θ ), (a.s.) we note that it is enough to establish 1 the limit lim n n 1 log p ν θ (Y 0 n ) = H(θ, θ ), (a.s.) which is easily obtained using the SBM theorem, and 2 the following bound: For all closed sets C Θ with θ / C, lim sup n n 1 sup log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) θ C
21 Our approach: A blocking technique To establish the bound, consider blocks of observations of length l and bound (roughly) the likelihood according to log p ν θ (Y 0 n ) 1 n l l k=0 log p λ θ k+l (Yk ) + o a.s. (n) for some constant c > 0. This implies directly, via Birkhoff s ergodic theorem, ( ) lim sup n 1 log p ν θ (Y 0 n ) Ēθ l 1 log p λ θ (Y 0 l ) (a.s.) n
22 Our approach: Identifiability To complete the proof we use the assumed exponential separability and its connection to the KL divergence: For all θ θ, ( P ν θ (Y0 n ) ) ( Pθ (Y0 n ) ) lim inf Ē θ (n 1 log p θ (Y 0 n) ) n p λ θ (Y 0 n) > 0 ( ) lim sup Ē θ n 1 log p λ θ (Y 0 n ) < H(θ, θ ) n ( ) l θ : Ēθ l 1 log p λ θ (Y l θ 0 ) < H(θ, θ ). Lemma θ
23 Our approach: Identifiability Combining this with the previous gives lim sup n 1 log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) n and, after some more work, using that the parameter space Θ is compact, for all closed sets C Θ with θ / C, lim sup n n 1 sup log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) θ C This completes the proof of consistency.
24 Summary Hidden Markov models We have introduced HMMs and the problem of maximum likelihood-based inference in such models. proved that the MLE is strongly consistent under (what we believe) minimal assumptions. discussed an information-theoretic device, exponential separability, which is efficiently used in our proof to establish identifiability. Shown how exponential separability can be verified using a novel Azuma-Hoeffding-type inequality for V -uniformly ergodic Markov chains a result of independent interest.
Introduction. log p θ (y k y 1:k 1 ), k=1
ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE
More informationA Recursive Learning Algorithm for Model Reduction of Hidden Markov Models
A Recursive Learning Algorithm for Model Reduction of Hidden Markov Models Kun Deng, Prashant G. Mehta, Sean P. Meyn, and Mathukumalli Vidyasagar Abstract This paper is concerned with a recursive learning
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationConsistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models
Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Yingfu Xie Research Report Centre of Biostochastics Swedish University of Report 2005:3 Agricultural Sciences ISSN
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationCoding on Countably Infinite Alphabets
Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationAsymptotic Properties of the Maximum Likelihood Estimator in Regime Switching Econometric Models
CIRJE-F-1049 Asymptotic Properties of the Maximum Likelihood Estimator in Regime Switching Econometric Models Hiroyuki Kasahara University of British Columbia Katsumi Shimotsu The University of Tokyo May
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationA minimalist s exposition of EM
A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationConcentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand
Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting
More informationStrong approximation for additive functionals of geometrically ergodic Markov chains
Strong approximation for additive functionals of geometrically ergodic Markov chains Florence Merlevède Joint work with E. Rio Université Paris-Est-Marne-La-Vallée (UPEM) Cincinnati Symposium on Probability
More informationPractical conditions on Markov chains for weak convergence of tail empirical processes
Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationInference in Hidden Markov Models. Olivier Cappé, Eric Moulines and Tobias Rydén
Inference in Hidden Markov Models Olivier Cappé, Eric Moulines and Tobias Rydén June 17, 2009 2 Contents 1 Main Definitions and Notations 1 1.1 Markov Chains.............................. 1 1.1.1 Transition
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationGeneralized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses
Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationBayesian estimation of Hidden Markov Models
Bayesian estimation of Hidden Markov Models Laurent Mevel, Lorenzo Finesso IRISA/INRIA Campus de Beaulieu, 35042 Rennes Cedex, France Institute of Systems Science and Biomedical Engineering LADSEB-CNR
More informationSupplement to Hidden Rust Models
Supplement to Hidden Rust Models Benjamin Connault May 2016 1 Appendix for section 3: Identification 1.1 Proof of heorem 1: generic identification structure First, we prove the existence part of heorem
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationSelected Exercises on Expectations and Some Probability Inequalities
Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationDepartment of Econometrics and Business Statistics
ISSN 440-77X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Nonlinear Regression with Harris Recurrent Markov Chains Degui Li, Dag
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationHidden Markov model likelihoods and their derivatives behave like i.i.d. ones : Details.
Hidden Markov model likelihoods and their derivatives behave like i.i.d. ones : Details. Peter J. Bickel Ya acov Ritov Tobias Rydén Abstract We consider the log-likelihood function of hidden Markov models,
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationInformation Theory and Statistics Lecture 3: Stationary ergodic processes
Information Theory and Statistics Lecture 3: Stationary ergodic processes Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Measurable space Definition (measurable space) Measurable space
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationInformation Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST
Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it
More informationSimultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms
Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationThe Discretization Filter: A Simple Way to Estimate Nonlinear State Space Models
The Discretization Filter: A Simple Way to Estimate Nonlinear State Space Models Leland E. Farmer Department of Economics University of Virginia This Version: August 7, 217 First Version: August 214 Link
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationLimit theorems for dependent regularly varying functions of Markov chains
Limit theorems for functions of with extremal linear behavior Limit theorems for dependent regularly varying functions of In collaboration with T. Mikosch Olivier Wintenberger wintenberger@ceremade.dauphine.fr
More informationC.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series
C.7 Numerical series Pag. 147 Proof of the converging criteria for series Theorem 5.29 (Comparison test) Let and be positive-term series such that 0, for any k 0. i) If the series converges, then also
More information1 Continuous-time chains, finite state space
Université Paris Diderot 208 Markov chains Exercises 3 Continuous-time chains, finite state space Exercise Consider a continuous-time taking values in {, 2, 3}, with generator 2 2. 2 2 0. Draw the diagramm
More informationWeak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij
Weak convergence and Brownian Motion (telegram style notes) P.J.C. Spreij this version: December 8, 2006 1 The space C[0, ) In this section we summarize some facts concerning the space C[0, ) of real
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationLower Tail Probabilities and Normal Comparison Inequalities. In Memory of Wenbo V. Li s Contributions
Lower Tail Probabilities and Normal Comparison Inequalities In Memory of Wenbo V. Li s Contributions Qi-Man Shao The Chinese University of Hong Kong Lower Tail Probabilities and Normal Comparison Inequalities
More informationAdaptive Population Monte Carlo
Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationGradient Estimation for Attractor Networks
Gradient Estimation for Attractor Networks Thomas Flynn Department of Computer Science Graduate Center of CUNY July 2017 1 Outline Motivations Deterministic attractor networks Stochastic attractor networks
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationVerifying Regularity Conditions for Logit-Normal GLMM
Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationGoodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach
Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,
More informationCorrelation Detection and an Operational Interpretation of the Rényi Mutual Information
Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Masahito Hayashi 1, Marco Tomamichel 2 1 Graduate School of Mathematics, Nagoya University, and Centre for Quantum
More informationBounds on Convergence of Entropy Rate Approximations in Hidden Markov Processes
Bounds on Convergence of Entropy Rate Approximations in Hidden Markov Processes By Nicholas F. Travers B.S. Haverford College 2005 DISSERTATION Submitted in partial satisfaction of the requirements for
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationOn Convergence of Recursive Monte Carlo Filters in Non-Compact State Spaces
Convergence of Particle Filters 1 On Convergence of Recursive Monte Carlo Filters in Non-Compact State Spaces Jing Lei and Peter Bickel Carnegie Mellon University and University of California, Berkeley
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationVARIATIONAL PRINCIPLE FOR THE ENTROPY
VARIATIONAL PRINCIPLE FOR THE ENTROPY LUCIAN RADU. Metric entropy Let (X, B, µ a measure space and I a countable family of indices. Definition. We say that ξ = {C i : i I} B is a measurable partition if:
More informationG8325: Variational Bayes
G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More informationRare event simulation for the ruin problem with investments via importance sampling and duality
Rare event simulation for the ruin problem with investments via importance sampling and duality Jerey Collamore University of Copenhagen Joint work with Anand Vidyashankar (GMU) and Guoqing Diao (GMU).
More informationErgodicity in data assimilation methods
Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation
More informationMixing time for a random walk on a ring
Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013 Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix
More informationIdentifiability and Inference of Hidden Markov Models
Identifiability and Inference of Hidden Markov Models Yonghong An Texas A& M University Yingyao Hu Johns Hopkins University Matt Shum California Institute of Technology Abstract This paper considers the
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationdans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés
Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,
More informationSemi-Parametric Importance Sampling for Rare-event probability Estimation
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability
More informationA PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction
tm Tatra Mt. Math. Publ. 00 (XXXX), 1 10 A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES Martin Janžura and Lucie Fialová ABSTRACT. A parametric model for statistical analysis of Markov chains type
More informationarxiv: v1 [math.st] 12 Aug 2017
Existence of infinite Viterbi path for pairwise Markov models Jüri Lember and Joonas Sova University of Tartu, Liivi 2 50409, Tartu, Estonia. Email: jyril@ut.ee; joonas.sova@ut.ee arxiv:1708.03799v1 [math.st]
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationMi-Hwa Ko. t=1 Z t is true. j=0
Commun. Korean Math. Soc. 21 (2006), No. 4, pp. 779 786 FUNCTIONAL CENTRAL LIMIT THEOREMS FOR MULTIVARIATE LINEAR PROCESSES GENERATED BY DEPENDENT RANDOM VECTORS Mi-Hwa Ko Abstract. Let X t be an m-dimensional
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationLarge deviations and averaging for systems of slow fast stochastic reaction diffusion equations.
Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations. Wenqing Hu. 1 (Joint work with Michael Salins 2, Konstantinos Spiliopoulos 3.) 1. Department of Mathematics
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More information