Priors for the frequentist, consistency beyond Schwartz

Size: px
Start display at page:

Download "Priors for the frequentist, consistency beyond Schwartz"

Transcription

1 Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics

2 Part I Introduction

3 Bayesian and Frequentist statistics sample space (X, B) measurable space i.i.d. data X n = (X 1,..., X n ) X n frequentist/bayesian model (P, G ) model subsets B, V G prior Π : G [0, 1] probability measure posterior Π( X n ) : G [0, 1] Bayes s rule, inference Frequentist assume there is P 0 X n P0 n Bayes assume P Π X n P P n 3

4 Definition of the posterior Definition 4.1 Assume that all P P n (A) are G -measurable. Given prior Π, a posterior is any Π( X n = ) : G X n [0, 1] (i) For any G G, x n Π(G X n = x n ) is B n -measurable (ii) (Disintegration) For all A B n and G G A Π(G Xn ) dpn Π = P n (A) dπ(p ) G where Pn Π = P n dπ(p ) is the prior predictive distribution Remark 4.2 For frequentists (X 1,..., X n ) P n 0, so assume P n 0 P Π n 4

5 Asymptotic consistency of the posterior n=0 n=1 n= n=9 n=16 n= n=36 n=64 n= n=144 n=256 n= Definition 5.1 Given a model P with topology and a Borel prior Π, the posterior is consistent at P P if for every open nbd U of P Π(U X n ) P 1 5

6 Doob s and Schwartz s consistency theorems Theorem 6.1 (Doob (1948)) Let P and X be Polish spaces. Assume that P P n (A) is Borel measurable for all n, A. Then for any prior Π, the posterior is consistent at P, for Π-almost-all P P Remark 6.2 (Schwartz (1961), Freedman (1963)) Not frequentist! Theorem 6.3 (Schwartz (1965)) Let X 1, X 2,... be an i.i.d.-sample from P 0 P. Let P be Hellinger totally bounded and let Π be a Kullback-Leibler (KL-)prior, i.e. Π ( P P : P 0 log dp/dp 0 < ɛ ) > 0 for all ɛ > 0. Then the posterior is consistent at P 0 in the Hellinger topology 6

7 The Dirichlet process Definition 7.1 (Dirichlet distribution) A random variable p = (p 1,..., p k ) with p l 0 and l p l = 1 is Dirichlet distributed with parameter α = (α 1,..., α k ), p D α, if it has density f α (p) = C(α) k l=1 p α l 1 l Definition 7.2 (Dirichlet process, Ferguson ) Let α be a finite measure on (X, B). The Dirichlet process P D α is defined by, (for all finite msb partitions A = {A 1,..., A k } of X ) ( P (A1 ),..., P (A k ) ) D (α(a1 ),...,α(a k )) 7

8 Weak consistency with Dirichlet priors Theorem 8.1 (Dirichlet consistency) Let X 1, X 2,... be an i.i.d.-sample from P 0 If Π is a Dirichlet prior D α with finite α such that supp(p 0 ) supp(α), the posterior is consistent at P 0 in the weak model topology Remark 8.2 Priors are not necessarily KL for consistency Remark 8.3 (Freedman (1965)) Dirichlet distributions are tailfree: if A refines A and A i1... A il i = A i, then (P (A i1 A i),..., P (A il i A i ) : 1 i k) is independent of (P (A 1 ),..., P (A k )). Remark 8.4 X n Π(P (A) X n ) is σ n (A)-measurable where σ n (A) is generated by products of the form n i=1 B i with B i = {X i A} or B i = {X i A}. 8

9 Bayesian and Frequentist testability For B, V be two (disjoint) model subsets Definition 9.1 Uniform (or minimax) testability sup P n φ n 0, P B sup Q V Q n (1 φ n ) 0 Definition 9.2 Pointwise testability for all P B, Q V φ n P -a.s. 0, φ n Q-a.s. 1 Definition 9.3 Bayesian testability for Π-almost-all P B, Q V φ n P -a.s. 0, φ n Q-a.s. 1 9

10 Part II Bayesian testability and prior-a.s.-consistency

11 A posterior concentration inequality Lemma 11.1 Let (P, G ) be given. For any prior Π, any test function φ and any B, V G such that B V = and Π(B) > 0, B B P Π(V X) dπ(p ) P φ dπ(p ) + Q(1 φ) dπ(q) Corollary 11.2 Consequently, in i.i.d.-context, for any sequences (Π n ), (B n ), (V n ) such that B n V n = and Π n (B n ) > 0, we have, P n Π(V n X n ) dπ n (P B n ) 1 Π(B n ) ( B n P n φ n dπ n (P ) + V V n Q n (1 φ n ) dπ n (Q) ) 11

12 Martingale convergence Proposition 12.1 Let (P, G, Π) be given. following are equivalent, For any B, V G, the (i) There exist Bayesian tests (φ n ) for B versus V ; (ii) There exist tests (φ n ) such that, P n φ n dπ(p ) 0 + B V Qn (1 φ n ) dπ(q) 0, (iii) For Π-almost-all P B, Q V, Π(V X n ) P -a.s. 0, Π(B X n ) Q-a.s. 0 Remark 12.2 Interpretation distinctions between model subsets are Bayesian testable, iff they are picked up by the posterior asymptotically, if(f), the Bayes factor for B versus V is consistent 12

13 Prior-almost-sure consistency Theorem 13.1 Let Hausdorff P with Borel prior Π be given. Assume that for Π-almost-all P P and any open nbd U of P, there exist a B U with Π(B) > 0 and Bayesian tests (φ n ) for B versus P \ U. Then the posterior is consistent at Π-almost-all P P Remark 13.2 Let P be a Polish space and assume that all P P n (A) are Borel measurable. Then, for any prior Π, any Borel set V P is Bayesian testable versus P \ V. Corollary 13.3 Doob s theorem (1948), and much more! 13

14 Part III Frequentism

15 Le Cam s inequality Definition 15.1 For B G such that Π(B) > 0, the local prior predictive distribution is Pn Π B = P n dπ(p B). Remark 15.2 (Le Cam, unpublished (197?) and (1986)) Rewrite the posterior concentration inequality P n 0 Π(V n X n ) + P 0 n P Π B n n P n φ n dπ(p B n ) + Π(V n) Π(B n ) Q n (1 φ n ) dπ(q V n ) Remark 15.3 For some b n 0, B n = {P P : P n P n 0 b n}, a 1 n Π(B n ) Remark 15.4 Useful in parametric models but a considerable nuisance [sic] (Le Cam (1986)) in non-parametric context 15

16 Schwartz s theorem revisited Remark 16.1 Suppose that for all δ > 0, there is a B s.t. Π(B) > 0 and for all P B and large enough n P n 0 Π(V Xn ) e nδ P n Π(V X n ) then (by Fatou) for large enough m sup n m [ (P n 0 enδ P Π B n )Π(V X n ) ] 0 Theorem 16.2 Let P be a model with KL-prior Π; P 0 P. Let B, V G be given and assume that B contains a KL-neighbourhood of P 0. If there exist Bayesian tests for B versus V of exponential power then Π(V X n ) P 0 a.s. 0 Corollary 16.3 (Schwartz s theorem (1965)) 16

17 Remote contiguity Definition 17.1 Given (P n ), (Q n ) of prob msr s, Q n is contiguous w.r.t. P n (Q n P n ), if for any (ψ n ), ψ n : X n [0, 1] P n ψ n = o(1) Q n ψ n = o(1) Definition 17.2 Given (P n ), (Q n ) of prob msr s and a a n 0, Q n is a n -remotely contiguous w.r.t. P n (Q n a 1 n P n ), if for any sequence (ψ n ), ψ n : X n [0, 1] P n ψ n = o(a n ) Q n ψ n = o(1) Remark 17.3 Contiguity is stronger than remote contiguity note that Q n P n iff Q n a 1 n P n for all a n 0. Definition 17.4 Hellinger transform ρ n (α) = (dp n ) α (dq n ) 1 α 17

18 Le Cam s first lemma Lemma 18.1 Given (P n ), (Q n ) like above, Q n P n iff any of the following holds: (i) If T n P n 0, then T n Q n 0 (ii) Given ɛ > 0, there is a b > 0 such that Q n (dq n /dp n > b) < ɛ (iii) Given ɛ > 0, there is a c > 0 such that Q n Q n c P n < ɛ (iv) If dp n /dq n Q n -w. f along a subsequence, then P (f > 0) = 1 (v) If dq n /dp n P n -w. g along a subsequence, then Eg = 1 (vi) lim inf n φ n (α) 1 as α 1 18

19 Criteria for remote contiguity Lemma 19.1 Given (P n ), (Q n ), a n 0, Q n a 1 n P n if any of the following holds: (i) If, for all ɛ > 0, P n (T n > ɛ) = o(a n ), then T n Q n 0 (ii) Given ɛ > 0, there is a b > 0 such that Q n (dq n /dp n > b a 1 n ) < ɛ (iii) Given ɛ > 0, there is a c > 0 such that Q n Q n c a 1 n P n < ɛ (iv) If a 1 n dp Q n -w. n/dq n f along a subsequence, then P (f > 0) = 1 (v) If a n dq n /dp n P n -w. g along a subsequence, then Eg = 1 (vi) lim sup n inf α a n α ρ n (α) 0 19

20 Beyond Schwartz Theorem 20.1 Let (P, G ) with priors (Π n ) and (X 1,..., X n ) P n 0 be given. Assume there are B n, V n G and a n, b n 0, a n 0 s.t. (i) There exist Bayesian tests for B n versus V n of power a n, P n φ n dπ n (P ) + Q n (1 φ n ) dπ n (Q) a n B n V n (ii) The prior mass of B n is lower-bounded by b n, Π n (B n ) b n (iii) The sequence P n 0 satisfies P n 0 b na 1 n P Π n B n n Then Π n (V n X n ) P

21 Application to consistency I Remark 21.1 (Schwartz (1965)) Take P 0 P, and define V n = V := {P P : H(P, P 0 ) ɛ} B n = B := {P : P 0 log dp/dp 0 < ɛ 2 } with a n and b n of form exp( nk). With N(ɛ, P, H) <, the theorem proves Hellinger consistency with KL-priors. Remark 21.2 (Ghosal-Ghosh-vdVaart (2000)) Take P 0 P, and define V n = {P P : H(P, P 0 ) ɛ n } B n = B := {P : P 0 log dp/dp 0 < ɛ 2 n, P 0 log 2 dp/dp 0 < ɛ 2 n} with a n and b n of form exp( Knɛ 2 n ). With log N(ɛ n, P, H) nɛ 2 n, the theorem then proves Hellinger consistency at rate ɛ n with GGV-priors. Other B n are possible! (see Kleijn and Zhao (201x)) 21

22 Application to consistency II Remark 22.1 Dirichlet posteriors X n Π(P (A) X n ) are msb σ n (A) where σ n (A) is generated by products of the form n i=1 B i with B i = {X i A} or B i = {X i A}. Remark 22.2 (Freedman (1965), Ferguson (1973), Lo (1984),...) Take P 0 P, and define V n = V := {P P : (P 0 P )f 2ɛ} B n = B := {P : (P 0 P )f < ɛ} for some bounded, measurable f. Impose remote contiguity only for ψ n that are σ n (A)-measurable! Take a n and b n of form exp( nk). The theorem then proves T 1 consistency with a Dirichlet prior D α, if supp(p 0 ) supp(α). 22

23 Stochastic Block Model Definition 23.1 At step n, nodes belong to one of K n unobserved classes: θ i. We estimate θ = (θ 1,..., θ n ) Θ n upon observation of X n = {X ij : 1 1 < j n}. Edges X ij occur independently with probabilities Q ij (θ) = Q(θ i, θ j ). The (expected) degree is denoted λ n A SBM network realisation: n = 17, K n = 3, λ n

24 Testing in the Stochastic Block Model Assume there is a q s.t. 0 < q < Q ij < 1 q < 1 Lemma 24.1 For given θ, θ Θ n, there exists a test φ n s.t. P θ,n φ n P θ,n (1 φ n) e 8q(1 q) i<j (Q ij(θ) Q ij (θ )) 2 Lemma 24.2 For given, B n, V n Θ n, there exists a test φ n s.t. sup P θ,n φ n e 8q(1 q) a2 n+log #(V n ) θ B n sup P θ θ,n (1 φ n) e 8q(1 q) a2 n +log #(B n) V n where a 2 n = inf θ Bn inf θ Vn i<j(q ij (θ) Q ij (θ )) 2 Note: log #(V n ), log #(B n ) n log(k n ) 24

25 Consistency in the Stochastic Block Model Theorem 25.1 (Bickel, Chen (2009)) If K n = K, λ n / log(n), the ML estimator for θ is consistent. Consistency requires not a single mistake in θ = (θ 1, θ 2,...). Conjecture 25.2 Give Θ n a prior Π n s.t. Π({θ}) > 0 for all θ Θ n. Unless very special conditions for q, K n, λ n are satisfied, the posterior is not consistent. Theorem 25.3 (see also Choi, Wolfe, Airoldi (2011)) Given uniform priors Π n on Θ n, posteriors are consistent for hypotheses B n, V n Θ n, if, for all n large enough. 4q n (1 q n )a 2 n n log(k n ) 25

26 Open questions Tailfreeness is too strong for (weak or T 1 ) consistency; the Dirichlet example allows generalization. Can we show that Pitman-Yor is inconsistent? What about inverse Gaussian? Gibbs-type? Can we construct a family of consistent priors around Dirichlet? Which pairs of (FDR-type) hypotheses in SBM are testable and which are not? Can the method be applied to other network models? What is the extent of the usefulness of remote contiguity? 26

27 The weak and T 1 topologies Uniformity U n : basis is finite intersections of W n,f s W n,f = {(P, Q) : (P n Q n )f < ɛ}, (bnd msb f : X n [0, 1]) and U = n U n U H. Weak U C U 1 for (bnd cont f : X [0, 1]) T C T 1 T n T T H are completely regular (P, T 1 ) sep (P, T ) sep (P, T H ) sep P is dominated Any model P is pre-cpt in T C, T 1, T n, T : any complete P is cpt T C is metrizable, even Polish if X is Polish T 1, T n, T are metrizable iff X discrete 27

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Lectures on Nonparametric Bayesian Statistics

Lectures on Nonparametric Bayesian Statistics Lectures on Nonparametric Bayesian Statistics Aad van der Vaart Universiteit Leiden, Netherlands Bad Belzig, March 2013 Contents Introduction Dirichlet process Consistency and rates Gaussian process priors

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

Consensus and Distributed Inference Rates Using Network Divergence

Consensus and Distributed Inference Rates Using Network Divergence DIMACS August 2017 1 / 26 Consensus and Distributed Inference Rates Using Network Divergence Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey August 23, 2017

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

An inverse of Sanov s theorem

An inverse of Sanov s theorem An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Bayesian Consistency for Markov Models

Bayesian Consistency for Markov Models Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for

More information

On Consistent Hypotheses Testing

On Consistent Hypotheses Testing Mikhail Ermakov St.Petersburg E-mail: erm2512@gmail.com History is rather distinctive. Stein (1954); Hodges (1954); Hoefding and Wolfowitz (1956), Le Cam and Schwartz (1960). special original setups: Shepp,

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

arxiv: v5 [math.st] 13 Nov 2009

arxiv: v5 [math.st] 13 Nov 2009 Electronic Journal of Statistics ISSN: 935-7524 arxiv: 090.342 arxiv:090.342v5 [math.st] 3 Nov 2009 Dynamics of Bayesian Updating with Dependent Data and Misspecified Models Cosma Rohilla Shalizi Statistics

More information

4 Countability axioms

4 Countability axioms 4 COUNTABILITY AXIOMS 4 Countability axioms Definition 4.1. Let X be a topological space X is said to be first countable if for any x X, there is a countable basis for the neighborhoods of x. X is said

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

7 Complete metric spaces and function spaces

7 Complete metric spaces and function spaces 7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m

More information

arxiv: v2 [math.st] 24 Jan 2009

arxiv: v2 [math.st] 24 Jan 2009 Submitted to the Annals of Statistics arxiv: 090.342 DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS arxiv:090.342v2 [math.st] 24 Jan 2009 By Cosma Rohilla Shalizi Statistics

More information

A PECULIAR COIN-TOSSING MODEL

A PECULIAR COIN-TOSSING MODEL A PECULIAR COIN-TOSSING MODEL EDWARD J. GREEN 1. Coin tossing according to de Finetti A coin is drawn at random from a finite set of coins. Each coin generates an i.i.d. sequence of outcomes (heads or

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.)

TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.) ABSTRACT TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.) Prediction of the future observations is an important practical issue for

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process On Simulations form the Two-Parameter arxiv:1209.5359v1 [stat.co] 24 Sep 2012 Poisson-Dirichlet Process and the Normalized Inverse-Gaussian Process Luai Al Labadi and Mahmoud Zarepour May 8, 2018 ABSTRACT

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales Prakash Balachandran Department of Mathematics Duke University April 2, 2008 1 Review of Discrete-Time

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

Wiener Measure and Brownian Motion

Wiener Measure and Brownian Motion Chapter 16 Wiener Measure and Brownian Motion Diffusion of particles is a product of their apparently random motion. The density u(t, x) of diffusing particles satisfies the diffusion equation (16.1) u

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio

More information

arxiv: v2 [math.st] 18 Feb 2013

arxiv: v2 [math.st] 18 Feb 2013 Bayesian optimal adaptive estimation arxiv:124.2392v2 [math.st] 18 Feb 213 using a sieve prior Julyan Arbel 1,2, Ghislaine Gayraud 1,3 and Judith Rousseau 1,4 1 Laboratoire de Statistique, CREST, France

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, 2013 1 Metric Spaces Let X be an arbitrary set. A function d : X X R is called a metric if it satisfies the folloing

More information

BAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS

BAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS BAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS BRENDAN KLINE AND ELIE TAMER UNIVERSITY OF TEXAS AT AUSTIN AND HARVARD UNIVERSITY Abstract. This paper develops a Bayesian approach to inference

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September

More information

Lecture 26: Likelihood ratio tests

Lecture 26: Likelihood ratio tests Lecture 26: Likelihood ratio tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f θ0 (X) > c 0 for

More information

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Bayesian Analysis (2012) 7, Number 2, pp. 277 310 On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Andrés F. Barrientos, Alejandro Jara and Fernando A. Quintana Abstract. We

More information

Consistent semiparametric Bayesian inference about a location parameter

Consistent semiparametric Bayesian inference about a location parameter Journal of Statistical Planning and Inference 77 (1999) 181 193 Consistent semiparametric Bayesian inference about a location parameter Subhashis Ghosal a, Jayanta K. Ghosh b; c; 1 d; ; 2, R.V. Ramamoorthi

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

A large-deviation principle for Dirichlet posteriors

A large-deviation principle for Dirichlet posteriors Bernoulli 6(6), 2000, 02±034 A large-deviation principle for Dirichlet posteriors AYALVADI J. GANESH and NEIL O'CONNELL 2 Microsoft Research, Guildhall Street, Cambridge CB2 3NH, UK. E-mail: ajg@microsoft.com

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

More information

Local Asymptotic Normality

Local Asymptotic Normality Chapter 8 Local Asymptotic Normality 8.1 LAN and Gaussian shift families N::efficiency.LAN LAN.defn In Chapter 3, pointwise Taylor series expansion gave quadratic approximations to to criterion functions

More information

Nonparametric Bayesian model selection and averaging

Nonparametric Bayesian model selection and averaging Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Information Theoretical Upper and Lower Bounds for Statistical Estimation

Information Theoretical Upper and Lower Bounds for Statistical Estimation Information Theoretical Upper and Lower Bounds for Statistical Estimation Tong Zhang, Yahoo Inc, New York City, USA Abstract We establish upper and lower bounds for some statistical estimation problems

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds

Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds Annals of the Institute of Statistical Mathematics manuscript No. (will be inserted by the editor) Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications

More information

Learning distributions and hypothesis testing via social learning

Learning distributions and hypothesis testing via social learning UMich EECS 2015 1 / 48 Learning distributions and hypothesis testing via social learning Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey September 29, 2015

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University

RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University Submitted to the Annals o Statistics RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS By Chao Gao and Harrison H. Zhou Yale University A novel block prior is proposed or adaptive Bayesian estimation.

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

Consistency of Modularity Clustering on Random Geometric Graphs

Consistency of Modularity Clustering on Random Geometric Graphs Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis The University of Arizona May 10, 2016 Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems

Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

On Bayes Risk Lower Bounds

On Bayes Risk Lower Bounds Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. Institute for Applied Mathematics WS17/18 Massimiliano Gubinelli Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. [version 1, 2017.11.1] We introduce

More information

Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures

Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures Submitted to Bernoulli arxiv: 1.v1 Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures SOPHIE DONNET 1,* VINCENT RIVOIRARD 1,** JUDITH ROUSSEAU

More information

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij Weak convergence and Brownian Motion (telegram style notes) P.J.C. Spreij this version: December 8, 2006 1 The space C[0, ) In this section we summarize some facts concerning the space C[0, ) of real

More information

A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter

A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter Richard J. La and Maya Kabkab Abstract We introduce a new random graph model. In our model, n, n 2, vertices choose a subset

More information

The Lindeberg central limit theorem

The Lindeberg central limit theorem The Lindeberg central limit theorem Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto May 29, 205 Convergence in distribution We denote by P d the collection of Borel probability

More information

Problem set 1, Real Analysis I, Spring, 2015.

Problem set 1, Real Analysis I, Spring, 2015. Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Uniformly and Restricted Most Powerful Bayesian Tests

Uniformly and Restricted Most Powerful Bayesian Tests Uniformly and Restricted Most Powerful Bayesian Tests Valen E. Johnson and Scott Goddard Texas A&M University June 6, 2014 Valen E. Johnson and Scott Goddard Texas A&MUniformly University Most Powerful

More information

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration? Lebesgue Integration: A non-rigorous introduction What is wrong with Riemann integration? xample. Let f(x) = { 0 for x Q 1 for x / Q. The upper integral is 1, while the lower integral is 0. Yet, the function

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

Information Theory and Statistics, Part I

Information Theory and Statistics, Part I Information Theory and Statistics, Part I Information Theory 2013 Lecture 6 George Mathai May 16, 2013 Outline This lecture will cover Method of Types. Law of Large Numbers. Universal Source Coding. Large

More information

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap On the Uniform Asymptotic Validity of Subsampling and the Bootstrap Joseph P. Romano Departments of Economics and Statistics Stanford University romano@stanford.edu Azeem M. Shaikh Department of Economics

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

Nonparametric one-sided testing for the mean and related extremum problems

Nonparametric one-sided testing for the mean and related extremum problems Nonparametric one-sided testing for the mean and related extremum problems Norbert Gaffke University of Magdeburg, Faculty of Mathematics D-39016 Magdeburg, PF 4120, Germany E-mail: norbert.gaffke@mathematik.uni-magdeburg.de

More information

Translation Invariant Experiments with Independent Increments

Translation Invariant Experiments with Independent Increments Translation Invariant Statistical Experiments with Independent Increments (joint work with Nino Kordzakhia and Alex Novikov Steklov Mathematical Institute St.Petersburg, June 10, 2013 Outline 1 Introduction

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures 2 1 Borel Regular Measures We now state and prove an important regularity property of Borel regular outer measures: Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Useful Probability Theorems

Useful Probability Theorems Useful Probability Theorems Shiu-Tang Li Finished: March 23, 2013 Last updated: November 2, 2013 1 Convergence in distribution Theorem 1.1. TFAE: (i) µ n µ, µ n, µ are probability measures. (ii) F n (x)

More information

6 Bayesian Methods for Function Estimation

6 Bayesian Methods for Function Estimation hs25 v.2005/04/21 Prn:19/05/2005; 15:11 F:hs25013.tex; VTEX/VJ p. 1 1 Handbook of Statistics, Vol. 25 ISSN: 0169-7161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S0169-7161(05)25013-7 1 6 Bayesian

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information