Priors for the frequentist, consistency beyond Schwartz
|
|
- Leon Claude Fleming
- 5 years ago
- Views:
Transcription
1 Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics
2 Part I Introduction
3 Bayesian and Frequentist statistics sample space (X, B) measurable space i.i.d. data X n = (X 1,..., X n ) X n frequentist/bayesian model (P, G ) model subsets B, V G prior Π : G [0, 1] probability measure posterior Π( X n ) : G [0, 1] Bayes s rule, inference Frequentist assume there is P 0 X n P0 n Bayes assume P Π X n P P n 3
4 Definition of the posterior Definition 4.1 Assume that all P P n (A) are G -measurable. Given prior Π, a posterior is any Π( X n = ) : G X n [0, 1] (i) For any G G, x n Π(G X n = x n ) is B n -measurable (ii) (Disintegration) For all A B n and G G A Π(G Xn ) dpn Π = P n (A) dπ(p ) G where Pn Π = P n dπ(p ) is the prior predictive distribution Remark 4.2 For frequentists (X 1,..., X n ) P n 0, so assume P n 0 P Π n 4
5 Asymptotic consistency of the posterior n=0 n=1 n= n=9 n=16 n= n=36 n=64 n= n=144 n=256 n= Definition 5.1 Given a model P with topology and a Borel prior Π, the posterior is consistent at P P if for every open nbd U of P Π(U X n ) P 1 5
6 Doob s and Schwartz s consistency theorems Theorem 6.1 (Doob (1948)) Let P and X be Polish spaces. Assume that P P n (A) is Borel measurable for all n, A. Then for any prior Π, the posterior is consistent at P, for Π-almost-all P P Remark 6.2 (Schwartz (1961), Freedman (1963)) Not frequentist! Theorem 6.3 (Schwartz (1965)) Let X 1, X 2,... be an i.i.d.-sample from P 0 P. Let P be Hellinger totally bounded and let Π be a Kullback-Leibler (KL-)prior, i.e. Π ( P P : P 0 log dp/dp 0 < ɛ ) > 0 for all ɛ > 0. Then the posterior is consistent at P 0 in the Hellinger topology 6
7 The Dirichlet process Definition 7.1 (Dirichlet distribution) A random variable p = (p 1,..., p k ) with p l 0 and l p l = 1 is Dirichlet distributed with parameter α = (α 1,..., α k ), p D α, if it has density f α (p) = C(α) k l=1 p α l 1 l Definition 7.2 (Dirichlet process, Ferguson ) Let α be a finite measure on (X, B). The Dirichlet process P D α is defined by, (for all finite msb partitions A = {A 1,..., A k } of X ) ( P (A1 ),..., P (A k ) ) D (α(a1 ),...,α(a k )) 7
8 Weak consistency with Dirichlet priors Theorem 8.1 (Dirichlet consistency) Let X 1, X 2,... be an i.i.d.-sample from P 0 If Π is a Dirichlet prior D α with finite α such that supp(p 0 ) supp(α), the posterior is consistent at P 0 in the weak model topology Remark 8.2 Priors are not necessarily KL for consistency Remark 8.3 (Freedman (1965)) Dirichlet distributions are tailfree: if A refines A and A i1... A il i = A i, then (P (A i1 A i),..., P (A il i A i ) : 1 i k) is independent of (P (A 1 ),..., P (A k )). Remark 8.4 X n Π(P (A) X n ) is σ n (A)-measurable where σ n (A) is generated by products of the form n i=1 B i with B i = {X i A} or B i = {X i A}. 8
9 Bayesian and Frequentist testability For B, V be two (disjoint) model subsets Definition 9.1 Uniform (or minimax) testability sup P n φ n 0, P B sup Q V Q n (1 φ n ) 0 Definition 9.2 Pointwise testability for all P B, Q V φ n P -a.s. 0, φ n Q-a.s. 1 Definition 9.3 Bayesian testability for Π-almost-all P B, Q V φ n P -a.s. 0, φ n Q-a.s. 1 9
10 Part II Bayesian testability and prior-a.s.-consistency
11 A posterior concentration inequality Lemma 11.1 Let (P, G ) be given. For any prior Π, any test function φ and any B, V G such that B V = and Π(B) > 0, B B P Π(V X) dπ(p ) P φ dπ(p ) + Q(1 φ) dπ(q) Corollary 11.2 Consequently, in i.i.d.-context, for any sequences (Π n ), (B n ), (V n ) such that B n V n = and Π n (B n ) > 0, we have, P n Π(V n X n ) dπ n (P B n ) 1 Π(B n ) ( B n P n φ n dπ n (P ) + V V n Q n (1 φ n ) dπ n (Q) ) 11
12 Martingale convergence Proposition 12.1 Let (P, G, Π) be given. following are equivalent, For any B, V G, the (i) There exist Bayesian tests (φ n ) for B versus V ; (ii) There exist tests (φ n ) such that, P n φ n dπ(p ) 0 + B V Qn (1 φ n ) dπ(q) 0, (iii) For Π-almost-all P B, Q V, Π(V X n ) P -a.s. 0, Π(B X n ) Q-a.s. 0 Remark 12.2 Interpretation distinctions between model subsets are Bayesian testable, iff they are picked up by the posterior asymptotically, if(f), the Bayes factor for B versus V is consistent 12
13 Prior-almost-sure consistency Theorem 13.1 Let Hausdorff P with Borel prior Π be given. Assume that for Π-almost-all P P and any open nbd U of P, there exist a B U with Π(B) > 0 and Bayesian tests (φ n ) for B versus P \ U. Then the posterior is consistent at Π-almost-all P P Remark 13.2 Let P be a Polish space and assume that all P P n (A) are Borel measurable. Then, for any prior Π, any Borel set V P is Bayesian testable versus P \ V. Corollary 13.3 Doob s theorem (1948), and much more! 13
14 Part III Frequentism
15 Le Cam s inequality Definition 15.1 For B G such that Π(B) > 0, the local prior predictive distribution is Pn Π B = P n dπ(p B). Remark 15.2 (Le Cam, unpublished (197?) and (1986)) Rewrite the posterior concentration inequality P n 0 Π(V n X n ) + P 0 n P Π B n n P n φ n dπ(p B n ) + Π(V n) Π(B n ) Q n (1 φ n ) dπ(q V n ) Remark 15.3 For some b n 0, B n = {P P : P n P n 0 b n}, a 1 n Π(B n ) Remark 15.4 Useful in parametric models but a considerable nuisance [sic] (Le Cam (1986)) in non-parametric context 15
16 Schwartz s theorem revisited Remark 16.1 Suppose that for all δ > 0, there is a B s.t. Π(B) > 0 and for all P B and large enough n P n 0 Π(V Xn ) e nδ P n Π(V X n ) then (by Fatou) for large enough m sup n m [ (P n 0 enδ P Π B n )Π(V X n ) ] 0 Theorem 16.2 Let P be a model with KL-prior Π; P 0 P. Let B, V G be given and assume that B contains a KL-neighbourhood of P 0. If there exist Bayesian tests for B versus V of exponential power then Π(V X n ) P 0 a.s. 0 Corollary 16.3 (Schwartz s theorem (1965)) 16
17 Remote contiguity Definition 17.1 Given (P n ), (Q n ) of prob msr s, Q n is contiguous w.r.t. P n (Q n P n ), if for any (ψ n ), ψ n : X n [0, 1] P n ψ n = o(1) Q n ψ n = o(1) Definition 17.2 Given (P n ), (Q n ) of prob msr s and a a n 0, Q n is a n -remotely contiguous w.r.t. P n (Q n a 1 n P n ), if for any sequence (ψ n ), ψ n : X n [0, 1] P n ψ n = o(a n ) Q n ψ n = o(1) Remark 17.3 Contiguity is stronger than remote contiguity note that Q n P n iff Q n a 1 n P n for all a n 0. Definition 17.4 Hellinger transform ρ n (α) = (dp n ) α (dq n ) 1 α 17
18 Le Cam s first lemma Lemma 18.1 Given (P n ), (Q n ) like above, Q n P n iff any of the following holds: (i) If T n P n 0, then T n Q n 0 (ii) Given ɛ > 0, there is a b > 0 such that Q n (dq n /dp n > b) < ɛ (iii) Given ɛ > 0, there is a c > 0 such that Q n Q n c P n < ɛ (iv) If dp n /dq n Q n -w. f along a subsequence, then P (f > 0) = 1 (v) If dq n /dp n P n -w. g along a subsequence, then Eg = 1 (vi) lim inf n φ n (α) 1 as α 1 18
19 Criteria for remote contiguity Lemma 19.1 Given (P n ), (Q n ), a n 0, Q n a 1 n P n if any of the following holds: (i) If, for all ɛ > 0, P n (T n > ɛ) = o(a n ), then T n Q n 0 (ii) Given ɛ > 0, there is a b > 0 such that Q n (dq n /dp n > b a 1 n ) < ɛ (iii) Given ɛ > 0, there is a c > 0 such that Q n Q n c a 1 n P n < ɛ (iv) If a 1 n dp Q n -w. n/dq n f along a subsequence, then P (f > 0) = 1 (v) If a n dq n /dp n P n -w. g along a subsequence, then Eg = 1 (vi) lim sup n inf α a n α ρ n (α) 0 19
20 Beyond Schwartz Theorem 20.1 Let (P, G ) with priors (Π n ) and (X 1,..., X n ) P n 0 be given. Assume there are B n, V n G and a n, b n 0, a n 0 s.t. (i) There exist Bayesian tests for B n versus V n of power a n, P n φ n dπ n (P ) + Q n (1 φ n ) dπ n (Q) a n B n V n (ii) The prior mass of B n is lower-bounded by b n, Π n (B n ) b n (iii) The sequence P n 0 satisfies P n 0 b na 1 n P Π n B n n Then Π n (V n X n ) P
21 Application to consistency I Remark 21.1 (Schwartz (1965)) Take P 0 P, and define V n = V := {P P : H(P, P 0 ) ɛ} B n = B := {P : P 0 log dp/dp 0 < ɛ 2 } with a n and b n of form exp( nk). With N(ɛ, P, H) <, the theorem proves Hellinger consistency with KL-priors. Remark 21.2 (Ghosal-Ghosh-vdVaart (2000)) Take P 0 P, and define V n = {P P : H(P, P 0 ) ɛ n } B n = B := {P : P 0 log dp/dp 0 < ɛ 2 n, P 0 log 2 dp/dp 0 < ɛ 2 n} with a n and b n of form exp( Knɛ 2 n ). With log N(ɛ n, P, H) nɛ 2 n, the theorem then proves Hellinger consistency at rate ɛ n with GGV-priors. Other B n are possible! (see Kleijn and Zhao (201x)) 21
22 Application to consistency II Remark 22.1 Dirichlet posteriors X n Π(P (A) X n ) are msb σ n (A) where σ n (A) is generated by products of the form n i=1 B i with B i = {X i A} or B i = {X i A}. Remark 22.2 (Freedman (1965), Ferguson (1973), Lo (1984),...) Take P 0 P, and define V n = V := {P P : (P 0 P )f 2ɛ} B n = B := {P : (P 0 P )f < ɛ} for some bounded, measurable f. Impose remote contiguity only for ψ n that are σ n (A)-measurable! Take a n and b n of form exp( nk). The theorem then proves T 1 consistency with a Dirichlet prior D α, if supp(p 0 ) supp(α). 22
23 Stochastic Block Model Definition 23.1 At step n, nodes belong to one of K n unobserved classes: θ i. We estimate θ = (θ 1,..., θ n ) Θ n upon observation of X n = {X ij : 1 1 < j n}. Edges X ij occur independently with probabilities Q ij (θ) = Q(θ i, θ j ). The (expected) degree is denoted λ n A SBM network realisation: n = 17, K n = 3, λ n
24 Testing in the Stochastic Block Model Assume there is a q s.t. 0 < q < Q ij < 1 q < 1 Lemma 24.1 For given θ, θ Θ n, there exists a test φ n s.t. P θ,n φ n P θ,n (1 φ n) e 8q(1 q) i<j (Q ij(θ) Q ij (θ )) 2 Lemma 24.2 For given, B n, V n Θ n, there exists a test φ n s.t. sup P θ,n φ n e 8q(1 q) a2 n+log #(V n ) θ B n sup P θ θ,n (1 φ n) e 8q(1 q) a2 n +log #(B n) V n where a 2 n = inf θ Bn inf θ Vn i<j(q ij (θ) Q ij (θ )) 2 Note: log #(V n ), log #(B n ) n log(k n ) 24
25 Consistency in the Stochastic Block Model Theorem 25.1 (Bickel, Chen (2009)) If K n = K, λ n / log(n), the ML estimator for θ is consistent. Consistency requires not a single mistake in θ = (θ 1, θ 2,...). Conjecture 25.2 Give Θ n a prior Π n s.t. Π({θ}) > 0 for all θ Θ n. Unless very special conditions for q, K n, λ n are satisfied, the posterior is not consistent. Theorem 25.3 (see also Choi, Wolfe, Airoldi (2011)) Given uniform priors Π n on Θ n, posteriors are consistent for hypotheses B n, V n Θ n, if, for all n large enough. 4q n (1 q n )a 2 n n log(k n ) 25
26 Open questions Tailfreeness is too strong for (weak or T 1 ) consistency; the Dirichlet example allows generalization. Can we show that Pitman-Yor is inconsistent? What about inverse Gaussian? Gibbs-type? Can we construct a family of consistent priors around Dirichlet? Which pairs of (FDR-type) hypotheses in SBM are testable and which are not? Can the method be applied to other network models? What is the extent of the usefulness of remote contiguity? 26
27 The weak and T 1 topologies Uniformity U n : basis is finite intersections of W n,f s W n,f = {(P, Q) : (P n Q n )f < ɛ}, (bnd msb f : X n [0, 1]) and U = n U n U H. Weak U C U 1 for (bnd cont f : X [0, 1]) T C T 1 T n T T H are completely regular (P, T 1 ) sep (P, T ) sep (P, T H ) sep P is dominated Any model P is pre-cpt in T C, T 1, T n, T : any complete P is cpt T C is metrizable, even Polish if X is Polish T 1, T n, T are metrizable iff X discrete 27
Semiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationNonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationLectures on Nonparametric Bayesian Statistics
Lectures on Nonparametric Bayesian Statistics Aad van der Vaart Universiteit Leiden, Netherlands Bad Belzig, March 2013 Contents Introduction Dirichlet process Consistency and rates Gaussian process priors
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/
More informationAsymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors
Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:
More informationConsensus and Distributed Inference Rates Using Network Divergence
DIMACS August 2017 1 / 26 Consensus and Distributed Inference Rates Using Network Divergence Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey August 23, 2017
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationAn inverse of Sanov s theorem
An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationBayesian Consistency for Markov Models
Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for
More informationOn Consistent Hypotheses Testing
Mikhail Ermakov St.Petersburg E-mail: erm2512@gmail.com History is rather distinctive. Stein (1954); Hodges (1954); Hoefding and Wolfowitz (1956), Le Cam and Schwartz (1960). special original setups: Shepp,
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationarxiv: v5 [math.st] 13 Nov 2009
Electronic Journal of Statistics ISSN: 935-7524 arxiv: 090.342 arxiv:090.342v5 [math.st] 3 Nov 2009 Dynamics of Bayesian Updating with Dependent Data and Misspecified Models Cosma Rohilla Shalizi Statistics
More information4 Countability axioms
4 COUNTABILITY AXIOMS 4 Countability axioms Definition 4.1. Let X be a topological space X is said to be first countable if for any x X, there is a countable basis for the neighborhoods of x. X is said
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More information7 Complete metric spaces and function spaces
7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m
More informationarxiv: v2 [math.st] 24 Jan 2009
Submitted to the Annals of Statistics arxiv: 090.342 DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS arxiv:090.342v2 [math.st] 24 Jan 2009 By Cosma Rohilla Shalizi Statistics
More informationA PECULIAR COIN-TOSSING MODEL
A PECULIAR COIN-TOSSING MODEL EDWARD J. GREEN 1. Coin tossing according to de Finetti A coin is drawn at random from a finite set of coins. Each coin generates an i.i.d. sequence of outcomes (heads or
More informationIntegration on Measure Spaces
Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results
Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationTANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.)
ABSTRACT TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.) Prediction of the future observations is an important practical issue for
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January
More informationNonparametric Bayes Inference on Manifolds with Applications
Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces
More informationOn Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process
On Simulations form the Two-Parameter arxiv:1209.5359v1 [stat.co] 24 Sep 2012 Poisson-Dirichlet Process and the Normalized Inverse-Gaussian Process Luai Al Labadi and Mahmoud Zarepour May 8, 2018 ABSTRACT
More informationRATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University
Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationFundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales
Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales Prakash Balachandran Department of Mathematics Duke University April 2, 2008 1 Review of Discrete-Time
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationWiener Measure and Brownian Motion
Chapter 16 Wiener Measure and Brownian Motion Diffusion of particles is a product of their apparently random motion. The density u(t, x) of diffusing particles satisfies the diffusion equation (16.1) u
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationNONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES
NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio
More informationarxiv: v2 [math.st] 18 Feb 2013
Bayesian optimal adaptive estimation arxiv:124.2392v2 [math.st] 18 Feb 213 using a sieve prior Julyan Arbel 1,2, Ghislaine Gayraud 1,3 and Judith Rousseau 1,4 1 Laboratoire de Statistique, CREST, France
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationStat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces
Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, 2013 1 Metric Spaces Let X be an arbitrary set. A function d : X X R is called a metric if it satisfies the folloing
More informationBAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS
BAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS BRENDAN KLINE AND ELIE TAMER UNIVERSITY OF TEXAS AT AUSTIN AND HARVARD UNIVERSITY Abstract. This paper develops a Bayesian approach to inference
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationBayesian nonparametrics
Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September
More informationLecture 26: Likelihood ratio tests
Lecture 26: Likelihood ratio tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f θ0 (X) > c 0 for
More informationOn the Support of MacEachern s Dependent Dirichlet Processes and Extensions
Bayesian Analysis (2012) 7, Number 2, pp. 277 310 On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Andrés F. Barrientos, Alejandro Jara and Fernando A. Quintana Abstract. We
More informationConsistent semiparametric Bayesian inference about a location parameter
Journal of Statistical Planning and Inference 77 (1999) 181 193 Consistent semiparametric Bayesian inference about a location parameter Subhashis Ghosal a, Jayanta K. Ghosh b; c; 1 d; ; 2, R.V. Ramamoorthi
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationA large-deviation principle for Dirichlet posteriors
Bernoulli 6(6), 2000, 02±034 A large-deviation principle for Dirichlet posteriors AYALVADI J. GANESH and NEIL O'CONNELL 2 Microsoft Research, Guildhall Street, Cambridge CB2 3NH, UK. E-mail: ajg@microsoft.com
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationDynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)
Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic
More informationLocal Asymptotic Normality
Chapter 8 Local Asymptotic Normality 8.1 LAN and Gaussian shift families N::efficiency.LAN LAN.defn In Chapter 3, pointwise Taylor series expansion gave quadratic approximations to to criterion functions
More informationNonparametric Bayesian model selection and averaging
Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationInformation Theoretical Upper and Lower Bounds for Statistical Estimation
Information Theoretical Upper and Lower Bounds for Statistical Estimation Tong Zhang, Yahoo Inc, New York City, USA Abstract We establish upper and lower bounds for some statistical estimation problems
More informationLatent factor density regression models
Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON
More informationStrong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds
Annals of the Institute of Statistical Mathematics manuscript No. (will be inserted by the editor) Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications
More informationLearning distributions and hypothesis testing via social learning
UMich EECS 2015 1 / 48 Learning distributions and hypothesis testing via social learning Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey September 29, 2015
More informationEECS 750. Hypothesis Testing with Communication Constraints
EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationRATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University
Submitted to the Annals o Statistics RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS By Chao Gao and Harrison H. Zhou Yale University A novel block prior is proposed or adaptive Bayesian estimation.
More information1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3
Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,
More informationConsistency of Modularity Clustering on Random Geometric Graphs
Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis The University of Arizona May 10, 2016 Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationIntroduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems
Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential
More information40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology
Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson
More informationOn Bayes Risk Lower Bounds
Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationMarkov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.
Institute for Applied Mathematics WS17/18 Massimiliano Gubinelli Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. [version 1, 2017.11.1] We introduce
More informationPosterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures
Submitted to Bernoulli arxiv: 1.v1 Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures SOPHIE DONNET 1,* VINCENT RIVOIRARD 1,** JUDITH ROUSSEAU
More informationWeak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij
Weak convergence and Brownian Motion (telegram style notes) P.J.C. Spreij this version: December 8, 2006 1 The space C[0, ) In this section we summarize some facts concerning the space C[0, ) of real
More informationA New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter
A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter Richard J. La and Maya Kabkab Abstract We introduce a new random graph model. In our model, n, n 2, vertices choose a subset
More informationThe Lindeberg central limit theorem
The Lindeberg central limit theorem Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto May 29, 205 Convergence in distribution We denote by P d the collection of Borel probability
More informationProblem set 1, Real Analysis I, Spring, 2015.
Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationUniformly and Restricted Most Powerful Bayesian Tests
Uniformly and Restricted Most Powerful Bayesian Tests Valen E. Johnson and Scott Goddard Texas A&M University June 6, 2014 Valen E. Johnson and Scott Goddard Texas A&MUniformly University Most Powerful
More informationLebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?
Lebesgue Integration: A non-rigorous introduction What is wrong with Riemann integration? xample. Let f(x) = { 0 for x Q 1 for x / Q. The upper integral is 1, while the lower integral is 0. Yet, the function
More informationGeneralized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses
Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:
More informationProbability and Measure
Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real
More informationInformation Theory and Statistics, Part I
Information Theory and Statistics, Part I Information Theory 2013 Lecture 6 George Mathai May 16, 2013 Outline This lecture will cover Method of Types. Law of Large Numbers. Universal Source Coding. Large
More informationOn the Uniform Asymptotic Validity of Subsampling and the Bootstrap
On the Uniform Asymptotic Validity of Subsampling and the Bootstrap Joseph P. Romano Departments of Economics and Statistics Stanford University romano@stanford.edu Azeem M. Shaikh Department of Economics
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationBandits : optimality in exponential families
Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities
More informationNonparametric one-sided testing for the mean and related extremum problems
Nonparametric one-sided testing for the mean and related extremum problems Norbert Gaffke University of Magdeburg, Faculty of Mathematics D-39016 Magdeburg, PF 4120, Germany E-mail: norbert.gaffke@mathematik.uni-magdeburg.de
More informationTranslation Invariant Experiments with Independent Increments
Translation Invariant Statistical Experiments with Independent Increments (joint work with Nino Kordzakhia and Alex Novikov Steklov Mathematical Institute St.Petersburg, June 10, 2013 Outline 1 Introduction
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationStanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures
2 1 Borel Regular Measures We now state and prove an important regularity property of Borel regular outer measures: Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationUseful Probability Theorems
Useful Probability Theorems Shiu-Tang Li Finished: March 23, 2013 Last updated: November 2, 2013 1 Convergence in distribution Theorem 1.1. TFAE: (i) µ n µ, µ n, µ are probability measures. (ii) F n (x)
More information6 Bayesian Methods for Function Estimation
hs25 v.2005/04/21 Prn:19/05/2005; 15:11 F:hs25013.tex; VTEX/VJ p. 1 1 Handbook of Statistics, Vol. 25 ISSN: 0169-7161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S0169-7161(05)25013-7 1 6 Bayesian
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More information