Differential Stein operators for multivariate continuous distributions and applications
|
|
- Bertina King
- 5 years ago
- Views:
Transcription
1 Differential Stein operators for multivariate continuous distributions and applications Gesine Reinert A French/American Collaborative Colloquium on Concentration Inequalities, High Dimensional Statistics and Stein s Method July 4th, 2017 Joint work with Guillaume Mijoule and Yvik Swan (Liége) 1 / 41
2 Stein s method Outline 1 Stein s method 2 The score function and the Stein kernel 3 Higher dimensions 4 Stein operators T p F = div(f p)/p 5 Last remarks 2 / 41
3 Stein s method Stein s method in a nutshell For µ a target distribution, with support I: 1 Find a suitable operator A (called Stein operator) and a wide class of functions F(A) (called Stein class) such that X µ if and only if for all functions f F(A), EAf (X ) = 0. 2 Let H(I) be a measure-determining class on I. For each h H find a solution f = f h F(A) of the h(x) Eh(X ) = Af (x), where X µ. If the solution exists and if it is unique in F(A) then we can write f (x) = A 1 (h(x) Eh(X )). We call A 1 the inverse Stein operator (for µ). 3 / 41
4 Stein s method Example: mean zero normal Stein (1972, 1986), see also Chen, Goldstein, Shao 2011 Z N (0, σ 2 ) if and only if for all smooth functions f, EZf (Z) = σ 2 Ef (Z). Given a test function h, let Z N (0, σ 2 ); the Stein equation is σ 2 f (w) wf (w) = h(w) Eh(Z) which has as unique bounded solution f (y) = 1 σ 2 ey 2 /2σ 2 y (h(x) Eh(Z)) e x2 /2σ 2 dx. 4 / 41
5 Stein s method Example: the sum of independent random variables X 1,..., X n indep t mean zero, Var = 1 n ; W = n i=1 X i. Then Ef (W ) EWf (W ) n = Ef (W ) EX i f (W ) = Ef (W ) = 1 n i=1 n EX i f (W X i ) + i=1 n i=1 n ( Ef (W ) Ef (W X i ) ) + R; i=1 EX 2 i f (W X i ) + R bound this expression by Taylor expansion to give that for any smooth h ( ) Eh(W ) Nh h 2 n + E X 3 n i. i=1 Note: nothing goes to infinity. 5 / 41
6 Stein s method Comparison of distributions Let X and Y have distributions µ X and µ Y with Stein operators A X and A Y, so that F(A X ) F(A Y ) and choose H(I) such that all solutions f of the Stein equation belong to this intersection. Then and Eh(X ) Eh(Y ) = EA Y f (X ) = EA Y f (X ) EA X f (X ) sup Eh(X ) Eh(Y ) sup EA X f (X ) EA Y f (X ). h H(I) f F(A X ) F(A Y ) If H(I) is the set of all Lipschitz-1-functions then the resulting distance is d W, the Wasserstein distance. For examples: Holmes (2004), Eichelsbacher and R. (2008), Döbler (2012), Ley, Swan and R. 2015, / 41
7 The score function and the Stein kernel Outline 1 Stein s method 2 The score function and the Stein kernel 3 Higher dimensions 4 Stein operators T p F = div(f p)/p 5 Last remarks 7 / 41
8 The score function and the Stein kernel A Stein operator for continuous real-valued variables Let X be continuous having pdf p with support I = [a, b] R. The Stein class of X is the class F(p) of functions f : R R such that (i) x f (x)p(x) is differentiable on R (ii) (fp) is integrable and (fp) = 0. To p associate the Stein operator T p : T p f = (fp) p. (Stein 1986, Stein et al. 2004, Ley and Swan 2013) By the product rule, E [ g (X )f (X ) ] = E [g(x )T p f (X )] for all f F(p) and for all differentiable functions g such that (gfp) dx = 0, and g fp dx < ; we say that g dom(( ), p, f ). 8 / 41
9 The score function and the Stein kernel Stein characterisations Let Y be continuous with density q, and same support as X. 1 Suppose that q p is differentiable. Take g f F(p)dom(( ), p, f ) such that g is p-a.s. never 0 and g q p is differentiable. Then Y D = X if and only if E [ f (Y )g (Y ) ] = E [g(y )T p f (Y )] for all f F(p). 2 Let f F(p) be p-a.s. never zero and assume that dom(( ), p, f ) is dense in L 1 (p). Then Y D = X if and only if E [ f (Y )g (Y ) ] = E [g(y )T p f (Y )] for all g dom(( ), p, f ). 9 / 41
10 The score function and the Stein kernel The inverse Stein operator Let F (0) (p) be the class of mean zero smooth test functions; the inverse Stein operator Tp 1 : F (0) (p) F(p) is The equation Tp 1 h(x) = 1 x p(y)h(y)dy = 1 b p(y)h(y)dy. p(x) a p(x) x h(x) Eh(X ) = f (x)g (x) + g(x)t p f (x), x I, is a Stein equation for the target p. Solutions of this equation (for h such that a solution exists) are pairs of functions (f, g) such that fg = Tp 1 (h E p h). Although fg is unique, the individual f and g are not. 10 / 41
11 The score function and the Stein kernel f (x)g (x) + g(x)t p f (x): Special Stein operators Our general Stein operator is an operator on pairs of functions (f, g); A(f, g)(x) = T p (fg)(x) = f (x)g (x) + g(x)t p f (x). Suppose that 1 F(p). Then taking f (x) = 1 we get A p g(x) = g (x) + g(x)ρ(x) with ρ(x) = T p 1(x) = p (x) p(x) the so-called score function of p; see for example Stein (2004). If X has finite mean ν taking f (x) = Tp 1 (ν x) we get A X g(x) = τ(x)g (x) + (ν x)g(x) with τ = Tp 1 (ν Id) the Stein kernel of p ; see Stein (1986) and Cacoullos et al. (1992). 11 / 41
12 The score function and the Stein kernel Example: Normal In the example of a N (0, σ 2 ) random variable, which contrasts with T N f (x) = f (x) + 1 xf (x) σ2 σ 2 f (x) xf (x), the standard Stein operator for this case. The score function is x σ 2. The Stein kernel is τ(x) = σ 2 giving the standard Stein operator. 12 / 41
13 Higher dimensions Outline 1 Stein s method 2 The score function and the Stein kernel 3 Higher dimensions 4 Stein operators T p F = div(f p)/p 5 Last remarks 13 / 41
14 Higher dimensions Notation Let e 1,..., e d be the canonical basis for Cartesian coordinates in R d. ( ) T The gradient for φ : R d R is φ = φ x 1,..., φ x d = d i=1 ( iφ)e i. The gradient of a vector field v : R d R r : x (v 1 (x), v 2 (x),..., v r (x)) (a line vector) is the matrix v = ( ( ) ) vj v 1 v 2 v r =. x i If r = d then the divergence of v is 1 i d,1 j r div(v) = v T = d i=1 v i x i = Tr ( v), with Tr the trace operator and x y = x T y = x, y the Euclidean scalar product between x and y. 14 / 41
15 Higher dimensions More generally, the divergence of a q d tensor field F 1 (x) F 11 (x)... F 1d (x) F : R d R q R d : x F(x) =. =..... F q (x) F q1 (x)... F qd (x) is d div(f 1 ) F T 1 i=1 div(f) := F =. =. =. div(f q ) F T q d i=1 F 1i x i F qi x i. The divergence maps matrix-valued functions F : R d R q R d onto vector valued functions div(f) : R d R q. 15 / 41
16 Higher dimensions Product rule for divergence Let F : R d R q R d be a q d tensor field and φ : R d R. Then, under appropriate regularity conditions, div(fφ) = div(f) φ + F φ. Similarly if F is a q d tensor field and G is a d d tensor field then FG is a q d vector field and for j = 1,..., q. (div (FG)) j = F j div(g) + Tr (grad (F j ) G) 16 / 41
17 Higher dimensions What is known: multivariate normal Y R d is a multivariate normal MVN (0, Σ) if and only if EY t f (Y ) = E t Σ f (Y ), for all smooth f : R d R. Assume that h : R d R has 3 bounded derivatives. Then, if Σ R d d is symmetric and positive definite, and Z MVN (0, Σ), there is a solution f : R d R to the Stein equation t Σ f (w) w t f (w) = h(w) Eh(Σ 1/2 Z), which holds for every w R d. 17 / 41
18 Higher dimensions The Mehler formula To solve t Σ f (w) w t f (w) = h(w) Eh(Σ 1/2 Z), t [0, 1] put Z w,t = tw + 1 t Σ 1/2 Z, then f (w) = t [Eg(Z w,t) Eg(Σ 1/2 Z)]dt w R d, for is a solution to the Stein equation. This solution f satisfies the bounds k f (w) k j=1 w 1 k h(w) i j k k j=1 w i j for every w R d. (Barbour 1990, Götze 1993, Rinott and Rotar 1996, Goldstein and Rinott 1996, R. + Röllin 2007, Meckes 2009, Chen, Goldstein and Shao 2011) 18 / 41
19 Higher dimensions What is known: strictly log-concave (Mackey and Gorham 2016) For continuous p on R d, such that log p C 4 (R d ) is k-strictly concave, the operator Af (w) = 1 2 f (w), log p(w) + 1 f (w) 2 is the generator of an overdamped Langevin diffusion. The Stein equation is solved by f (w) = Af (w) = h(w) E p h 0 [E p h(z) Eh(Z w,t )]dt with (Z w,t ) t 0 the overdamped Langevin diffusion with generator A and Z w,0 = w. The first three derivatives of f can be bounded in terms of same and lower order derivatives of h. 19 / 41
20 Higher dimensions What is known: Score functions (Nourdin et al. 2013, 2014) Let X R d have mean 0 and p.d.f. p : R d R. The score of p is the random vector ρ p (X ) inr d which satisfies Eρ p (X )φ(x ) = E φ(x ) for all φ C c (R d ). If p has a score, then it is uniquely defined through ρ p (x) = log p(x). 20 / 41
21 Higher dimensions What is known: Stein kernels (Nourdin et al. 2013, 2014) A random d d matrix τ p (X ) such that Eτ p (X ) φ(x ) = EX φ(x ) for all φ C c (R d ) is called a strong Stein kernel for p. Ledoux et al. 2015: τ p (X ) is a weak Stein kernel if for all φ C c (R d ) ETr(τ p (X ){Hess(φ(X ))} T ) = EX φ(x ). There is no reason to assume uniqueness for the Stein kernel, or existence. If τ 1 and τ 2 are two Stein kernels for p, then for all φ C c (R d ), E(τ 1 (X ) τ 2 (X )) φ(x ) = 0; then div(p(x)(τ 1 (x) τ 2 (x)) = 0 from which we get uniqueness only in the one-dimensional case. 21 / 41
22 Stein operators T pf = div(f p)/p Outline 1 Stein s method 2 The score function and the Stein kernel 3 Higher dimensions 4 Stein operators T p F = div(f p)/p 5 Last remarks 22 / 41
23 Stein operators T pf = div(f p)/p The general multivariate density case Let X R d have pdf p : R d R with respect to the Lebesgue measure on R d. Let Ω be the support of p. 1 Let q N 0. The q-stein class for X is the class F q (X ) of all F : R d R q R d such that pf is (i) differentiable in the sense that its gradient exists, (ii) div(pf) is integrable, on Ω (iii) Ω div(pf) = 0. 2 We propose as Stein operator of p the operator T p F = div(f p) p acting on test functions F : R d R q R d F q (X ). If F F q (X ) then T X F : R d R q. 23 / 41
24 Stein operators T pf = div(f p)/p Stein type integration by parts To each F : R d R q R d F q (p) we associate dom(, p, F) the vector space of functions g : R d R such that F g F q (p) and F g L 1 (p). Proposition: E p [F g] = E p [(T X F) g] for all F F q (p) and all g dom(, p, F). Proof: Apply the product rule for divergence, div(fφ) = div(f) φ + F φ, to (Fφ)p with φ = g, to show that for T p F = div(f p) p, T p (F g) = (T p F) g + F g, and then take expectations, using that Ω div(fgp) = 0 and hence the l.h.s has mean / 41
25 Stein operators T pf = div(f p)/p Stein operators As in the 1-dimensional case, our Stein operators depend on two test functions, F and g, and are of the form obtained by T p (F g) = (T p F) g + F g either by fixing F and considering g as the (scalar-valued) test functions, or fixing g and considering F as the (matrix-valued) test functions. 25 / 41
26 Stein operators T pf = div(f p)/p T p (F g) = (T p F) g + F g: F = I d fixed Suppose that the identity matrix I d F d (p) (e.g. if p is log-concave and vanishes at Ω). Then T p I d = log p = ρ p, and the Stein operator is A p g : R d R d, A p g = T p (Ig) = g + ρ X g acting on g : R d R belonging to dom(, p, I d ). 26 / 41
27 Stein operators T pf = div(f p)/p T p (F g) = (T p F) g + F g: F = τ p fixed Let X have mean ν and suppose that there exists a d d matrix-valued function F = τ p (a Stein kernel) satisfying at all x. Then A p g : R d R d, T p (τ τ p )(x) = (x ν) A p g(x) = T p (τ τ p g)(x) = (x ν)g(x) + τ X (x) g(x) acting on differentiable functions g : R d R belonging to dom(, p, τ p ). 27 / 41
28 Stein operators T pf = div(f p)/p T p (F g) = (T p F) g + F g: g = 1 fixed For g : R d R, g(x) = 1 we obtain for F F q (p), A p F(x) = T p F(x) R q, vector-valued. The Stein equation for a zero mean function h : R d R q is then A p F(x) = div(fp) (x) = h(x) p which gives div(fp)(x) = p(x)h(x). There is not a unique solution. If q = d then we could choose a solution F such that F i,j = 0 for i j. 28 / 41
29 Stein operators T pf = div(f p)/p Special case: q = 1 Let v = (v 1,..., v n ) : R d R d be a vector field in the 0-Stein class for p : R d R. Then our Stein operator of p is T p v = = ( v)p + v p p d v i d i p + v i x i p. i=1 i=1 This is a function from R d to R. Take as vector field v = f for a smooth function f : R d R. This choice gives A p (f ) = T p v = f + log p, f, interpreted as operator on f rather than v. This is the operator considered by Mackey and Gorham 2016, except for a factor / 41
30 Stein operators T pf = div(f p)/p T p (F g) = (T p F) g + F g: g = p 1 fixed For g : R d R, g(x) = 1/p(x) we obtain for F F q (p), A p F = div(fp) p 2 + F (1/p) R q, vector-valued. The Stein equation for a zero mean function h : R d R q is then div(fp) p 2 (x) + F (1/p)(x) = h(x) which gives div(f)(x) = p(x)h(x). Again there is not a unique solution. If q = d then we could choose a solution F such that F i,j = 0 for i j. 30 / 41
31 Stein operators T pf = div(f p)/p Example: multivariate normal Consider Z MVN d (0, Σ). Then ρ p (x) = Σ 1 x and τ p (z) = Σ. (linear score and constant Stein kernel). These lead to the Stein operator for g : R d R A p g(x) = Σ g(x) g(x)x. 31 / 41
32 Stein operators T pf = div(f p)/p Example: elliptical distributions A d-random vector has multivariate elliptical distribution E d (µ, Σ, φ) if its density is given by ( ) 1 p(x) = κ Σ 1/2 φ 2 (x µ)t Σ 1 (x µ) on R d, for φ a smooth function and κ the normalising constant. Elliptical distributions have the score function and ρ p (x) = Σ 1 x φ (x t Σ 1 x/2) φ(x t Σ 1 x/2), ( 1 τ p (x) = φ(x t Σ 1 x/2) + x t Σ 1 x/2 φ(u)du is a strong Stein kernel for p (Landsman, Vanduffel, Yao 2014). ) Σ 32 / 41
33 Stein operators T pf = div(f p)/p Bounds on the solution of the Stein equation So we have Stein equations, but when are the solutions well behaved? In the multivariate normal case: Mehler formula. In the case of strictly log-concave distributions: overdamped Langevin diffusion. The bounds will be distribution-specific. 33 / 41
34 Stein operators T pf = div(f p)/p Bounds using a Poincaré constant We say that C p is a Poincaré constant associated to µ X if for every smooth function ϕ L 2 (µ X ) such that Eϕ(X ) = 0, Eϕ 2 (X ) C p E ϕ(x ) 2. For example, when X has k-log-concave density, then the law of X satisfies a Poincaré inequality with C p = 1/k. Using the Lax-Milgram theorem we can show the following result. Let h be a smooth, 1-Lipschitz function. Let X be a random vector with density p, and assume C p < is a Poincaré constant for p(x)dx. Then we prove that there exists a weak solution u to u + log p u = h p(h), such that u 2 p C 2 p. 34 / 41
35 Stein operators T pf = div(f p)/p Application: nested densities The Wasserstein distance between (the distributions of) X and Y is d W (X, Y ) = sup Eh(X ) Eh(Y ). h Lip(1) Compare the Wasserstein distance between P 1 and P 2 on R d, with densities p 1, assumed k-log concave, and p 2 = π 0 p 1. Put A 1 u = 1 2 log p 1 u u, and Then A 2 u = 1 2 log p 2 u u. A 2 u = A 1 u log π 0 u. 35 / 41
36 Stein operators T pf = div(f p)/p Let h : R d R be a 1-Lipschitz function, and u h a solution to A 1 u h = h hp 1. Let X 1 (X 2 ) have distribution P 1 (P 2 ). Then as A 2 u = A 1 u log π 0 u, E[h(X 2 )] E[h(X 1 )] = E[A 1 u h (X 2 )] [ = E A 2 u h (X 2 ) 1 ] 2 log π 0(X 2 ) u h (X 2 ) Using the Poincaré bounds we obtain = 1 2 E [ log π 0(X 2 ) u h (X 2 )]. d W (X 1, X 2 ) 1 k E[ π 0(X 1 ) ]. 36 / 41
37 Stein operators T pf = div(f p)/p Example: Copulas Let (V 1, V 2 ) be a 2-dimensional random vector, such that the marginals V 1 and V 2 have uniform U[0, 1] distribution. The copula of (V 1, V 2 ) is C(x 1, x 2 ) = P[V 1 x 1, V 2 x 2 ], (x 1, x 2 ) [0, 1] 2 and we assume that c = 2 x 1 x 2 C exists. Let (U 1, U 2 ) be independent U[0, 1]. The copula of (U 1, U 2 ) is (x 1, x 2 ) x 1 x 2. Payne 1960: an optimal Poincaré constant for U[0, 1] 2 is C p = 2/π 2. Now we can show: d W [(V 1, V 2 ), (U 1, U 2 )] 2 π 2 [0,1] 2 c(x 1, x 2 ) 2 dx 1 dx / 41
38 Stein operators T pf = div(f p)/p Example: the effect of the prior on the posterior Consider a normal model with mean θ R d and positive definite covariance matrix Σ. The likelihood of θ given a sample (x 1,..., x n ) is ( ) (2π) nd/2 det(σ) n/2 exp 1 n (x i θ) T Σ 1 (x i θ). 2 We want to compare the posterior distribution P 1 = N ( x, n 1 Σ) of θ with uniform prior with the posterior P 2 with normal prior with parameters (µ, Σ 2 ); Σ 2 is assumed positive definite. i=1 38 / 41
39 Stein operators T pf = div(f p)/p The operator norm of a matrix A is A = sup x =1 Ax. The nomal density p 1 is n/ Σ -log concave. Moreover P 2 = N ( µ, Σ n ) with After some calculation we find µ = µ + n Σ n Σ 1 ( x µ) Σ n = (Σ nσ 1 ) 1. d W (P 1, P 2 ) Σ (Σ + nσ 2 ) 1 x µ 2Γ(d/2 + 1/2) Σ + (Σ 2 + nσ 2 Σ 1 Σ 2 ) 1/2. Γ(d/2) n The closer x is to µ, the smaller the bound. The influence of Σ 2 vanishes as n. 39 / 41
40 Last remarks Outline 1 Stein s method 2 The score function and the Stein kernel 3 Higher dimensions 4 Stein operators T p F = div(f p)/p 5 Last remarks 40 / 41
41 Last remarks Last remarks Solving and bounding the Stein equation is crucial for applying the method. Our framework gives a large (indeed infinite) choice for Stein equations to choose from. The effect of the prior on the posterior will be studied in more detail. We are thinking about the multivariate discrete case, too. Note that Barbour et al gives an approximation by a discretised multivariate normal, using Markov process arguments. 41 / 41
Stein s Method: Distributional Approximation and Concentration of Measure
Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Stein s method for Distributional
More informationA Gentle Introduction to Stein s Method for Normal Approximation I
A Gentle Introduction to Stein s Method for Normal Approximation I Larry Goldstein University of Southern California Introduction to Stein s Method for Normal Approximation 1. Much activity since Stein
More informationNEW FUNCTIONAL INEQUALITIES
1 / 29 NEW FUNCTIONAL INEQUALITIES VIA STEIN S METHOD Giovanni Peccati (Luxembourg University) IMA, Minneapolis: April 28, 2015 2 / 29 INTRODUCTION Based on two joint works: (1) Nourdin, Peccati and Swan
More informationStein s method, logarithmic Sobolev and transport inequalities
Stein s method, logarithmic Sobolev and transport inequalities M. Ledoux University of Toulouse, France and Institut Universitaire de France Stein s method, logarithmic Sobolev and transport inequalities
More informationStein s Method and the Zero Bias Transformation with Application to Simple Random Sampling
Stein s Method and the Zero Bias Transformation with Application to Simple Random Sampling Larry Goldstein and Gesine Reinert November 8, 001 Abstract Let W be a random variable with mean zero and variance
More informationA Short Introduction to Stein s Method
A Short Introduction to Stein s Method Gesine Reinert Department of Statistics University of Oxford 1 Overview Lecture 1: focusses mainly on normal approximation Lecture 2: other approximations 2 1. The
More informationStein s method and zero bias transformation: Application to CDO pricing
Stein s method and zero bias transformation: Application to CDO pricing ESILV and Ecole Polytechnique Joint work with N. El Karoui Introduction CDO a portfolio credit derivative containing 100 underlying
More informationWasserstein-2 bounds in normal approximation under local dependence
Wasserstein- bounds in normal approximation under local dependence arxiv:1807.05741v1 [math.pr] 16 Jul 018 Xiao Fang The Chinese University of Hong Kong Abstract: We obtain a general bound for the Wasserstein-
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationNormal approximation of Poisson functionals in Kolmogorov distance
Normal approximation of Poisson functionals in Kolmogorov distance Matthias Schulte Abstract Peccati, Solè, Taqqu, and Utzet recently combined Stein s method and Malliavin calculus to obtain a bound for
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationMultivariate approximation in total variation
Multivariate approximation in total variation A. D. Barbour, M. J. Luczak and A. Xia Department of Mathematics and Statistics The University of Melbourne, VIC 3010 29 May, 2015 [Slide 1] Background Information
More informationMATH 220: INNER PRODUCT SPACES, SYMMETRIC OPERATORS, ORTHOGONALITY
MATH 22: INNER PRODUCT SPACES, SYMMETRIC OPERATORS, ORTHOGONALITY When discussing separation of variables, we noted that at the last step we need to express the inhomogeneous initial or boundary data as
More informationStein s Method: Distributional Approximation and Concentration of Measure
Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Concentration of Measure Distributional
More informationStein s Method Applied to Some Statistical Problems
Stein s Method Applied to Some Statistical Problems Jay Bartroff Borchard Colloquium 2017 Jay Bartroff (USC) Stein s for Stats 4.Jul.17 1 / 36 Outline of this talk 1. Stein s Method 2. Bounds to the normal
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationLecture 2: Convex functions
Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on
More informationKolmogorov Berry-Esseen bounds for binomial functionals
Kolmogorov Berry-Esseen bounds for binomial functionals Raphaël Lachièze-Rey, Univ. South California, Univ. Paris 5 René Descartes, Joint work with Giovanni Peccati, University of Luxembourg, Singapore
More informationStein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions
Stein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions Jim Dai Joint work with Anton Braverman and Jiekun Feng Cornell University Workshop on Congestion Games Institute
More informationKOLMOGOROV DISTANCE FOR MULTIVARIATE NORMAL APPROXIMATION. Yoon Tae Kim and Hyun Suk Park
Korean J. Math. 3 (015, No. 1, pp. 1 10 http://dx.doi.org/10.11568/kjm.015.3.1.1 KOLMOGOROV DISTANCE FOR MULTIVARIATE NORMAL APPROXIMATION Yoon Tae Kim and Hyun Suk Park Abstract. This paper concerns the
More informationCALCULUS JIA-MING (FRANK) LIOU
CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion
More informationConvex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)
Convex Functions Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK) Course on Convex Optimization for Wireless Comm. and Signal Proc. Jointly taught by Daniel P. Palomar and Wing-Kin (Ken) Ma
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationStein s method and stochastic analysis of Rademacher functionals
E l e c t r o n i c J o u r n a l o f P r o b a b i l i t y Vol. 15 (010), Paper no. 55, pages 1703 174. Journal URL http://www.math.washington.edu/~ejpecp/ Stein s method and stochastic analysis of Rademacher
More information{σ x >t}p x. (σ x >t)=e at.
3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ
More informationCalculus in Gauss Space
Calculus in Gauss Space 1. The Gradient Operator The -dimensional Lebesgue space is the measurable space (E (E )) where E =[0 1) or E = R endowed with the Lebesgue measure, and the calculus of functions
More informationAnalytic Geometry and Calculus I Exam 1 Practice Problems Solutions 2/19/7
Analytic Geometry and Calculus I Exam 1 Practice Problems Solutions /19/7 Question 1 Write the following as an integer: log 4 (9)+log (5) We have: log 4 (9)+log (5) = ( log 4 (9)) ( log (5)) = 5 ( log
More informationConcentration inequalities for non-lipschitz functions
Concentration inequalities for non-lipschitz functions University of Warsaw Berkeley, October 1, 2013 joint work with Radosław Adamczak (University of Warsaw) Gaussian concentration (Sudakov-Tsirelson,
More informationReflected Brownian Motion
Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide
More informationChapter 4. Inverse Function Theorem. 4.1 The Inverse Function Theorem
Chapter 4 Inverse Function Theorem d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d dd d d d d This chapter
More informationarxiv: v2 [math.pr] 15 Nov 2016
STEIN S METHOD, MANY INTERACTING WORLDS AND QUANTUM MECHANICS Ian W. McKeague, Erol Peköz, and Yvik Swan Columbia University, Boston University, and Université de Liège arxiv:606.0668v2 [math.pr] 5 Nov
More informationStable Process. 2. Multivariate Stable Distributions. July, 2006
Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.
More informationSTEIN MEETS MALLIAVIN IN NORMAL APPROXIMATION. Louis H. Y. Chen National University of Singapore
STEIN MEETS MALLIAVIN IN NORMAL APPROXIMATION Louis H. Y. Chen National University of Singapore 215-5-6 Abstract Stein s method is a method of probability approximation which hinges on the solution of
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationON MEHLER S FORMULA. Giovanni Peccati (Luxembourg University) Conférence Géométrie Stochastique Nantes April 7, 2016
1 / 22 ON MEHLER S FORMULA Giovanni Peccati (Luxembourg University) Conférence Géométrie Stochastique Nantes April 7, 2016 2 / 22 OVERVIEW ı I will discuss two joint works: Last, Peccati and Schulte (PTRF,
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationEuler Equations: local existence
Euler Equations: local existence Mat 529, Lesson 2. 1 Active scalars formulation We start with a lemma. Lemma 1. Assume that w is a magnetization variable, i.e. t w + u w + ( u) w = 0. If u = Pw then u
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationLecture No 1 Introduction to Diffusion equations The heat equat
Lecture No 1 Introduction to Diffusion equations The heat equation Columbia University IAS summer program June, 2009 Outline of the lectures We will discuss some basic models of diffusion equations and
More informationNormal Approximation for Hierarchical Structures
Normal Approximation for Hierarchical Structures Larry Goldstein University of Southern California July 1, 2004 Abstract Given F : [a, b] k [a, b] and a non-constant X 0 with P (X 0 [a, b]) = 1, define
More informationParametric Stein operators and variance bounds
Parametric Stein operators and variance bounds Christophe Ley and Yvik Swan Universiteit Gent and Université de Liège Abstract Stein operators are differential and difference operators which arise within
More informationMathematical Analysis Outline. William G. Faris
Mathematical Analysis Outline William G. Faris January 8, 2007 2 Chapter 1 Metric spaces and continuous maps 1.1 Metric spaces A metric space is a set X together with a real distance function (x, x ) d(x,
More informationTHE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION
THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION JAINUL VAGHASIA Contents. Introduction. Notations 3. Background in Probability Theory 3.. Expectation and Variance 3.. Convergence
More informationMultivariate Distribution Models
Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is
More informationVector Derivatives and the Gradient
ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego
More informationCONVERGENCE THEORY. G. ALLAIRE CMAP, Ecole Polytechnique. 1. Maximum principle. 2. Oscillating test function. 3. Two-scale convergence
1 CONVERGENCE THEOR G. ALLAIRE CMAP, Ecole Polytechnique 1. Maximum principle 2. Oscillating test function 3. Two-scale convergence 4. Application to homogenization 5. General theory H-convergence) 6.
More informationCHAPTER 5. Jointly Probability Mass Function for Two Discrete Distributed Random Variables:
CHAPTER 5 Jointl Distributed Random Variable There are some situations that experiment contains more than one variable and researcher interested in to stud joint behavior of several variables at the same
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationMultivariable Calculus
2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)
More informationExponential tail inequalities for eigenvalues of random matrices
Exponential tail inequalities for eigenvalues of random matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify
More informationMalliavin calculus and central limit theorems
Malliavin calculus and central limit theorems David Nualart Department of Mathematics Kansas University Seminar on Stochastic Processes 2017 University of Virginia March 8-11 2017 David Nualart (Kansas
More informationBefore you begin read these instructions carefully.
MATHEMATICAL TRIPOS Part IB Thursday, 6 June, 2013 9:00 am to 12:00 pm PAPER 3 Before you begin read these instructions carefully. Each question in Section II carries twice the number of marks of each
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationBounded uniformly continuous functions
Bounded uniformly continuous functions Objectives. To study the basic properties of the C -algebra of the bounded uniformly continuous functions on some metric space. Requirements. Basic concepts of analysis:
More informationContinuous Functions on Metric Spaces
Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0
More informationLaplace s Equation. Chapter Mean Value Formulas
Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic
More informationThe Stein and Chen-Stein Methods for Functionals of Non-Symmetric Bernoulli Processes
ALEA, Lat. Am. J. Probab. Math. Stat. 12 (1), 309 356 (2015) The Stein Chen-Stein Methods for Functionals of Non-Symmetric Bernoulli Processes Nicolas Privault Giovanni Luca Torrisi Division of Mathematical
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationTHE L 2 -HODGE THEORY AND REPRESENTATION ON R n
THE L 2 -HODGE THEORY AND REPRESENTATION ON R n BAISHENG YAN Abstract. We present an elementary L 2 -Hodge theory on whole R n based on the minimization principle of the calculus of variations and some
More informationApproximation of fluid-structure interaction problems with Lagrange multiplier
Approximation of fluid-structure interaction problems with Lagrange multiplier Daniele Boffi Dipartimento di Matematica F. Casorati, Università di Pavia http://www-dimat.unipv.it/boffi May 30, 2016 Outline
More informationZ-estimators (generalized method of moments)
Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationStein s Method and Stochastic Geometry
1 / 39 Stein s Method and Stochastic Geometry Giovanni Peccati (Luxembourg University) Firenze 16 marzo 2018 2 / 39 INTRODUCTION Stein s method, as devised by Charles Stein at the end of the 60s, is a
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationLecture 12: Detailed balance and Eigenfunction methods
Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility), 4.2-4.4 (explicit examples of eigenfunction methods) Gardiner
More informationRandom Variables. P(x) = P[X(e)] = P(e). (1)
Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment
More informationCalculation of Bayes Premium for Conditional Elliptical Risks
1 Calculation of Bayes Premium for Conditional Elliptical Risks Alfred Kume 1 and Enkelejd Hashorva University of Kent & University of Lausanne February 1, 13 Abstract: In this paper we discuss the calculation
More informationLecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write
Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More informationSeparation of Variables in Linear PDE: One-Dimensional Problems
Separation of Variables in Linear PDE: One-Dimensional Problems Now we apply the theory of Hilbert spaces to linear differential equations with partial derivatives (PDE). We start with a particular example,
More informationAn inverse source problem in optical molecular imaging
An inverse source problem in optical molecular imaging Plamen Stefanov 1 Gunther Uhlmann 2 1 2 University of Washington Formulation Direct Problem Singular Operators Inverse Problem Proof Conclusion Figure:
More informationA New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction
A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationLecture 7. 1 Notations. Tel Aviv University Spring 2011
Random Walks and Brownian Motion Tel Aviv University Spring 2011 Lecture date: Apr 11, 2011 Lecture 7 Instructor: Ron Peled Scribe: Yoav Ram The following lecture (and the next one) will be an introduction
More informationIntroduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems
p. 1/5 Introduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems p. 2/5 Time-varying Systems ẋ = f(t, x) f(t, x) is piecewise continuous in t and locally Lipschitz in x for all t
More information[2] (a) Develop and describe the piecewise linear Galerkin finite element approximation of,
269 C, Vese Practice problems [1] Write the differential equation u + u = f(x, y), (x, y) Ω u = 1 (x, y) Ω 1 n + u = x (x, y) Ω 2, Ω = {(x, y) x 2 + y 2 < 1}, Ω 1 = {(x, y) x 2 + y 2 = 1, x 0}, Ω 2 = {(x,
More informationROOT FINDING REVIEW MICHELLE FENG
ROOT FINDING REVIEW MICHELLE FENG 1.1. Bisection Method. 1. Root Finding Methods (1) Very naive approach based on the Intermediate Value Theorem (2) You need to be looking in an interval with only one
More informationQuasi-Monte Carlo Methods for Applications in Statistics
Quasi-Monte Carlo Methods for Applications in Statistics Weights for QMC in Statistics Vasile Sinescu (UNSW) Weights for QMC in Statistics MCQMC February 2012 1 / 24 Quasi-Monte Carlo Methods for Applications
More informationThe Stein and Chen-Stein methods for functionals of non-symmetric Bernoulli processes
The Stein and Chen-Stein methods for functionals of non-symmetric Bernoulli processes Nicolas Privault Giovanni Luca Torrisi Abstract Based on a new multiplication formula for discrete multiple stochastic
More informationChp 4. Expectation and Variance
Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.
More informationStein Couplings for Concentration of Measure
Stein Couplings for Concentration of Measure Jay Bartroff, Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] [arxiv:1402.6769] Borchard
More informationFerromagnets and the classical Heisenberg model. Kay Kirkpatrick, UIUC
Ferromagnets and the classical Heisenberg model Kay Kirkpatrick, UIUC Ferromagnets and the classical Heisenberg model: asymptotics for a mean-field phase transition Kay Kirkpatrick, Urbana-Champaign June
More informationMeasuring Sample Quality with Stein s Method
Measuring Sample Quality with Stein s Method Lester Mackey Joint work with Jackson Gorham, Andrew Duncan, Sebastian Vollmer Microsoft Research, Opendoor Labs, University of Sussex, University of Warwick
More informationChapter 6. Stein s method, Malliavin calculus, Dirichlet forms and the fourth moment theorem
November 20, 2014 18:45 BC: 9129 - Festschrift Masatoshi Fukushima chapter6 page 107 Chapter 6 Stein s method, Malliavin calculus, Dirichlet forms and the fourth moment theorem Louis H.Y. Chen and Guillaume
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationGaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula
Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Larry Goldstein, University of Southern California Nourdin GIoVAnNi Peccati Luxembourg University University British
More informationNormal approximation of geometric Poisson functionals
Institut für Stochastik Karlsruher Institut für Technologie Normal approximation of geometric Poisson functionals (Karlsruhe) joint work with Daniel Hug, Giovanni Peccati, Matthias Schulte presented at
More informationApplied Math Qualifying Exam 11 October Instructions: Work 2 out of 3 problems in each of the 3 parts for a total of 6 problems.
Printed Name: Signature: Applied Math Qualifying Exam 11 October 2014 Instructions: Work 2 out of 3 problems in each of the 3 parts for a total of 6 problems. 2 Part 1 (1) Let Ω be an open subset of R
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationDistance-Divergence Inequalities
Distance-Divergence Inequalities Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences Motivation To find a simple proof of the Blowing-up Lemma, proved by Ahlswede,
More informationDifferential Equations Preliminary Examination
Differential Equations Preliminary Examination Department of Mathematics University of Utah Salt Lake City, Utah 84112 August 2007 Instructions This examination consists of two parts, called Part A and
More informationMODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT
Statistica Sinica 23 (2013), 1523-1540 doi:http://dx.doi.org/10.5705/ss.2012.203s MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT Louis H. Y. Chen 1, Xiao Fang 1,2 and Qi-Man Shao 3 1 National
More informationLinear and non-linear programming
Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)
More information