On conditional moments of high-dimensional random vectors given lower-dimensional projections

Similar documents
Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Logarithmic spurious regressions

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

Least-Squares Regression on Sparse Spaces

Agmon Kolmogorov Inequalities on l 2 (Z d )

Euler equations for multiple integrals

Linear First-Order Equations

Some Examples. Uniform motion. Poisson processes on the real line

Acute sets in Euclidean spaces

Convergence of Random Walks

arxiv: v4 [math.pr] 27 Jul 2016

Tractability results for weighted Banach spaces of smooth functions

Introduction to the Vlasov-Poisson system

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

The Exact Form and General Integrating Factors

Self-normalized Martingale Tail Inequality

u!i = a T u = 0. Then S satisfies

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

PDE Notes, Lecture #11

LECTURE NOTES ON DVORETZKY S THEOREM

6 General properties of an autonomous system of two first order ODE

Function Spaces. 1 Hilbert Spaces

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

arxiv: v1 [math.mg] 10 Apr 2018

All s Well That Ends Well: Supplementary Proofs

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

arxiv: v4 [cs.ds] 7 Mar 2014

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

Monte Carlo Methods with Reduced Error

7.1 Support Vector Machine

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

Permanent vs. Determinant

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Lower bounds on Locality Sensitive Hashing

DIFFERENTIAL GEOMETRY, LECTURE 15, JULY 10

Topic 7: Convergence of Random Variables

Math 1B, lecture 8: Integration by parts

A simple model for the small-strain behaviour of soils

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Chapter 6: Energy-Momentum Tensors

Schrödinger s equation.

Lecture 5. Symmetric Shearer s Lemma

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Generalized Tractability for Multivariate Problems

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Separation of Variables

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

ANALYSIS OF A GENERAL FAMILY OF REGULARIZED NAVIER-STOKES AND MHD MODELS

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Math 1271 Solutions for Fall 2005 Final Exam

Notes on Lie Groups, Lie algebras, and the Exponentiation Map Mitchell Faulk

Robustness and Perturbations of Minimal Bases

Ramsey numbers of some bipartite graphs versus complete graphs

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Polynomial Inclusion Functions

Iterated Point-Line Configurations Grow Doubly-Exponentially

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION

Conservation Laws. Chapter Conservation of Energy

Jointly continuous distributions and the multivariate Normal

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

A LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Equilibrium in Queues Under Unknown Service Times and Service Value

Counting Lattice Points in Polytopes: The Ehrhart Theory

Table of Common Derivatives By David Abraham

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

Discrete Mathematics

THE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP

On the number of isolated eigenvalues of a pair of particles in a quantum wire

arxiv: v2 [math.ca] 8 Oct 2014

Sturm-Liouville Theory

Calculus of Variations

On a limit theorem for non-stationary branching processes.

3 The variational formulation of elliptic PDEs

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

Introduction to Markov Processes

Modular composition modulo triangular sets and applications

MEASURES WITH ZEROS IN THE INVERSE OF THEIR MOMENT MATRIX

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

Database-friendly Random Projections

Qubit channels that achieve capacity with two states

1. Aufgabenblatt zur Vorlesung Probability Theory

Entanglement is not very useful for estimating multiple phases

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

Quantum Mechanics in Three Dimensions

A Unified Theorem on SDP Rank Reduction

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS

How to Minimize Maximum Regret in Repeated Decision-Making

Transcription:

Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of Statistics, University of Vienna Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, Tel.: +43-1-4277-38620 E-mail: lukas.steinberger@univie.ac.at; hannes.leeb@univie.ac.at One of the most wiely use properties of the multivariate Gaussian istribution, besies its tail behavior, is the fact that conitional means are linear an that conitional variances are constant. We here show that this property is also share, in an approximate sense, by a large class of non-gaussian istributions. We allow for several conitioning variables an we provie explicit non-asymptotic results, whereby we exten earlier finings of Hall an Li [7] an Leeb [13]. Keywors: high imensional istribution, conitional moments, linear conitional mean, constant conitional variance. 1. Introuction 1.1. Informal summary The property of the multivariate Gaussian law, that conitional means are linear an that conitional variances are constant, is use by several funamental statistical methos, even if these methos per se o not require Gaussianity: the generic linear moel is built on the assumption that the conitional mean of the response is linear in the(conitioning) explanatory variables; an the generic homoskeastic linear moel rests on the aitional assumption that the conitional variance is constant. Linear conitional means an/or constant conitional variances are also assume, for example, by methos for sufficient imension reuction such as SIR [15] or SAVE [5], or by certain imputation techniques [18]. Elliptically contoure istributions are characterize by linear conitional means[6]. An methos for spatial statistics such as Kriging rely on Gaussianity mainly through the 1

2 Steinberger, Leeb property that conitional means are linear an that conitional variances are constant. 1 But even though these properties are wiely use, in a sense the only istribution that has both linear conitional means an constant conitional variances is the Gaussian (see also Section 1.2). In this paper, we show that conitional means are approximately linear an that conitional variances are approximately constant, for a large class of multivariate istributions, when the conitioning is on lower-imensional projections. To illustrate our results, consier a ranom -vector Z that has a Lebesgue ensity, an a p matrix B. Conitional on B Z, we show that the mean of Z is linear in B Z, an that the variance/covariance matrix of Z is constant, in an approximate sense. Typically, our approximation error bouns are small if is sufficiently large relative to p. Our results exten recent finings of [13], where the case p = 1 is consiere (which, from a moeling perspective, covers only moels with one explanatory variable). More precisely, we exten an refine the results of [13] in three irections: First, we allow for the case where p > 1, thereby also proving a result that is outline in [7, Sect. 5]. Secon, we erive non-asymptotic an explicit error bouns that hol for fixe, whereas [7] an [13] only give asymptotic results that hol as ; cf. Theorem 2.1. An thir, we also give asymptotic results where p is allowe to increase with ; see Corollary 2.4. In many cases, our error bouns go to zero if p/log 0. The rest of the paper is organize as follows: We continue this section with a more etaile escription of the results that we erive. Our main results are then state in Section 2. In Section 3 we provie a number of examples where the assumptions of our main theorem are satisfie an we iscuss further extensions of our work. Finally, Section 4 gives a high-level escription of the proof. The more technical low-level parts of the proof as well as the proofs of Section 3 are collecte in the supplementary material [21]. 1.2. Outline of results Consier a ranom -vector Z that has a Lebesgue ensity, an that is centere an stanarize so that EZ = 0 an EZZ = I. An take a p matrix B with orthonormal columns. [While we o rule out egenerate istributions, the requirement that Z is centere an stanarize, an the requirement that the columns of B are orthonormal, are inconsequential; cf. Remark 1.1 as well as Section 3.3.] Our objective is to show that the conitional mean an the conitional variance of Z given B Z are close to what they woul be if Z were Gaussian. In the following, we use the notation to enote the Eucliean norm of vectors an the spectral norm of matrices; the meaning of will always be clear from the context. 1 Distributions with linear conitional means an/or constant conitional variances are also stuie, for example, in [1, 4, 11, 17, 23, 24].

Conitional moments of high-imensional ranom vectors 3 Instea of the conitional mean an variance, it will be convenient to focus on the first two conitional moments, i.e., on E[Z B Z] an on E[ZZ B Z]. If both the expressions E[Z B Z] BB Z an E[ZZ B Z] (I BB +BB ZZ BB ) are equal to zero, then the conitional mean of Z given B Z is linear in B Z, an the corresponing conitional variance is constant in B Z. But the only istribution, which satisfies this for all B, is the Gaussian law; cf. the iscussion in [13, p. 466]. We will show that a weaker form of this requirement, namely that the expressions in the preceing isplay are close to zero in probability for most B, is satisfie by a much larger class of istributions, provie mainly that is sufficiently large relative to p. For the case where p = 1, it was shown in [13], for each t > 0, that ) E[Z B sup P( Z] BB Z > t an (1.1) B G ) E[ZZ P( B Z] (I BB +BB ZZ BB ) > t (1.2) sup B G converge to zero as, uner some conitions, where the sets G are collections of p matrices with orthonormal columns that become large as. More precisely, for ν,p ( ) enoting the uniform istribution on the set of all such matrices (i.e., the Haar measure on the Stiefel manifol V,p ), the sets G satisfy ν,p (G) 1 as. [Obviously, G epens on an also on p, although this epenence is not shown explicitly in our notation.] In the case where p = 1 covere in [13], the sets G are collections of unit-vectors, an ν,1 ( ) is the uniform istribution on the unit sphere, in R. We erive a non-asymptotic version of this result, i.e., explicit upper bouns on (1.1) an (1.2), an also on 1 ν,p (G), that hol for fixe an p, where we allow for p > 1; see Theorem 2.1. Moreover, we also provie an asymptotic result where our upper bouns go to zero as, where p may increase with ; cf. Corollary 2.4. In many cases, our upper bouns are small provie that p/log is small. Both our non-asymptotic an our asymptotic result, i.e., both Theorem 2.1 an Corollary 2.4, hol uniformly over classes of istributions for Z, as outline in Remark 2.2. Ofcourse,ourresultsrely onfurther conitionson the istribution ofz (in aition to the existence of a Lebesgue ensity an the requirements that EZ = 0 an EZZ = I ). In particular, we require that the mean of certain functions of Z, an of i.i.. copies of Z, is boune; see the bouns (b1).(a) an (b2), as well as the attening iscussion in Section 2. An we require that certain moments of Z are close to what they woul be in the Gaussian case; see (b1).(b-c). From a statistical perspective, we stress that our results rely on bouns that can be estimate from appropriate ata, as outline in the iscussion leaing up to Theorem 2.1. One particularly simple example, where these bouns hol, an where the error bouns in Theorem 2.1 get small as gets large, is the case where the components of Z are inepenent, with boune marginal ensities an boune marginal moments of sufficiently large orer; see Example 3.1. Finally, we emphasize that (b1) an (b2) o not require that the components of Z are inepenent.

4 Steinberger, Leeb The results in this paper emonstrate that the requirement of linear conitional means an constant conitional variances (which is quite restrictive as iscusse in the secon paragraph of this subsection) is actually satisfie, in an approximate sense, by a rather large class of istributions. Some implications, namely to sparse linear moeling, an to sufficient imension reuction methos like SIR or SAVE, are iscusse in [13, Sect. 1.4]. An while the iscussion in [13] is hampere by the fact that only situations with p = 1, i.e., only moels with one explanatory variable, are covere in that paper, our results show that these consierations exten also to the case where p > 1, i.e., to more complex moels with several explanatory variables. Remark 1.1. (i) Our requirements, that the ranom -vector Z is centere an stanarize, an that the matrix B has orthonormal columns, are inconsequential in the following sense: Consier a ranom -vector Y such that E[Y] = µ an Var[Y] = Σ are both well-efine an finite, an such that Y has a Lebesgue ensity (which also entails that Σ is invertible). Moreover, consier a p matrix A with linearly inepenent columns. If Y were Gaussian, we woul have E[Y A Y] = µ+σa(a ΣA) 1 A (Y µ). In general, one easily verifies that E[Y A Y] (µ+σa(a ΣA) 1 A (Y µ)) Σ 1/2 E[Z B Z] BB Z hols for Z = Σ 1/2 (Y µ) an B = Σ 1/2 A(A ΣA) 1/2. Note that Z has a Lebesgue ensity; that Z is centere an stanarize so that E[Z] = 0 an E[ZZ ] = I ; an that the columns of B are orthonormal. In particular, we see that the conitional mean of Y given A Y is approximately linear if the same is true for the conitional mean of Z given B Z, provie only that the largest eigenvalue of Σ is not too large. A similar consieration applies, mutatis mutanis, to the conitional variance of Y given A Y an that of Z given B Z. For further etails, in particular about the role of Σ, see Section 3.3. (ii) Conitioning on B Z is equivalent to conitioning on BB Z, which is the orthogonal projection of Z onto the column space of B. Therefore, we coul formulate Theorem 2.1 for collections of p-imensional subspaces S of R (elements of the Grassmann manifol G,p ) instea of matrices B (from the Stiefel manifol V,p ), an thus replace (1.1) by sup S H ) E[Z PS P( Z] P S Z > t, with P S enoting the orthogonal projection matrix for the subspace S. Here, H enotes the image of the set G V,p from (1.1) uner the mapping that maps a matrix B into its column space S. Note that the image of the Haar measure on V,p uner this mapping is the Haar measure on G,p ; see also [3, Theorem 2.2.2(iii)]. In a similar manner, one can also write (1.2) in terms of the Grassmann manifol, namely as sup S H ) E[ZZ P( P S Z] (I P S +P S Z(P S Z) ) > t.

Conitional moments of high-imensional ranom vectors 5 2. Results We first present our main non-asymptotic result, i.e., Theorem 2.1, an the bouns (b1) an (b2) that it relies on. These bouns epen on a constant k that will be chosen as neee later. In Corollary 2.4 an the attening iscussion, we then present asymptotic scenarios in which the constants in (b1) an (b2) can be controlle, such that the error bouns in Theorem 2.1 become small. Throughout the following, consier a ranom - vector Z that has a Lebesgue ensity an that satisfies EZ = 0 as well as EZZ = I. For k N, write Z 1,...,Z k for i.i.. copies of Z, an write S k for the k k Gram-matrix S k = (Z i Z j/) k i,j=1. For g 0, a monomial of egree g in the elements of S k I k is an expression of the form G = g l=1 (S k I k ) il,j l for (i l,j l ) {1,...,k} 2, 1 l g (with the convention that G = 1 in case g = 0). We say that G has a linear (resp. quaratic) factor if one of the pairs, say (i 1,j 1 ), occurs exactly once (resp. twice). (b1) Fix k N. (a) There are constants ε [0,1/2] an α 1 so that E (S k I k ) 2k+1+ε α. (b) There are constants β > 0 an ξ (0,1/2] that satisfy the following: For any monomial G = G(S k I k ) in the elements of S k I k, whose egree g satisfies g 2k, we have g/2 EG 1 β/ ξ if G consists only of quaratic factors in elements above the iagonal, an g/2 EG β/ ξ if G contains a linear factor. (c) The constants β an ξ in (b) also satisfy the following: Consier two monomials G = G(S k I k )anh = H(S k I k )ofegreeg anh,respectively,intheelements ofs k I k.ifgisgivenbyz 1 Z 2Z 2 Z 3...Z g 1 Z gz g Z 1/ g,ifh = h l=1 (S k I k ) il,j l with {1,...,g} {i 1,j 1,...,i h,j h }, an if 2 h < g k, then g EGH β/ ξ. (b2) For fixe k N, there is a constant D 1, such that the following hols true: If R is an orthogonal matrix, then a marginal ensity of the first k +1 components of RZ is boune by ( k 1) 1/2D k+1. The bouns in (b1) an (b2) essentially guarantee that moments of certain functions of the Gram matrix S k are either boune (in (b1).(a) an (b2)) or not too ifferent from what they woul be if Z weregaussian(in (b1).(b-c)). We will impose (b1) an (b2) with k = 2 when consiering conitional means, an with k = 4 when consiering conitional variances. Clearly, (b1) becomes stronger as k increases. The specific requirements in (b1) are minimal for our current metho of proof an the boun in (b2) is chosen in such a way that certain constants γ 1 an γ 2 appearing in Theorem 2.1 o not epen on the imension. Other methos of proof, if such can be foun, may rely on ifferent conitions. For further iscussion an specific examples where our conitions apply, see Section 3.1. The bouns in (b1) are non-asymptotic versions of conition (t1) in [13], an the boun in (b2) coincies with conition (t2) in that reference. The bouns in (b1).(b-c) are written as β/ ξ, because in Corollary 2.4 we will consier situations where these

6 Steinberger, Leeb bouns hol for constants β an ξ that either are both inepenent of, or that are such that β is inepenent of while ξ epens on so that 1/ ξ 0. In (b2), note that the upper boun on the marginal ensities can increase in. The boun in (b2) appears to be qualitatively ifferent from (b1) in that it oes not irectly impose restrictions on moments involving the stanarize Gram matrix S k I k. However, (b2) is use only to boun the p-th moment of ets 4(k+1) l for l = 1,...,k; cf. Lemma E.5 an the proof of Proposition 4.4 in Appenix E of the supplement. Just like Conition (b1), the requirement of a uniform boun on max l k EetS 4p(k+1) l, becomes more restrictive if k increases. From a statistical perspective, we note that the moment-bouns iscusse here can be estimate from a sample of inepenent copies of Z. Inee, population means like E S k I k 2k+1+ε, EG, EGH, or EetS 4p(k+1) l as above are reaily estimate by appropriate sample means. In this sense, we rely on bouns that can be estimate from ata. Theorem 2.1. For fixe, consier a ranom -vector Z that has a Lebesgue ensity f Z an that is stanarize such that EZ = 0 an EZZ = I. (i) Suppose that (b1).(a-b) an (b2) hol with k = 2. Then, for each p < an for each τ (0,1), there is a Borel set G V,p such that (1.1) is boune by for each t > 0, an such that 1 + γ 1 p t τξ1 1 τ 3ξ 1 log (2.1) ( ) ν,p (G c ) κ 1 τξ1 1 γ 1 p τ ξ 1 log, (2.2) where ξ 1 is given by ξ 1 = min{ξ,ε/2+1/4,1/2}/3 an γ 1 = max{g 1,6+2log(2D πe)}. Here, the constant κ 1 epens only on α an β an g 1 is a global constant. (ii) Suppose that (b1).(a-c) an (b2) hol with k = 4. Then, for each p < an for each τ (0,1), there is a Borel set G V,p so that both (1.1) an (1.2) are boune by for each t > 0, an such that 1 + γ 2 p t τξ2 1 τ 5ξ 2 log (2.3) ( ) ν,p (G c ) 2κ 2 τξ2 1 γ 2 p τ ξ 2 log. (2.4) Here, ξ 2 is given by ξ 2 = min{ξ,ε/2+1/4,1/2}/5an γ 2 = max{g 2,10+4log(2D πe)}. The constant κ 2 epens only on α an β an g 2 is a global constant. (iii) The set G in both parts (i) an (ii) can be chosen to have the following aitional properties: G is right-invariant uner the action of the orthogonal group of orer p an it is left orthogonally equivariant, i.e., G = G(f Z ) epens on the istribution of Z in such a way that G(f RZ ) = RG(f Z ), for every orthogonal matrix R.

Conitional moments of high-imensional ranom vectors 7 The constants g 1, g 2, κ 1 an κ 2 in part (i) an (ii) can be obtaine explicitly upon etaile inspection of the proof. With Theorem 2.1, we aime to obtain the best possible upper bouns for (1.1), (1.2) an ν,p (G c ) that our current technique of proof elivers. It is likely that better bouns can be obtaine uner stronger assumptions (like in the case where the components of Z are inepenent) together with an alternative metho of proof. In particular, when bouning (1.1) in Theorem 2.1(i), the term γ1 1 τ p 3ξ 1 log is obtaine by bouning P( B Z 2 > (1 τ)3ξ 1 log()/γ 1 ) using Chebyshev s inequality; cf. the proof of Lemma B.2 in the supplement. Uner appropriate aitional assumptions on the tails of B Z, this boun can be ramatically improve. The boun on both (1.1) an (1.2) in Theorem 2.1(ii) can be improve in a similar fashion (cf. Section 3.2). When proving Theorem 2.1, we erive upper bouns for (1.1) an (1.2), on the one han, an for ν,p (G c ), on the other han, that are antagonistic in the sense that one can be reuce at the expense of the other (namely in the proof of Lemma B.2). For Theorem 2.1, we have balance these bouns so that both are of the same leaing orer in, i.e., τξ1 in part (i) an τξ2 in part (ii). Remark 2.2. Because the error bouns in Theorem 2.1 epen on Z only through the constants that occur in (b1) an (b2), the theorem a fortiori hols uniformly over the class of all istributions for Z that satisfy (b1) an (b2). For example: Fix constants ε, α, β an ξ as in (b1), fix D as in (b2), an write Z for the class of all ranom -vectors Z that satisfy the bouns (b1).(a-c) an (b2) for k = 4, that have a Lebesgue ensity, an that are centere an stanarize. Then, for each Z Z an for each p <, there exits a Borel set G V,p (that epens on Z), so that (1.1), (1.2) an also ν,p (G c ) are boune as in Theorem 2.1(ii). Similar consierations also apply, mutatis mutanis, to Theorem 2.1(i) an to the following corollary. Remark 2.3. Our results provie conitions uner which conitional means are approximately linear an conitional variances are approximately constant, provie that p/log is small. Theorem 2.1 provies such a statement for a fixe istribution of Z an for many B. By a slight change of perspective, this also leas to a similar statement that hols for fixe B an many istributions of Z, cf. [22]. We can not eal with a fixe matrix B an a fixe istribution of Z with our methos. Whether, say, the conitional varianceisapproximatelyconstantforgivenb anz epensonthe particularsofb an Z, irrespective of p an. A few trivial examples, however, are well known. For instance, if the istribution of Z is spherically symmetric, then the conitional expectation of Z given B Z is exactly linear for every matrix B. Moreover, the conitional expectation is linear an the conitional variance is constant if the components of Z are inepenent an B = (e j1,...,e jp ), where e j is the j-th element of the stanar basis in R. See also Section 2.3.4. in Chapter 2 of [20] for a non-trivial example with p = 1 an = 2. Corollary 2.4. For each, consier a ranom -vector Z () that has a Lebesgue ensity an that satisfies EZ () = 0 an EZ () Z () = I. An for each, suppose that (b1).(a-c)

8 Steinberger, Leeb an (b2) hol with Z () replacing Z an with k = 4, such that the constants ε, α, β, an D in these bouns o not epen on, while the constant ξ = ξ in (b1) may epen on as long as ξ 0 as. Moreover, consier a sequence of integers p < such that p /(ξ log) 0. Then Theorem 2.1(ii) applies for each, with Z () an p replacing Z an p, respectively, an the error bouns provie in the theorem go to zero as. Corollary 2.4 provies an asymptotic version of Theorem 2.1(ii). Similarly, an asymptotic version of Theorem 2.1(i) can also be obtaine, mutatis mutanis. This provies a irect extension of Theorem 2.1 of [13] from the case p = 1 covere in that reference to the case where p > 1, also allowing for p to grow with. [Inee, it is elementary to verify that conitions (t1) an (t2) with k = 4 in [13] imply that the conitions of the corollaryare satisfie with ε = 0 an for some sequence ξ such that ξ 0 as. An if (t1) an (t2) hol with k = 2, one obtains conitions that imply an asymptotic version of Theorem 2.1(i).] If Corollary 2.4 applies with constants ξ satisfying ξ ξ > 0 (e.g., in the case where the ξ o not epen on, which is also iscusse in Example 3.1), the corollary s requirement on p reuces to p = o(log). In that case, the bouns on ν,p (G c ) in Theorem 2.1 are of polynomial orer in. But if Corollary 2.4 applies with ξ 0, then the stronger requirement p = o(ξ log) is neee, an the bouns on ν,p (G c ) in Theorem 2.1can be ofslowerorer in. Note that ξ 0 entails that ξ log, so that the constant sequence p = p always satisfies the growth conition in Corollary 2.4. 3. Examples an extensions 3.1. Examples In this section we iscuss a few simple examples of multivariate istributions for which our assumptions (b1) an (b2) are satisfie an explicit values for the quantities ε an ξ can be given. First, we consier the case of a prouct istribution on R with moments of sufficiently high orer an boune component ensities. For the proof we refer the reaer to Example A.1 in [14]. Example 3.1 (Leeb 2013). Suppose that the ranom -vector Z = (z 1,...,z ) has inepenent components an satisfies EZ = 0, EZZ = I, an fix k N. (i) If E z i 4k+4 µ 4k+4, for some universal constant µ 4k+4 > 0 an for all i = 1,...,, then the bouns in (b1).(a-b) hol with k as chosen here, with ε = ξ = 1/2 an the constants α an β epen only on k an µ 4k+4.

Conitional moments of high-imensional ranom vectors 9 (ii) If E z i 2k+1 µ 2k+1, for some universal constant µ 2k+1 > 0 an for all i = 1,...,, then the bouns in (b1).(c) hol with k as chosen here, with ξ = 1/2 an the constant β epens only on k an µ 2k+1. (iii) If all the marginal Lebesgue ensities of the components of Z exist an are boune by a constant D 1 then Conition (b2) hols true for the same constant D an every value of k {1,...,}. From Example 3.1 we see, in particular, that if the ranom vector Z has inepenent components with boune ensities an boune 12-th marginal moments, then the bouns of Theorem 2.1(i) hol, with ξ 1 = 1/6 (note that k has to be chosen as k = 2 here). If the components of Z even have 20 marginal moments boune by a universal constant, then also the bouns of Theorem 2.1(ii) hol, with ξ 2 = 1/10 (in this case k = 4). The assumptions of Theorem 2.1, however, are not limite to prouct istributions, as the following examples show. See Appenix A in the supplementary material [21] for the proofs. Example 3.2. Suppose that the ranom -vector Z satisfies EZ = 0 an EZZ = I. (i) If R is a fixe orthogonal matrix an Z satisfies any of the bouns (b1).(a,b,c) or (b2) for some values of k, α, β, ε an ξ, then the ranom vector Z = RZ satisfies the same boun with the same constants. (ii) If r is a scalar ranom variable taking values in { 1,1} that is inepenent of Z, an Z satisfies any of the bouns (b1).(a,b,c) or (b2) for some values of k, α, β, ε an ξ, then the ranom vector Z = rz satisfies the same boun with the same constants. Examples 3.1 an 3.2 can be combine to prouce many multivariate istributions with epenent components that still satisfy the assumptions of Theorem 2.1. For instance, if Z has inepenent non-gaussian components with moment an ensity bouns as in Example 3.1 an R is orthogonal with no zero entry, then, by the Darmois-Skitovich Theorem, Z = RZ can not have inepenent components. Alternatively, if Z = (z 1,...,z ) is as in Example 3.1 an such that, say, the first two components have non-symmetric istributions, then the first two components of Z = rz = (z 1,...,z ), for some non-egenerate ranom variable r with values in { 1,1}, may be epenent. Inee, for example, take z 1 z 2 Exp(1) 1 an note that P(z 2 < 1 z 1 > 1) = 0 P(z 2 < 1). Asourlastexampleweiscussaspecificcaseofasphericalistribution.Recallthatevery spherically symmetric istribution with inepenent components must be Gaussian. So every spherical non-gaussian istribution constitutes an example of a multivariate istribution with epenent components. Also, if Z is spherical, then E[Z B Z] = BB Z, almost surely, for every B V,p. Hence, the following example is only of interest in

10 Steinberger, Leeb connection with Theorem 2.1(ii) on the conitional secon moment of Z, since Theorem 2.1(i) is trivially true in this case. Example 3.3. If Z is uniformly istribute on the -ball of raius +2, then EZ = 0 an EZZ = I. Moreover, for k {2,4}, at least for all sufficiently large, Z satisfies Conitions (b1) an (b2) with constants ε = ξ = 1/2, D = 1, an constants α an β that epen only on k. Finally, it is worth mentioning that in the case of spherically symmetric Z the structure of the set G from Theorem 2.1 simplifies ramatically. Inee, from Theorem 2.1(iii) we see that if Z is spherical, then G is both left an right-invariant uner the action of the appropriate orthogonal groups an thus is either empty or equal to the whole Stiefel manifol V,p. 3.2. Improve bouns At the current state of research we can not say if the bouns provie by Theorem 2.1 are tight, or at least if they are of the optimal rate in, in the sense that this rate is achieve for some multivariate istribution satisfying conitions (b1) an (b2). However, there are certain istributions for which the bouns of the theorem can be improve substantially. 2 First, consier the bouns (2.1) an (2.3), which are only of logarithmic orer in. As mentione in Section 2, they can be improve consierably if one imposes an appropriate conition on Z. Here, we only consier (2.1) as an example. This boun is erive in the proof of Lemma B.2(i) by the following simple argument involving the cut-off point M = 3ξ 1 (1 τ)(log)/γ 1. For t > 0, P( E[Z B Z] BB Z > t) P( E[Z B Z] BB Z > t, B Z M ) + P( B Z > M ) 1 E[Z B Z = x] Bx P B Z(x) + p/m t. 2 (3.1) x M In the proof of Theorem 2.1(i) we choose G V,p such that for B G the boun in (3.1) turns into (2.1), while, at the same time, ν,p (G c ) is boune as in (2.2). Of course, using Markov s inequality to boun P( B Z > M ) in the preceing isplay is far from optimal if we have more information on the tails of Z. 2 Moreover, for each specific istribution, there are typically subsets of the set G from the theorem, for which the probabilities in (1.1) an (1.2) are substantially smaller than their respective upper bouns (2.1) an (2.3). For instance, if Z has inepenent components an the columns of B are elements of the stanar basis in R, then both probabilities in (1.1) an (1.2) are equal to zero, for all t > 0.

Conitional moments of high-imensional ranom vectors 11 Suppose now that the ranom vector Z, in aition to the assumptions of Theorem 2.1(i), also satisfies the sub-gaussian tail conition Eexp(α Z) exp( α 2 σ 2 /2), (3.2) for every α R an some constant σ > 0. 3 Uner this conition, the tail inequality for quaratic forms by [10] applies an yiels P( B Z 2 > σ 2 (p+2 ps+2s 2 )) e s2, for all s > 0. Suppose that p < M 2 /(8σ2 ). Since this restriction also entails that p < M 2 /σ2, the equation σ 2 (p+2 ps+2s 2 ) = M 2 has a real positive solution s 0 = p/2+ M 2 /(2σ 2 ) p/4. Thus, after expaning the square an rearranging terms, we obtain s 2 0 = M2 2σ 2 M 2 p( 2σ 2 p ) 4 M2 4σ 2 = (log) 3 ξ 1 (1 τ) 4 σ 2, γ 1 where we have use our restriction on p again. Hence, P( B Z 2 > M) 2 e s2 0 3 ξ 1 (1 τ) 4 σ 2 γ 1, an we have manage to replace the term in (2.1) that is only of logarithmic orer in by something that is ecreasing polynomially in. However, since the square cut-off point M 2 is only of logarithmic orer in, the conition that p < M2 /(8σ2 ) still requires p/log to be small. At the moment, we o not see a way how to increase the cut-off point to polynomial orer in without simultaneously ruining the boun in (2.2). Concerning the bouns (2.2) an (2.4), we believe that polynomial rates in of arbitrarily high orer can be achieve uner more restrictive assumptions than those maintaine here an upon using a more elaborate metho of proof. First results in that irection, regaring only the conitional expectation, are in preparation, cf. [16]. 3.3. The case of a general covariance matrix TheproofofTheorem2.1cruciallyreliesontheassumptionsthatEZ = 0anEZZ = I. However, Theorem 2.1, as it stans, can alreay be use to generalize substantially beyon the mean zero an unit covariance case. In particular, we can provie a large class of positive efinite covariance matrices such that for each element Σ of that class the conclusions of Theorem 2.1 remain vali, provie, of course, that all the relevant quantities are moifie to reflect the general covariance structure Σ. The key to this extension is the following observation. 3 This is satisfie, for instance, if Z has inepenent components which all satisfy the one imensional analogue of (3.2) with the same value of σ.

12 Steinberger, Leeb If Y is Gaussian with mean µ R an positive efinite covariance matrix Σ, an A V,p, then E[Y A Y] = µ+σ 1/2 P Σ 1/2 AΣ 1/2 (Y µ) an E[(Y µ)(y µ) A Y] = Σ 1/2[ I P Σ 1/2 A +P Σ 1/2 AΣ 1/2 (Y µ)(y µ) Σ 1/2 P Σ 1/2 A] Σ 1/2, where P... is the projection matrix corresponing to the column span of the matrix in the subscript. These are our target quantities. Now assume that Y is not necessarily Gaussian but satisfies Y = µ+σ 1/2 Z, with Z as in Theorem 2.1. One easily verifies that an E[Y A Y] (µ+σ 1/2 P Σ 1/2 AΣ 1/2 (Y µ)) = Σ 1/2 (E[Z B Z] BB Z), E[(Y µ)(y µ) A Y] Σ 1/2( I P Σ 1/2 A +P Σ 1/2 AΣ 1/2 (Y µ)(y µ) Σ 1/2 P Σ 1/2 A = Σ 1/2 (E[ZZ B Z] (I BB +BB ZZ BB ))Σ 1/2, where B = Σ 1/2 A(A ΣA) 1/2 V,p. Ieally, the norm ofthese quantities shoul become smallifislarge.ignoringthe aitionalscalingbythe matrixσ 1/2 ofthese errorterms 4, there remains the question of whether the theorem also applies to ) E[Z B P( Z] BB Z > t (3.3) ) Σ 1/2 an ) E[ZZ P( B Z] (I BB +BB ZZ BB ) > t, (3.4) instea of (1.1) an (1.2), i.e., if B = B(Σ,A) G. This raises two questions: For given Σ, how large is the collection of goo matrices A, i.e., how large is the set of A for which B(Σ,A) G? An: How large is the family of matrices Σ for which the collection of goo matrices A is large? The latter question is answere by the next result, the proof of which is eferre to Appenix A in the supplement. Proposition 3.4. If Z satisfies the assumptions of Theorem 2.1(i) (or Theorem 2.1(ii)) an G V,p is the corresponing subset of the Stiefel manifol, then, for each iagonal positive efinite matrix Λ, there exists a collection U(Λ) = U(G,Λ) O of orthogonal matrices, such that the sets an have the following properties: S := S(G) := {UΛU : Λ = iag(λ i ) > 0,U U(G,Λ)} J(Σ) := J(Σ,G) := {A V,p : Σ 1/2 A(A ΣA) 1/2 G}, 4 Whether the scaling by Σ 1/2 matters epens on the specific context of application for these results. Also, the problem can always be circumvente by imposing a bouneness assumption on Σ. However, in the context of [22], for example, no such boun is require.

Conitional moments of high-imensional ranom vectors 13 sup Λ:Λ=iag(λ i)>0 ν, (U c (Λ)) an supν,p (J c (Σ)) Σ S are boune by the square root of the right-han-sie of (2.2) (resp. (2.4)). By efinition, if Σ is any positive efinite covariance matrix an A J(Σ), then (3.3) (resp. (3.4)) is boune by (2.1) (resp. (2.3)) for every t > 0. To unerstan the message of Proposition 3.4, suppose for now that the assumptions of Theorem 2.1(i) are satisfie. Then the set J(Σ) is constructe such that the following hols: If Σ is any positive efinite covariance matrix an A is taken from the collection J(Σ), then, for B = Σ 1/2 A(A ΣA) 1/2, the expression in (3.3) is boune by (2.1). In other wors, J(Σ) is a collection of goo matrices A as iscusse just before the proposition. Now Proposition 3.4 shows that J(Σ) is large provie that Σ S, an also that the set S itself is large. Similar consierations apply, mutatis mutanis, to the conitional variance uner the assumptions of Theorem 2.1(ii). In short, for a large class of -imensional istributions Z (cf. conitions (b1) an (b2)), for a large set of covariance matrices Σ (given by S) an for most matrices A from the Stiefel manifol (those containe in J(Σ)), the first two conitional moments of Y = µ + Σ 1/2 Z given A Y are close to what they woul be in the Gaussian case, all provie that p/log is small. 4. Proof of Theorem 2.1 The rest of the paper an the on-line supplementary material comprise the proof of Theorem 2.1. The basic strategy of the proof is non-stanar an is escribe in this section. To implement this strategy, we nee to eal with several intricate technical challenges. But those can be hanle by stanar methos from multivariate analysis an probability theory. To keep the main paper short, such technical etails are relegate to the on-line supplementary material. Our arguments have the same basic structure as those use in [13]. To prove Theorem 2.1, however, the arguments from [13] require substantial extensions an moifications, because many of the arguments use in that reference are of an asymptotic nature an o not provie explicit error bouns, an because all of these arguments rely heavily on the assumption that p is fixe an equal to 1. 4.1. Two crucial bouns Throughout, fix N an let Z be as in Theorem 2.1, i.e., a ranom -vector that has a LebesgueensityanthatisstanarizesothatEZ = 0anEZZ = I.[Theparticular assumptions of part (i) an (ii) of Theorem 2.1 will be impose as neee later.] We will

14 Steinberger, Leeb stuy the following quantities: For a positive integer p <, for x R p, an for B V,p, set µ x B = E[Z B Z = x], x B = E[ZZ B Z = x] (I +B(xx I p )B ), an h(x B) = ] E, where f = f Z is a Lebesgue ensity of Z, W x B = Bx+(I BB )V, an φ [ f(w x B ) φ(w x B ) enotes a Lebesgue ensity of V N(0,I ). Note that these quantities, if consiere as functions with omain R p V,p, can be chosen to be measurable;cf. Lemma B.1. Finally, write S M,p for the close ball of raius M in R p, i.e., S M,p = {x R p : x M}. We now introuce two bouns which will play an essential role in the proof of Theorem 2.1. In each boun, the quantity of interest, which will be introuce shortly, will be boune by an expression of the form p 2k+1+ε e gm2 ( 2D πe ) pk min{ξ,ε/2+1/4,1/2} κ (4.1) for some even integer k, where the precise value of the constants in the boun will epen on the context, i.e., these constants will be chosen as neee later. The first crucial boun implies the first part of Theorem 2.1: Uner the assumptions of the Theorem 2.1(i), we will show that [ µx B 2 x 2] h(x B) 2 ν,p(b) (4.2) sup x S M,p is boune by (4.1) for k = 2, for every M > 1 an every p N such that > max{4(k + p + 1)M 4,2k + p(2k + 2)2 k+3,p 2 }, where κ = κ 1 1 is a constant that epens only on α an β, an where g = g 1 is a global constant. The remaining constants occurring here, i.e., ε, α, β, ξ, an D, are those that appear in the bouns (b1).(ab) an (b2) impose by Theorem 2.1(i). Once that statement has been erive, the proof of Theorem 2.1(i) is easily complete by stanar arguments (that are etaile in Lemma B.2(i)). The secon crucial boun similarly elivers the secon part of Theorem 2.1: Uner the assumptions of Theorem 2.1(ii), we will show that (4.2) an trace k x B h(x B)k ν,p (B) (4.3) sup x S M,p are both boune by (4.1) for k = 4, for every M > 1 an every p N such that > max{4(k +p+1)m 4,2k +p(2k+2)2 k+3,p 2 }, where κ = κ 2 1 epens only on α an β, an where g = g 2 is a global constant. Again, the remaining constants ε, α, β, ξ, an D are those that appear in the bouns (b1) an (b2) impose by Theorem 2.1(ii). From this statement, stanar arguments complete the proof of Theorem 2.1(ii); cf. Lemma B.2(ii). It turns out that in orer to erive the upper bouns for (4.2) an (4.3), it will be instrumental to show that sup x S M,p [h(x B) 1] 2 ν,p (B) (4.4)

Conitional moments of high-imensional ranom vectors 15 is finite. In particular, we will nee to establish finiteness of (4.4) for every M an p as in (4.2) to erive the esire boun on (4.2), an for every M an p as in (4.3) for the boun on (4.3). We will in fact show more than that, namely that (4.4) is also boune by (4.1), uner the assumptions of Theorem 2.1(i) an for constants as in (4.2), an also uner the assumptions of Theorem 2.1(ii) an for constants as in (4.3). We pause here for a moment to iscuss a weaker version of Theorem 2.1 which also allows us to better appreciate the importance of (4.4) (the exact role of (4.4) in the main argument will become apparent later, after Proposition 4.1): Assume in this paragraph that (4.2), (4.3), an (4.4) are all boune by (4.1) with k = 4, for each M > 1, an for each sufficiently large. [The other constants in the boun, i.e., p, ε, g, D, ξ an κ, are assume to be fixe, inepenent of of here.] Uner this assumption, we immeiately obtain the following weaker version of Corollary 2.4: For each x R p, we have E[Z B Z = x] Bx 2 E[ZZ B Z = x] (I +B(xx I p )B ) p 0, p 0 as, if B is a ranom matrix that is uniformly istribute on V,p. After noting that E[Z B Z = x] Bx 2 can also be written as E[Z B Z = x] 2 x 2, this is an easy consequence of Markov s inequality an Slutzky s Lemma. 5 [Choose M x an observe that the upper boun (4.1) converges to zero as, so that also the three quantities in (4.2), (4.3), an (4.4) converge to zero. Now convergence of (4.4) to zero entails that h(x B) converges to one in square mean. Convergence of (4.2) to zero implies that [ E[Z B Z = x] 2 x 2] h(x B) 2 converges to zero in expectation. Similarly, convergence of (4.3) to zero implies that trace 4 x B h(x B)4 converges to zero in expectation. Because the involve ranom variables are all non-negative, the first relation in the preceing isplay follows from Markov s inequality an Slutzky s Lemma. The secon relation follows in a similar fashion upon observing that the symmetry of x B entails that x B 4 is boune from above by trace 4 x B.] In this subsection, we have seen that to prove Theorem 2.1(i), it suffices to show, uner the assumptions maintaine there, that both (4.2) an (4.4) are boune by (4.1) for constants as in (4.2). An, similarly, to prove Theorem 2.1(ii), it remains to show, uner the assumptions maintaine there, that (4.2), (4.3) an (4.4) are all boune by (4.1) for constants as in (4.3). 4.2. Changing the reference measure Throughout the following, set W j = Bx+(I BB )V j, (4.5) 5 A proof of the first statement in the preceing isplay was alreay sketche in [7] as an immeiate generalization of the case where p = 1 prove therein. See also [13] for further iscussion of that result.

16 Steinberger, Leeb for j = 1,...,k, where B, V 1,...,V k are inepenent such that B is a ranom p matrix with istribution ν,p an such that each of the V i is istribute as N(0,I ). We call W 1,...,W k the rotational clones, in analogy to the name rotational twins that the authors of [7] use for the pair W 1,W 2 in case p = 1. With this, we may re-write the integral in (4.4) as ( [ f(w x B ])2 [ ) f(w x B ] ) E 2E +1 ν φ(w x B ) φ(w x B,p (B) ) [ f(bx+(i BB ] [ )V 1 ) f(bx+(i BB ] )V 2 ) = E φ(bx+(i BB E )V 1 ) φ(bx+(i BB ν,p (B) )V 2 ) [ f(bx+(i BB ] )V 1 ) 2 E φ(bx+(i BB ν,p (B)+1 )V 1 ) [ ] [ ] f(w1 ) f(w 2 ) = E φ(w 1 ) φ(w 2 ) 1 f(w1 ) 2E φ(w 1 ) 1, (4.6) provie that the expecte values in (4.6) are all finite. An, clearly, if both expecte values in (4.6) are boune by (4.1) in absolute value, then (4.6) is boune by three times the expression in (4.1). To establish the esire bouns on (4.2) an (4.3) it will be convenient to also express the integrals in (4.2) an (4.3) in terms of the W i. This can be accomplishe by virtue of the following proposition. Proposition 4.1. Fix p 1, an consier a ranom -vector Z with Lebesgue ensity f. Let V N(0,I ), an write φ( ) for a Lebesgue ensity of V. Moreover, for a fixe p-matrix B V,p an for any x R p, set W x B = Bx+(I BB )V. Then the function h( B) : R p R efine by h(x B) = E [ f(w x B ] ) φ(w x B ) for x R p is a ensity of B Z with respect to the p-variate stanar Gaussian measure (i.e., h(x B)φ p (x) is a Lebesgue ensity of B Z if φ p enotes a N p (0,I p )-ensity). Moreover, if Ψ : R R is such that Ψ(Z) is integrable, then a conitional expectation E[Ψ(Z) B Z = x] of Ψ(Z) given B Z = x satisfies [ ] [ ] E Ψ(Z) B Z = x h(x B) = E Ψ(W x B ) f(wx B ) φ(w x B ) whenever x R p is such that h(x B) <. Note that this proposition applies uner the assumptions of both parts of Theorem 2.1. Assume therefore that Proposition 4.1 is applicable throughout the rest of this subsection.

Conitional moments of high-imensional ranom vectors 17 The integral in (4.2) can then be re-written as [ µx B 2 x 2] h(x B) 2 ν,p(b) = = [ (W = E µ x B h(x B) 2 ν,p (B) x 2 ] E[ W x Bf(Wx B ) φ(w x B ) [ E W x Bf(Wx B ) φ(w x B ) [ ] x 2 f(w1 ) f(w 2 ) E φ(w 1 ) φ(w 2 ) 1 W 2 x 2) f(w 1 ) f(w 2 ) φ(w 1 ) φ(w 2 ) h(x B) 2 ν,p (B) ] ν,p (B) ], (4.7) provie that the expecte values in (4.6) an (4.7) are all finite. [Inee, finiteness of the expecte values in (4.6) entails that [h(x B) 1] 2 ν,p (B) is finite, so that ν,p {B : h(x B) = } = 0, whence Proposition 4.1 can be use to obtain the secon equality in the preceing isplay. The first an the thir equality follow from finiteness of the expecte values in (4.6) an (4.7).] Toexpresstheintegralin(4.3)inasimilarway,efine x B (z) : R p V,p R R by x B (z) = zz (I +B(xx I p )B ), an use Proposition 4.1 component-wise with Ψ i,j (Z) = [ x B (Z) ] for all i,j = 1,..., to obtain i,j trace k x B h(x B)k ν,p (B) (E[ x B = trace{ (Z) B Z = x]h(x B) ) } k ν,p (B) ( [ ]) k } = trace{ E x B (W x B ) f(wx B ) ν φ(w x B,p (B) ) [ = E trace x B (W 1 ) x B (W k ) f(w 1) φ(w 1 ) f(w ] k), (4.8) φ(w k ) provie that the expecte values in (4.6) an (4.8) are all finite. Lemma C.1 escribes how the expression in (4.8) can be written as a weighte sum of expressions that, similarly to (4.7), involve only inner proucts of the W i an a prouct of ensity ratios. In particular, we fin that (4.8) can be written as a weighte sum of terms of the form ( ) [ k (W ( 1) k j E j 1W 2 W j W jw 1 +p x 2j) f(w 1 ) φ(w 1 ) f(w ] k), an φ(w k ) j=1 [( m ) ] E W j i 1+1W ji 1+2W j i 1+2 W j i 1W ji x 2(jm m) f(w 1 ) φ(w 1 ) f(w k) φ(w k ) (4.9)

18 Steinberger, Leeb for m 1 an inices j 0,...,j m satisfying j 0 = 0, j m < k, an j i 1 +1 < j i whenever 1 i m, provie that the expecte values in (4.9) are all finite. [Note that these requirements entail that m k/2, an that there are no more than ( k m) choices for the inices j 0,...,j m in the secon expecte value in (4.9).] Lemma C.1 also shows that the weights in this expansion of (4.8) only epen on k an on x, an are polynomials in x 2. Note that hence all the weights are boune, in absolute value an uniformly in x S M,p, by e c(k)m2 for some constant c(k) that epens only on k. In particular, if the expecte values in (4.9) are all boune by (4.1) in absolute value, then the same is true for (4.8) or, equivalently, (4.3) upon replacing the constants g an κ in (4.1) by, say, g +c(k) an (1+(k/2) ( k k/2) )κ, respectively. In this subsection, we have seen how the integrals in (4.2), (4.3) an (4.4) can be re-written as weighte sums of expecte values involving the rotational clones, provie these expecte values are all finite. To prove Theorem 2.1(i), it thus remains to show, uner the maintaine assumptions, that the expecte values in (4.6) an (4.7) are all boune by (4.1) in absolute value, uniformly in x S M,p, an for constants as in (4.2). An Theorem 2.1(ii) follows if, uner the assumptions of that theorem, the expecte values in (4.6) an (4.7) as well as all the expressions in (4.9) are boune by (4.1) in absolute value, uniformly in x S M,p, an for constants as in (4.3). 4.3. The joint ensity of the rotational clones Proposition 4.2. For integers 1 p < an 1 k p, let x R p, an let W 1,...,W k be as in (4.5). Then W 1,...,W k have a joint ensity ϕ x (w 1,...,w k ) with respect to Lebesgue measure which satisfies ϕ x (w 1,...,w k ) = φ(w 1 ) φ(w k ) ( ) pk 2 k Γ(( i+1)/2) 2 Γ(( p i+1)/2) et(s k) p 2 (1 x 2 ) p k 1 ι S 1 k ι 2 e k 2 x 2 if S k is invertible with x 2 ι S 1 k ι <, an ϕ x(w 1,...,w k ) = 0 otherwise, where S k = (w i w j/) k i,j=1 enotes the k k matrix of scale inner proucts of the w i, an ι = (1,...,1) enotes an appropriate vector of ones. If, in aition, k < p 1, then the normalizing constant in the preceing isplay, i.e., the quantity η(,p,k) = (/2) kp/2 k 0 < η(, p, k) exp Γ(( i+1)/2) Γ(( p i+1)/2) satisfies [ p 2 ( 1 p+k 1 ) ] 1 k 2. 2

Conitional moments of high-imensional ranom vectors 19 Note that Proposition 4.2 applies whenever p,, an k are as in (4.2) or (4.3). For p,, an k as in Proposition 4.2, we can re-express the expression in (4.6) as [ ] [ ] f(w1 ) f(w 2 ) E φ(w 1 ) φ(w 2 ) 1 f(w1 ) 2E φ(w 1 ) 1 ( ) f(w1 ) f(w 2 ) = R R φ(w 1 ) φ(w 2 ) 1 ϕ x (w 1,w 2 )w 1 w 2 ( ) f(w1 ) 2 R φ(w 1 ) 1 ϕ x (w 1 )w 1 [ ] [ ] ϕx (Z 1,Z 2 ) = E φ(z 1 )φ(z 2 ) 1 ϕx (Z 1 ) 2E φ(z 1 ) 1, (4.10) we can re-write (4.7) as an the expressions in (4.9) can be written as [ (Z E 1Z 2 x 2) ] ϕ x (Z 1,Z 2 ), (4.11) φ(z 1 )φ(z 2 ) ) [ (Z E 1Z 2 Z j Z jz 1 +p x 2j) ϕ x (Z 1,...,Z k ) φ(z 1 ) φ(z k ) ( k ( 1) k j j j=1 [( m ) E Z j i 1+1Z ji 1+2 Z j i 1Z ji x 2(jm m) ϕ x (Z 1,...,Z k ) φ(z 1 ) φ(z k ) ], ], (4.12) for m 1 an inices j 0,...,j m satisfying j 0 = 0, j m < k, an j i 1 +1 < j i whenever 1 i m. In this subsection, we have seen how the integrals in (4.2), (4.3) an (4.4) can be re-written as weighte sums of expecte values involving i.i.. copies of Z an the ensity of the rotational clones, provie these expecte values are all finite. Theorem 2.1(i) now follows if we can show that the expecte values in (4.10) an (4.11) are all boune by (4.1) in absolute value, uniformly in x S M,p, an for constants as in (4.2), uner the assumptions of that theorem. Similarly, Theorem 2.1(ii) follows, provie the expecte values in (4.10) an (4.11) as well as all the expressions in (4.12) are boune by (4.1) in absolute value, uniformly in x S M,p, an for constants as in (4.3), uner the assumptions of that theorem. 4.4. Two sufficient conitions For an even integer k, consier the quantities [( m ) ] E Z j Z i 1+1 j i 1+2 Z j Z ϕ x (Z 1,...,Z l ) i 1 j i x 2(jm m) (4.13) φ(z 1 ) φ(z l )

20 Steinberger, Leeb for l = 1,...,k, for each m 0, an for each set of inices j 0...,j m that satisfies j 0 = 0, j m l an j i 1 +1 < j i whenever 1 i m. An, again for even k, consier ( k ( 1) k j j j=1 ) [ (Z E 1 Z 2 Z j Z j Z 1 +p 1 ) ] ϕ x(z 1,...Z k ) (1 x 2 ) k. (4.14) φ(z 1 ) φ(z k ) If the expressions of the form (4.13) are all boune by (4.1), in absolute value an with constants as in (4.2), then both the expecte values in (4.10) an (4.11) are also boune by (4.1), again in absolute value an for constants as in (4.2). Inee, the two expecte values in (4.10) are special cases of (4.13), namely with m = 0 an l = 1, an with m = 0 an l = 2, respectively. Similarly, one sees that (4.11) equals E [ (Z 1 Z 2) ϕ ] x(z 1,Z 2 ) φ(z 1 )φ(z 2 ) x 2 [ ] x 2 ϕx (Z 1,Z 2 ) E φ(z 1 )φ(z 2 ) 1. Note that the two expecte values in the preceing isplay are special cases of (4.13), namely with m = 1, l = 2 an with m = 0, l = 2. If these special cases of (4.13) are both boune by (4.1) in absolute value, uniformly in x S M,p, an for constants as in (4.2), then the expression in the preceing isplay is similarly boune by the prouct of (4.1) an 1+M 2. It is now easy to see that the resulting upper boun on the expression in the preceing isplay, an hence also on (4.11), is itself upper boune by an expression of the form (4.1) for constants as in (4.2). Similarly, if the expressions of the form (4.13) an also (4.14) are boune by (4.1), in absolute value an with constants as in (4.3), then the expecte values in (4.10) an (4.11) as well as all the expressions in (4.12) are also boune by (4.1), again in absolute value an for constants as in (4.3). Inee, (4.10) can be boune as claime by arguing as in the preceing paragraph. For (4.12), we re-write the first expression in that isplay

Conitional moments of high-imensional ranom vectors 21 as ( ) [ k (Z ( 1) k j E 1 j Z 2 Z j Z j Z 1 +p x 2j) ] ϕ x (Z 1,...,Z k ) φ(z 1 ) φ(z k ) j=1 = ( ) [ k (Z ( 1) k j E 1 j Z 2 Z j Z j Z 1 +p 1 ) ] ϕ x (Z 1,...,Z k ) φ(z 1 ) φ(z k ) j=1 +E [ ] ( ) ϕx (Z 1,...,Z k ) k (1 x ( 1) k j 2j ) φ(z 1 ) φ(z k ) j j=1 = (4.14)+ ( 1 x 2) k E [ ϕx (Z 1,...,Z k ) φ(z 1 ) φ(z k ) = (4.14) ( 1 x 2) k E [ ϕx (Z 1,...,Z k ) φ(z 1 ) φ(z k ) 1 ] (1 x 2 ) k ], where the secon equality is obtaine from the binomial formula upon recalling that k is even. From this, the first expression in (4.12) can be boune by an expression of the form (4.1), namely by first bouning both (4.13) an (4.14) with m = 0 an l = k by (4.1), by using the fact that (1 x 2 ) k 2 k M 2k for x S M,p, an by ajusting the constants κ an g in (4.1) accoringly. The secon expression in (4.12) can be boune in a similar fashion upon using appropriate bouns on (4.13). In this subsection, we have seen how bouns on the expressions of the form (4.13) an on (4.14) can be use to prove both parts of Theorem 2.1. In particular, Theorem 2.1(i) follows if, uner the assumptions of that theorem, the expressions of the form (4.13) are all boune by (4.1), in absolute value, uniformly in x S M,p, an for constants as in (4.2). An Theorem 2.1(ii) follows if the expressions of the form (4.13) an also (4.14) are all boune by (4.1), in absolute value, uniformly in x S M,p, an for constants as in (4.3), uner the assumptions of that theorem. 4.5. Approximating the ensity ratio Proposition 4.3. Fix M > 1, positive integers k,, an p, such that > p 2 an > 4(k+p+1)M 4 an let x S M,p. For a collection of -vectors w 1,...,w k, efine the k k-matrix S k = (w i w j/) k i,j=1. Then the ensity ratio ϕx(w1,...,w k) φ(w 1) φ(w k ) can be expane as ϕ x (w 1,...,w k ) φ(w 1 ) φ(w k ) = ψ x (S k I k ) +, where the quantities on the right han sie have the following properties: ψ x is a polynomial of egree k in the elements of S k I k whose coefficients are boune

22 Steinberger, Leeb by p k M 2(k+2) C ψ (k), where C ψ (k) epens only on k. In particular, we may write ψ x (S k I k ) = C(H) H(S k I k ), H M k where M k is the set of all monomials in the entries of a symmetric k k-matrix (i.e., in k(k + 1)/2 variables) up to egree k an C(H) R is the coefficient in ψ x corresponing to the monomial H, which satisfies C(H) p k M 2(k+2) C ψ (k). In aition, the coefficients C(H) are invariant uner permutations in the following sense: Define the function g by g(w 1,...,w k ) = S k I k. If H,G M k are such that H g(w 1,...,w k ) = G g(w π(1),...,w π(k) ), for some permutation π of k elements an every choice of w 1,...,w k R, then C(H) = C(G). Moreover, there exists a constant ξ(k) > 2k that epens only on the value of k, such that whenever S k I k < 1/(pξ(k)), the remainer term satisfies p k+1 M 2(k+2) e k 2 M2 S k I k k+1 C (k), where C (k) is a constant that epens only on k. Note that Proposition4.3applieswheneverM, k,, an p areeither asin (4.2) oras in (4.3). Thepropositionsuggeststoreplacethe ensityratio ϕx(w1,...,w k) φ(w 1) φ(w k ) bythepolynomial ψ x (S k I k ). For a fixe even integer k, we therefore consier, as approximations to the expressions in (4.13), the quantities [( m ] E Z j Z i 1+1 j i 1+2 Z j Z i 1 j i )ψ x (S l I l ) x 2(jm m) (4.15) for l = 1,...,k, for each m 0, an for each set of inices j 0...,j m that satisfies j 0 = 0, j m l an j i 1 + 1 < j i whenever 1 i m. An as approximation to (4.14), we consier ( ) [ k (Z ( 1) k j E 1 j Z 2 Z j Z j Z 1 +p 1 ) ] ψ x (S k I k ) (1 x 2 ) k. (4.16) j=1 In orer for these approximations to be useful we have to make sure that the ifference between (4.14) an (4.16) as well as the ifference between (4.13) an (4.15) can be controlle. The following proposition provies us with the appropriate tool. Proposition 4.4. Fix positive integers an k. Moreover, let M > 1 an p N be such that > max{4(k+p+1)m 4,2k+p(2k+2)2 k+3,p 2 }. Let Z be a ranom -vector such that EZ = 0 an EZZ = I, an such that bouns (b1).(a) an (b2) obtain with k as chosen here. Write Z 1,...,Z k for i.i.. copies of Z. Finally, fix l {1,...,k}, let S l = (Z i Z j/) l i,j=1, an let H(S l I l ) be a (fixe) monomial in the elements of S l I l whose egree, enote by eg(h), satisfies 0 eg(h) l. Then sup E x S M,p [ l+eg(h) 2 H(S l I l ) ] ϕ x (Z 1,...,Z l ) φ(z 1 ) φ(z l ) ψ x(s l I l )