Conditional moment representations for dependent random variables

Similar documents
Complete Moment Convergence for Sung s Type Weighted Sums of ρ -Mixing Random Variables

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Notes on Integrable Functions and the Riesz Representation Theorem Math 8445, Winter 06, Professor J. Segert. f(x) = f + (x) + f (x).

Mi-Hwa Ko. t=1 Z t is true. j=0

A Note on the Strong Law of Large Numbers

NOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS

Boundedly complete weak-cauchy basic sequences in Banach spaces with the PCP

STAT 7032 Probability Spring Wlodek Bryc

USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION

Random Bernstein-Markov factors

I teach myself... Hilbert spaces

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

THE L 2 -HODGE THEORY AND REPRESENTATION ON R n

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Logarithmic scaling of planar random walk s local times

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Measurable functions are approximately nice, even if look terrible.

Wittmann Type Strong Laws of Large Numbers for Blockwise m-negatively Associated Random Variables

MOMENT CONDITIONS FOR ALMOST SURE CONVERGENCE OF WEAKLY CORRELATED RANDOM VARIABLES

University of Toronto Department of Statistics

Estimates for probabilities of independent events and infinite series

The oblique derivative problem for general elliptic systems in Lipschitz domains

Your first day at work MATH 806 (Fall 2015)

BLOWUP THEORY FOR THE CRITICAL NONLINEAR SCHRÖDINGER EQUATIONS REVISITED

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

6. Duals of L p spaces

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

The Kadec-Pe lczynski theorem in L p, 1 p < 2

On Riesz-Fischer sequences and lower frame bounds

The Codimension of the Zeros of a Stable Process in Random Scenery

On the distributional divergence of vector fields vanishing at infinity

The small ball property in Banach spaces (quantitative results)

SENSITIVITY AND REGIONALLY PROXIMAL RELATION IN MINIMAL SYSTEMS

WLLN for arrays of nonnegative random variables

Spectral representations and ergodic theorems for stationary stochastic processes

GERHARD S I E GEL (DRESDEN)

Frame expansions in separable Banach spaces

A note on a construction of J. F. Feinstein

IN AN ALGEBRA OF OPERATORS

UNCONDITIONALLY CONVERGENT SERIES OF OPERATORS AND NARROW OPERATORS ON L 1

Useful Probability Theorems

Functional Analysis, Stein-Shakarchi Chapter 1

ON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY

The Measure Problem. Louis de Branges Department of Mathematics Purdue University West Lafayette, IN , USA

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Fragmentability and σ-fragmentability

Limit Theorems for Exchangeable Random Variables via Martingales

THEOREM OF OSELEDETS. We recall some basic facts and terminology relative to linear cocycles and the multiplicative ergodic theorem of Oseledets [1].

E.7 Alaoglu s Theorem

LIMITING CASES OF BOARDMAN S FIVE HALVES THEOREM

Math 676. A compactness theorem for the idele group. and by the product formula it lies in the kernel (A K )1 of the continuous idelic norm

Commutative Banach algebras 79

REAL AND COMPLEX ANALYSIS

MATH5011 Real Analysis I. Exercise 1 Suggested Solution

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

CORRECTIONS AND ADDITION TO THE PAPER KELLERER-STRASSEN TYPE MARGINAL MEASURE PROBLEM. Rataka TAHATA. Received March 9, 2005

arxiv: v1 [math.ds] 31 Jul 2018

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

Optimization and Optimal Control in Banach Spaces

P-adic Functions - Part 1

Weak-Star Convergence of Convex Sets

Weak Topologies, Reflexivity, Adjoint operators

Spectral theory for compact operators on Banach spaces

SHABNAM AKHTARI AND JEFFREY D. VAALER

The main results about probability measures are the following two facts:

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

ESTIMATES FOR MAXIMAL SINGULAR INTEGRALS

THEOREMS, ETC., FOR MATH 515

Kaczmarz algorithm in Hilbert space

SHARP INEQUALITIES FOR MAXIMAL FUNCTIONS ASSOCIATED WITH GENERAL MEASURES

1 Lyapunov theory of stability

are Banach algebras. f(x)g(x) max Example 7.4. Similarly, A = L and A = l with the pointwise multiplication

THE JOHN-NIRENBERG INEQUALITY WITH SHARP CONSTANTS

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Isomorphisms between pattern classes

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Existence and Uniqueness

A SHORT PROOF OF THE COIFMAN-MEYER MULTILINEAR THEOREM

Vector measures of bounded γ-variation and stochastic integrals

Notes for Functional Analysis

The Hilbert transform

October 25, 2013 INNER PRODUCT SPACES

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES

Soo Hak Sung and Andrei I. Volodin

1. Bounded linear maps. A linear map T : E F of real Banach

Arithmetic progressions in sumsets

Auerbach bases and minimal volume sufficient enlargements

A CRITERION FOR THE EXISTENCE OF TRANSVERSALS OF SET SYSTEMS

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 )

Independence of some multiple Poisson stochastic integrals with variable-sign kernels

Homogeneity and rigidity in Erdős spaces

ON THE STRONG LIMIT THEOREMS FOR DOUBLE ARRAYS OF BLOCKWISE M-DEPENDENT RANDOM VARIABLES

ON SOME RESULTS ON LINEAR ORTHOGONALITY SPACES

Kernel Method: Data Analysis with Positive Definite Kernels

On complete convergence and complete moment convergence for weighted sums of ρ -mixing random variables

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

Transcription:

Conditional moment representations for dependent random variables W lodzimierz Bryc Department of Mathematics University of Cincinnati Cincinnati, OH 45 22-0025 bryc@ucbeh.san.uc.edu November 9, 995 Abstract The question considered in this paper is which sequences of p-integrable random variables can be represented as conditional expectations of a fixed random variable with respect to a given sequence of σ-fields. For finite families of σ-fields, explicit inequality equivalent to solvability is stated; sufficient conditions are given for finite and infinite families of σ-fields, and explicit expansions are presented. Introduction We analyze which sequences of random variables {X j } can be represented as conditional expectations E(Z F j ) = X j. () of a p-integrable random variable Z with respect to a given sequence (F j ) of σ-fields. The martingale theory answers this question for increasing σ-fields (F j ). We are more interested in other cases which include σ-fields generated by single independent, or say, Markov dependent, random variables. In particular, given a random sequence ξ j and p-integrable random variables X j = f j (ξ j ), we analyze when there is Z L p such that X j = E(Z ξ j ). (2) This is motivated by our previous results for independent random variables and by the alternating conditional expectations (ACE) algorithm of Breiman & Friedman [3]. In [3] the authors are interested in the L 2 -best additive prediction Z of a random variable Y based on the finite number of the predictor variables ξ,..., ξ d. The solution (ACE) is based on the fact that the best additive predictor Z = φ (ξ ) +... + φ d (ξ d ) satisfies the conditional moment constraints (2). Relation () defines an inverse problem, and shares many characteristics of other inverse problems, c. f. Groetsch [8]. Accordingly, our methods partially rely on (non-constructive) functional analysis. We give sufficient conditions for the solvability of () in terms of maximal correlations. We also show that (2) has solution for finite d <, if the joint density of ξ,..., ξ d with respect to the product of marginals is bounded away from zero and EX i = EX j. Partially supported by C. P. Taft Memorial Fund and Office of Naval Research Grant N0004-93-0043. Key Words: alternating conditional expectation, inverse problems, ACE AMS (99) Subject Classification: Primary: 62J2 Secondary: 60E05, 62J02

We are interested in both finite, and infinite sequences, extending our previous results in [4, 5]. In this paper we concentrate on the p-integrable case with < p <. The extremes p = or p = seem to require different assumptions. For infinite sequences of independent r.v. all three cases < p <, p =, and p = are completely solved in [5]. For finite sequences of dependent σ-fields, Kellerer [9] and Strassen [5] can be quoted in connection with conditional expectations problem with Z 0, which can be modified to cover bounded random variables (p = ) case. For pairs of σ-fields the case < p < is solved in [4]. 2 Notation and results For 2 d, let {F j } j d be a given family of σ-fields. By L 0 p(f) we denote the Banach space of all p-integrable F-measurable centered random variables, p. By E j we denote the conditional expectation with respect to F j. For d < by d j= L p(f j ) we denote the set of sums Z = Z +... + Z d, where Z j L p (F j ). We shall analyze the following problems. For all consistent X j L p find Z L p satisfying () and with minimal norm E Z p = min. (3) For all consistent X j L p find additive Z L p satisfying (); additive means that Z = Z j, where Z j L p (F j ). (4) (for d = the series in (4) is assumed to converge absolutely in L p ) j= The above statements do not spell out the consistency conditions; these will be explicit in the theorems. Remark 2. If () can be solved, then there is a minimal solution Z. This can be easily recovered from the Komlos law of large numbers [0]. 2. Maximal correlations Maximal correlation coefficients play a prominent role below; for another use see also [3, Section 5]. The following maximal correlation coefficient is defined in [4]. Let ρ(f, G) = sup {corr(x, Y ) : X L 2 (F), Y L 2 (G), E(X F G) = 0}. Notice that ρ(f, G) = 0 for independent F, G but also for increasing σ-fields F G. If the intersection F G is trivial, ρ coincides with the usual maximal correlation coefficient, defined in general by ρ(f, G) = sup X L 2(F),Y L 2(G) corr(x, Y ). (5) Note that if ρ(f, G) < then F G is trivial. Given d, σ-fields {F j } j d, and a finite subset T I := {, 2,..., d} put Define pairwise maximal correlation r by F T = σ(f j : j T ). r = sup ρ(f i, F j ) i j 2

and global maximal correlation R = sup ρ(f T, F S ). T S= For p = 2 a version of R based on additive r.v. will also play a role. Let R l = sup corr(u, V ) : U = X j, V = X j, X j L 2 (F j ), T S =. j T j S Clearly, r R l R. All three coefficients coincide for two σ-fields d = 2 case. One can easily see that R l = 0 and R = can happen already for d = 3. 2.2 Main results In Section 2.4 we present complete solution of () for two σ-fields case d = 2. For general families of σ-fields, there seems to be little hope to get existence and uniqueness results as precise as for d = 2. As Logan & Shepp [] point out, complications arise even in relatively simple situations. One possible source of such complications is linear dependence between L 0 p(f j ). Suitable assumptions on maximal correlation coefficients exclude the latter. The following result extends [5, Corollary ] to infinite sequences of dependent families of σ-fields. Theorem 2. (i) Fix < p < and suppose R <. Then equation () is solvable for Z for all X j L 0 p(f j ) such that E( j X j 2 ) p/2 <, and the solution is unique. (ii) If R l < then for all X j L 0 2(F j ) such that j EX2 j < there is the unique additive solution Z to (), and it satisfies E Z 2 +R l R l j E X j 2. Moreover, if d < then Z is given by ACE formula [3]. If one isn t interested in sharp moment estimates for Z, and only finite families d < are of interest, then one can iterate Theorem 2. for a pair of σ-fields, relaxing R <. By Lemma 3.2, this yields the following. Theorem 2.2 If d <, ρ = max j d ρ(f {,...,j}, F j+ ) <, (6) and < p <, then aquation () has an additive (4) solution Z for all X j L 0 p(f j ). Moreover, inequality (2) holds with q = 2 and δ = ( ρ ) d/2. The following criterion for solvability of the additive version of () uses the pairwise maximal correlation r and gives explicit alternative to ACE. For d = 2 the assumptions are close to [4], except that we assume p = 2 and (implicitly) linear independence. Theorem 2.3 If d <, r < d, and p = 2, then for all X j L 2 (F j ) with EX i = EX j there is unique Z L 2 such that () and (4) hold. Moreover, the solution is given by the explicit series expansion + ( ) k k=0 i I i 2 I\i... Z = EX E i... E ik (X j EX j ) i k I\i k j I\i k (7) (with the convention j X j = 0). Furthermore, V ar(z) r(d ) d j= V ar(x j). 3

For finite families of σ-fields, inequality (2) is equivalent to solvability of (). Lemma 3.3 gives a pairwise condition for (2) and was motivated by Breiman & Friedman [3]. This implies [3, Proposition 5.2]. Corollary 2.4 ([3]) If d <, vector spaces L 0 2(F j ) are linearly independent, and for all j d, k j, the operators E j E k : L 0 2(F k ) L 0 2(F j ) are compact, then for all square integrable X,..., X d with equal means EX i = EX j, there is the unique additive solution Z of (). 2.3 Conditioning with respect to random variables We now state sufficient conditions for solvability of () in terms of joint distributions for finite families d < of σ-fields generated by random variables F j = σ(ξ j ), where ξ,..., ξ d is a given random sequence. We begin with the density criterion that gives explicit estimate for R, and was motivated by [4]. By Lemma 3.2, it implies that () has unique additive solution Z for all < p <. Although it applies both to discrete and continuous distributions (typically, the density in the statement is with respect to the product of marginals), it is clear that the result is far from being optimal. Theorem 2.5 Suppose there is a product probability measure µ = µ... µ d such that the distribution of ξ,..., ξ d on IR d is absolutely continuous with respect to µ and its density f is bounded away from zero and infinity, 0 < b f(x,..., x d ) B <. Then R b B. 2 The next result follows from Corollary 2.4 by [3, page 06, Exercise 5]; it is stated for completeness. Proposition 2.6 ([3]) Suppose d < and for every pair of i j the density f i,j of the distribution of (ξ i, ξ j ) with respect to the product measure µ i,j = µ i µ j of the marginals exists and fi,j(x, 2 y)dµ i (x)dµ j (y) <. (8) max i j If vector spaces L 0 2(F j ) are linearly independent p = 2, then R l <. In particular, () has unique additive solution Z for all square integrable X,..., X d with equal means EX i = EX j. In general, linear independence is difficult to verify (vide [], where it fails). consequence of Proposition 2.6 gives a relevant density criterion. The following Corollary 2.7 Suppose the density f of the distribution of ξ,..., ξ d (d < ) with respect to the product of marginals µ = µ... µ d exists. If f is strictly positive, i.e., µ({(x,..., x d ) : f(x,..., x d ) = 0}) = 0 and f 2 dµ < then there is an additive solution to () for all X j L 2 (F j ) such that EX i = EX j. In relation to Theorem 2.5, one should note that the lower bound on the density is of more relevance. (On the other hand, in Theorem 2.5 we use the density with respect to arbitrary product measure rather than the product of marginals.) Proposition 2.8 Let f be the density of the absolute continuous part of the distribution of ξ,..., ξ d (d < ) with respect to the product of marginals µ = µ... µ d. If f is bounded away from zero, i.e., there is b > 0 such that µ({(x,..., x d ) : f(x,..., x d ) b}) =, then (2) holds for all < q <. In particular, for < p < for X j L p (F j ) such that EX i = EX j there is an additive solution to (). 4

2.4 Results for two σ-fields This case is rather completely settled. Most of the results occurred in various guises in the literature. They are collected below for completeness, and to point out what to aim for in the more general case. The following shows that for d = 2 there is at most one solution of () and (4). (Clearly, there is no Z if X, X 2 are not consistent, e.g., if EX EX 2.) Proposition 2.9 Given X j L p (F j ), p, there is at most one Z = Z + Z 2 + Z L such that () holds and E(Z j F F 2 ) = 0. Since best additive approximations satisfy (), uniqueness allows to consider the inverse problem () instead. This is well known, c.f., [7]. Corollary 2.0 If p = 2 and the best additive approximation Z = Z + Z 2 + Z of Y L 2 (i.e., Z minimizing E (Y E(Y F F 2 ) (Z + Z 2 )) 2 ) exists, then it is given by the solution to Problem SA (if solvable). The following result points out the role of maximal correlation and comes from [4]. Theorem 2. ([4]) Suppose < p < is fixed. The following conditions are equivalent:. There is a minimal solution to () for all consistent X, X 2 in L p (F ), L p (F 2 ) respectively; 2. here is an additive solution to () for all consistent X, X 2 in L p (F ), L p (F 2 ) respectively; 3. ρ <. Moreover, the explicit consistency condition is E{X F F 2 } = E{X 2 F F 2 }. Furthermore, if E(Z F F 2 ) = 0, the minimum norm in (3) is bounded by E Z 2 ρ (EX2 + EX 2 2 ) and the bound is sharp. The solution is explicit: Z = E(X F F 2 ) + (E 2 E ) k (X 2 E 2 X ) + (E E 2 ) k (X E X 2 ) (9) and both series converge in L p. k=0 Remark 2.2 For p = 2 formula (9) resembles the following explicit expansion for the orthogonal projection L 2 cl (L 2 (F ) L 2 (F 2 )) (see []). Z = E{Y F F 2 } + (the series converges in L 2 ). k=0 ( E (E 2 E ) k + E 2 (E E 2 ) k (E E 2 ) k (E 2 E ) k) Y ; (0) k= 5

3 Proofs The following uniqueness result is proved in [3] for the square-integrable case p = 2 (the new part is < p < 2). Lemma 3. (i) If L 0 p(f j ) are linearly independent and d <, then for every {X j } in L p (F j ), p 2, there is at most one solution of () in the additive class (4). (ii) If (2) holds with q = 2, then for every {X j } in L p (F j ), p 2, there is at most one solution of () in the additive class (4). (iii) Fix < p <. If there are constants c, C such that for all centered {X j } L q (F j ) ce( Xj 2 ) q/2 E j= X j q CE( Xj 2 ) q/2 () j= holds for q = p and for the conjugate exponent q = most one solution of () in the additive class (4). j= p p, then for every {X j} in L p (F j ) there is at Proof of Lemma 3.. The case p = 2 goes as follows. Suppose Z = Z + Z 2 +... has E j (Z) = 0 for all j. Then EZ 2 = j EZZ j = j E(Z je j (Z)) = 0. This implies that Z j = 0 for all j either by linear independence, or by (2). The second part uses the existence part of the proof of Theorem 2.. Take Z = j Z j (L p - convergent series) such that E j (Z) = 0. Then by () Z p C(E( j Z 2 j ) p/2 ) /p = C j E(Z j X j ), where E( j X2 j )q/2 =, /p + /q = and X j L 0 q(f j ). The latter holds because the conjugate space to L 0 q(l 2 (F j )) is L 0 p(l 2 (F j )). The existence part of the proof of Theorem 2. implies that there is Z L q such that E j ( Z) = X j and Z = j Z j with Z j L 0 q(f j ). Therefore E(Z j X j ) = j j E(Z j Z) = E(Z Z) = E(Z Z j ) = j j E( Z j E j (Z)) = 0. This shows E Z p = 0 and by the left hand side of () we have Z j = 0 a.s. for all j. Proof of Proposition 2.9. Clearly Z = E{Z F F 2 } is uniquely determined by Z and without loosing generality we may assume Z = 0. Suppose that a particular Z = Z +Z 2 has E j Z = 0. Then Z = E E 2 Z. Using this iteratively, by alterniende Verfahren (see [2]) we get Z = (E E 2 ) k Z E(Z F F 2 ) = 0. By symmetry, Z 2 = 0 and uniqueness follows. Proof of Corollary 2.0. Without loss of generality we may assume E(Y F F 2 ) = 0. For optimal Z = Z + Z 2 we have min = E (Y (Z + Z 2 )) 2 = E (Y (E (Y ) E (Z 2 ) + Z 2 )) 2 + E (E (Y ) E (Z)) 2 min +E (E (Y ) E (Z)) 2. Since the same analysis applies to E 2, the optimal Z has to satisfy (). By Theorem 2.9, there is only one such Z, so this one has to be the optimal one. Proof of Theorem 2.. For E(Y j F F 2 ) = 0 we have E(Y + Y 2 ) 2 EY 2 + EY 2 2 2 ρ(ey 2 EY 2 2 ) /2 ( ρ)(ey 2 + EY 2 2 ) 6

Therefore the linear operator T : L 0 2 L 0 2(F ) L 0 2(F 2 ) given by T (Y ) = (E (Y ), E 2 (Y )) is onto and the norm of its left inverse is bounded by ( ρ) /2 (here L 0 p denotes the null space of the linear operator E( F F 2 ) on L p ). This proves the bound EZ 2 (EX 2 + EX 2 2 )/( ρ). Because of the explicit formula for Z, it is clear that 3 2; implication 2 holds by general principles (see Remark 2.). The equivalence 3 is in [4]. It is easy to see that if for given {X j } () is solvable in the additive class (4), then there is also a minimal solution (3), see Remark 2.. The following shows that for finite families of σ-fields the solvability of both problems is actually equivalent, at least when trivial constraints EX i = EX j are the only ones to be used. Lemma 3.2 Fix < p < and suppose d <. The following conditions are equivalent (i) Equation () has an additive (4) solution Z for all X j L 0 p(f j ); (ii) Equation () has a minimal (3) solution Z for all X j L 0 p(f j ); (iii) There is δ = δ(q) > 0 such that for all X j L 0 q(f j ) E j X j q δ q j E X j q, (2) where /p + /q =. Moreover, if inequality (2) holds, then there is an additive solution Z to () with E Z p δ j E X j p. p Remark 3. In addition, if L 0 q(f j ) are linearly independent then the following equivalent condition can be added: (iv) L 0 q(f )... L 0 q(f d ) is a closed subspace of L q (F I ). Proof of Lemma 3.2. (iii) (i) Consider the linear bounded operator T : L p l p (L 0 p(f j )) defined by Z (E(Z F j ) : j =,..., d). The conjugate operator T : l q L q is given by (X j ) d j= X j. Coercivity criterion for T being onto is T (X j ) Lq δ (X j ) lq, which is (2), see [3, Theorem 4.5]. Therefore (i) follows. The left-inverse operator has l p L p operator norm T /δ, which gives the estimate of the norm of Z as claimed. (i) (ii) If there is additive solution, then X j are consistent and Remark 2. implies that there is the minimal norm solution. (ii) (iii) This is a simple operator coercivity of linear operators analysis, see [3, Theorem 4.5]. Namely, if for all X j there is Z such that () holds, then the linear operator T : L 0 p L 0 p(f )... L 0 p(f d ) given by Z (E j (Z)) is onto. Therefore the conjugate operator satisfies and inequality (2) follows. T (X,..., X d ) q δ (X,..., X d ) lq(l q(f j)) Proof of Remark 3.. (iv) (iii) If L 0 q(f )... L 0 q(f d ) is a closed subspace of L q (F I ) then (2) holds. Indeed, by linear independence, the linear operator (X +... + X d ) (X,..., X d ) is an injection of the Banach space L 0 q(f )... L 0 q(f d ) into L 0 q(f )... L 0 q(f d ) with l q norm. Since the range is closed, open mapping theorem ([3, Theorem 2.]) implies (2). (iii) (iv) is trivial. Proof of Theorem 2.. From the proof of Bryc & Smolenski [6, (7)] we have the left hand side of the inequality ce ɛ j X j q E X j q CE ɛ j X j q. j= j= j= 7

(The right hand side is [6, (7)].) By the Khinchin inequality this implies (). Note, that a more careful analysis gives explicit estimates for the constants involved. For q = 2 the above is replaced by R l + R l EXj 2 E j= j= X j 2 + R l R l EXj 2, which is given in [2, Lemma ]. Existence of the solution follows now from functional analysis: Consider the linear bounded (cf ()) operator T : L 0 p L 0 p(l 2 (F j )) defined by Z (E(Z F j ) : j =, 2,...). The conjugate operator T : L 0 q(l 2 ) L 0 q is given by (X j ) j= X j. Coercivity criterion for T being onto, see [3, Theorem 4.5], is T (X j ) Lq δ (X j ) Lq(l 2), which follows from (). Therefore the existence of a solution to () follows and the minimal solution exists by Remark 2.. For p = 2 inequalities () show that L 0 2(l 2 ) = l 2 ( L 0 2 (F j ) ) (X j ) generates the L 2 convergent series j X j. Denote by H the set of random variables represented by such series. By () H is closed and since the orthogonal projection onto H shrinks the norm, the minimal solution to () has to be in H, thus it is additive (4). The left-inverse operator has l 2 L 2 operator norm T ( R l +R l ) /2, which gives the estimate for the norm of Z as claimed. The uniqueness follows from () by Lemma 3.. Proof of Theorem 2.2. Use Theorem 2. to produce recurrently F,2 -measurable Z such that E (Z ) = X, E 2 (Z ) = X 2 ; F,2,3 -measurable Z 2 such that E,2 (Z 2 ) = Z, E 3 (Z 2 ) = X 3 ;. F,...,d -measurable Z d such that E,...,d (Z d ) = Z d, E d (Z d ) = X d. This shows that for all d < there is a solution to (), and hence there is a minimal solution. Therefore, by Lemma 3.2 there is an additive solution (4), and (2) holds. Notice that for d < inequality (2) implies (), which by Lemma 3. implies uniqueness. The inequality (2) for q = 2 follows recurrently from E( k j= X j+x k+ ) 2 ( ρ )(E( k j= X j) 2 + EXk+ 2 ). Proof of Theorem 2.3. To verify that the series in (7) converge, notice that for j i k E i...e ik L 0 2 (F j) L 0 2 rk. j= Therefore k ( ) k... i I i 2 I\i i k I\i k E i... E ik j I\i k (X j EX j ) 2 2d k (d ) k r k max X j 2. j Clearly, (4) holds true. We check now that Z defined by (7) satisfies (). To this end, without loss of generality we assume EX j = 0 and we verify () for j = only. Splitting the sum (7) we get E (Z) = ( ) k... k=0 i 2 I\ E E i2... E ik i k I\i k j I\i k (X j EX j ) 8

+ ( ) k k=0 i I\ i 2 I\i... i k I\i k E E i... E ik j I\i k (X j EX j ). The 0-th term of the sum on the left is X and the k-th term of the sum on the left cancels the (k ) term of the sum on the right. Therefore E (Z) = X. To prove the uniqueness, it suffices to notice that r < d implies linear independence. Alternatively, suppose that both Z = Z +...+Z d and Z = Z +...+Z d have the same conditional moments E. Then Z Z 2 = E (Z 2 +...+Z d ) E (Z 2 +...+Z d ) r d j=2 Z j Z j 2, and the similar estimate holds for all other components. Therefore d j= Z j Z j 2 r(d ) d j= Z j Z j 2. Since r < /(d ), this implies the sum vanishes, proving uniqueness. To prove the variance estimate notice that r < /(d ) implies (2) with p = 2 and δ 2 = r(d ). Indeed, E X j 2 (EXj 2 EXk) 2 /2. j= j= EX 2 j r j k The estimate now follows from the elementary inequality d j= k=,k j valid for arbitrary numbers a,..., a d. a k a j d j= k=,k j 2 (a2 k + a 2 j) = a 2 j, Lemma 3.3 If d <, vector spaces L 0 2(F j ) are linearly independent, and for all j d, k j, the operators E j E k : L 0 2(F k ) L 0 2(F j ) are compact, then R l < ; hence inequality (2) holds for q = 2. Proof of Lemma 3.3. The proof is similar to the proof of Proposition 2.9 with T = P S P Q, where S, Q are disjoint and P Q denotes the orthogonal projection onto the L 2 -closure of j Q L0 2(F j ); operator T is compact, compare [3, Proposition 5.3]. Details are omitted. Proof of Theorem 2.5. Take U L 2 (F S ), V L 2 (F T ) with disjoint S, T I and such that EU = EV = 0, EU 2 = EV 2 =, EUV = ρ. Then E(U V ) 2 = 2 2ρ 2 b B 2. Indeed, we have E(U V ) 2 = (U(x) V (x)) 2 f(x)dµ(x) b (U(x) V (x)) 2 dµ(x) IR d IR d = b (U(x) V (y)) 2 dµ(y)dµ(x) IR d IR d b B 2 (U(x) V (y)) 2 f(y)dµ(y)f(x)dµ(x) = 2b IR d IR d B 2. Since the above analysis can also be carried through for E(U + V ) 2, we get the following. Corollary 3.4 (c.f.[4] Lemma ) Under the assumption of Theorem 2.5, for V j L 0 2(F j ) we have E V +... + V d 2 b 2B 2 b (EV 2 +... + EVd 2 ) j= 9

Lemma 3.5 Let f be the density of the absolute continuous part of the distribution of ξ,..., ξ d (d < ) with respect to the product of marginals µ = µ... µ d. If f is strictly positive, i.e., µ({(x,..., x d ) : f(x,..., x d ) = 0}) = 0, then vector spaces L 0 2(F j ) are linearly independent. Proof of Lemma 3.5. Suppose X = X (ξ ),..., X d = X d (ξ d ) L 0 2 are non-zero. Denote by µ the product of marginal measures on IR d and let A ɛ = {(x,..., x d ) : f(x,..., x d ) > ɛ}. Choose ɛ > 0 such that X j 2 dµ < EX 2 2 j. Then A c ɛ This proves linear independence of X j. E X j 2 X j (x j ) 2 f(x,..., x d )dµ A ɛ ɛ X j 2 dµ ɛ EX 2 A ɛ 2 j > 0. Proof of Proposition 2.8. This follows the proof of Proposition 3.5 with ɛ = b. Namely, E X j q b X j q dµ ce( Xj 2 ) q/2. IR d The last inequality holds by the Marcinkiewicz-Zygmund inequality, because under µ random variables X j are independent and centered. 4 Example The following simple example illustrates sharpness of some moment estimates. Example 4. Let d <. Suppose X,..., X d are in L 2, centered and have linear regressions, i.e., there are constants a i,j such that E(X i X j ) = a i,j X j for all i, j (for instance, this holds true for (X,..., X d ) with elliptically contoured distributions, or when each X j is two-valued). Let C = [C i,j ] be the covariance matrix. Clearly, if either R l < or r < /(d ), then C is non degenerate. Explicit solutions illustrating Theorems 2., 2., and 2.3 are then possible: It is easy to check that Z = d j= θ jx j, where θ.. θ d = C.., satisfies E(Z X j ) = X j for all j, and it clearly is additive. Moreover, if corr(x i, X j ) = ρ doesn t depend on i j, then Z = + (d )ρ (X +... + X d ). In particular, for d = 2 we have Z = +ρ (X + X 2 ). For negative ρ, those point out the sharpness of estimates for EZ 2. (It is easy to see directly from E(X +...+X d ) 2 = d+d(d )ρ R 2R +R that (d )ρ +R >, provided R l <. Therefore Z is well defined.) Acknowledgement: The subject of this paper arose in the conversation with M. Pourahmadi, who also provided relevant references, helpful comments and additional materials. Part of the research was done at the Institute for Mathematics and Applications of the University of Minnesota and at the Center for Stochastic Processes in Chapel Hill. 0

References [] N. Aronszajn, Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 (950), 337 404. [2] R. Bradley, On the spectral density and asymptotic normality of weakly dependent random fields. Journ. Theoret. Probab. 5 (992), 355 373. [3] L. Breiman & J. H. Friedman, Estimating Optimal Transformations for Multiple Regression and Correlation. JASA 80 (985), 580 598. [4] W. Bryc, Conditional expectation with respect to dependent σ-fields. In: Proceedings of the Seventh Conference on Probability Theory (Brasov), Ed. M. Iosifescu, Bucuresti, 984, 409 4. [5] W. Bryc & S. Kwapień, On the conditional expectation with respect to a sequence of independent σ-fields. Z. Wahrsch. Verw. Gebiete 46 (979), 22 225. [6] W. Bryc & W. Smoleński, Moment conditions for almost sure convergence of weakly correlated random variables. Proc. Amer. Math. Soc. 9 (993), 629 635. [7] A. Buja & T. Hastie & R. Tibshirani, Linear smoothers and additive models. Ann. Statist. 7 (989), 453 50. [8] C. Groetsch, Inverse Problems in the Mathematical Sciences. Vieweg, Braunschweig (Germany), 993. [9] H. G. Kellerer, Masstheoretische Marginalproblem, Math. Ann. 53 (964), 68 98. [0] J. Komlos, A generalization of a problem of Steinhaus. Acta Math. Hung. 8 (967), 27 229. [] B. F. Logan & L. A. Shepp, Optimal reconstruction of a function from its projections. Duke Math. Journ. 42 (975), 645 659. [2] G. C. Rota, An alterninde Verfahren for general positive operators. Bull. Amer. Math. Soc. 68 (962), 95 02. [3] W. Rudin, Functional Analysis. McGraw, New York 973. [4] C. J. Stone, Additive regression and other nonparametric models. Ann. Statist. 3 (985), 689 705. [5] V. Strassen, The existence of probability measures with given marginals, Ann. Math. Statist. 36 (965), 423 439.