Representation of bi-parameter singular integrals by dyadic operators

Available online at wwwsciencedirectcom Advances in Mathematics 229 2012 1734 1761 wwwelseviercom/locate/aim Representation of bi-parameter singular integrals by dyadic operators Henri Martikainen 1 Department of Mathematics and Statistics, University of Helsinki, POB 68, FI-00014 Helsinki, Finland Received 26 October 2011; accepted 16 December 2011 Available online 23 December 2011 Communicated by Charles Fefferman Abstract We prove a dyadic representation theorem for bi-parameter singular integrals That is, we represent certain bi-parameter operators as averages of rapidly decaying sums of what we call bi-parameter shifts A new version of the product space T 1 theorem is established as a consequence 2011 Elsevier Inc All rights reserved MSC: 42B20 Keywords: Haar shift; Bi-parameter singular integral; T 1 theorem; Non-homogeneous analysis 1 Introduction We study certain bi-parameter singular integrals T acting on some class of functions with product domain R n+m = R n R m Our aim is to prove a representation theorem for them as an average of bi-parameter shifts S: Tf,g=C T E wn E wm i 1,i 2 Z 2 + j 1,j 2 Z 2 + 2 maxi 1,i 2 δ/2 2 maxj 1,j 2 δ/2 S i 1i 2 j 1 j 2 D n D m f,g E-mail address: henrimartikainen@helsinkifi 1 The author is supported by the Academy of Finland through the project L p methods in harmonic analysis 0001-8708/$ see front matter 2011 Elsevier Inc All rights reserved doi:101016/jaim201112019

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1735 Here the average is taken over all the dyadic grids D n in R n parametrized by the random parameter w n and all the dyadic grids D m in R m parametrized by the random parameter w m An exact formulation of everything is given after the introduction Such a general representation theorem exists for ordinary Calderón Zygmund operators, and this was proven by Hytönen [7] in connection with the proof of the A 2 conjecture for singular integrals Various earlier representation theorems also exist For example, the sharp weighted bound for the Hilbert transform was obtained by Petermichl [18] using, among other things, a representation theorem for the Hilbert transform The general Haar shift philosophy was introduced by Lacey, Petermichl and Reguera [14] In the one-parameter case the general representation theorem by Hytönen has already been utilized several times after [7] The simplified proof of the A 2 conjecture by Hytönen, Pérez, Treil and Volberg [11] offered among other things a bit easier formulation of the representation theorem In [9] the author together with Hytönen, Lacey, Orponen, Reguera, Sawyer and Uriarte- Tuero used the representation theorem to study sharp weak and strong type weighted bounds for maximal truncations T # Modifying the metric randomization by Hytönen and the author [10] these representation theorems were lifted to the generality of metric spaces by Nazarov, Reznikov and Volberg [16] Several other applications in the weighted context also already exist The reason why the representation theorem is so useful in the one-parameter case is that it can be used to reduce problems considering a general singular integral T into purely dyadic problems considering shifts only Because of this, there is no particular reason why the applications should be limited to weighted questions This just happens to be the case, since the representation theorem was originally developed for this purpose and is still a very new result All in all, there is good motivation for us to develop the analogous representation theory in the bi-parameter case As is well known, multi-parameter analysis is generally quite a bit more difficult than one-parameter analysis It would, of course, be interesting to study sharp weighted theory in the bi-parameter setting Our theorem might be useful for this, however, it should be a very difficult problem Regarding multi-parameter singular integrals, and multi-parameter harmonic analysis in general, there is a very large existing theory After the classical T 1 and Tbtype theory by David and Journé [2] and David, Journé and Semmes [3], the first T 1 type theorem for product spaces was proved by Journé [12] Regarding other classical theory, we only mention the work of Chang and Fefferman [1], Fefferman [4] and Fefferman and Stein [5] These three concern singular integrals and various spaces, like the BMO, on the product setting There is a wide body of more recent developments of which we here only mention the papers by Ferguson and Lacey [6], Lacey and Metcalfe [13] and Muscalu, Pipher, Tao and Thiele [15] These have to do with various multi-parameter paraproducts and characterizations for some product spaces Some bi-parameter paraproducts also appear in our proof Thus, the product BMO space is important for us The classical multi-parameter singular integral theory of Journé [12] involves formulations written in the language of vector-valued Calderón Zygmund theory Very recently Pott and Villarroya [19] formulated and proved a new type of T 1 theorem for product spaces There such vector-valued formulations are replaced by several new mixed type conditions Here we define our bi-parameter operators inspired by [19] The conditions we use are not exactly the same We, for example, do not work with smooth testing conditions Establishing the correct shift structure is our primary task However, we do get, as a pleasant by-product, a pretty nice formulation and proof of the product space T 1 theorem In this paper we use the superbly useful machinery of non-homogeneous analysis pioneered by Nazarov, Treil and Volberg see for example [17] in the context of bi-parameter theory The use of non-homogeneous analysis gives additional decay for certain matrix elements in-

1736 H Martikainen / Advances in Mathematics 229 2012 1734 1761 volved in the expansion of Tf,g Just like in Hytönen s proof of the representation theorem for one-parameter singular integrals, the proof is a T 1 style proof with ingredients from nonhomogeneous analysis We follow the basic idea from Hytönen s recent lecture notes [8] However, we have to deal with the much added complexity of the bi-parameter situation Indeed, there are more cases than in the one-parameter setting, and many of these are interesting mixed type phenomena The non-homogeneous analysis makes this splitting into cases nicely transparent getting rid of rare geometric complications 2 Definitions, strategy and the main result 21 Structural assumptions Let us formulate the Calderón Zygmund structure of our operators The basic assumption is that if f = f 1 f 2 meaning fx= f 1 x 1 f 2 x 2 for x = x 1,x 2 and g = g 1 g 2 with f 1,g 1 : R n C, f 2,g 2 : R m C, sptf 1 spt g 1 = and spt f 2 spt g 2 =, then we have the kernel representation Tf,g= R n+m R n+m Kx,yfygxdx dy The kernel K: R n+m R n+m \{x, y R n+m R n+m : x 1 = y 1 or x 2 = y 2 } C is assumed to satisfy the size condition and the Hölder conditions Kx,y C 1 x 1 y 1 n 1 x 2 y 2 m Kx,y K x, y 1,y 2 K x, y 1,y 2 + K x,y C y 1 y 1 δ x 1 y 1 n+δ y 2 y 2 δ x 2 y 2 m+δ whenever y 1 y 1 x 1 y 1 /2 and y 2 y 2 x 2 y 2 /2, Kx,y K x1,x 2,y K x 1,x 2,y + K x,y C x 1 x 1 δ x 1 y 1 n+δ x 2 x 2 δ x 2 y 2 m+δ whenever x 1 x 1 x 1 y 1 /2 and x 2 x 2 x 2 y 2 /2, Kx,y K x 1,x 2,y K x, y 1,y 2 + K x1,x 2, y 1,y 2 C y 1 y 1 δ x 1 y 1 n+δ x 2 x 2 δ x 2 y 2 m+δ

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1737 whenever y 1 y 1 x 1 y 1 /2 and x 2 x 2 x 2 y 2 /2, and Kx,y K x, y 1,y 2 K x 1,x 2,y + K x 1,x 2, y1,y 2 C x 1 x 1 δ x 1 y 1 n+δ y 2 y 2 δ x 2 y 2 m+δ whenever x 1 x 1 x 1 y 1 /2 and y 2 y 2 x 2 y 2 /2 Furthermore, we assume the mixed Hölder and size conditions whenever x 1 x 1 x 1 y 1 /2, whenever y 1 y 1 x 1 y 1 /2, Kx,y K x 1,x 2,y C x 1 x 1 δ 1 x 1 y 1 n+δ x 2 y 2 m Kx,y K x, y 1,y 2 C y 1 y 1 δ 1 x 1 y 1 n+δ x 2 y 2 m Kx,y K x1,x 2,y 1 x 2 x C 2 δ x 1 y 1 n x 2 y 2 m+δ whenever x 2 x 2 x 2 y 2 /2, and Kx,y K x, y 1,y 2 1 y 2 y 2 C δ x 1 y 1 n x 2 y 2 m+δ whenever y 2 y 2 x 2 y 2 /2 We use, for minor convenience, l metrics on R n and R m We also need some Calderón Zygmund structure on R n and R m separately If f = f 1 f 2 and g = g 1 g 2 with spt f 1 spt g 1 =, then we assume the kernel representation Tf,g= R n R n K f2,g 2 x 1,y 1 f 1 y 1 g 1 x 1 dx 1 dy 1 The kernel K f2,g 2 : R n R n \{x 1,y 1 R n R n : x 1 = y 1 } is assumed to satisfy the size condition and the Hölder conditions K f2,g 2 x 1,y 1 1 Cf 2,g 2 x 1 y 1 n K f2,g 2 x 1,y 1 K f2,g 2 x 1,y 1 Cf 2,g 2 x 1 x 1 δ x 1 y 1 n+δ

1738 H Martikainen / Advances in Mathematics 229 2012 1734 1761 whenever x 1 x 1 x 1 y 1 /2, and K f2,g 2 x 1,y 1 K f2,g 2 x1,y 1 Cf 2,g 2 y 1 y 1 δ x 1 y 1 n+δ whenever y 1 y 1 x 1 y 1 /2 Let A denote the Lebesgue measure of a set A and χ A be the characteristic function of A We need the above representations and some control for Cf 2,g 2 only in the diagonal in the following sense For every cube V R m we assume that there holds Cχ V,χ V + Cχ V,u V + Cu V,χ V C V, whenever u V is such a function that spt u V V, u V 1 and u V = 0 Functions u V are called V -adapted with zero-mean so V -adapted means just the first two conditions on the support and size We also assume the analogous representation and properties with a kernel K f1,g 1 in the case spt f 2 spt g 2 = 22 Boundedness and cancellation assumptions Define the partial adjoint T 1 of T by setting T1 f 1 f 2, g 1 g 2 = Tg1 f 2, f 1 g 2 We assume that T 1,T 1,T 1 1 and T 1 1 belong to the product BMO on Rn R m We recall the definition of this space later in this section We assume that Tχ K χ V, χ K χ V C V for every cube K R n and V R m This is the weak boundedness property for T We also assume the following diagonal BMO conditions: for every cube K R n and V R m and for every zero-mean functions a K and b V which are K and V adapted respectively one has spt a K K, a K 1 and a K = 0, and similarly for b V : i Ta K χ V, χ K χ V C V, ii Tχ K χ V, a K χ V C V, iii Tχ K b V, χ K χ V C V, iv Tχ K χ V, χ K b V C V 23 Haar functions Let h I be an L 2 normalized Haar function related to I D n, where D n is a dyadic grid on R n With this we mean that h I, I = I 1 I n, is one of the 2 n functions h η I, η = η 1,,η n {0, 1} n, defined by h η I = hη 1 I 1 h η n I n, where h 0 I i = I i 1/2 χ Ii and h 1 I i = I i 1/2 χ Ii,l χ Ii,r for every i = 1,,nHereI i,l and I i,r are the left and right halves of the interval I i respectively If η 0 the Haar function is cancellative: h I = 0 All the cancellative Haar functions form an orthonormal basis of L 2 R n If a L 2 R n we may thus write a = I D n η {0,1} n \{0} a,hη I hη I However, we suppress the finite η summation and just write a = I a,h I h I Given a dyadic grid D m on R m and a cube J D m, we denote an L 2 normalized Haar function on J by u J

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1739 24 Product BMO on R n R m Let us be given a dyadic grid D n in R n and a dyadic grid D m in R m We define the square function [ S Dn D m f = K D n V D m f,hk u V 2 χ K χ V V ] 1/2 Then the product Hardy space HD 1 n D m R n R m consists of the locally integrable functions f with f H 1 DnDm R n R m = S D n D m f 1 < The dual of this space is the product BMO space BMO Dn D m R n R m For us, the condition that b {T1,T 1,T 1 1, T1 1} is in the product BMO is defined to mean that b BMODnDm R n R m C with every dyadic grid D n in R n and every dyadic grid D m in R m For a detailed discussion, with emphasis on the dyadic setting, about Hardy and BMO spaces in the product setting see Treil [20] 25 Bi-parameter shifts A bi-parameter shift on R n R m is tied to a dyadic grid D n on R n, a dyadic grid D m on R m and non-negative integers i 1,i 2,j 1,j 2 Such an operator is denoted by S i 1i 2 j 1 j 2 D n D m and is of the form S i 1i 2 j 1 j 2 D n D m f = A i1i2j1j2 KV f, V D m K D n where A i 1i 2 j 1 j 2 KV f = I 1,I 2 K li 1 =2 i 1 lk li 2 =2 i 2 lk J 1,J 2 V lj 1 =2 j 1 lv lj 2 =2 j 2 lv a I1 I 2 KJ 1 J 2 V f,h I1 u J1 h I2 u J2 with a I1 I 2 KJ 1 J 2 V I 1 1/2 I 2 1/2 J 1 1/2 J 2 1/2 V Here, of course, I 1,I 2 D n and J 1,J 2 D m, and li denotes the side length of a cube I Itis also required that all the subshifts S i 1i 2 j 1 j 2 AB = K A V B A i 1i 2 j 1 j 2 KV f, A D n, B D m, map L 2 R n R m L 2 R n R m with norm at most one If all of the Haar functions h I1, h I2, u J1, u J2 appearing are cancellative, the shift is called cancellative Otherwise, it is called noncancellative The last requirement concerning the L 2 boundedness of all of the subshifts follows from the other conditions for cancellative shifts

1740 H Martikainen / Advances in Mathematics 229 2012 1734 1761 In practice, it is useful to observe that a bi-parameter shift S of type i 1,i 2,j 1,j 2 related to some dyadic grids is simply of the form Sf x = A KV fx= 1 K AKV x, yf y dy K V K,V K,V K V = K S x, yf y dy, R n+m where first of all spt K AKV K V K V and K AKV x, y 1 Moreover, K AKV is constant with respect to x on dyadic rectangles I J K V for which li < 2 i 2lK and lj < 2 j 2lV, and K AKV is constant with respect to y on dyadic rectangles I J K V for which li < 2 i 1lK and lj < 2 j 1lV Note also that clearly K S x, y 1 1 C x 1 y 1 n x 2 y 2 m 26 Random dyadic grids and the basic averaging formula Let w n = wn i i Z and w m = wm j j Z, where wn i {0, 1}n and wm j {0, 1} m LetDn 0 and Dm 0 be the standard dyadic grids on Rn and R m respectively In R n we define the new dyadic grid D n ={I + i:2 i <li 2 i wn i : I D0 n }={I + w n: I Dn 0 }, where we simply have defined I + w n := I + i:2 i <li 2 i wn i The dyadic grid D m in R m is similarly defined There is a natural product probability structure on {0, 1} n Z and {0, 1} m Z So we have independent random dyadic grids D n and D m in R n and R m respectively Even if n = m we need two independent grids A cube I D n is called bad if there exists I D n so that l I 2 r li and di, I 2lI γ nl I 1 γ nhereγ n = δ/2n + 2δ, where δ>0appears in the kernel estimates One notes that πgood n := P w n I + w n is good is independent of I Dn 0 The parameter r is a fixed constant so that πgood n,πm good > 0 Furthermore, it is important to note that for a fixed I D0 n the set I +w n depends on wn i with 2 i <li, while the goodness of I + w n depends on wn i with 2 i li In particular, these notions are independent Analogous definitions and remarks related to D m hold We prove the basic averaging formula of Hytönen [7] but in the bi-parameter setting This is the only part of the proof where probabilistic arguments are needed, and here independence plays a big role, even more so in the bi-parameter setting We note that the functions f and g in this paper are always taken from some particularly nice dense subset of functions 21 Proposition There holds Tf,g=CE χ good I 1,I 2 D n J 1,J 2 D m smalleri1,i 2 χ good smallerj1,j 2 Th I1 u J1, h I2 u J2 f,hi1 u J1 g,h I2 u J2, where E = E wn E wm and C = 1/π n good π m good

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1741 22 Remark Here all the appearing Haar functions are, of course, cancellative and we recall that the finite summations over the 2 n 1or2 m 1 different cancellative Haar functions per cube are simply suppressed from the notation Proof of Proposition 21 Define f,h I 1 y = fx,yh I x dx, y R m We may write f = h I1 f,h I1 1 = h I1 +w n f,h I1 +w n 1 I 1 D n I 1 Dn 0 so that by independence Tf,g=E wn T hi1 +w n f,h I1 +w n 1,g = 1 π n good I 1 D 0 n E wn I 1 D 0 n χ good I 1 + w n T h I1 +w n f,h I1 +w n 1,g After expanding g similarly as f above, one sees that this equals 1 πgood n E wn I 1,I 2 Dn 0 = 1 πgood n E wn + E wn χ good I 1 + w n T h I1 +w n f,h I1 +w n 1,hI2 +w n g,h I2 +w n 1 I 1,I 2 D 0 n li 1 li 2 I 1,I 2 D 0 n li 1 >li 2 χ good I 1 + w n T h I1 +w n f,h I1 +w n 1,hI2 +w n g,h I2 +w n 1 T hi1 +w n f,h I1 +w n 1,hI2 +w n g,h I2 +w n 1 Here we again used independence in the latter summation Comparing to the trivial representation Tf,g=E wn I 1,I 2 D 0 n T hi1 +w n f,h I1 +w n 1,hI2 +w n g,h I2 +w n 1 we conclude that π n good E w n T hi1 +w n f,h I1 +w n 1,hI2 +w n g,h I2 +w n 1 I 1,I 2 D 0 n li 1 li 2 = E wn I 1,I 2 D 0 n li 1 li 2 χ good I 1 + w n T h I1 +w n f,h I1 +w n 1,hI2 +w n g,h I2 +w n 1 First expanding g and proceeding like above one gets the symmetric formula

1742 H Martikainen / Advances in Mathematics 229 2012 1734 1761 π n good E w n T hi1 +w n f,h I1 +w n 1,hI2 +w n g,h I2 +w n 1 I 1,I 2 D 0 n li 2 <li 1 = E wn I 1,I 2 D 0 n li 2 <li 1 χ good I 2 + w n T h I1 +w n f,h I1 +w n 1,hI2 +w n g,h I2 +w n 1 Splitting the trivial representation into these two parts allows us to conclude that Tf,g= 1 π n good E wn I 1,I 2 D n χ good We now expand on R m One may write f,h I1 1 = smalleri1,i 2 T h I1 f,h I1 1,hI2 g,h I2 1 J 1 D m f,h I1 u J1 u J1 so that h I1 f,h I1 1 = J 1 D m f,h I1 u J1 h I1 u J1 We may then follow the recipe from above: insert this to the above formula for Tf,g, add goodness to J 1 by independence, expand h I2 g,h I2 1, split the summation to lj 1 lj 2 and lj 1 > lj 2, remove the goodness from J 1 in the latter summation by independence and, finally, compare to the appropriate trivial identity One also does the symmetric thing, where one first expands h I2 g,h I2 1 and adds the goodness to J 2 Combining these gives the claim of the proposition 23 Remark One may also use full expansions like f = I 1 D n J 1 D m f,h I1 u J1 h I1 u J1 in the beginning of the proof Following the usual trickery this leads to the formula Tf,g= 1 πgood n E χ good I 1,I 2 D n J 1,J 2 D m smalleri1,i 2 Th I1 u J1, h I2 u J2 f,hi1 u J1 g,h I2 u J2 Here it may at first seem that there is no longer enough independence to add the goodness to J 1 However, one may simply write the summation as smalleri1,i 2 Th I1 u J1, g I2 f,hi1 u J1, χ good I 1,I 2 D n J 1 D m where one realizes that g I2 = J 2 D m g,h I2 u J2 h I2 u J2 = h I2 g,h I2 1

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1743 does not depend on w m Then one may add the goodness to J 1 using independence and repeat the basic recipe to get the proposition 27 Strategy and formulation of the main theorem We fix the random variables w n and w m which fix the dyadic grids D n and D m respectively Then we study the summation ThI1 u J1, h I2 u J2 f,hi1 u J1 g,h I2 u J2 li 1 li 2 I 1 good lj 1 lj 2 J 1 good We more often than not suppress from the notation the important fact that I 1 and J 1 are good Then we perform the splitting li 1 li 2 = li 1 li 2 di 1,I 2 >li 1 γn li 2 1 γn + + + I 1 I 2 I 1 =I 2 li 1 li 2 di 1,I 2 li 1 γn li 2 1 γn I 1 I 2 = and similarly for the summation over the grid D m HeredA,B denotes the distance of the sets A and B recall that we use the l metric The first sum is the separated sum, then we have the inside sum, the equal sum and the nearby sum The summation over both the grids is split in to various types which also includes several mixed types The list is: separated/separated, separated/inside, separated/equal, separated/nearby, inside/inside, inside/equal, inside/nearby, equal/equal, equal/nearby, nearby/nearby and some symmetric mixed sums It seems reasonable to deal with these separately Note that actually the mixed sums where li 1 li 2 and lj 1 > lj 2 or li 1 > li 2 and lj 1 lj 2 are not completely symmetrical to this case However, the relevant difference is only in the full paraproduct that appears in the corresponding inside/inside part There one gets a bit different paraproducts, which are related to the assumptions that T 1 1 and T1 1 belong to the product BMO of R n R m We comment more on this on Remark 72 The goal is to represent all of these different parts as a sum of shifts with a good decay factor in front Combining all these cases together leads to our main theorem: 24 Theorem For a bi-parameter singular integral operator T as defined above, there holds for some bi-parameter shifts S i 1i 2 j 1 j 2 D n D m that Tf,g=C T E wn E wm i 1,i 2 Z 2 + j 1,j 2 Z 2 + 2 maxi 1,i 2 δ/2 2 maxj 1,j 2 δ/2 S i 1i 2 j 1 j 2 D n D m f,g, where non-cancellative shifts may only appear if i 1,i 2 = 0, 0 or j 1,j 2 = 0, 0 25 Corollary A bi-parameter singular integral T as defined above is L 2 bounded,

1744 H Martikainen / Advances in Mathematics 229 2012 1734 1761 We note that all of the appearing non-cancellative shifts will have a certain paraproduct structure, and this structure is explicit in the proof For example in [9], where the one-parameter representation theorem is applied, it is important to know the explicit structure of the noncancellative shifts The rest of the paper is dedicated to the piece by piece proof of this theorem We use X Y to mean X CY for some constant C and X Y to mean Y X Y Of course, we cannot absorb just any constants, but only ones that depend on the dimensions or the various constants from the assumptions concerning T 3 Separated/separated Let I 1 I 2 = K D n,k I 1 I 2 K and J 1 J 2 = V D m,v J 1 J 2 V By [8, Lemma 37] the separation conditions together with goodness imply that such minimal cubes exist, li 1 γ nli 1 I 2 1 γ n di 1,I 2 and lj 1 γ mlj 1 J 2 1 γ m dj 1,J 2 Let us write li 1 li 2 di 1,I 2 >li 1 γn li 2 1 γn = i 2 1 j 2 1 i 1 i 2 j 1 j 2 K D n V D m lj 1 lj 2 dj 1,J 2 >lj 1 γm lj 2 1 γm di 1,I 2 >li 1 γn li 2 1 γn I 1 I 2 =K li 1 =2 i 1 lk,li 2 =2 i 2 lk dj 1,J 2 >lj 1 γm lj 2 1 γm J 1 J 2 =V lj 1 =2 j 1 lv,lj 2 =2 j 2 lv 31 Lemma For I 1,I 2,J 1,J 2 in the above summation, we have the estimate Th I1 u J1, h I2 u J2 I 1 1/2 I 2 1/2 = 2 i 1δ/2 I 1 1/2 I 2 1/2 J 1 1/2 J 2 1/2 V li1 δ/2 lj1 δ/2 lk lv 2 j 1δ/2 J 1 1/2 J 2 1/2 V Proof Given a cube I we denote by c I its center We may write ThI1 u J1, h I2 u J2 = I 1 J 1 I 2 J 2 Kx,yh I1 y 1 u J1 y 2 h I2 x 1 u J2 x 2 dxdy, where we may, using cancellation, replace Kx,y by Kx,y K x,y 1,c J1 K x,c I1,y 2 + K x,c I1,c J1

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1745 Since y 1 c I1 li 1 /2 1 2 li 1 γ nli 2 1 γ n di 1,I 2 /2 x 1 c I1 /2 and similarly y 2 c J1 x 2 c J1 /2, we have Kx,y K x,y1,c J1 K x,c I1,y 2 + K x,c I1,c J1 y 1 c I1 δ y 2 c J1 δ x 1 c I1 n+δ x 2 c J1 m+δ li 1 δ di 1,I 2 n δ lj 1 δ dj 1,J 2 m δ li 1 δ[ li 1 γ n lk 1 γ n ] n δ lj1 δ[ lj 1 γ m lv 1 γ m ] m δ = li 1 δ/2 lk δ/2 1 lj 1 δ/2 lv δ/2 V 1 Here we used li 1 γ nlk 1 γ n di 1,I 2 and γ n n + γ n δ = δ/2 and the analogous estimates involving J 1, J 2, V and m Recalling the L 2 normalization of the Haar functions and the fact that li 1 /lk = 2 i 1 and lj 1 /lv = 2 j 1 completes the proof We write ThI1 u J1, h I2 u J2 f,hi1 u J1 g,h I2 u J2 = C2 i1δ/2 2 j 1δ/2 Th I 1 u J1, h I2 u J2 C2 i1δ/2 2 j f,hi1 u J1 h I2 u J2,g 1δ/2 Define a I1 I 2 KJ 1 J 2 V = Th I 1 u J1, h I2 u J2 C2 i 1δ/2 2 j 1δ/2 if all the various goodness and separation conditions appearing in the summations are satisfied, and otherwise set a I1 I 2 KJ 1 J 2 V = 0 This enables us to write ThI1 u J1, h I2 u J2 li 1 li 2 di 1,I 2 >li 1 γn li 2 1 γn f,h I1 u J1 g,h I2 u J2 lj 1 lj 2 dj 1,J 2 >lj 1 γm lj 2 1 γm in the form C i 2 1 j 2 1 i 1 i 2 j 1 j 2 2 i 1δ/2 2 j 1δ/2 K,V A i 1 i 2 j 1 j 2 KV f,g, where A i 1i 2 j 1 j 2 KV f = I 1,I 2 K li 1 =2 i 1 lk li 2 =2 i 2 lk J 1,J 2 V lj 1 =2 j 1 lv lj 2 =2 j 2 lv a I1 I 2 KJ 1 J 2 V f,h I1 u J1 h I2 u J2

1746 H Martikainen / Advances in Mathematics 229 2012 1734 1761 with a I1 I 2 KJ 1 J 2 V I 1 1/2 I 2 1/2 J 1 1/2 J 2 1/2 V The corresponding bi-parameter shift with indices i 1,i 2,j 1,j 2 is by definition S i 1i 2 j 1 j 2 f = K,V A i 1i 2 j 1 j 2 KV f 4 Separated/inside As J 1 J 2, there is a child J 2,1 of J 2 such that J 1 J 2,1 We decompose ThI1 u J1, h I2 u J2 = ThI1 u J1, h I2 s J1 J 2 +u J2 J1 ThI1 u J1, h I2 1, where s J1 J 2 = χ J c 2,1 [u J2 u J2 J2,1 ] The relevant properties of s J1 J 2 are s J1 J 2 2 J 2 1/2 and spt s J1 J 2 J2,1 c We write li 1 li 2 J 1 J 2 di 1,I 2 >li 1 γn li 2 1 γn = i 2 1 i 1 i 2 j 1 1 K D n J 2 D m di 1,I 2 >li 1 γn li 2 1 γn I 1 I 2 =K li 1 =2 i 1 lk,li 2 =2 i 2 lk J 1 J 2 lj 1 =2 j 1 lj 2 41 Lemma For I 1,I 2,J 1,J 2 in the above summation, we have the estimate Th I1 u J1, h I2 s J1 J 2 I 1 1/2 I 2 1/2 J 1 1/2 J 2 1/2 li1 lk = 2 i 1δ/2 I 1 1/2 I 2 1/2 2 j 1δ/2 J 1 1/2 J 2 1/2 δ/2 lj1 δ/2 lj 2 Proof There is good separation by the goodness of J 1 if lj 1 <2 r lj 2 Indeed, in this case there holds dj 1,J c 2,1 2lJ 1 γ mlj 2,1 1 γ m lj 1 γ mlj 2 1 γ m Then we may write ThI1 u J1, h I2 s J1 J 2 = I 1 J 1 I 2 J2,1 c Kx,yh I1 y 1 u J1 y 2 h I2 x 1 s J1 J 2 x 2 dxdy,

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1747 and replace Kx,y by Kx,y Kx,y 1,c J1 Kx,c I1,y 2 + Kx,c I1,c J1 using the cancellation of u J1 and h I1 We may utilize the kernel estimates to get Kx,y K x,y 1,c J1 K x,c I1,y 2 + K x,c I1,c J1 li 1 δ/2 lk δ/2 1 lj 1 δ 1 x 2 c J1 m+δ This yields where Th I1 u J1, h I2 s J1 J 2 I 1 1/2 I 2 1/2 J c 2,1 dx 2 x 2 c J1 li1 δ/2 J 1 1/2 lk m+δ J 2 1/2 lj 1 δ R m \Bc J1,dJ 1,J c 2,1 J c 2,1 dx 2 x 2 c J1 m+δ, dx 2 x 2 c J1 m+δ d J 1,J c 2,1 δ lj1 δ/2 lj 2 δ/2 Therefore, we have Th I1 u J1, h I2 s J1 J 2 I 1 1/2 I 2 1/2 li1 lk δ/2 J 1 1/2 J 2 1/2 lj1 δ/2 lj 2 We still need to deal with the case 2 r lj 2 lj 1 lj 2 This time we split ThI1 u J1, h I2 s J1 J 2 = ThI1 u J1, h I2 χ 3J1 s J1 J 2 + Th I1 u J1, h I2 χ 3J1 cs J 1 J 2 We have that Th I1 u J1, h I2 χ 3J1 s J1 J 2 equals [ Kx,y K x,ci1,y 2 ] h I1 y 1 u J1 y 2 h I2 x 1 s J1 J 2 x 2 dxdy I 1 J 1 I 2 3J 1 \J 2,1 so we can estimate using the mixed Hölder and size estimate that Th I1 u J1, h I2 χ 3J1 s J1 J 2 I 1 1/2 I 2 1/2 li1 δ/2 J 1 1/2 J 2 1/2 lk J 1 3J 1 \J 1 1 x 2 y 2 m dx 2 dy 2

1748 H Martikainen / Advances in Mathematics 229 2012 1734 1761 I 1 1/2 I 2 1/2 I 1 1/2 I 2 1/2 li1 δ/2 J 1 1/2 lk J 2 1/2 li1 δ/2 J 1 1/2 lk J 2 1/2 lj1 δ/2 lj 2 In the term Th I1 u J1, h I2 χ 3J1 cs J 1 J 2 we have good separation everywhere, so the Hölder estimate for K yields Th I1 u J1, h I2 χ 3J1 cs J 1 J 2 I 1 1/2 I 2 1/2 I 1 1/2 I 2 1/2 I 1 1/2 I 2 1/2 li1 δ/2 J 1 1/2 lk J 2 1/2 lj 1 δ li1 δ/2 J 1 1/2 lk J 2 1/2 li1 δ/2 J 1 1/2 lk J 2 1/2 3J 1 c dx 2 x 2 c J1 m+δ lj1 δ/2 lj 2 The above lemma enables us to write ThI1 u J1, h I2 s J1 J 2 f,hi1 u J1 g,h I2 u J2 li 1 li 2 di 1,I 2 >li 1 γn li 2 1 γn J 1 J 2 in the form C i 2 1 2 i1δ/2 2 j 1δ/2 S i 1i 2 j 1 0 f,g i 1 i 2 j 1 1 Next, we deal with the series with the term u J2 J1 Th I1 u J1, h I2 1 This will yield shifts of the type i 1,i 2, 0, 0 which are non-cancellative their R m parts are paraproducts in a certain sense As these shifts will be non-cancellative, we will also have to worry about their L 2 boundedness properties Write u J2 J1 ThI1 u J1, h I2 1 f,h I1 u J1 g,h I2 u J2 J 1 J 2 = J 1 J 2 g,h I2 u J2 u J2 J 1 ThI1 u J1, h I2 1 f,h I1 u J1 = V g,hi2 1 V ThI1 u V, h I2 1 f,h I1 u V The summands can further be written in the form V 1/2 Th I1 u V, h I2 1 f,h I1 u V h I2 u 0 V,g,

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1749 where u 0 V = V 1/2 χ V Written in this way it is evident that we will have the required shift structure of the type i 1,i 2, 0, 0 42 Lemma The correct normalization holds Proof Let us first split Th I1 u V, h I2 1 I 1 1/2 I 2 1/2 li1 δ/2 V 1/2 lk ThI1 u V, h I2 1 = Th I1 u V, h I2 χ 3V + ThI1 u V, h I2 χ 3V c We have Th I1 u V, h I2 χ 3V V 1/2 V chv [ ThI1 χ V, h I2 χ 3V \V + Th I1 χ V, h I2 χ V ] For the first time, we use the kernel representations in R n to write Th I1 χ V, h I2 χ V in the form [ KχV,χ V x 1,y 1 K χv,χ V x 1,c I1 ] h I1 y 1 h I2 x 1 dx 1 dy 1 I 1 I 2 This gives that Th I1 χ V, h I2 χ V Cχ V,χ V I 1 1/2 I 2 1/2 V I 1 1/2 I 2 1/2 li1 lk li1 δ/2 lk δ/2 Notice that by the mixed Hölder and size estimates for K we have the same bound also for the term Th I1 χ V, h I2 χ 3V \V, and so there holds Th I1 u V, h I2 χ 3V I 1 1/2 I 2 1/2 li1 δ/2 V 1/2 lk The term Th I1 u V, h I2 χ 3V c is in control by the full kernel representation and the Hölder estimate for K These are non-cancellative shifts so we must separately demonstrate the L 2 boundedness For this, we prefer to write things in a different way:

1750 H Martikainen / Advances in Mathematics 229 2012 1734 1761 g,hi2 1 ThI1 u V, h I2 1 f,h I1 u V V V = g,hi2 1 V T h I2 1, h I1 1,u V f,hi1 1,u V V = C2 i 1δ/2 f,h I1 1, g,hi2 1 V b I 1 I 2,u V u V V = C2 i 1δ/2 f,h I1 1,Π bi1 I g,hi2 2 1 = C2 i 1δ/2 Πb f,hi1 I1 I 1, g,hi2 1 2 = C2 i 1δ/2 h I2 Π b I1 I 2 f,hi1 1,g, is the related paraproduct on R m de- where b I1 I 2 =T h I2 1, h I1 1 /C2 i1δ/2 and Π bi1 I 2 fined by the general formula Π b a = V a V b,u V u V 43 Lemma We have b I1 I 2 BMOR m with the bound b I1 I 2 BMOR m c I 1 1/2 I 2 1/2 Proof Let V be any cube in R m and a be any function in R m such that spt a V, a 1 and a = 0 It suffices to show that Th I1 a,h I2 1 I 1 1/2 I 2 1/2 li1 δ/2 V lk This is done by splitting 1 = χ 3V + χ 3V c before and using kernel estimates in a similar fashion as 44 Remark The strengthening of Lemma 42 to the related BMO estimate of Lemma 43 requires one to have the control Cu V,χ V C V for V -adapted functions u V with zero-mean It is precisely for these type of BMO reasons that merely the assumption Cχ V,χ V C V does not seem to be enough for the results of this paper Let us abbreviate I 1,I 2 K li 1 =2 i 1 lk,li 2 =2 i 2 lk = i 1,i 2 I 1,I 2 K We are ready to show the boundedness of our non-cancellative shifts of type i 1,i 2, 0, 0

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1751 45 Proposition There holds K i 1,i 2 I 1,I 2 K Proof There holds by orthogonality that h I2 Π b I1 I 2 f,hi1 1 2 f 2 K i 1,i 2 I 1,I 2 K h I2 Π b I1 I 2 f,hi1 1 2 2 = K K i 2 i 1 I 2 K I 1 K i 2 i1 I 2 K Π b I1 I 2 f,hi1 1 2 2 Πb f,hi1 I1 2 I 1 2 I 1 K 2 Let p i 1 K be the orthogonal projection from L 2 R n to span{h I1 : I 1 K, li 1 = 2 i 1lK} Write also f y x = fx,y There holds by the boundedness of paraproducts defined by BMO functions and the previous lemma that Therefore, we have Π bi1 f,hi1 2 I 1 I 1 1/2 I 2 1/2 f,hi1 2 1 2 I 1 1/2 I 2 1/2 p i 1K f y x 1/2 dxdy 2 K i 1,i 2 I 1,I 2 K R m I 1 h I2 Π b I1 I 2 f,hi1 1 2 1 i1 I 1 1/2 K I 1 K R m I 1 i1 1 i1 I 1 K I 1 K I 1 K p i 1 K f y x 2 dxdy K = R m R n R m f y 2 2 dy = f 2 2, 2 p i 1K f y x 1/2 2 2 dxdy R m I 1 p i 1 K f y x 2 dxdy where we again utilized orthogonality

1752 H Martikainen / Advances in Mathematics 229 2012 1734 1761 We end this section by concluding that u J2 J1 ThI1 u J1, h I2 1 f,h I1 u J1 g,h I2 u J2 li 1 li 2 J 1 J 2 di 1,I 2 >li 1 γn li 2 1 γn = C 2 i1δ/2 S i1i200 f,g i 2 1 i 1 i 2 5 Separated/equal There holds that Th I1 u V, h I2 u V I 1 1/2 I 2 1/2 li1 δ/2 lk Indeed, to see this, first estimate [ Th I1 u V, h I2 u V V 1 + V chv We have by the kernel representation in R n that For V V the estimate Th I1 χ V, h I2 χ V Th I1 χ V, h I2 χ V V,V chv V V Th I1 χ V, h I2 χ V ] Th I1 χ V, h I2 χ V Cχ V,χ V I 1 1/2 I 2 1/2 V I 1 1/2 I 2 1/2 V I 1 1/2 I 2 1/2 li1 lk li1 δ/2 lk δ/2 li1 δ/2 lk follows from the full kernel representation using the mixed Hölder and size estimate of K We may thus immediately write that ThI1 u V, h I2 u V f,hi1 u V g,h I2 u V li 1 li 2 di 1,I 2 >li 1 γn li 2 1 γn = C 2 i1δ/2 S i1i200 f,g, i 2 1 i 1 i 2 where in this case S i 1i 2 00 are cancellative shifts V

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1753 6 Separated/nearby For the J 1 and J 2 in the nearby summation it follows by [8, Lemma 37] that V = J 1 J 2 satisfies lv 2 r lj 1 Thus, we may write li 1 li 2 di 1,I 2 >li 1 γn li 2 1 γn = i 2 1 r j 1 i 1 i 2 j 1 =1 j 2 =1 K lj 1 lj 2 dj 1,J 2 lj 1 γm lj 2 1 γm J 1 J 2 = It is easy to get the required estimate V di 1,I 2 >li 1 γn li 2 1 γn I 1 I 2 =K li 1 =2 i 1 lk,li 2 =2 i 2 lk Th I1 u J1, h I2 u J2 I 1 1/2 I 2 1/2 li1 δ/2 lk dj 1,J 2 lj 1 γm lj 2 1 γm,j 1 J 2 = J 1 J 2 =V lj 1 =2 j 1 lv,lj 2 =2 j 2 lv by using the full kernel representation and the mixed Hölder and size estimate of K Therefore, we are able to realize this part in the form 7 Inside/inside We decompose C i 2 1 r j 1 i 1 i 2 j 1 =1 j 2 =1 2 i 1δ/2 2 j 1δ/2 S i 1i 2 j 1 j 2 f,g ThI1 u J1, h I2 u J2 = ThI1 u J1, s I1 I 2 s J1 J 2 +u J2 J1 ThI1 u J1, s I1 I 2 1 +h I2 I1 ThI1 u J1, 1 s J1 J 2 +h I2 I1 u J2 J1 ThI1 u J1, 1, where s I1 I 2 = χ I c 2,1 h I2 h I2 I2,1 and s J1 J 2 = χ J c 2,1 [u J2 u J2 J2,1 ] The relevant properties are spt s I1 I 2 I2,1 c,spts J 1 J 2 J2,1 c, s I 1 I 2 2 I 2 1/2 and s J1 J 2 2 J 2 1/2 71 Lemma There holds Th I1 u J1, s I1 I 2 s J1 J 2 I 1 1/2 li1 I 2 1/2 li 2 δ/2 J 1 1/2 J 2 1/2 lj1 δ/2 lj 2

1754 H Martikainen / Advances in Mathematics 229 2012 1734 1761 Proof In the case that li 1 <2 r li 2 and lj 1 <2 r lj 2 one may use the Hölder estimate of K In the case 2 r li 2 li 1 li 2 and 2 r lj 2 lj 1 lj 2 one splits ThI1 u J1, s I1 I 2 s J1 J 2 = ThI1 u J1, χ 3I1 s I1 I 2 χ 3J1 s J1 J 2 + Th I1 u J1, χ 3I1 s I1 I 2 χ 3J1 cs J 1 J 2 + Th I1 u J1, χ 3I1 cs I 1 I 2 χ 3J1 s J1 J 2 + Th I1 u J1, χ 3I1 cs I 1 I 2 χ 3J1 cs J 1 J 2 The first term is controlled by the size estimate of the full kernel: Th I1 u J1, χ 3I1 s I1 I 2 χ 3J1 s J1 J 2 I 1 1/2 I 2 1/2 dx 1 dy 1 x 1 y 1 n J 1 1/2 J 2 1/2 I 1 3I 1 \I 1 I 1 1/2 J 1 1/2 I 2 1/2 J 2 1/2 I 1 1/2 li1 I 2 1/2 li 2 δ/2 J 1 1/2 J 2 1/2 lj1 δ/2 lj 2 J 1 3J 1 \J 1 dx 2 dy 2 x 2 y 2 m The two terms after that are controlled using the mixed size and Hölder estimates of K The last term is controlled using the Hölder estimate of K The mixed cases where 2 r li 2 li 1 li 2 and lj 1 <2 r lj 2 or li 1 <2 r li 2 and 2 r lj 2 lj 1 lj 2 are handled similarly The above lemma shows that ThI1 u J1, s I1 I 2 s J1 J 2 f,hi1 u J1 g,h I2 u J2 I 1 I 2 J 1 J 2 can be realized in the form C 2 i1δ/2 2 j 1δ/2 S i 10j 1 0 f,g I 1 I 2 i 1 =1 j 1 =1 The part u J2 J1 ThI1 u J1, s I1 I 2 1 f,h I1 u J1 g,h I2 u J2 J 1 J 2 can be written in the form C 2 i 1δ/2 S i1000 f,g, where i 1 =1

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1755 S i 1000 f = K I 1 K li 1 =2 i 1 lk h K Π b I1 K f,hi1 1 and b I1 K =T s I1 K 1, h I1 1 /C2 i 1δ/2 Since one can check b I1 K BMOR m c I 1 1/2 / 1/2, it is similarly as has already been done in the separated/inside case seen that S i 1000 f 2 f 2 The proof of the BMO estimate is similar to the proof of the previous lemma Completely analogously one can write in the form h I2 I1 ThI1 u J1, 1 s J1 J 2 f,hi1 u J1 g,h I2 u J2 J 1 J 2 I 1 I 2 C 2 j 1δ/2 S 00j10 f,g, j 1 =1 where S 00j 10 is a non-cancellative L 2 bounded shift The last part collapses to where h I2 I1 u J2 J1 ThI1 u J1, 1 f,h I1 u J1 g,h I2 u J2 J 1 J 2 I 1 I 2 g K V T 1,h K u V f,hk u V =C ΠT 1/C f,g, K,V Π b f = K,Vf K V b,h K u V h K u V is a bounded shift of the type 0, 0, 0, 0 for b in the product BMO of R n R m For the boundedness of such paraproducts see, for example, [19, p 41] So here we can set S 0000 = Π T 1/C Note that the correct normalization for this shift would follow just from the various kernel estimates and the weak boundedness property 72 Remark In the proof of this representation theorem there are paraproducts of essentially three different types We have seen two types already: the full paraproduct Π b f = K,Vf K V b,h K u V h K u V

1756 H Martikainen / Advances in Mathematics 229 2012 1734 1761 and some half paraproducts, like f K I 1 K li 1 =2 i 1 lk h K Π b I1 K f,hi1 1, which have a paraproduct part only in the R n or R m variable The third type of paraproduct does not surface in our current sum, where li 1 li 2 and lj 1 lj 2 However, for example in the mixed case, where li 1 li 2 and lj 1 > lj 2, one has in the corresponding inside/inside part the mixed full paraproduct f K V 1 T 1 1, h K u V f,hk χ V χ K u V K,V = K,V T1 1, h K u V f,hk u 2 V h 2 K u V, which is L 2 bounded as T 1 1 belongs to the product BMO of R n R m by assumption Indeed, the boundedness of such a mixed full paraproduct defined by some product BMO function is also known, see for example [19, pp 45 46] 8 Inside/equal One splits ThI1 u V, h I2 u V = ThI1 u V, s I1 I 2 u V +hi2 I1 ThI1 u V, 1 u V, where s I1 I 2 = χ I c 2,1 h I2 h I2 I2,1 satisfies spt s I1 I 2 I c 2,1 and s I 1 I 2 2 I 2 1/2 One may write I 1 I 2 ThI1 u V, s I1 I 2 u V f,hi1 u V g,h I2 u V V in the form C 2 i 1δ/2 S i1000 f,g i 1 =1 with cancellative shifts For this one needs that Estimate Th I1 u V, s I1 I 2 u V I 1 1/2 li1 δ/2 I 2 1/2 li 2

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1757 [ Th I1 u V, s I1 I 2 u V V 1 + V chv V,V chv V V Th I1 χ V, s I1 I 2 χ V ] Th I1 χ V, s I1 I 2 χ V In the case V V use the full kernel representation In the diagonal case use the kernel representation in R n IflI 1 <2 r li 2, use the mixed size and Hölder estimate of K in the case V V or the Hölder estimate for the kernel K χv,χ V in the case V = V In the case 2 r li 2 li 1 split s I1 I 2 = χ 3I1 s I1 I 2 + χ 3I1 cs I 1 I 2 ForV V use the size estimate of K for the first term and the mixed size and Hölder estimate of K for the second term In the case V = V use the size estimate of K χv,χ V for the first term, and the Hölder estimate of K χv,χ V for the second term One writes h I2 I1 ThI1 u V, 1 u V f,hi1 u V g,h I2 u V in the form I 1 I 2 where in this case V C S 0000 f,g, S 0000 f = V Π b V f,uv 2 uv and b V =T 1 u V, u V 2 /C This is indeed a non-cancellative shift of the type 0, 0, 0, 0 81 Lemma There holds b V BMOR n c Proof Fix a cube K R n and a function a so that spt a K, a 1 and a = 0 We need to show that Ta u V, 1 u V We begin with the split Ta uv, 1 u V = Ta uv, χ K u V + Ta uv, χ 3K\K u V There holds [ Ta uv, χ 3K\K u V V 1 + Ta u V, χ 3K c u V + V chv V,V chv V V Ta χ V, χ 3K\K χ V Ta χ V, χ 3K\K χ V ],

1758 H Martikainen / Advances in Mathematics 229 2012 1734 1761 where and Ta χ V, χ 3K\K χ V 1 x 1 y 1 n dx 1 dy 1 K 3K\K V V Ta χ V, χ 3K\K χ V CχV,χ V 1 x 2 y 2 m dx 2 dy 2 V K 3K\K 1 x 1 y 1 n dx 1 dy 1 V Furthermore, we have [ Ta u V, χ 3K c u V V 1 + V chv V,V chv V V Ta χ V, χ 3K c χ V ] Ta χ V, χ 3K c χ V, where Ta χ V, χ 3K c χ V lk δ dx 1 x 1 c K n+δ 3K c V V 1 x 2 y 2 m dx 2 dy 2 V and Ta χ V, χ 3K c χ V Cχ V,χ V K 3K c For the first term we again begin with the estimate Ta u V, χ K u V V 1 V,V chv lk δ x 1 c K n+δ dx 1 dy 1 V Ta χ V, χ K χ V Let us consider the case V V Inthiscasewehave Ta χ V, χ K χ V = K a,χk x 2,y 2 dx 2 dy 2 V V 1 Ca,χ K x 2 y 2 m dx 2 dy 2 V V V

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1759 Thus, we are only left with the need for the estimate Ta χ V, χ K χ V V but this is one of the diagonal BMO assumptions Because of this lemma, one can show, similarly but with a bit less effort than in Proposition 45, that S 0000 is L 2 bounded 9 Inside/nearby This goes very much so in the same vein as the inside/equal case In fact, this is easier since the nearby cubes do not intersect by definition From the series with the matrix element Th I1 u J1, s I1 I 2 u J2 we get C r j 1 i 1 =1 j 1 =1 j 2 =1 2 i 1δ/2 2 j 1δ/2 S i 10j 1 j 2 f,g From the series with the matrix element h I2 I1 Th I1 u J1, 1 u J2 we get with bounded non-cancellative shifts 10 Equal/equal C r j 1 j 1 =1 j 2 =1 2 j 1δ/2 S 00j 1j 2 f,g This part can be realized in the form CS 0000 f,g for a cancellative shift, since one can just estimate Th K u V, h K u V 1 This estimate is an easy consequence of the weak boundedness property and the size estimates of our kernels 11 Equal/nearby This part is clearly of the form C r j 1 j 1 =1 j 2 =1 2 j 1δ/2 S 00j 1j 2 f,g, where the shifts are cancellative Here one can again just use the estimate Th K u J1, h K u J2 1, which follows just from the size estimates of our kernels 12 Nearby/nearby This part is of the form C r i 1 r j 1 i 1 =1 i 2 =1 j 1 =1 j 2 =1 2 i 1δ/2 2 j 1δ/2 S i 1i 2 j 1 j 2 f,g

1760 H Martikainen / Advances in Mathematics 229 2012 1734 1761 once again because of the easy estimate Th I1 u J1, h I2 u J2 1 This follows from the size estimate for the full kernel 13 Concluding remarks It would be interesting to prove the analogous result in the case of three or more parameters The core techniques of this paper should prove to be useful in proving such results However, also new ideas might be needed With more parameters comes more paraproducts of increasing variety, and these may very well cause additional problems A careful extension is an interesting further development The possible applications of this paper to the study of weighted questions in the bi-parameter setting form an ongoing collaborative project It is our understanding that even the qualitative theory is not currently quite satisfactory Fefferman [4] proves a qualitative weighted L p w estimate for w A p/2 R n R m and p>2 Perhaps the result could hold with A p/2 replaced by A p and for every p 1, Here the product A p is defined by the condition w Ap R n R m = sup y R m w,y Ap R n sup x R n wx, Ap R m < Regarding possible sharp theory one could also consider the equivalent rectangular definition of this characteristic Acknowledgments The question concerning the representation formula for bi-parameter singular integrals was first asked by M Lacey and communicated to T Hytönen by S Pott The author wishes to thank T Hytönen for suggesting the study of this topic, for useful discussions, and for reading through the manuscript References [1] SYA Chang, R Fefferman, Some recent developments in Fourier analysis and H p theory on product domains, Bull Amer Math Soc 12 1 1985 1 43 [2] G David, J-L Journé, A boundedness criterion for generalized Calderón Zygmund operators, Ann of Math 120 2 1984 371 397 [3] G David, J-L Journé, S Semmes, Opérateurs de Calderón Zygmund, fonctions para-accretives et interpolation, Rev Mat Iberoam 1 4 1985 1 56 [4] R Fefferman, Harmonic analysis on product spaces, Ann of Math 126 1 1987 109 130 [5] R Fefferman, E Stein, Singular integrals on product spaces, Adv Math 45 2 1982 117 143 [6] S Ferguson, M Lacey, A characterization of product BMO by commutators, Acta Math 189 2 2002 143 160 [7] T Hytönen, The sharp weighted bound for general Calderón Zygmund operators, Ann of Math 2011, in press, arxiv:10074330, 2010 [8] T Hytönen, Representation of singular integrals by dyadic operators, and the A 2 theorem, preprint, arxiv: 11085119, 2011 [9] T Hytönen, M Lacey, H Martikainen, T Orponen, MC Reguera, E Sawyer, I Uriarte-Tuero, Weak and strong type estimates for maximal truncations of Calderón Zygmund operators on A p weighted spaces, J Anal Math 2011, in press, arxiv:11035229, 2011 [10] T Hytönen, H Martikainen, Non-homogeneous Tb theorem and random dyadic cubes on metric measure spaces, J Geom Anal 2011, doi:101007/s12220-011-9230-z, in press, arxiv:09114387, 2009 [11] T Hytönen, C Pérez, S Treil, A Volberg, Sharp weighted estimates for dyadic shifts and the A 2 conjecture, preprint, arxiv:10100755, 2010 [12] J-L Journé, Calderón Zygmund operators on product spaces, Rev Mat Iberoam 1 3 1985 55 91 [13] M Lacey, J Metcalfe, Paraproducts in one and several parameters, Forum Math 19 2 2007 325 351 [14] M Lacey, S Petermichl, MC Reguera, Sharp A 2 inequality for Haar shift operators, Math Ann 348 1 2010 127 141

H Martikainen / Advances in Mathematics 229 2012 1734 1761 1761 [15] C Muscalu, J Pipher, T Tao, C Thiele, Bi-parameter paraproducts, Acta Math 193 2 2004 269 296 [16] F Nazarov, A Reznikov, A Volberg, The proof of A 2 conjecture in a geometrically doubling metric space, preprint, arxiv:11061342, 2011 [17] F Nazarov, S Treil, A Volberg, The Tb-theorem on non-homogeneous spaces, Acta Math 190 2 2003 151 239 [18] S Petermichl, The sharp bound for the Hilbert transform on weighted Lebesgue spaces in terms of the classical A p characteristic, Amer J Math 129 5 2007 1355 1375 [19] S Pott, P Villarroya, A T1 theorem on product spaces, preprint, arxiv:11052516, 2011 [20] S Treil, H 1 and dyadic H 1, in: Linear and Complex Analysis, in: Amer Math Soc Transl Ser 2, vol 226, Amer Math Soc, Providence, RI, 2009, pp 179 193