A VARIATION NORM CARLESON THEOREM. 1. Introduction The partial (inverse) Fourier integral of a Schwartz function f on R is defined as

A VARIATION NORM CARLESON THEOREM R. OBERLIN, A. SEEGER, T. TAO, C. THIELE, AND J. WRIGHT 1. Introduction The partial (inverse) Fourier integral of a Schwartz function f on R is defined as ξ S[f](ξ, x) = ˆf(ξ )e 2πiξ x dξ where ˆf denotes the Fourier transform of f. The behaviour of the partial Fourier integrals as ξ tends to has been a subject of interest for a long time. The following uniform control is well known: Theorem 1.1. Suppose f is a Schwartz function and 1 < p <, then (1) sup S[f](ξ, ) L p (R) C p f L p (R). ξ R By a standard approximation argument it follows that S[f] may be meaningfully defined as a continuous function in ξ for almost every x whenever f L p and the a priori bound of the theorem continues to hold for such functions. Theorem 1.1 is intimately related to almost everywhere convergence of partial Fourier sums for functions in L p [0, 1]. Via a transference principle [12], it is indeed equivalent to the celebrated theorem by Carleson [2] for p = 2 and the extension of Carleson s theorem by Hunt [9] for 1 < p < ; see also [7],[15], and [8]. The main purpose of this paper is to sharpen Theorem 1.1 towards control of the variation norm in the parameter ξ. Thus we consider mixed L p and V r norms of the type: S[f] L p (V r ) = R sup K N ξ 0 <...<ξ K R ( K k=1 We will prove the following, where r = r/(r 1): Theorem 1.2. Suppose r > 2 and r < p <. Then (2) S[f] L p (V r ) C p,r f L p (R). At the endpoint r = p we have the result: S[f](ξ k, x) S[f](ξ k 1, x) r ) p r dx Theorem 1.3. Suppose 2 < r < and r = p. Then for all measurable functions f and sets F with f 1 F, we have λ p {x : S[f](, x) V r ξ λ} C p F. Note that if in the above definition of the mixed L p and V r norm we interchange the order between integration in the x variable and taking the supremum over the choices of K and the points ξ 0 to ξ K so that these choices become independent of the variable x, then the estimates corresponding to Theorems 1.2 and 1.3 are weaker and follow by an inequality of Rubio de Francia [22], see also the proof [13] 1 1 p.

which is closer to the methods of this paper. As will be discussed in Section 2, the conditions on the exponents in Theorem 1.2 are sharp, and in the range of Lorentz norms no better than the stated weak-type estimate is possible in Theorem 1.3. While the concept of r-variation norm is at least as old as Wiener s 1920s paper on quadratic variation [25], such norms and related oscillation norms have been pioneered by Bourgain [1] as a tool to prove convergence results for ergodic averages. Bourgain s simple motivation is that the variational estimate, rather than the weaker L estimate, allows him to prove pointwise convergence without previous knowledge that pointwise convergence holds for a dense subclass of functions. Such dense subclasses of functions, while usually available in the setting of analysis on Euclidean space, are less abundant in the ergodic theory setting. In Appendix D we demonstrate the use of Theorem 1.2 in the setting of Wiener-Wintner type theorems as developed in [14]. Additionally, we are motivated by the fact that variation norms are in certain situations more stable under nonlinear perturbation than supemum norms. For example one can deduce bounds for certain r-variational lengths of curves in Lie groups from the corresponding lengths of the trace of the curves in the corresponding Lie algebras, see Appendix C for definitions and details. What we have in mind is proving Carleson type theorems for nonlinear perturbations of the Fourier transform as discussed in [19], [20]. Unfortunately the naive approach fails and the ultimate goal remains unattained since we only know the correlation between lengths of the trace and the original curve for r < 2, while the variational Carleson theorem only holds for r > 2. Nonetheless, this method allows one to see that a variational version of the Christ-Kiselev theorem [4] follows from a variational Menshov-Paley-Zygmund theorem which we prove in Appendix B. The variational Carleson theorem can be viewed as an endpoint estimate in this theory. The Carleson-Hunt theorem has previously been generalized by using other norms in place of the variation norm, see for example [14], [5], [6]. Our proof of Theorem 1.2 will follow the method of [15] as refined in [8]. In Section 3 we reduce the problem to that of bounding certain model operators which map f to linear combinations of wave-packets associated to collections of multitiles. In Section 5 we bound the model operators when the collection of multitiles is of a certain type called a tree; this bound is in terms of two quantities, energy and density, which are associated to the tree. These quantities are defined in Section 4 and an algorithm is given to decompose an arbitrary collection of multitiles into a union of trees with controlled energy and density. These ingredients are combined to complete the proof in Section 6. Finally, a variational estimate which is crucial for the proof of the model operator bound for trees is given in Appendix A. The first author was partially supported by NSF VIGRE grant DMS 0502315. The second author was partially supported by NSF grant DMS 0652890. The third author was partially supported by NSF grant DMS 0701302. The fourth author was partially supported by a grant from the McArthur Foundation, and by NSF grant DMS 0649473. The fifth author was partially supported by an EPSRC grant. 2. Optimality of the exponents In [11] it was shown that the condition r > 2 is necessary for the Fourier series analog of the bound (2) to hold; we begin by noting that similar considerations apply to the Fourier transform on the real line. For any integer k, consider the 2

dyadic averaging operator E k [f](x) = 1 I k (x) I k (x) f(y) dy where I k (x) is the dyadic interval of length 2 k containing x. From arguments in [21] and [11] one sees that E is unbounded from L p L p x (V k 2 ). Applying the square-function estimate from Appendix A it then follows that for 1 2, and where ψ k = 2 k ψ(2 k ). Letting S t [f](x) = S[f](t, x) S[f]( t, x) one applies the standard estimates ( ) 1/2 ( ) 1/2 S 2 k[g k ] 2 L p (R) C p g k 2 L p (R) and k Z k Z ( ) 1/2 (ψ k ψ k+1 ) f 2 L p (R) C p f L p (R) k Z with g k = (ψ k ψ k+1 ) f to see that ( ) 1/2 S 2 k[f] ψ k+1 f 2 L p (R) C p f L p (R) k Z for 1 r is a consequence of the following argument. First note that, for 1 t 2 we have S t [ψ 1 ](x) = sin(tx) πx. For integers n let t x,n = π/2+nπ so that x sin(t x,n x) sin(t x,n+1x) x x = 2 x. For each x let E(x) = [1, 2] {t x,n : n Z} and note that for large x, the cardinality of E(x) is C x and so S t [ψ 1 ](x) V r t C(1 + x ) 1/r. It follows that S[ψ 1 ] L p x (Vt r) = for p < r, and in fact the Lorentz norm S[ψ 1 ] L r,s = for s <. x (V r ) t 3. The model operators To start the proof of Theorems 1.2 and 1.3, we first linearize the variation norm. Fix K, measurable real valued functions ξ 0 (x) <... < ξ K (x), and measurable complex valued functions a 1 (x),...,a K (x) satisfying a 1 (x) r +...+ a K (x) r = 1. Letting K S [f](x) = (S[f](ξ k (x), x) S[f](ξ k 1 (x), x))a k (x) k=1 3

Theorem 1.2 will follow by standard arguments from the estimate (3) S [f] L p (R) C f L p (R) where C is independent of K and the linearizing functions, and where f is any Schwartz function (an analogous statement holds for the endpoint p = r result, all such considerations for Theorem 1.3 will henceforth remain implicit). Let D = {[2 k m, 2 k (m + 1)) : m, k Z} be the set of dyadic intervals. A tile will be any rectangle I ω where I, ω are dyadic intervals, and I ω = 1/2. We will write S as the sum of wave packets adapted to tiles, and then decompose the operator into a finite sum of model operators by sorting the wave packets into a finite number of classes. For each k, S[f](ξ k, x) S[f](ξ k 1, x) = 1 (ξk 1,ξ k )(ξ) ˆf(ξ)e 2πiξx dξ. To suitably express the difference above as a sum of wave packets, we will first need to construct a partition of 1 (ξk 1,ξ k ) adapted to certain dyadic intervals. The fact that (ξ k 1, ξ k ) has two boundary points instead of the one from (, ξ k ) will necessitate a slightly more involved discretization argument than that in [15]. For any ξ < ξ, let J ξ,ξ be the set of maximal dyadic intervals J such that J (ξ, ξ ) and dist(j, ξ), dist(j, ξ ) J. Let ν be a smooth function from R to [0, 1] which vanishes on (, 1/100] and is identically equal to 1 on [1/100, ). Given an interval J = [a, b) and i { 1, 0, 1}, define ϕ J,i (ξ) = ν ( ξ a 2 i J ) ν ( ξ b J For each J J ξ,ξ, one may check that there is a unique interval J J ξ,ξ which lies strictly to the left of J and satisfies dist(j, J) = 0, and one may check that J has size J /2, J, or 2 J. We define ϕ J = ϕ J,i(J) where i(j) is chosen so that J = 2 i(j) J. Then 1 (ξ,ξ ) = ϕ J. J J ξ,ξ We now write each multiplier ϕ J as the sum of wave packets. For every tile P = I J, define φ P (x) = I ˇ ϕj (x c(i)) where c(i) denotes the center of the interval I andˇdenotes the inverse Fourier transform. For each J, we then have f, φ I J φ I J = ˆfϕ J. This gives: S [f](x) = K k=1 I =1/(2 J ) J J ξk 1 (x),ξ k (x) I =(1/(2 J )) ). f, φ I J φ I J a k (x). The wave packets will be sorted into a finite number of classes, each well suited for further analysis. Sorting is accomplished by dividing every J ξ,ξ into a finite number of disjoint sets. These sets will be indexed by a fixed subset of {1, 2, 3} {1, 2, 3, 4} 2 {left, right}. Specifically, for each (m, n, side) {1, 2, 3, 4} 2 {left, right}, we define J ξ,ξ,(1,m,n,side) = {J D : J (ξ, ξ ), ξ is in the interval J (m+1) J, ξ is in the interval J + (n + 1) J, and J is the side-child of its dyadic parent}. 4

J ξ,ξ,(2,m,n,side) = {J D : J (ξ, ξ ), ξ is in the interval J (m + 1) J, dist(ξ, J) n J, and J is the side-child of its dyadic parent}. J ξ,ξ,(3,m,n,side) = {J D : J (ξ, ξ ), dist(ξ, J) > m J, ξ is in the interval J + (n + 1) J, and J is the side-child of its dyadic parent}. We will choose R {1, 2, 3} {1, 2, 3, 4} 2 {left, right} so that for each ξ, ξ, the collection {J ξ,ξ,ρ} ρ R is pairwise disjoint and J ξ,ξ = ρ R J ξ,ξ,ρ. We will also assume that for each ρ R there is an i(ρ) { 1, 0, 1} such that J = 2 i(ρ) J for every ξ < ξ, J J ξ,ξ,ρ and J J ξ,ξ with J strictly to the left of J and dist(j, J ) = 0. One may check that these conditions are satisfied, say, for R = {(1, 2, 1, left), (1, 2, 2, left), (1, 3, 1, left), (1, 3, 2, left), (2, 1, 1, left), (2, 1, 1, right), (2, 2, 1, right), (3, 4, 1, left), (3, 3, 1, right), (3, 4, 2, left)}. It now follows that S [f] = ρ R S ρ [f] where S ρ [f](x) = K f, φ I J φ I J a k (x). k=1 J J ξk 1 (x),ξ k (x),ρ I =1/(2 J ) It will be convenient to rewrite each operator S ρ in terms of multitiles. A multitile will be a subset of R 2 of the form I ω where I D and where ω is the union of three intervals ω l, ω u, ω h. For each ρ = (l, m, n, side) R, we consider a set of ρ-multitiles which is parameterized by {(I, ω u ) : I, ω u D, I ω u = 1/2, and ω u is the side-child of its parent}. Specifically, given ω u = [a, b) If ρ = (1, m, n, side) then ω l = ω u (m + 1) ω u and ω h = ω u +(n + 1) ω u. If ρ = (2, m, n, side) then ω l = ω u (m+1) ω u and ω h = [a+(n+1) ω u, ) If ρ = (3, m, n, side) then ω l = (, b (m + 1) ω u ) and ω h = ω u + (n + 1) ω u. For every ρ-multitile P, let a P (x) = a k (x) if k satisfies 1 k K and ξ k 1 (x) ω l and ξ k ω h (such a k would clearly be unique), and a P (x) = 0 if there is no such k. Then, using P ρ to denote the set of ρ-multitiles, we have S ρ [f](x) = P P ρ f, φ P φ P (x)a P (x) where, for each ρ-multitile P, φ P (x) = I ϕωu,i(ρ)(x ˇ c(i)). Inequality (3) and hence Theorem 1.2 will then follow after proving the bound (4) S ρ [f] L p C f L p for each ρ R. The argument for the case ρ = (3, m, n, side) is analogous to that for the case ρ = (2, m, n, side), so below we will assume ρ = (2, m, n, side) in which case we say that ρ is a 2-index or ρ = (1, m, n, side) in which case we say that ρ is a 1-index. 5

We want to prove 4. Energy and density P f, φ P φ P a P L p (R) C f L p (R) where P ranges over an arbitrary finite collection of ρ-multitiles, ρ is a 1 or 2-index, and C does not depend on this collection or on the linearizing functions (which were used to define the functions a P ). By a standard limiting argument, this is sufficient to prove (4) and hence Theorem 1.2. The wave packets φ P are adapted to the multitiles P in the following sense. For each P, ˆφ P is supported on the interval with the same center as ω u and 11 the 10 diameter, which we denote 11 we have (5) ω 10 u. Fixing a large C and N and defining, for each I, ( ) N 1 + w I (x) = C 1 I x c(i) I d n φ dx n(e 2πic(ωu)x P )(x) C I (1/2) n w I (x) for n 0, where the constant above may depend on n. Fix 1 C 3 < C 2 < C 1 such that for every multitile P, ˆφ P is supported on C 3 ω u, C 2 ω u C 2 ω l =, C 2 ω u ω h =, C 2 ω l C 1 ω u, C 2 ω u C 1 ω l. One may check that the values C 3 = 11/10, C 2 = 2, and C 1 = 12 satisfy these properties. Given a dyadic interval I T and a point ξ T R, we say that a collection T of multitiles is a tree with top interval I T and top frequency ξ T if I I T and ω T ω m for every P T where ω T is the interval [ξ T (C 2 1)/(4 I T ), ξ T +(C 2 1)/(4 I T )) and ω m is the convex hull of C 2 ω u C 2 ω l. A tree T will be said to be l-overlapping if for every P T, ξ T C 2 ω l ; it will be said to be l-lacunary if for every P T, ξ T C 2 ω l. We split our arbitrary finite collection of multitiles into a bounded number of subcollections (i.e. henceforth all multitiles will be assumed to belong to a fixed subcollection) to obtain the following two separation properties. (6) If P, P satisfy ω u < ω u, then ω u C 2 C 3 2C 1 ω u. (7) If P, P satisfy C 1 ω u C 1 ω u and ω u = ω u then ω u = ω u. From (6), it follows that if P, P T, T is a l-lacunary tree, and ω u < ω u then C 3 ω l C 3 ω l =, and that if P, P T, T is an l-overlapping tree, and ω u < ω u then C 3 ω u C 3 ω u =. From (7), it follows that if P, P T, T is a tree, and ω u = ω u, then I I =. Given any collection of multitiles P, we define energy(p) = sup T 1 f, φ P I T 2 P T where the sup ranges over all l-overlapping trees T P. We set density(p) = 6

= sup T ( 1 I T E K (1 + x c(i T ) / I T ) 4 a k (x) r 1 ωt (ξ k 1 (x)) dx k=1 ) 1/r where the sup is over all non-empty trees T P, and where E R is a fixed set which will be chosen later. The following proposition allows one to decompose an arbitrary collection of multitiles into the union of trees, where the trees are divided into collections T j with the energy of trees from T j bounded by 2 j. The control over energy is balanced by an L q bound for the functions N j,l = T T j 1 2 l I T. In contrast to [15] and [8], it is necessary here to consider q > 1 and l > 0 in order to effectively use the tree estimate Proposition 5.1 with q > 1. The bound (11) permits one to make further decompositions to take advantage of large F in the L q bound for the N j,l while maintaining compatibility with bounds for trees with a fixed density obtained from Proposition 4.2. Proposition 4.1. Let P be a collection of multitiles with energy bounded above by e, and let f be bounded above by 1 F. Then, there is a collection of trees T such that (8) I T Ce 2 F and T T and such that, for every integer l 0, energy (P \ T T T) e/2. (9) T T1 2 l I T BMO C2 2l e 2. Furthermore, if for some collection of trees T, (10) P = T T T then (11) I T C T T T T I T. Above, and subsequently, BMO denotes the dyadic BMO norm. Proof. Without loss of generality, assume that e > 0. We select trees through an iterative procedure. Suppose that trees S k, T k have been chosen for k = 1,...,j. Set j P j = P \ If energy(p j ) e/2 then we terminate the procedure, set T = {T k } 1 k j and n = j. Otherwise, we may find an l-overlapping tree S P j such that 1 (12) f, φ P 2 e 2 /4. I S P S Choose such a tree S j+1 with ξ Sj+1 maximal in the sense that for any l-overlapping tree S satisfying (12) with ξ S > ξ Sj+1 we have that (S j+1, ξ S, I Sj+1 ) is an l- overlapping tree. Let T j+1 be the maximal, with respect to inclusion, tree with 7 k=1 T k

top data (ξ Sj+1, I Sj+1 ). This process will eventually stop since each T j is nonempty and P is finite. To verify (8) it suffices to show ( e 2 F ) 2 n I Sj C e2 F j=1 Since the S j satisfy (12), we have ( 2 e 2 n I Sj ) 4 F f j=1 n j=1 P S j Since F 1/2 L 2 1, the right hand side above is 16 n n j=1 k=1 P S j P S k n I Sj. j=1 f F 1/2, φ P 2 f F 1/2, φ f P F 1/2, φ P φ P, φ P. By symmetry, it remains, for (8), to show that n n (13) f, φ P f, φ P φ P, φ P Ce 2 and (14) j=1 k=1 P S j P S k : I = I n n j=1 k=1 P S j P S k : I < I In both cases, we will use the estimate (15) φ P, φ P C f, φ P f, φ P φ P, φ P Ce 2 ( ) 1/2 I w I I, 1 I. 2 n I Sj j=1 n I Sj. which holds whenever I I. Estimating the product of two terms by the square of their maximum, we see that the left side of (13) is n n 2 f, φ P 2 φ P, φ P. j=1 k=1 P S j P S k : I = I Recall that φ P, φ P = 0 unless C 3ω u C 3 ω u. Thus, by (7), (15) and the fact that the S k are pairwise disjoint, we have that the display above is n n 2 f, φ P 2 w I, 1 I 2C f, φ P 2. j=1 P S j j=1 P S j I : I = I Since the energy of P is bounded above by e, the right side above is n 2C e 2 I Sj which finishes the proof of (13). j=1 8 j=1

Applying Cauchy-Schwarz, we see that the left side of (14) is 1/2 2 n n f, φ P 2 f, φ P φ P, φ P j=1 P Sj P Sj k=1 P S k : I < I Twice using the fact that the energy of P is bounded by e, we see that the display above is 2 1/2 n n e 2 I Sj 1/2 φ P, I P 1/2 φ P. j=1 P Sj k=1 P S k : I < I Thus, to prove (14) it remains to show that, for each j, n 2 φ P, I P 1/2 φ P C I Sj. P S j k=1 P S k : I < I Again, we only have φ P, I P 1/2 φ P nonzero when C3 ω u C 3 ω u which can only happen if sup C 3 ω u C 3 ω u or inf C 3ω u C 3 ω u. Applying (15), we thus see that the left side of above is 2 2 n I P 1/2 w I, 1 I P S j k=1 P S k : I < I sup C 3 ω u C 3 ω u + 2 n P S j k=1 P S k : I < I inf C 3 ω u C 3 ω u 1/2 I P 1/2 w I, 1 I Suppose P S k, P S k, P P, I I and C 3 ω u C 3 ω u. If I = I, then from (7) it follows that ω u = ω u and hence, since P P, we have I I =. If I < I, then from (6) it follows that ξ Sk > ξ Sk and ξ Sk / C 2 ω l, and hence k < k. But ω Sk ω m and P / T k, so I I Sk =. We conclude that each of the two terms above is (16) 2 2 I P w I, 1 R\ISj 2 2 w l I, 1 R\ISj P S j One may check that for each l P S j : I =2 l l:2 l I Sj w I, 1 R\ISj C P S j : I =2 l and so the right side of (16) is C I Sj, which finishes the proof of (14) and thus (8). For (9), we need to show that for each dyadic interval J, we have 1 J J T T 1 2 l I T (x) 1 J 1 2 l I T (y) dy dx C2 2l e 2. J T T 9 2..

To this end, it will suffice to show that (17) I T Ce 2 2 l J. T T J 2 l I T,J Let T = {T T : I T 2 l+1 J, I T J } and note that if T T with 2 l I T J, J then T T. Write f = f + f where f 1 2 l+5 J and f 1 R\2 J. l+5 We will write T as the union of collections of trees T T 0 T 1... each of which will have certain properties related to the energy. For each tree T T there is an l-overlapping tree S chosen in the algorithm above with I S = I T and (18) 1 I S f, φ P 2 e 2 /4. P S Let T 0 = {T T 1 : I S P S f, φ P 2 e 2 /16}. For j 1, define T j = {T T 1 : sup f, φ P 2 e 2 /16} S S I S I S =2 j P S I T where, for each T, the sup above is taken over all l-overlapping trees S with S S. We then let T = {T T \ (T 0 T 1...)}. For each j, we have I T C 2 j e 2 f, φ P 2. T T j T T j P S: I 2 j J Since the S above are pairwise disjoint, the right hand side is (19) C2 j e 2 f, φ P 2. k j P: I =2 k J I 2 l+1 J Fixing k, we apply Minkowski s inequality to obtain P: I =2 k J I 2 l+1 J f, φ P 2 K: K =2 1 k J K 2 l+2 J= P: I =2 k J I 2 l+1 J 1/2 1 K f, φ P 2 where above, we sum over dyadic intervals K and use the fact that f is supported on R\2 l+5 J. Since φ P = ce ) c(ωu)) φ 2πi(c(ω u P when I = I, we may use orthogonality and the fact that f 1 to see that the right side above is 2 C2 k J K: K =2 1 k J K 2 l+2 J= I: I =2 k J I 2 l+1 J 1/2 1 K φ PI 2 L 2 where for each I, P I is any multitile with time interval I. Using (5) gives and so we see that the display above is 1 K φ PI 2 L 2 C(1 + dist(k, I)/ I ) N, C2 k(n 2) J 10 2

Summing over k and j, we conclude that Thus, to prove (17), it suffices to show T j T j I T Ce 2 J. T T I T Ce 2 2 l J. Let T T and let S be any l-overlapping tree contained in S satisfying I S I S. Since the energy of P is bounded by e and since T is not in any T j, we have 1 f, φ S P 2 2 1 f, φ S P 2 + 2 1 f, φ S P 2 Ce 2. P S P S P S From (12) and the fact that T / T 0, we have 1 f, φ P 2 e 2 /8 e 2 /16 = e 2 /16. S P S By the same reasoning as in the proof of (8), we thus have T T I T Ce 2 f 2 2 C e 2 2 l J. Moving on to (11), for each T T, let S be the corresponding l-overlapping tree from the selection algorithm above and recall I T (e/2) 2 f, w u 2. T T P S T T S Since P = T T T, the right side above is f, w u 2 + T T P T S T T S ξ T C 2 ω l Since P has energy bounded by e T T P T S T T S ξ T C 2 ω l T T P T S T T S ξ T inf C 3 ω u + f, w u 2 f, w u 2 T T P T S T T S sup C 2 ω l ξ T <inf C 3 ω u T T e 2 I T. f, w u 2. Since the rectangles {I [inf C 3 ω u, sup C 2 ω u ) : P T T S} are pairwise disjoint, we apply the energy bound again to see that f, w u 2 e 2 I T T P T S T T S ξ T inf C 3 ω u T T P T S T T S ξ T inf C 3 ω u Now, suppose P T S, P T S where T, T T and ξ T [sup C 2 ω l, inf C 3 ω u ) [sup C 2 ω l, inf C 3 ω u ), T T e 2 I T. and suppose I Ĩ and P P. From (7) we have I Ĩ. We also have inf C 2 ω l < sup C 2 ω l since otherwise it would follow that S was selected prior to S and hence 11

P T which is impossible. From (6), we have inf C 2 ω l sup C 3 ω l and so P is in the maximal l-overlapping tree with top data (Ĩ, inf C 2 ω l ). For each T T let T be the collection of multitiles P T T T S with ξ T [sup C 2 ω l, inf C 3 ω u ) and I maximal among such multitiles. Then f, w u 2 f, w u 2 T T P T S T T S sup C 2 ω l ξ T <inf C 3 ω u T T P T P T S T T S sup C 2 ω l ξ T <inf C 3 ω u I I Considering the discussion in the preceding paragraph, we may apply the energy bound to see that the right side above is 2e 2 I 2e 2 I T. T T P T T T We thus obtain (11). The proposition below is for use in tandem with Proposition 4.1. Proposition 4.2. Let P be a collection of multitiles and d > 0. Then, there is a collection of trees T such that (20) I T Cd r E and such that T T density (P \ T T T) d/2. Proof. We select trees through an iterative procedure. Suppose that trees T j, T + j, T j have been chosen for j = 1,..., k. Let k P k = P \ T j T j + Tj. If density(p k ) d/2 then we terminate the procedure and set j=1 T = {T 1, T + 1, T 1,...,T k, T + k, T k }. Otherwise, we may find a nonempty tree T P k such that 1 (21) (1 + x c(i T ) / I T ) 4 a k (x) r dx > (d/2) r. I T E k:ξ k 1 (x) ω T Choose T k+1 P k so that I Tk+1 is maximal among all nonempty trees contained in P k which satisfy (21), and so that T k+1 is the maximal, with respect to inclusion, tree contained in P k with top data (I Tk+1, ξ Tk+1 ). Let T + k+1 P k be the maximal tree contained in P k with top data (I Tk+1, ξ Tk+1 +(C 2 1)/(2 I Tk+1 )) and T k+1 P k be the maximal tree contained in P k with top data (I Tk+1, ξ Tk+1 (C 2 1)/(2 I Tk+1 )). Since each T j is nonempty and P is finite, this process will eventually stop. To prove (20), it will suffice to verify (22) I Tj Cd r E. j To this end, we first observe that the tiles I Tj ω Tj are pairwise disjoint. Indeed, suppose that (I Tj ω Tj ) (I Tj ω Tj ) and j < j. Then, by the first maximality condition, we have I Tj I Tj and so I Tj I Tj and ω Tj ω Tj. From the 12

latter inequality, it follows that for every P T j, either ω Tj ω m, ω T + ω m, or j ω m. Thus, T j T j T + j T j which contradicts the selection algorithm. ω T j Breaking the integral up into pieces and applying a pigeonhole argument, it follows from (21) that for each j there is a positive integer l j such that (23) I Tj C2 3l j d r a k (x) r dx. E 2 lj I Tj k:ξ k 1 (x) ω Tj For each l we let T (l) = {T j : l j = l} and choose elements of T (l) : T (l) 1, T (l) and subsets of T (l) : T (l) 1,T (l) 2,... as follows. Suppose T (l) j for j = 1,..., k. If T (l) \ k j=1 T(l) j Otherwise, let T (l) k+1 be an element of T(l) \ k j=1 T(l) j T (l) k+1 = {T T(l) \ k j=1 2,... have been chosen and T (l) j is empty, then terminate the selection procedure. T (l) j By construction, T (l) = j T(l) j and so (24) I T T T (l) j with I (l) T maximal, and let k+1 : (2 l I T ω T ) (2 l I (l) T ω (l) k+1 T ) }. k+1 T T (l) j I T. Using the fact that the tiles I Tj ω Tj are pairwise disjoint, and (twice) the fact that I T I (l) T for every T T (l) j, we see that for each j j T T (l) j I T C2 l I (l) T. j From (23), we thus see that the right side of (24) is C2 2l d r (x) E j 1 2 l I T (l) j k:ξ k 1 (x) ω T (l) j Since each K k=1 a k(x) r 1 and the tiles 2 l I (l) T ω (l) j T j display above is C2 2l d r E. Summing over l, we thus obtain (22). 5. The tree estimate a k (x) r dx. are pairwise disjoint, the The following bound allows us to estimate the model operator in the special case where the collection of multitiles is a tree. The bound will be applied in Section 6 with q = r and q = 1. Proposition 5.1. Let T be a tree with energy bounded above by e and density bounded above by d. Then, for each 1 q 2 (25) P T f, φ P φ P a P 1 E L q Ced min(1,r /q) I T 1/q. 13

Furthermore, for l 0 we have (26) P T f, φ P φ P a P 1 E L q (R\2 l I T ) C2 l(n 10) ed min(1,r /q) I T 1/q. The bounds above also hold for 2 < q <, but we omit the proof for this range of exponents since it requires an additional L p estimate for P T f, φ P φ P, and is not required for our purposes. Proof. Let J be the collection of dyadic intervals J which are maximal with respect to the property that I 3J for every P T. Our first goal is to prove (27) f, φ P φ P a P 1 E L q (J) Ced min(1,r /q) J 1/q (1 + dist(i T, J)/ I T ) (N 6) P T: I C J for each J J, where C 1 is a constant to be determined later. By Hölder s inequality, we may assume that q r. Fix P T with I C J. From the energy bound, we have (28) f, φ P φ P a P 1 E L q (J) Ce(1 + dist(i, J)/ I ) N a P 1 E L q (J). From the density bound applied to 1/(C 2 1) nonempty trees, each with top time interval I, we obtain 1 (1 + x c(i) / I ) 4 a k (x) r dx Cd r. I E k:ξ k 1 (x) ω l Since I 3J, it follows that 1 + x y / I C(1 + dist(i, J)/ I ) for every x J and y I. Thus a P 1 E q L q (J) a P1 E r L r (J) C(1 + dist(i, J)/ I )4 I d r where, above, we use the fact that a P 1. Since I C J the right side above is C(1 + dist(i, J)/ I ) 4 J d r. and so the right side of (28) is Ced r /q J 1/q (1 + dist(i, J)/ I ) (N 4). Summing this estimate and using the fact that T is a tree, we have f, φ P φ P a P 1 E L q (J) C2 k (1 + dist(i T, J)/ I T ) (N 6) ed r /q J 1/q P T: I =2 k J and summing over k gives (27). Using the maximality of each J, we see that if l 4 and J (R \ 2 l I T ) then dist(i T, J) J /2 and J 2 l 3 I T. It thus follows from (27) that P T f, φ P φ P a P 1 E L q (J) C( I T / J )(dist(i T, J)/ J ) 2 ed min(1,r /q) I T 1/q 2 l(n 10) whenever J (R \ 2 l I T ). Summing over all J, we thus obtain (26) for l 4. It remains to prove (29) P T f, φ P φ P a P 1 E L q (16I T ) Ced min(1,r /q) I T 1/q, 14

and, again, we may assume that q r. The first step will be to demonstrate (30) dx Cd r J J E k:ξ k 1 (x) ω J a k (x) r where ω J = P T: I C J ω l. We will say that an l-overlapping tree T is l -overlapping if for every P T, ξ T inf ω l. We will say that an l-overlapping tree T is l + -overlapping if for every P T, ξ T > inf ω l. For the remainder of the proof, we assume without loss of generality that T is either l + -overlapping, l -overlapping, or l-lacunary. By the maximality of J there is a multitile P T with I 3 J where J is the dyadic double of J. This implies that there is a dyadic interval J with J J 4 J and dist(j, J ) J and I J. If T is l + -overlapping then T = ({P }, ξ T, J ) is a tree. For every P T with I C J, we have ω l [ξ T C 1 /(2C J ), ξ T +C 1 /(2C J )). Thus, by choosing C 8C 1 /(C 2 1), we have ω J ω T. If T is l -overlapping then T = ({P }, ξ T + (C 2 1)/(4 J ), J ) is a tree. Using the fact that T is l -overlapping, we see that ω l [ξ T, ξ T +C 2 /(2C J )) for every P T with I C J. Thus, by choosing C 4C 2 /(C 2 1), we have ω J ω T. If T is l-lacunary then T = ({P }, ξ T (C 2 1)/(4 J )) is a tree. Using the fact that T is l-lacunary, we see that ω l [ξ T C 1 /(2C J ), ξ T ) for every P T with I C J. Thus, by choosing C 4C 1 /(C 2 1), we have ω J ω T. In any of the three cases, the density bound gives 1 J E (1 + x c(j ) / J ) 4 k:ξ k 1 (x) ω T a k (x) r dx d r and hence (30). We now show that if T is l-lacunary then (29) follows from (30). We start by observing that for each x there is at most one integer m and at most one integer k such that there exists a P T with I = 2 m, ξ k 1 ω l, and ξ k ω h. Indeed suppose such a P exists, and P T with I > I. Since T is l-lacunary, we have inf(ω l ) > sup(ω l) by (6), and so ξ k 1 < inf(ω l ). We also have ξ k > sup(ω T ) > sup(ω l ) since C 2ω u ω h = and ξ T C 2 ω l. It follows that there is no k with ξ k 1 ω l. We thus have f, φ P φ P a P 1 E q L q (J) J E P T: I C J a(x) f, φ P φ P (x) P T: I =2 m(x) where, a(x) = a k (x) if there exists an m(x) as in the previous paragraph with 2 m(x) C J, and a(x) = 0 otherwise. From the energy bound and the bound for φ P, the right side above is q a(x) dx. J E P T: I =2 m(x) e(1 + x c(i) / I ) N 15 q dx.

Noting that P T: I =2 (1+ x c(i) / I ) N C, we see that the display above m(x) is Ce q a(x) q dx J E and by our choice of a(x), the display above is Ce q a k (x) r J E k:ξ k 1 (x) ω J q/r dx Using (30) and the fact that a k (x) r 1, the display above is k:ξ k 1 (x) ω J Ce q d r J. Summing over J gives (29). It remains to consider the case when T is l-overlapping. For each J, we have q/r (31) f, φ P φ P a P 1 E q L q (J) a k (x) r P T: I C J J E k:ξ k 1 (x) ω J r q/r f, φ P φ P (x) dx P T: I C J,ξ k 1 (x) ω l,ξ k (x) ω h By breaking up T into a bounded number of subtrees, we may assume without loss of generality that for each P T, ξ T ω l + j ω l for some integer j with j C 2. We will show that, for any ξ k 1 < ξ k, there exist integers l 1 l 2 with 2 l 1 J such that (32) f, φ P φ P = (e (ψ 2πiξT l1 ψ l2 )) f, φ P φ P P T: I C J,ξ k 1 ω l,ξ k ω h P T where ψ l = 2 l ψ(2 l ), and ψ is any Schwartz function with ˆψ(ξ) = 1 for ξ C 1 +C 3 and ˆψ(ξ) = 0 for ξ 2C 1. From (6) we have, for each l such that 2 l = I for some multitile P, (e 2πiξ ψ T l ) f, φ P φ P = f, φ P φ P. P T P T: I 2 l Thus, to prove (32) it will suffice to show that there exist integers l 1 and l 2 such that (33) {P T : I C J, ξ k 1 ω l, ξ k ω h } = {P T : 2 l 1 I 2 l 2 }. Again using (6), we see that for P, P T with I < I we have inf ω h < inf ω h, and if we are in the setting of ρ-multitiles where ρ is a 1-index, we have the stronger inequality sup ω h < inf ω h. Thus, (33) will follow after finding l 1 and l 2 with (34) {P T : ξ k 1 ω l } = {P T : 2 l 1 I 2 l 2 }. The equation above follows when j > 1 from the fact that ω l ω l = if P, P T and I < I ; it follows when j = 0 from the fact that the intervals {ω l : P T } are nested. Finally, when j = ±1 it follows from the property that if P, P, P T, I, I I and ω l ω l then ω l ω l ω l. 16

Using (32), we have r f, φ P φ P (x) P T: I C J,ξ k 1 (x) ω l,ξ k (x) ω h k:ξ k 1 (x) ω J (e 2πiξ ψ T k ) φ P φ P (x) V r k (Z P T f, + +log 2 ( J )). For log 2 ( J ) k 1 < k 2, we have 1/r (e 2πiξ T (ψ k1 ψ k2 )) P T f, φ P φ P and so, for x J = (e 2πiξ T ψ C+log2 ( J )) (e 2πiξ T (ψ k1 ψ k2 )) P T f, φ P φ P (e 2πiξ T ψ k ) P T f, φ P φ P (x) V r k (Z + +log 2 ( J )) C sup x J sup R J 2 x+r (e 2πiξ ψ T k ) φ P φ P (y) V r R k (Z x R P T f, + +log 2 ( J )) dy. Denoting the right side of the inequality above by M J, we see that the right side of (31) is M q J J d dr r M[ ψ k (e 2πiξ T f, φ P φ P ) V r k ](x) q dx J where M is the Hardy-Littlewood maximal operator. Summing over J gives P T P T f, φ P φ P a P 1 E q L q (16I T ) Ce q d r I T + d r M[ ψ k (e 2πiξ T f, φ P φ P ) V r k ](x) q. L q x(16i T ) Since q 2, it follows from Hölder s inequality that the right side above is Ce q d r I T + Cd r I T (2 q)/2 M[ ψ k (e 2πiξ T f, φ P φ P ) V r k ](x) q L 2 x (16I T ). Applying the variation estimate (44) from Appendix A with p = 2 and the L 2 estimate for M one sees that the display above is P T P T Ce q d r I T + Cd r I T (2 q)/2 P T f, φ P φ P q L 2. To finish the proof, it only remains to see that P T f, φ P φ P 2 L Ce 2 I 2 T. The left side of this inequality is f, φ P f, φ P φ P, φ P P T P T 2 f, φ P 2 φ P, φ P. P T P T 17

Since T is an l-overlapping tree, we have φ P, φ P unless I = I, in which case, we have φ P, φ P C(1 + dist(i, I )/ I ) N. It follows that the right side above is C f, φ P 2 Ce 2 I T. P T 6. Main argument To prove Theorem 1.2, it will suffice by interpolation and monotonicity of the V r norms to prove the restricted weak type estimate { P P f, φ P φ P a P > λ} C F λ p where P is a finite collection of multitiles as in Section 4, F R, f 1 F, λ > 0, 2 < r <, and r p < (1/2 1/r) 1. This is equivalent to proving that, for every E R, { (35) x E : ( ) } 1/p F f, φ P φ P a P > C E E /2. P P After possibly rescaling, we assume that 1 E 2. It will suffice, by Chebyshev s inequality to show (36) 1 E f, φ P φ P a P L 1 (R\G) C F 1/p P P for some exceptional set G with G 1/4. The density of P (which will henceforth be defined with respect to the set E above) is clearly bounded above by a universal constant. Let T be any l-overlapping tree. Writing f = f + f where f = 1 3IT f and f = f f, it follows from arguments in the proof of (8) that f, φ P 2 C f 2 L C I 2 T. P T Furthermore, since f 1 R\3IT, we have the estimate f, φ P C I 1/2 (1 + dist(i, R \ 3I T )/ I ) (N 1) C I 1/2 ( I / I T ) N 1 Summing the inequality above, we obtain f, φ P 2 C I T P T and so the energy of P with respect to f is bounded above by a universal constant. We first consider the case when F > 1. Repeatedly applying Propositions 4.1 and 4.2 we write P as the disjoint union P = j 0 T T j T where each T j is a collection of trees T each of which have energy bounded by C2 j/2 F 1/2, density bounded by C2 j/r, and satisfy T T j I T 2 j. 18

For each j we apply Proposition 4.1 again, this time using (9) and (11) to write T = T T j T k 0 T T j,k where each tree T T j,k has energy bounded by C2 (j+k)/2 F 1/2, density bounded by C2 j/r, and satisfies (37) I T C2 j. T T j,k and for every l 0 (38) T T j,k 1 2 l I T BMO C2 2l 2 j+k F 1 From (37), (38), and a standard technique involving the sharp maximal function, it follows that for 1 q < 1 2 l I T q C2 j+k+2l F 1/q. T T j,k Let ǫ > 0 be small and C > 0 be large, depending on p, q, r. For each j, k, l define G j,k,l = { 1 2 l I T C F 1/q 2 (1+ǫ)(j+k+2l) }. T T j,k By Chebyshev s inequality, we have G j,k,l c 2 ǫ(j+k+2l), so setting G = j,k,l 0 G j,k,l we have G 1/4. Applying Minkowski s inequality gives 1 E f, φ P φ P a P L 1 (R\G) P P 1 E j,k 0 + l 1 T T j,k 1 IT 1 E f, φ P φ P a P L 1 (R\G j,k,0 ) P T 1 2 l I T \2 l 1 I T T T j,k P T f, φ P φ P a P L 1 (R\G j,k,l ). From Hölder s inequality, Fubini s theorem, and the definition of G j,k,l, it follows that the right side above is C(S 1 + S 2 ) where S 1 = F 1/(q r) 2 (1+ǫ)(j+k)/r 1/r 1 E f, φ P φ P a P r L r (R) T T j,k P T j,k 0 l 1 j,k 0 and S 2 = F 1/(q r) 2 (1+ǫ)(j+k+2l)/r 1 E f, φ P φ P a P r T T j,k P T 19 L r (R\2 l 1 I T ) 1/r

Applying Proposition 5.1 with the energy and density bounds for trees T T j,k, we see that S 2 C F 1/(q r) 2 (1+ǫ)(j+k+2l)/r 2 l(n 10) 2 (j+k)/2 F 1/2 2 j/r 1/r I T T T j,k j,k 0 l 1 C 2 (j+k)((1+ǫ)(2/r) 1)/2 2 l(n 14) F 1/2 1/(q r) j,k 0 l 1 Choosing ǫ small enough and q large enough so that (1 + ǫ)(2/r) 1 < 0 and 1/2 1/(q r) < 1/p we have S 2 C F 1/p. We similarly obtain S 1 C F 1/p, thus giving (36). We will finish by proving (36) for F 1. Here, we let G = {M[1 F ] > C F } where M is the Hardy-Littlewood maximal operator and C is chosen large enough so that the weak-type 1-1 estimate for M guarantees G 1/4. From the proposition below, which is a special case of an estimate from [8] (we will provide a proof for convenience), and the fact that p r, it will remain to show that (39) 1 E P P f, φ P φ P a P L 1 (R\G) C F 1/p. where P = {P P : I G}. Proposition 6.1. Let P be a finite set of multitiles, and let λ > 0, F R, and f 1 F. Then (40) f, φ P φ P a P L 1 (R\Ω) C F λ 1/r where Ω = {M[1 F ] > λ}. P P:I Ω Finally, it follows from the proposition below, the proof of which may be found on page 12 of [24] or as a special case of a lemma from [8], that the energy of P is bounded above by C F. Proposition 6.2. Let T be an l-overlapping tree. Then 1 f, φ P 2 Cλ 2 I T P T:I Ω D where Ω D = {M D [1 F ] > λ} and M D is the maximal dyadic average operator. Repeatedly applying Propositions 4.1 and 4.2 we write P as the disjoint union P = j 0 T T j T where each T j is a collection of trees T each of which have energy bounded by C2 j/2 F 1/2, density bounded by C2 j/r, and satisfy T T j I T 2 j. We then have 1 E f, φ P φ P a P L 1 1 E f, φ P φ P a P L 1. P P j 0 T T j P T 20

Applying Proposition 5.1, we see that the right side above is C min(2 j/2 F 1/2, F )2 j/r I T C 2 j/r min(2 j/2 F 1/2, F ). j 0 T T j j 0 Summing over j, we see that the right side above is C F 1/r. This finishes the proof, since p r. Proof of Proposition 6.1. Fix l and let I l Ω be a dyadic interval satisfying (41) 2 l I l Ω and 2 l+1 I l Ω. We consider ( P:I=I l f, φ P 2 ) 1/2. Applying Minkowski s inequality, the display above is ( ) 1/2 ( 1 4Il f, φ P 2 + 1 2 j+1 I l \2 j I l f, φ P 2 P:I=I l j=2 P:I=I l Using orthogonality, the display above is (4 I l ) 1/2 1 4Il fφ P0 L 2 + (2 j+1 I l ) 1/2 1 2 j+1 I l \2 j I l fφ P0 L 2 j=2 ) 1/2 where P 0 is any multitile with I = I l. Applying the bounds (5) and f 1 F, we see that the display above is C F 4I l 1/2 + C2 j(n 1) F 2 j+1 I l 1/2. j=2 Since 2 l+1 I l Ω, we have F 2 j+1 I l C2 max(l,j) I l λ for each j. Thus, the display above is C(2 l λ I l ) 1/2. Similarly, and so, by interpolation, ( (42) sup f, φ P C2 l λ I l 1/2 P:I=I l P:I=I l f, φ P r ) 1/r (2 l λ) 1/r I l 1/2 whenever 2 r. For each ξ,i l there is at most one P P with ξ ω l and I = I l. Thus, using the fact that, for each x, K k=1 a k(x) r 1, we see that I l 1/2 φ P0 L 1 (R\Ω) P P:I=I l f, φ P φ P a P L 1 (R\Ω) C(2 l λ) 1/r where P 0 is any multitile with I 0 = I l. Using the fact that 2 l I l Ω, it follows that the right side above is C2 l(n 2) λ 1/r I l. 21

For l 0 let I l be the set of all dyadic intervals satisfying (41). If I I l then for each j > 0 there are at most 2 intervals I I l with I I and I = 2 j I. By considering the collection of maximal dyadic intervals in I l, one sees that I I l I C Ω Thus, f, φ P φ P a P L 1 (R\Ω) C2 l(n 2) λ 1/r Ω. P P:I I l Summing over l and applying the weak-type 1-1 estimate for M then gives (40). A. Variational estimates for averages The purpose of this appendix is to give the bound (44), which may be considered as a lacunary- smooth cutoff version of the main result Theorem 1.2 (a nonsmooth version follows from the smooth version by the square function argument in Section 2). Although this estimate seems to be well-known, we provide a proof for the convenience of the reader. We will follow a method from [10], see also the references therein. For any integer k, we consider the dyadic averaging operator E k [f](x) = 1 f(y) dy I k (x) where I k (x) is the dyadic interval of length 2 k containing x. It is a special case of Lépingle s inequality [16] (of which alternative proofs, using Doob s jump inequality, may be found for example in [1],[6]) that I k (x) (43) E k [f](x) L p x (V r k ) C p,r f L p whenever 1 2, where g V r = sup N,k 1 <k 2 <...<k N ( N 1 j=1 g(k j+1 ) g(k j ) r ) 1/r Let ψ be a Schwartz function on R with ψ = 1, and for each k let ψ k = 2 k ψ(2 k ). Our aim is to see that the bound (44) ψ k f(x) L p x (V r k ) C p,r f L p follows from (43) whenever 1 2. Letting ( ) 1/2 S[f](x) = ψ k f(x) E k [f](x) 2, it will suffice to show that k= (45) S[f] L p C p f L p holds whenever 1 < p <. We let D k [f] = E k 1 [f] E k [f] = I =2 k f, h I h I, where, for every dyadic interval I, h I = (1 [inf(i),c(i)) 1 [c(i),sup(i)) )/ I 1/2 is the L 2 normalized Haar function associated to I. 22.

The case p = 2 of (45), which is the case used in the proof of Theorems 1.2 and 1.3, will follow from Lemma A.1. Suppose ψ is a Schwartz function with ψ = 1. Then for every f L 2 (46) ψ k D j [f] E k [D j [f]] L 2 C2 j k /4 D j [f] L 2 where C may depend on ψ. Indeed S[f] L 2 = ψ k f(x) E k [f](x) l 2 k (L 2 x) = ψ k ( D j [f])(x) E k ( j= j= D j [f])(x) l 2 k (L 2 x) where the second equation follows from the fact that the Haar functions are a complete orthonormal system in L 2. After applying (46), the right side above is C j 2 j k /4 D j [f] L 2 l 2 k C 2 j k /8 D j [f](x) l 2 k (l 2 j (L2 x )) C D j [f](x) l 2 j (L 2 (x)) = f L 2. Proof of lemma A.1. First, suppose k j. Then, for every x R ψ k D j [f](x) = f, h I (ψ k (x y) ψ k (x c(i)))h I (y) dy. I =2 j I Applying the triangle inequality and mean value theorem, the absolute value of the right side above is f, h I 2 j k sup 2 k ψ k (x y) I 1/2 dy. I =2 j I y I Since ψ is a Schwartz function, we have 2 k ψ k (x y) C2 k (1 + 2 k x y ) 2. Thus, the display above is C2 k (1 + 2 k x y ) 2 I 1/2 dy I =2 j f, h I 2 j k = C2 j k 2 k (1 + 2 k ) 2 D j [f]. I Since k j, we have E k [D j [f]] = 0, and we thus obtain (46) from Young s inequality. For the case k < j, we write ψ k = ψ (0) k +ψ(1) k where ψ (0) k = ψ k 1 [ 2 (j+k 2)/2,2 ]. (j+k 2)/2 Since ψ is a Schwartz function, ψ k (x) C2 k (1 + 2 k x ) 2 and so ψ (1) k C2 j k /2. Since ψ k = 1 and E k [D j [f]] = D j [f], we have ψ k D j [f](x) E k [D j [f]](x) = ψ k (y) f, h I (h I (x y) h I (x)) dy. I =2 j 23

Since h I (x y) = h I (x) unless x y and y are in different dyadic intervals of length 2 j, we have ψ (0) k (y)( f, h I (h I (x y) h I (x)) dy I =2 j supported on m Z (m2 j 2 (j+k 2)/2, m2 j +2 (j+k 2)/2 ). Using an L 1 estimate of ψ (0) k and, again using its support property, we see that 1 (m2 j 2 (j+k 2)/2,m2 j +2 )(x) (j+k 2)/2 Thus ψ (0) k (y) ψ (0) k (y) f, h I (h I (x y) h I (x)) dy I =2 j C( f, h [m2 j,(m+1)2 j ) + f, h[(m 1)2 j,m2 j ) )2 j/2. I =2 j f, h I (h I (x y) h I (x)) dy L 2 (x) C2 j k /4 D j [f] L 2. From the L 1 estimate of ψ (1) k, we have ψ (1) k (y) I =2 j f, h I (h I (x y) h I (x)) dy L 2 x ψ (1) k D j [f] L 2 + C2 j k /2 D j [f] L 2 C2 j k /2 D j [f] L 2 and thus (46). To demonstrate (45) for 1 α} C α f L 1. To obtain this estimate, we perform a dyadic Calderón-Zygmund decomposition of f at height α, that is we write f = g + b where g L α, g L 1 C f L 1, and b = I I b I where I is a collection of disjoint dyadic intervals with I I I C f L 1/α, and where each b I (x) = 1 I (x)(f(x) 1 f). I I The bound for g follows from the L 2 estimate for S {x : S[g] > α/2} C S[g] 2 L 2/α2 C g 2 L 2/α2 C g L 1/α C f L 1/α. Thus, (47) will follow from the bound (48) S[b] L 1 (R\ I I 2I) C b L 1. The left side above is I I k= ψ k b I E k [b I ] L 1 (R\2I). 24

Any dyadic interval intersecting both I and R \ 2I must contain I. Thus, since each b I is supported on I and has mean zero, the display above (49) = ψ k b I L 1 (R\2I). I I k= For x R \ 2I, Since ψ is a Schwartz function, and so ψ k b I (x) = (1 R\[ I /2, I /2] ψ k ) b I (x) 1 R\[ I /2, I /2] ψ k L 1 C2 k / I (50) ψ k b I L 1 (R\2I) C(2 k / I ) b I L 1. Since each b I has mean zero and is supported on I, it follows as in the case k j of the proof of lemma A.1 that, whenever 2 k I, we have and so ψ k b I (x) C( I /2 k )2 k (1 + 2 k ) 2 b I (x) (51) ψ k b I L 1 C( I /2 k ) b I L 1. Combining (50) and (51), it follows that (49) is C I I b I L 1. Thus, since the b I have disjoint supports, we obtain (48). After minor modifications, the same argument gives a bound from L 1 (l 2 ) to weak L 1 for the dual operator, and so (45) also holds for 2 < p <. For ξ, x R let B. A variational Menshov-Paley-Zygmund theorem C[f](ξ, x) = x e 2πiξx f(x ) dx. Menshov, Paley, and Zygmund extended the Hausdorff-Young inequality by proving a version of the bound (52) C[f] L p ξ (L x ) C p f L p (R) for 1 p < 2. The bound at p = 2 is a special case of the much more difficult Theorem 1.1 proved by Carleson and Hunt. Interpolating the variational version, Theorem 1.2, at p = 2 with a trivial estimate at p = 1, one sees that (52) may be strengthened to the bound (53) C[f] L p ξ (V r x ) C p,r f L p (R) for 1 p 2 and r > p. It follows from the same arguments given in Section 2 that this range of r is the best possible. Our interest in this variational bound primarily stems from the fact, which will be proven in Appendix C, that it may be transferred, when r < 2, to give a corresponding estimate for certain nonlinear Fourier summation operators. The purpose of the present appendix is to give an easier alternate proof of (53) when p < 2. 25

A now-famous lemma of Christ and Kiselev [3] asserts that if an integral operator Tf(x) = K(x, y)f(y) dy R is bounded from L p (R) to L q (X) for some measure space X and some q > p, thus Tf L q (X) A f L p (R), then automatically the maximal function T f(x) = sup N R y<n K(x, y)f(y) dy is also bounded from L p (R) to L q (X), with a slightly larger constant. Another way to phrase this is as follows. If we define the partial integrals T f(x, N) = K(x, y)f(y) dy then we have y<n (54) T f L q x (L N ) C p,q A f L p (R). As was observed by Christ and Kiselev, this may be applied in conjunction with the Hausdorff-Young inequality to obtain (52) for p < 2. The L N norm can also be interpreted as the V N norm, and we will now see that V can be replaced by V r for r > p, thus giving (53) from the Hausdorff-Young inequality. Lemma B.1. Under the same assumptions, we have for any r > p. T f L q x (V r N ) C p,q,r A f L p (R) Proof. This follows by an adaption of the argument by Christ and Kiselev, or by the following argument. Without loss of generality we may take r < q, in particular r <. We use a bootstrap argument. Let us make the a priori assumption that (55) T f L q x (V r N ) BA f L p (R) for some constant 0 < B < ; this can be accomplished for instance by truncating the kernel K appropriately. We will show that this a priori bound automatically implies the bound (56) T f L q x (V r N ) (2 1/r 1/p BA + C p,q,r A) f L p (R) for some C p,q,r > 0. This implies that the best bound B in the above inequality will necessarily obey the inequality B 2 1/r 1/p B + C p,q,r ; since r > p, this implies B C p,q,r for some finite C p,q,r, and the claim follows. It remains to deduce (56) from (55). Fix f; we may normalize f L p (R) = 1. We find a partition point N 0 in the real line which halves the L p norm of f: N0 f(y) p dy = + N 0 f(y) p dy = 1 2. 26

Write f (y) = f(y)1 (,N0 ](y) and f + (y) = f(y)1 [N0,+ )(y), thus f L p (R) = f + L p (R) = 2 1/p. We observe that { T f T f(x, N) = (x, N) when N N 0 Tf (x) + T f + (x, N) when N > N 0 Furthermore, T f (x, ) and T f + (x, ) are bounded in L norm by O(T f(x)). Thus we have T f(x, ) V r N ( T f (x, ) r V r N + T f + (x, ) r V r N )1/r + O(T f(x)). (The O(T f(x)) error comes because the partition used to define T f(x, ) V r N may have one interval which straddles N 0 ). We take L q norms of both sides to obtain T f L q x V r N ( T f (x, ) r V r N + T f (x, ) r V r N )1/r L q x + O( T f L q x ). The error term is at most C p,q A by the ordinary Christ-Kiselev lemma. For the main term, we take advantage of the fact that r < q to interchange the l r and L q norms, thus obtaining T f L q x VN r ( T f r L q + T xvn r f + r L q xvn) 1/r + O(C r p,q A). By inductive hypothesis we thus have and the claim follows. T f L q x V r N ((2 1/p BA) r + (2 1/p BA) r ) 1/r + O(C p,q A), C. Variation norms on Lie groups In this appendix, we will show that certain r-variation norms for curves on Lie groups can be controlled by the corresponding variation norms of their traces on the Lie algebra as long as r < 2. This follows from work of Terry Lyons [17], we present a self contained proof in this appendix. Combining this fact with the variational Menshov-Paley-Zygmund theorem of Appendix B, we rederive the Christ-Kiselev theorem on the pointwise convergence of the nonlinear Fourier summation operator for L p (R) functions, 1 p < 2. Let G be a connected finite-dimensional Lie group with Lie algebra g. We give g any norm g, and push forward this norm using left multiplication by the Lie group to define a norm x TgG = g 1 x g on each tangent space T g G of the group. Observe that this norm structure is preserved under left group multiplication. We can now define the length γ of a continuously differentiable path γ : [a, b] G by the usual formula γ = b a γ (t) Tγ(t) G dt. Observe that this notion of length is invariant under left group multiplication, and also under reparameterization of the path γ. From this notion of length, we can define a metric d(g, g ) on G as d(g, g ) = inf γ γ:γ(a)=g,γ(b)=g where γ ranges over all differentiable paths from g to g. It is easy to see that this does indeed give a metric on G. 27

Given any continuous path γ : [a, b] G and 1 r <, we define the r-variation γ V r of γ to be the quantity γ V r = n 1 sup ( a=t 0 <t 1 <...<t n=b j=0 d(γ(t j+1 ), γ(t j )) r ) 1/r where the infimum ranges over all partitions of [a, b] by finitely many times a = t 0, t 1,...,t n = b. We can extend this to the r = case in the usual manner as γ V = sup sup a=t 0 <t 1 <...<t n=b 0 j n 1 d(γ(t j+1 ), γ(t j )), and indeed it is clear that the V norm of γ is simply the diameter of the range of γ. The V 1 norm of γ is finite precisely when γ is rectifiable, and when γ is differentiable it corresponds exactly with the length γ of γ defined earlier. It is easy to see the monotonicity property and the triangle inequalities γ V p γ V r whenever 1 r p ( γ 1 r V r + γ 2 r V r)1/r γ 1 + γ 2 V r γ 1 V r + γ 2 V r where γ 1 + γ 2 is the concatenation of γ 1 and γ 2. A key fact about the V r norms is that they can be subdivided: Lemma C.1. Let γ : [a, b] G be a continuously differentiable curve with finite V r norm. Then there exists a decomposition γ = γ 1 + γ 2 of the curve into two sub-curves such that γ 1 V r, γ 2 V r 2 1/r γ V r. Proof. Let t = sup{t [a, b] : γ [a,t] V r 2 1/r γ V r}. Letting γ 1 = γ [a,t ] we have γ 1 V r = 2 1/r γ V r. The bound for γ 2 = γ [t,b] follows from the left triangle inequality above. Given a continuously differentiable curve γ : [a, b] G, we can define its left trace γ l : [a, b] G by the formula γ l (t) = t a γ(s) 1 γ (s) ds Note that the trace is also a continuously differentiable curve, but taking values now in the Lie algebra g instead of G. Clearly γ l is determined uniquely from γ. The converse is also true after specifying the initial point γ(a) of γ, since γ can then be recovered by solving the ordinary differential equation (57) γ (t) = γ(t)γ l (t). This equation is fundamental in the theory of eigenfunctions of a one-dimensional Schrodinger or Dirac operator, or equivalently in the study of the nonlinear Fourier transform; see, for example, [23],[19] for a full discussion. Basically for a fixed potential f(t) and a frequency k, the nonlinear Fourier transform traces out a curve γ(t) (depending on k) taking values in a Lie group (e.g. SU(1, 1)), and the corresponding left trace is essentially the ordinary linear Fourier transform. It is easy to see that these curves have the same length (i.e. they have the same V 1 norm): (58) γ = γ l. 28