LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO

Similar documents
The dichotomy between structure and randomness. International Congress of Mathematicians, Aug Terence Tao (UCLA)

The Green-Tao Theorem on arithmetic progressions within the primes. Thomas Bloom

Generalizing the Hardy-Littlewood Method for Primes

Arithmetic progressions in primes

Linear equations in primes

Theorem 5.3. Let E/F, E = F (u), be a simple field extension. Then u is algebraic if and only if E/F is finite. In this case, [E : F ] = deg f u.

CONSTRUCTION OF THE REAL NUMBERS.

6 Lecture 6: More constructions with Huber rings

Higher-order Fourier analysis of F n p and the complexity of systems of linear forms

Introduction to Arithmetic Geometry Fall 2013 Lecture #24 12/03/2013

Spanning and Independence Properties of Finite Frames

Cosets and Lagrange s theorem

NOTES ON FINITE FIELDS

A VERY BRIEF REVIEW OF MEASURE THEORY

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

LINEAR EQUATIONS WITH UNKNOWNS FROM A MULTIPLICATIVE GROUP IN A FUNCTION FIELD. To Professor Wolfgang Schmidt on his 75th birthday

Chapter One. The Calderón-Zygmund Theory I: Ellipticity

MA554 Assessment 1 Cosets and Lagrange s theorem

S chauder Theory. x 2. = log( x 1 + x 2 ) + 1 ( x 1 + x 2 ) 2. ( 5) x 1 + x 2 x 1 + x 2. 2 = 2 x 1. x 1 x 2. 1 x 1.

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Smith theory. Andrew Putman. Abstract

THE REPRESENTATION THEORY, GEOMETRY, AND COMBINATORICS OF BRANCHED COVERS

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Small gaps between primes

Lecture 4: Constructing the Integers, Rationals and Reals

Math 61CM - Solutions to homework 6

Measure and integration

Arithmetic progressions and the primes

PATTERNS OF PRIMES IN ARITHMETIC PROGRESSIONS

Convex Optimization Notes

THE INVERSE PROBLEM FOR REPRESENTATION FUNCTIONS FOR GENERAL LINEAR FORMS

Computing a Lower Bound for the Canonical Height on Elliptic Curves over Q

Roth s Theorem on Arithmetic Progressions

Linear Programming Redux

2. Signal Space Concepts

Math 145. Codimension

Standard forms for writing numbers

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA)

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

SCALE INVARIANT FOURIER RESTRICTION TO A HYPERBOLIC SURFACE

1 Basic Combinatorics

GAUSS CIRCLE PROBLEM

Topological properties of Z p and Q p and Euclidean models

MATH 131A: REAL ANALYSIS (BIG IDEAS)

We have been going places in the car of calculus for years, but this analysis course is about how the car actually works.

Balanced subgroups of the multiplicative group

BERNARD HOST AND BRYNA KRA

Notes on Equidistribution

Lebesgue Measure on R n

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

Roth s Theorem on 3-term Arithmetic Progressions

BALANCING GAUSSIAN VECTORS. 1. Introduction

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

A Harvard Sampler. Evan Chen. February 23, I crashed a few math classes at Harvard on February 21, Here are notes from the classes.

ERGODIC AVERAGES FOR INDEPENDENT POLYNOMIALS AND APPLICATIONS

GOLDBACH S PROBLEMS ALEX RICE

Countability. 1 Motivation. 2 Counting

Prime Number Theory and the Riemann Zeta-Function

DR.RUPNATHJI( DR.RUPAK NATH )

Math 396. Quotient spaces

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

Lecture 2: Vector Spaces, Metric Spaces

V (v i + W i ) (v i + W i ) is path-connected and hence is connected.

Chapter One. The Real Number System

GREEN-TAO THEOREM IN FUNCTION FIELDS. 1. Introduction

LUCK S THEOREM ALEX WRIGHT

2. Duality and tensor products. In class, we saw how to define a natural map V1 V2 (V 1 V 2 ) satisfying

Distributions: Topology and Sequential Compactness

Continuum Probability and Sets of Measure Zero

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

means is a subset of. So we say A B for sets A and B if x A we have x B holds. BY CONTRAST, a S means that a is a member of S.

Lecture 6: Finite Fields

A SHORT PROOF OF THE COIFMAN-MEYER MULTILINEAR THEOREM

HARMONIC ANALYSIS TERENCE TAO

Real Analysis - Notes and After Notes Fall 2008

ELEMENTARY PROOF OF DIRICHLET THEOREM

Possible Group Structures of Elliptic Curves over Finite Fields

SPECIAL POINTS AND LINES OF ALGEBRAIC SURFACES

The small ball property in Banach spaces (quantitative results)

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

Lecture 8: A Crash Course in Linear Algebra

Overview of normed linear spaces

Math 210B. Artin Rees and completions

Before giving the detailed proof, we outline our strategy. Define the functions. for Re s > 1.

Normed and Banach spaces

Linear Algebra I. Ronald van Luijk, 2015

arxiv:math.nt/ v2 17 Nov 2004

1/30: Polynomials over Z/n.

Chapter One Hilbert s 7th Problem: It s statement and origins

0. Introduction 1 0. INTRODUCTION

12. Hilbert Polynomials and Bézout s Theorem

HOW TO LOOK AT MINKOWSKI S THEOREM

Chapter 1. Introduction to prime number theory. 1.1 The Prime Number Theorem

A BRIEF INTRODUCTION TO LOCAL FIELDS

Construction of a general measure structure

Tree sets. Reinhard Diestel

The Integers. Math 3040: Spring Contents 1. The Basic Construction 1 2. Adding integers 4 3. Ordering integers Multiplying integers 12

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS

Transcription:

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO ROLAND GRINIS This is an essay written after attending a graduate course on higher order Fourier analysis given by Ben Green at Oxford in Winter 2015. We would like to outline here the argument of Ben Green and Terence Tao from their work [4] establishing the asymptotic count of solutions in primes to some finite complexity affine-linear system of equations as predicted by the generalized Hardy-Littlewood conjecture. We shall apologize straightaway to the reader for a quite heavy style in our presentation and warn that we are not aiming to take some distance with the paper through a new perspective or give a survey with a discussion on how it fits into a bigger picture in the field. Interested readers seeking such an overview are invited to have a look at a beautiful article of Green [1] or else at the relevant sections in Tao s book [7]. Our focus is simply to present some important ingredients that go into [4] fitting them into the main line of the argument, and also informally discuss some ideas behind the proofs that we could appreciate the most which are mainly restricted to the transference principle for which we relied on [3] as well (our background is in geometric and dispersive PDEs and so we have enjoyed encountering some (very rough) analogues to ideas like the profile decomposition and Bourgain s induction on energy paradigm or else the duality approach to collapsing results, but we decided to abstain from drawing a parallel even at an informal level here). The essay is organized as follows: in Section 1 we briefly present the generalized Hardy- Littlewood conjecture and introduce the notion of complexity for systems of affine-linear forms leading to Green-Tao result [4], Section 2 is devoted to higher order Fourier analysis where we present the inverse theorems and discuss the generalized von Neumann and Koopman - von Neumann theorems from [4] and [3] and finally in Section 3 we say how this theory is applied to prime numbers. We shall be explaining the notations as we encounter them, and we have tried to remain as close as possible to [4], but those ones which are the most frequently used (e.g. asymptotic notation) are gathered in an appendix at the end of the essay for reader s convenience. We are very grateful to Ben Green for delivering a very interesting and inspiring graduate course, kindly sending us a copy of his book in preparation [2] and suggesting to read [4] and [1] for the broadening dissertation. 1. Generalized Hardy-Littlewood conjecture A classical result in number theory, the prime number theorem in arithmetic progressions, a more refined version of which is the Siegel-Walfisz theorem, yields for the von Mangoldt function Λ (defined on the integers by setting Λ(n) = log p when n = p k for some k 1 and zero otherwise) the asymptotic: Λ(qn + b) = Λ Zq (b)n + o q (N), n [N] Date: August 2015. 1

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 2 where q 1, b q and Λ Zp denotes the local von Mangoldt function - a q-periodic function that assigns q/φ(q) to b Z whenever b is coprime to q and zero otherwise (φ(q) = (Z/qZ) is the Euler totient function). The generalized Hardy-Littlewood conjecture seeks to obtain a similar asymptotic when the affine-linear form n qn + b is replaced by a whole system of affine-linear forms: Ψ = (ψ 1,..., ψ t ) : Z d Z t, where ψ i = ψ i + ψ i (0), with ψ i (0) constant and ψ i : Z d Z linear, chosen such that none of the ψ i is constant and no two are linearly dependent (to avoid unnecessary degeneracies), and having size at scale N: Ψ N := ψ i (e j ) + ψ i (0) N L j [d] bounded by some constant L > 0 independent of N. We obtain therefore an affine sub-lattice Ψ(Z d ) in Z t which one typically restricts to Ψ(K) for some fixed convex body K [ N, N] d and would like to count points (p 1,..., p t ) Ψ(K) where all the coordinates are primes p i P := {2, 3, 5,...}. The following asymptotic is conjectured: Conjecture 1. (Generalized Hardy-Littlewood conjecture). With notation as above one has: (1.1) Λ(ψ i (n)) = β β p + o s,d,t,l (N d ), where n K Z d is the archimedean factor, and are the local factors. p P β := vol d (K Ψ 1 ((R + ) t )) β p := E n Z d p Λ Zp (ψ i (n)) The term p P β p is called the singular product and can be checked to converge (which is done in Lemma 1.3 in [4]). The above conjecture has some very spectacular consequences as for example the twin prime conjecture if one considers the system Ψ(n) = (n, n + 2), with n running through positive integers (the impressive progress on this problem in recent years has spread the excitement well beyond the field of analytic number theory, which is our case!). In this essay however, we would like to discuss the groundbreaking work of Green and Tao [4] which establishes Conjecture 1 for the class of systems Ψ which have finite complexity as defined just below (this unfortunately excludes the cases d = 1, t > 1, as needed for the twin prime conjecture, but enables, among many other things, to obtain an asymptotic for the number of arithmetic progressions in primes of arbitrary length, for example). Definition 2. We say that the system Ψ = (ψ 1,..., ψ t ) has i-complexity s i if {ψ j : j i} can be covered by s i + 1 classes so that ψ i does not lie in the affine-linear span of any of those classes (take the smallest such s i ). If {s i } exists, we let the complexity of Ψ to be s = min s i, otherwise we set s =. A more geometric way to think about the notion of complexity is to note that the system Ψ will have finite complexity if no two of the ψ i are affinely dependent, in which case the complexity is bounded (but not necessarily equal to) by the codimension of Ψ(Z d ) in Z t (this is proved in Lemma 1.6 in [4]).

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 3 Theorem 3. (Green-Tao [4]). The generalized Hardy-Littlewood conjecture holds for systems of finite complexity. One beautiful corollary of the above theorem that we would like to mention is the asymptotic for the number of k-tuples of primes below N lying in arithmetic progression, which Theorem 3 of Green and Tao implies to be: 1 β p + o k (1) N 2 2(k 1) log k N, where the local factors are given by: β p = ( p P ( 1 p p 1 k 1 p and one would be applying Theorem 3 with k 1 p 1) if p k ) ( k 1 p p 1) if p k Ψ(n 1, n 2 ) := (n 1, n 1 + n 2, n 1 + 2n 2,..., n 1 + (k 1)n 2 ) setting K := {(n 1, n 2 ) : 1 n 1 n 1 + (k 1)n 2 N} (see Conjecture 1.4 and Examples 5 and 8 in [4]). In contrast to the above asymptotic, the work [3] of Green and Tao establishing that there are infinitely many arithmetic progression of arbitrary length in primes would replace the singular product above with some positive constant c k > 0, perhaps very small (but independent of N of course). 2. Inverse theorem for the Gowers norms and the transference principle In this section we would like to discuss the machinery of inverse theorems and the transference principle from [3] and [4]. Our account here is more abstract and we will move away from the problem for quite a bit but always keeping it in the background via the connection provided by the generalized von Neumann theorem. There are some minor technical differences with the inverse Gowers-norm conjecture as used in [4] and the now established inverse theorem of Green, Tao and Ziegler [6], and more notably the refinements brought by Ben Green in [2] (concerning the regularity of nilsequences which leads to an actual simplification of the argument in [4], we thank Ben Green for pointing that to us). For our discussion below, we shall pick a prime N = O s,d,t,l (N), bigger than C N for some constant C 20 large enough (which is convenient when multiplying expressions by cut-off functions like the de la Vallée Poussin kernel) but to be ultimately fixed once the enveloping sieve is constructed in Section 3.2; this is simply a technical issue as one needs to go from [N] to Z N and back with the estimates. See also Remark 10 in Section 2.3. For this matter, we also assume that K [ 1 4 N, 1 4 N ] d and Ψ(K) [N] t after enlarging N to O L (N) if necessary, and replacing K with K Ψ 1 ((R + ) t ), recalling that the contribution of Ψ to the negative integers is not relevant for the asymptotic (1.1) (in Section 3.1 K will get even smaller); this way we can naturally view K Z d N and Ψ : Z d N Z t N. 2.1. Pseudorandom measures and the generalized von Neumann theorem. In this section we would like to introduce the principle, rather central to the whole theory going back to the roots of higher order Fourier analysis, of which the generalized von Neumann theorem below is a direct manifestation, that the Gowers uniformity norms of order s are universal among multilinear averages of complexity less or equal to s (in the sense that contributions

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 4 to those averages coming from functions with small Gowers norm are negligible), providing us with a strategy to approach problems in which one seeks to prove an asymptotic like (1.1). We recall that the Gowers U s+1 -norms are defined for functions f : [N] C by f U s+1 [N] := c f U s+1 (Z N ), where c 1 = 1 U [N] would be bounded above and below by a constant that does not depend on N in our set up (see Lemma B.5 in [4]), and where we are s+1 (Z N ) naturally viewing f as a function on Z N with the U s+1 -norm on cyclic groups being given by the following fabulous formula: f 2s+1 U s+1 (Z N ) = E x (0),x (1) Z s+1 N s+1 C ω f( ω {0,1} s+1 j=1 x (ω j) j ), here C denotes the complex conjugation operator. One central property of those norms is the Gowers-Cauchy-Schwarz inequality, holding for a collection of functions f ω : Z N C indexed by ω {0, 1} s+1 : (2.1) E s+1 x (0),x (1) Z s+1 C ω f ω ( x (ω j) N j ) ω {0,1} s+1 j=1 f ω U s+1 (Z N ). ω {0,1} s+1 It is not very hard then to obtain the following estimate for our Ψ = (ψ 1,..., ψ t ) : (2.2) E x Z d f i (ψ i (x)) N min f i U s+1 (Z N ) provided the functions f i are uniformly bounded in magnitude by 1. In applying this to the asymptotic (1.1) one runs into the trouble noting that the von Mangoldt function Λ is not uniformly bounded. Green and Tao had the idea in [3] and [4] to note that it is possible to prove a version of the above estimate provided that the functions f i are majored by some other function ν for which the desired asymptotic is known to hold for all systems of complexity less than D s s (plus some other technical requirements, see just below), which is in itself a manifestation of another principle, the transference (cf. Section 2.3). Such an enveloping sieve is known to exist for Λ as we discuss in Section 3.2 (and is, roughly speaking, a regularized version of Λ). Definition 4. A D-pseudorandom measure is a function ν : Z N R + satisfying the linear form condition: ν(φ i (x)) = 1 + o D (1) E x Z d N for any finite complexity affine-linear system Φ = (φ 1,..., φ t ) with d, t, and Φ N less than D (in particular we note the identity E x ZN ν(x) = 1 + o(1)) and the correlation condition (whose presence has only technical reasons arising only in the actual transference principle discussed in Section 2.3) stating that for every 1 m D there exists a weight function τ = τ m : Z N R + having the L q -norm bounded for all 1 q < : and for any h 1,..., h m Z N : E x ZN i [m] E x ZN τ q (x) m,q 1 ν(x + h i ) 1 i<j m τ(h i h j ).

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 5 To state the generalized von Neumann theorem it is convenient to assume that the affinelinear form Ψ = (ψ 1,..., ψ t ) is in normal form, that is for every i [t] one can find a collection J i {e 1,..., e d } of at least s + 1 standard basis vectors such that e J i ψ j (e) is non-zero only for j = i. The following is Proposition 7.1 in [4]. Proposition 5. (Generalized von Neumann theorem [4]). There exists a constant D s,d,t,l such that for a collection of functions f 1,..., f t : [N] R dominated by some D-pseudorandom measure ν, f i (n) ν(n) for all n [N] and i [t], with D D s,d,t,l, we have that min f i U s+1 [N] δ for some δ > 0 implies the asymptotic: f i (ψ i (n)) = o δ (N d ) + κ(δ)n d. n K Z d See the appendix for the κ-notation. We should drop a few words about the proof (contained in Appendix B of [4]), but not much more as it is rather technical: starting already with the fact that one has to deal with the characteristic function 1 K (coming from the restriction in the summation over n): so one needs to pick a careful decomposition 1 K = F ε + O(G ε ) with F ε, G ε Lipschitz functions of magnitude 1 with Lipschitz constant O(ε 1 ) such that after writing the Fourier series for F ε and G ε with some quantitatively controlled errors one could absorb 1 K into the functions f i (upon changing the measure ν accordingly) and reduce to show something which looks more like (2.2): E x Z d N f i (ψ i (x)) = o δ (1) + κ(δ), under the assumption that min f i U s+1 (Z d N ) δ and f i(x) ν(x) for some D-pseudorandom ν (recalling our construction of the prime N at the beginning of the section). At this point we should just mention that one key ingredient is again Gowers-Cauchy- Schwarz inequality (2.1), or rather a very intricate version of it proved in Corollary B.4 of [4] that we would not even dare to state. As for the requirement that the affine-linear system Ψ should be in normal form, assuming that f 1 has the smallest U s+1 -norm one is going to use this fact in trying to make f 1 U s+1 (Z d appearing when applying (2.1): assuming as well N ) that s+1 ψ j=1 i (e j ) is non-zero only for i = 1 (by the definition of normal form one is guaranteed to have at least s + 1 vectors in the product), one can suitably dilate the first s + 1 variables obtaining: ψ 1 (x 1,..., x d ) = x 1 + + x s+1 + ψ 1 (0, x s+2,..., x d ), and the other forms ψ i, i 2, not involving all of the variables x 1,..., x s+1. One applies the Gowers-Cauchy-Schwarz inequality from Corollary B.4 of [4] around f 1 (ψ 1 ) which, thanks to the above preliminary work on Ψ, can be reparametrized to make f 1 U s+1 (Z d appear, after N ) what one is left with estimating all the other terms via f i (x) ν(x) and using the linear forms condition (which is quite a lot of work). Green and Tao prove in the end the following version of (2.2): E x Z d f i (ψ i (x)) N f 1 U s+1 (Z N ) + o(1), see Proposition 7.1 in Appendix B of [4].

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 6 2.2. The Inverse Theorem. Once the problem is reduced to prove (or disprove) that a given function is Gowers uniform (i.e. has small U s+1 -norm) one move into the realm of higher order Fourier analysis where there is a quantitatively precise relation (the inverse theorem) between how much of the Gowers norm a bounded function carries and how strongly it correlates with objects generalizing the standard characters e 2πix called nilsequences (hence the name for the theory). The purpose of this section is to quickly introduce some of the relevant notions and state the linear version of the inverse theorem established in [6] together with the refinements that appeared in [2]. Let us start by introducing linear nilsequences (note that the work [4] does not use the more general notion of nilsequence, so we would not mention it), all of our definitions are taken from Ben Green s book [2]. Let G be a simply-connected Lie group nilpotent of class s, together with a filtration of closed connected subgroups G : G = G 0 = G 1 G 2 G s G s+1 = 1. The fact that groups with nilpotency class s arise in the study of multilinear averages of complexity s, such as the Gowers norms U s+1, is very natural if one notes that for an arithmetic progression h g n in such a group the terms h g s+1 and higher are all determined by the first ones h, h g,..., h g s. We assume also that the group G is equipped with a lattice: a discrete cocompact subgroup Γ G such that G is rational with respect to Γ in the sense that G i Γ is a lattice in G i. We also fix a basis B for the Lie algebra g compatible with the induced filtration g and attach to it a complexity quantity # B (G, Γ) = M defined as the smallest constant such that a ijk M and M a ijk Z, where a ijk are the Lie structure constants with respect to B, and such that M Z[B] log Γ 1 M Z[B]. Finally, attached to our choice of basis B are also Sobolev spaces W m, (B) for which one measures in L (G) m-folded derivatives of some function in G along the vector fields generated by B. The higher order analogues of e 2πix for this new Fourier analysis are then the following items: Definition 6. Given F C (G) which is automorphic (i.e. Γ-periodic), and g G, we call the function n F (g n ), n Z, a (smooth) linear nilsequence of class s. One important property of smooth nilsequences (as opposed to the ones where F is only assumed Lipschitz used in [4]) is that they have their dual Gowers norms: F (g n ) U s+1 [N] := sup { E n [N] f(n)f (g n ) : f U s+1 [N] 1 } bounded (which is proved in [2]). This then has a direct application to questions about Gowers uniformity (see also Corollary 11.6 in [4]): Proposition 7. (Nilsequences obstruct uniformity [4]). Let f : [N] R be a function with E n [N] f(n) 1 such that its U s+1 -norm is well-defined, and suppose that: E n [N] f(n)f (g n ) δ > 0 holds for some smooth class s nilsequence with F (g n ) U s+1 [N] 1, then: f U s+1 [N] δ 1. It is a very deep fact that for bounded functions f the converse holds. Theorem 8. (Inverse theorem in linear form [6],[2]). Assume that f : [N] R satisfies: f(n) 1, f U s+1 [N] δ,

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 7 then there exists a smooth linear nilsequence F (g n ) of class at most s such that # B (G, Γ) s,δ 1, dim G s,δ 1, F W m, (B) s,δ,m 1 (in particular F (g n ) U s+1 [N] s,δ 1), which correlates with f: En [N] f(n)f (g n ) s,δ 1. Note that this theorem reduces the problem of showing Gowers uniformity for a given function f to establishing a discorrelation estimate for any nilsequence, which is certainly easier as instead of testing the function against itself (which might be a quite complicated and rough object, as is the case with the von Mangoldt function Λ) we are testing it against a smooth linear nilsequence which is relatively a more elementary object (even if looking quite intimidating when encountering it for the first time). The problem that we face again in applying this to the asymptotic (1.1), as with the generalized von Neumann theorem, is that Λ is not uniformly bounded. But as we discuss in the next section, the transference principle developed by Green and Tao in [3] and [4] enables to relax the condition f(n) 1 to f(n) ν(n) for a pseudorandom measure ν. 2.3. The transference principle. Here we would like to describe some ideas that go to the transference principle of Green and Tao, namely the Koopman - von Neumann theorem, and the Furstenberg tower used to establish it. Proposition 9. (Koopman - von Neumann theorem [3], [4]). Suppose that f : [N] R is dominated by a (s + 2)2 s+1 -pseudorandom measure ν, then we can decompose f = f 1 + f 2 with: f 1 L (Z N ) 2 + o s(1), f 2 U s+1 (Z N ) = o s(1), and both f 1 and f 2 can be chosen to be supported on { 2N,..., 2N}. Remark 10. The (s + 2)2 s+1 for the pseudorandomness of ν is there for concreteness - the point is that it depends on s only and one will need to make sure that the enveloping sieve constructed in Section 3.2 has pseudorandomness constant larger than (s+2)2 s+1 and D s,t,d,l from Proposition 5, which will be true upon picking up the prime N large enough: i.e. N = C N with C = C s,d,t,l 1. In this case f U s+1 [N] s,t,d,l f U s+1 (Z N ) and one can safely transfer estimates between [N] and Z N. As far as Propositions 5 and 9 are concerned C 20 is certainly good enough. One can use now the Inverse Theorem 8 for the first component f 1 (the factor of 2 is of course irrelevant - should simply be absorbed into the implicit constants after rescaling - and comes from the fact that what one actually proves is f ± L 1 1 for the positive/negative parts of f 1 ) and the duality U s+1 - ( U s+1) for the second term f2 (together with the fact the nilsequence produced by the Inverse theorem have bounded dual norms) to relax the boundedness condition from last subsection (this is Proposition 10.1 in [4]): Proposition 11. (Relative Inverse Theorem [4]). With the assumptions as in Proposition 9 together with f U s+1 [N] δ, there exists a smooth nilsequence F (g n ) as in Theorem 8 satisfying: En [N] f(n)f (g n ) s,δ 1. Proposition 9 has a beautiful proof, essentially contained in [3], which has some vague analogues in ergodic theory (the actual Koopman - von Neumann theory), harmonic analysis (profile decomposition) and PDEs (the soliton resolution conjecture and induction on energy arguments). Let us outline the main steps in the arguments referring to [3] for the details.

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 8 Our aim will be the following structure theorem (Proposition 8.1 in [3]), and before stating it let us recall that a σ-algebra B for a finite set like Z N is just a partition of this set whose irreducible sets are called atoms, and the conditional expectation E(f B) of f with respect to B is simply a function obtained by averaging f on each atom (also recall our definition of Lebesgue spaces from the appendix, we will be writing L p (Z N ) = L p and similarly for U s+1 below). Proposition 12. (Generalized Koopman - von Neumann structure theorem [3]). Suppose that f : Z N R is dominated by ν as in Proposition 9, fix 0 < ε < 1, then for N large enough (and so for N ) there exists σ-algebra for Z N and an exceptional set Ω B satisfying: smallness of the defect: 1 Ω ν L 1 = o ε (1) L -estimate away from Ω: (1 1 Ω )E(ν 1 B) L = o ε (1) Gowers uniformity estimate away from Ω: (1 1 Ω )(f E(f B)) U s+1 κ s (ε). As mentioned above, the proof relies on the construction of a Furstenberg tower, and one important technical ingredient here is the notion of a dual function which is defined by: Df(x) := E h Z s+1 f(x + ω h). N ω {0,1} s+1 \{0} One important property of Df (given f ν, in which case we call Df a basic Gowers anti-uniform function), proved in Lemma 6.1 of [3], is that it is uniformly bounded: (2.3) Df L s 1 + o(1). We can associate to such a Df a σ-algebra B ε,η (Df) for a parameter 0 < η < 1/2 (ε is from the proposition) by defining the atoms to be given by Df 1 {[ε(n + α), ε(n + 1 + α))}, for a suitably chosen 0 α 1, where n Z runs through finitely many, O s (ε 1 ), values (see Proposition 7.2 of [3] for the exact choice of α and some other properties of B ε,η (Df)). In building the tower, we will have to consider σ-algebras generated by several basic Gowers anti-uniform functions f i, i = 1,..., K: (2.4) B K := B ε,η (Df 1 ) B ε,η (Df K ), and for which we shall define the exceptional subsets Ω K B K to be the union of those atoms A B K for which 1 A ν L 1 η 1/2 (one calls such atoms small). Those sets Ω K will ultimately make up Ω, and we can check already using the bound on the number of atoms that: (2.5) 1 Ω K ν = O K,ε(η 1/2 ). L 1 One central property of B K and Ω K, proved in Proposition 7.3 of [3], is the following L - estimate for ν: (2.6) (1 1 Ω K )E(ν 1 B K ) = O K,ε(η 1/2 ), L holding for N large enough, as for atoms A which are not small we have 1 A L 1 η 1/2 1 A (ν 1) L 1 and one needs the RHS to be bounded away from zero (which holds by the linear form conditions provided N is large depending on η) in order to reduce the estimate to E ZN 1 A (ν 1) L 1 = O K,ε (η 1/2 ) by the definition of conditional expectation. To quickly overview the proof of the last statement, we simply say that one shows that characteristic functions of atoms from B K can be suitably approximated by functions of the form Φ(Df 1,..., Df K ) with Φ continuous (this is in Proposition 7.2 of [3]) and so by Weierstrass theorem one is reduced to show that for a monomial P m of degree m: E ZN (ν 1)P m (Df 1,..., Df K ) = o ε,k,m (1)

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 9 which one proves using U s+1 - ( U s+1) duality, noting that by the linear forms condition ν 1 U s+1 = o(1) and the fact that P m (Df 1,..., Df K ) (U s+1 ) = O ε,k,m(1) (proved in Lemma 6.3 of [3] and is the only place where the weird correlation condition for ν is used!). We are ready now get the Furstenberg tower. One starts the induction with B 0 := {, Z N }, Ω 0 = and f 0 = f E ZN (f) and then constructs the tower up to level K for K = 0,..., K 0 1, with K 0 fixed to be the least integer larger than 100/g s (ε) say where g s (ε) = O s (ε) is the energy gap (see below), and so that B K is given by (2.4) with functions: (2.7) f i ν + 2 + O K,ε (η 1/2 ), hence Df i are basic Gowers anti-uniform up to an additive constant which is certainly harmless to the theory discussed above, and a defect set Ω K containing Ω K (so in particular (2.6) holds with Ω K ) satisfying: (2.8) 1 ΩK ν L 1 = O K,ε (η 1/2 ). All of this is of course easy to see for K = 0. One defines then f K+1 := (1 1 ΩK )(f E(f B K )) and asks whether f K+1 U s+1 κ s (ε) holds? The constant κ s (ε) (note we are using the κ- notation here) is related to the energy increment estimate presented below, and exact expressions for both the energy gap g s (ε) and κ s (ε) can be worked out from the proof of Proposition 8.2 in [3] (which unfortunately we are not going to present as it is quite technical). So if the answer is yes, we stop the algorithm and this yields Proposition 12. If not, then we carry on to the next step, for which, by considering f = f + f, decomposition into the positive and negative parts, we can use the L -estimate (2.6) outside Ω K with 0 f ± ν to get: (2.9) (1 1 ΩK )E(f B K ) L 2 + O K,ε (η 1/2 ), and so one would obtain then the bound (2.7) for f K+1. We are therefore in a position to get the next step in the Furstenberg tower as Df K+1 is a basic Gowers anti-uniform function and from the theory above we get the σ-algebra B K+1 := B ε,η (Df K+1 ) B K together with its exceptional set Ω K+1 B K+1 which we use to update the defect set Ω K+1 := Ω K+1 Ω K for which one still has 1 ΩK+1 ν L 1 = O K,ε (η 1/2 ) by (2.5) for Ω K+1 and (2.8) for Ω K. Moving from one step to the next in the tower costs a definite amount of energy whenever one has large enough Gowers norm: formally, there exist constants κ s (ε) and g s (ε) which we were mentioning and using already above, such that whenever f K+1 U s+1 κ s (ε) the energy increment estimate holds: (1 1 ΩK+1 )E(f B K+1 ) 2 L 2 (1 1 ΩK )E(f B K ) 2 L 2 + g s (ε), and is proved by Green and Tao in Proposition 8.2 of [3] (we should not say anything more about it). This estimate is absolutely central to the whole construction as it enables to conclude that the algorithm cannot run for an indefinite amount of steps as, from the estimate (2.9) we have an a priori bound on the energies: 0 (1 1 ΩK )E(f B K ) L 2 2 + O K,ε (η 1/2 ) and so choosing η = η(k 0, ε) sufficiently small one will have to invalidate this bound for some K K 0 (recall that K 0 = K 0 (ε, s) and so one really has η = η(s, ε)). In other words the algorithm has to terminate for some K fin K 0 and so making η even smaller if necessary, depending on ε, one obtains Proposition 12 setting B := B Kfin and Ω := Ω Kfin.

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 10 To close things up, let us remark that Proposition 9 follows pretty much directly from 12 (this is Proposition 10.3 in [4]). One sets: f 1 := ψ (1 1 Ω )E(f B) f 2 := ψ ((1 1 Ω )(f E(f B)) + 1 Ω f) where ψ is a smooth cut-off function identically 1 on [ N, N] and vanishing outside [ 2N, 2N] (Green and Tao use a de la Vallée Poussin kernel). Multiplying by such functions, roughly speaking, does not perturb too much the Gowers norms as one can decompose ψ into Fourier series and then use the phase invariance of the U s+1 -norm. The main issue here, however, is that one shall ask why should we expect 1 Ω f to be Gowers uniform? In fact, this follows from the following inequality, which uses dual functions again and was noted by Green and Tao in the proof of Proposition 10.3 in [4]: 1 Ω f 2s+1 U s+1 Dν L 1 Ωf L 1, the claim then follows from the L -estimate (2.3) for basic Gowers anti-uniform functions and the fact that 1 Ω f L 1 1 Ω ν L 1 = o ε (1). Proposition 9 then follows upon choosing ε to be a sufficiently slowly decaying function of N. 3. The application to primes In this section we will discuss, without great depth, what kind of results one need to use from analytic number to apply the higher order Fourier analysis techniques presented in the last section to the actual problem of obtaining the asymptotic (1.1) in the generalized Hardy- Littlewood conjecture. In the first subsection we quickly mention how to reduce the problem described in section 1 to a question of existence of an enveloping sieve (i.e. the pseudorandom measure) so that, after applying the generalized von Neumann theorem and the transference principle, one is left with showing a discorrelation estimate against nilsequences which ultimately follows from a deep fact from number theory established by Ben Green and Terence Tao in [5]. 3.1. Reduction to a discorrelation estimate. The reductions presented here are all made at the beginning of the paper [4] in Section 4, so please refer there for the details. In doing them, one should really put the hat of the people who were formulating the conjecture at the origin and needed to compute the constant term in the asymptotic (1.1). Remark 13. Throughout Sections 1 and 2 the affine-linear system Ψ was fixed. While reducing here the problem to a discorrelation estimate, we will need to change Ψ sometimes with the modifications generating new affine-linear systems of complexity s, which still are mapping into Z t but with the parameters d (for the domain) and L (the size) getting multiplied by some constant C s,d,t,l > 0 (this happens for example when we put Ψ into normal form or apply the W -trick). One should fix C s,d,t,l at the end of this section once all the reductions are done, and note that it does not depend on N (so is basically universal for our purposes). Therefore, any asymptotic that we claim implying (1.1) for our initial Ψ must hold for any such affine-linear system, and we shall abuse notation by not changing it for Ψ, d and L. Of course, when applying the theory from Section 2, we will just pick one system and fix it. Let us start now. First, we recall that in using the generalized von Neumann theorem one needs to put the affine-linear system Ψ into normal form. This is indeed possible by building the following extension Ψ : Z d Z t of Ψ: for each i [t], by definition of complexity, there are k i classes A i 1,... Ai k i partitioning the forms {ψ j : j i}, with k i s + 1, and so one

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 11 can pick vectors f1 i,..., f k i i Z d of universally bounded magnitude such that ψ j (fk i) = 0 for ψ j A i k and ψ i (fk i) 0 for all k [k i]. One set then: Ψ (n Z d, (m i,ji ),ji [k i ]) = Ψ(n + m i,ji fj i i ) j i [k i ] and it can be easily checked then that Ψ is in normal form, Ψ N = O s,d (L) and Ψ (Z d ) = Ψ(Z d ) (see Lemma 4.4 in [4]). One should then pick carefully a convex body K in [ N, N] d, upon enlarging N to O(N), such that the asymptotic (1.1) follows from: Λ(ψ i(n)) = β β p + o t,d,l(n d ), n K Z d as is done in [4] while deducing the Main theorem from Theorem 4.5 (recalling our convention discussed in Remark 13, we can therefore assume that Ψ is in normal form straightaway and drop from the notation). We would like to apply the machinery from Section 2 to the o-term in the asymptotic and for that we better present it in a form of a multilinear average, which can be achieved upon noting that one has, for the archimedean factor β, the relation: 1 R +(n) = β + o d,t,l (N d ) n K Z d proved in Appendix A of [4], and so replacing K with K Ψ 1 ((R + ) t ), and actually we shall be going even further and replace K with K Ψ 1 ( { x R t : x i > N 9/10} ), where the whole action is essentially happening as the contribution of 0 ψ i N 9/10 to the LHS of (1.1) can be shown directly to be o(n d ), we can deduce (1.1) from: (3.1) Λ(ψ i (n)) β p = o t,d,l (N d ), ψ i > N 9 10 on K, p P n K Z d where, up to enlarging N to O L (N), we can assume Ψ(K) [N] t as well (in fact this has already been done in the preliminary discussion to Section 2). Note that we haven t quite got yet a true multilinear average on the LHS, but this needs just a couple of steps more. Note that the contributions from powers of primes in the von Mangoldt function can be neglected and we can replace Λ by Λ which takes the value log p at a prime p and zero otherwise. It is convenient also to rescale Λ in such a way that the singular product p P β p gets replaced by 1: this is called the W -trick and is very well explained in Section 5 of [4] - we should just mention the outcome. One defines w(n) := log 3 N and use it to obtain a slowly growing product of primes: p P W := p w p. Since W log N one has for the singular product: β p = β W + o(1), p P this can deduced from Lemma 1.3 in [4], and hence the expected value of: Λ b,w φ(w ) := W Λ (W n + b),

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 12 defined for b [W ] coprime to W, should be asymptotically one. This can be exploited to reduce the asymptotic (3.1) to the question of establishing: (3.2) Λ b i,w (ψ i(n)) 1 = o t,d,l (N d ), N 7 10 < ψi < N on K, n K Z d for any b 1,..., b t less than and coprime to W, and any affine-linear system in normal form with bounds as stated in Remark 13. One last piece of manipulation needed in order to put ourselves in a position to use Section 2, is to write Λ b i,w (ψ i(n)) as Λ b i,w (ψ i(n)) 1 + 1 and expand the product in i, in which case the asymptotic for each individual term will be the consequence of the following factorized version of (3.2): ( (3.3) Λ bi,w (ψ i(n)) 1 ) = o t,d,l (N d ), n K Z d with K such that N 7 10 < ψ i < N, and that was the discorrelation estimate to which we wanted to reduce the asymptotic (1.1) and which appears in Proposition 5. The aim is now to find a pseudorandom measure dominating the functions Λ b i,w (ψ i(n)) 1 and prove that they are Gowers uniform. 3.2. The enveloping sieve and correlation estimates for the Möbius function and nilsequences. In this section we simply mention a few of the ingredients from analytic number theory needed to close our account of Green-Tao resolution of the finite complexity generalized Hardy-Littlewood conjecture. We start by introducing the Möbius function µ defined by µ(n) = ( 1) d whenever n Z is a product of d distinct primes and µ(n) = 0 otherwise. We have then the following classical identity relating µ to the von Mangoldt function Λ: (3.4) Λ(n) = d n µ(d) log d, n 1. One constructs the enveloping sieve for Λ b i,w 1 by smoothing out Λ in the following way: one picks a smooth even function χ : R R supported on the interval [ 1, 1], attaining the value 1 at 0 and having unit energy 1 0 χ (t) dt = 1. Fixing auxiliary parameters γ > 0 and l a positive integer, one defines then: (3.5) Λ χ,γ,l (n) := (log N γ ) l µ(d)χ( log d log N γ ), n Z. d n The following is Proposition 6.4 from [4]: Proposition 14. (Domination by a pseudorandom measure [4]). Fix D > 1, then there exists a constant C 0 (D) such that for C C 0 one can find a constant γ = γ(c, D) (0, 3/5) and a prime N [C N, 2C N], for N large enough, such that the sieve: ν(n) = 1 2 + 1 2 E φ(w ) W Λ χ,γ,2(w n + b i ), n [N], extended by 1 to Z N [N] is a D-pseudorandom measure dominating: for n [N 3/5, N]. 1 + Λ b 1,W (ψ 1(n)) + + Λ b t,w (ψ t(n)) D,C ν(n)

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 13 We will not say anything about the proof of this proposition except the fact that the verification of the linear form conditions for ν involves, among other things, the so called Goldston-Yıldırım estimate, proved in Theorem D.3 of [4], which, loosely speaking, establishes a suitably modified version of the asymptotic (1.1) from the generalized Hardy-Littlewood conjecture for the smoothed version (3.5) of the von Mangoldt function, provided γ is small enough. These type of methods are used by analytic number theorists to establish upper bounds on the number of primes solving linear equations (but for lower bounds, and the actual asymptotic, the machinery of higher order Fourier analysis is required). Fixing D in Proposition 14 larger than the ones used in Propositions 12 and 9, we can plug the problem of establishing the asymptotic (1.1) into the theory discussed in Section 2 which leads to the following. By the reductions we made in Section 3.1 and the generalized von Neumann theorem we need to prove the Gowers uniformity estimate: Λ b,w 1 U s+1 [N] = o(1). Suppose, for contradiction, that the above fails so that there exists some δ > 0 such that Λ b,w 1 U s+1 [N] > δ no matter how large N gets. Then, applying the transference principle, Proposition 11, to reach a contradiction it is enough to show the following correlation estimate: (3.6) E n [N] ( Λ b,w (n) 1 ) F (g n ) = o(1) for a nilsequence F as arising in the Inverse Theorem 8 (in particular it has the fabulous property that its dual Gowers norm is bounded, which allows us to easily use U s+1 - ( U s+1) duality, and the preliminary technical averaging procedure from Section 11 of [4] is therefore not required). It will be more convenient to work here with the actual von Mangoldt function, because of the identity (3.4), and establish: ( φ(w ) E n [N] Λ(W n + b) 1 W ) F (g n ) = o(1) which certainly implies (3.6). From there, one proceeds splitting Λ into a smooth Λ and rough Λ parts according the cut-off functions χ = χ + χ, where χ : R + R + is the identity function χ(x) = x, χ and χ vanish for x 1 and x 1/2 respectively. We set then γ s := 1 10 2 s (this concrete constant is obtained in the proof) and: Λ / (n) := (log N γs ) d n µ(d)χ / ( log d ), n [N]. γs log N We remark that Λ = Λ χ,γ s,1 and in fact Green and Tao use again the Goldston-Yıldırım estimate to obtain that (see Appendix D in [4]): φ(w ) W Λ (W n + b) 1 = o(1), U s+1 [N] hence using the U s+1 - ( U s+1) duality one has: ( ) φ(w ) E n [N] Λ(W n + b) 1 F (g n ) = o(1), W

LINEAR EQUATIONS IN PRIMES: AFTER GREEN AND TAO 14 which says roughly that the asymptotic (1.1) is driven mainly by the smooth part of the von Mangoldt function, and one needs to show that the rough part can be treated as an error: φ(w ) E n [N] W Λ (W n + b)f (g n ) = o(1). This is a deep fact from analytic number theory, and after some further reductions carried in Section 12 of [4] on which we won t report, it can be deduced from the strong orthogonality property of the Möbius function with nilsequences established by Ben Green and Terence Tao in [5]. Appendix: notation and conventions For two quantities X and Y we write X = O a1,...,a k (Y ) if there exists a constant C a1,...,a k > 0, depending on the auxiliary parameters a 1,..., a k only, so that X C a1,...,a k Y ; in the same spirit we write X a1,...,a k Y if X C a1,...,a k Y. The parameter N is reserved for a large integer that is ultimately tending to infinity and with respect to which we shall write X = o a1,...,a k (Y ) meaning that there exists a sequence of constants c a1,...,a k (N) tending to zero as N and X c a1,...,a k (N)Y. We will be using the κ-notation for constants tending to zero as their parameters do so, i.e. κ(δ) 0 whenever δ 0 for example. Concerning sets, we shall denote by [N] := {1,..., N} the corresponding discrete interval and we also set Z p := Z/pZ for cyclic groups (not to be confused with p-adic integers). For a function f : A C, where A is some non-empty finite set, we shall denote the expectation of f over A by: E a A f(a) = 1 f(a), A here A stands for the cardinality of A. We define the Lebesgue spaces L p (A) with respect to the normalized measure B A E a A 1 B (a) = B / A. a A References [1] B. Green. Generalising the Hardy-Littlewood method for primes. International Congress of Mathematicians, Eur. Math. Soc., Zurich, Vol. II:373 399, 2006. [2] B. Green. Higher-Order Fourier Analysis, I. book in preparation, 2015. [3] B. Green and T. Tao. The primes contain arbitrarily long arithmetic progressions. Annals of Math., 167, no.2:481 547, 2008. [4] B. Green and T. Tao. Linear equations in primes. Annals of Math., 171, no.3:1753 1850, 2010. [5] B. Green and T. Tao. The Möbius function is strongly orthogonal to nilsequences. Annals of Math., 175, no. 2:541 566, 2012. [6] B. Green, T.Tao, and T. Ziegler. An inverse theorem for the Gowers U s+1 [N]-norm. Annals of Math., 176, no.2:1231 1372, 2012. [7] T. Tao. Higher order Fourier analysis. volume 142 of Graduate Studies in Mathematics, American Mathematical Society, Providence, RI., 2012. E-mail address: roland.grinis@maths.ox.ac.uk