Some approximation theorems in Math 522. I. Approximations of continuous functions by smooth functions

Some approximation theorems in Math 522 I. Approximations of continuous functions by smooth functions The goal is to approximate continuous functions vanishing at ± by smooth functions. In a previous homewor problem a C -function φ was constructed with the property that φ is positive on (, and φ(t = 0 for t. If we divide by a suitable constant we may achieve and assume and we may also write Now for s > 0 define φ(tdt = φ(tdt = since φ vanishes off [,]. φ s (t = s φ( t. s Then we also have φ s (tdt =, by the substitution u = t/s. Graph the function φ s for small values of the parameter s. Definition. For continuous f C(R we define A s f(x = φ s (x tf(tdt. We shall be interested in the behavior of A s f for s 0. Note that the t-integral extends over a compact interval depending on x, s. The integral is also called a convolution of the functions φ s and f. Exercise: Let f C(R. Show that for every s > 0 the function x A s f is a C function on (,. If lim x f(x = 0 then show also that lim x A s f(x = 0. Theorem. (a Let f C(R and let J be any compact interval. Then, as s 0, A s f converges to f uniformly on J. (b Let f be as in (a and assume in addition that lim x f(x = 0. Then A s f converges to f uniformly on R. Proof. We shall only prove part (b. As an exercise you can prove part (a in the same way, or alternatively, deduce it from part (b. One may change variables to write A s f(x = φ s (tf(x tdt. TheconvolutionoftwofunctionsdefinedonRisgivenbyf g(x = f(yg(x ydy whenever this maes sense; one checs f g = g f.

2 Since φ s (tdt = we see that A s f(x f(x = φ s (t[f(x t f(x]dt. Note that, since φ s (t = 0 for t > s, the t integral is really an integral over [ s,s]. The assumptions that f is continuous and that lim x f(x = 0 imply that f is uniformly continuous on R (prove this!. Thus given ε > 0 there is a δ > 0 so that f(x t f(x < ε/2 for all t with t δ and for all x R. If 0 < s < δ we have by the nonnegativity of φ s A s f(x f(x for all x R. s sφ s (t f(x t f(x dt ε 2 s sφ s (tdt = ε 2 Terminology: The linear transformations (aa as linear operators A s are called approximations of the identity. The identity operator Id is simply given by Id(f = f, and the above Theorem says that the operators A s approximate in a certain sense the identity operator as s 0. One can use other approximations of the identity defined lie the one above where φ is replaced by a not necessarily compactly supported function. If one drops the compact support the proofs get slightly more involved. Other types of approximations of the identity (with a parameter n are given by the families of linear operators L n in II below and B n in III below. For each f these linear operators will produce families of polynomials depending on f. II. The Weierstrass approximation theorem Theorem. Let f be a continuous function on an interval [a, b]. Then f can be uniformly approximated by polynomials on [a, b]. In other words: Given ε > 0 there exists a polynomial P (depending on ε so that max f(x P(x ε. x [a,b] Here f may be complex valued and then a polynomial is a function of the form N =0 a x with complex coefficients a (considered for x [a,b]. If f is real-valued, the polynomial can be chosen real-valued. Define The Landau polynomials. Q n (x = c n ( x 2 n where c n = ( ( s2 n ds so that Q n(tdt =.

Let f be continuous on the interval [ /2, /2]. The sequence of Landau polynomials associated to f is defined by L n f(x = /2 /2 f(tq n (t xdt. Verify that L n f is a polynomial of degree at most 2n. We shall prove the Weierstrass approximation theorem for the interval I γ := [ 2 +γ, 2 γ] for small γ > 0, by using the Landau Polynomials. By a change of variable one can then use the Weierstrass approximation theorem on any compact interval [a,b]. See the last paragraph in this section. Our first concrete version of the Weierstrass approximation theorem is Theorem. Let γ > 0 and let I γ = [ /2+γ,/2 γ]. The sequence L n f converges to f, uniformly on the interval I γ, i.e. max L N f(x f(x 0. x I γ 3 Proof. 2 We first need some information about the size of the polynomials Q n. Consider c n = ( s2 n ds. We use the inequality ( x 2 n nx 2, for 0 x. To see this let h(x = ( x 2 n +nx 2. The derivative of h is h (x = 2xn( x 2 n +2nx = 2nx( ( x 2 n which is positive for x [0,]. Thus h is increasing on [0,] and since h(0 = 0 we see that h(x 0 for x [0,]. Since h is even we have h(x 0 for x [,]. We use the last displayed inequality in the integral defining the constant c n and get n = ( x 2 n dx = 2 c and from this we obtain 2 ( x 2 n dx 2 0 n /2 0 n /2 ( nx 2 dx > n /2 ( Q n (x n( x 2 n. 0 ( x 2 n dx Given ε > 0 the goal is to show that max x Iγ L n f(x f(x < ε for sufficiently large n. Let ε > 0. Since f is uniformly continuous on [ /2,/2] we can find δ > 0 so that δ < γ and so that for all x I γ and all t with t δ we have that f(x+t f(x < ε/4. 2 The proof here is essentially the same as the proof of Weierstrass theorem in Theorem 7.26 of W. Rudin s boo.

4 Write (with a change of variables /2 /2 f(sq n (s xds = 2 +x 2 +x f(t+xq n (tdt Since x I γ = [ /2 + γ,/2 γ] and since δ < γ we have /2 + x < < δ < /2+x. We may thus split the integral as /2+x + + 2 +x δ f(t+xq n (tdt. The idea is that the first and the third term will be small for large n. We modify the middle integral further to write f(t+xq n (tdt = and finally (using Q n(tdt = f(x Q n (tdt = f(x f(x Putting it all together we get where Estimate I n (x = II n (x = [f(t+x f(x]q n (tdt+f(x Q N (tdt f(x L n f(x f(x = I n (x+ii n (x+iii n (x [f(t+x f(x]q n (tdt /2+x III n (x = f(x I n (x = f(t+xq n (tdt+ Q N (tdt f(x 2 +x δ f(t+x f(x Q n (tdt δ δ Q n (tdt Q N (tdt. f(t+xq n (tdt Q N (tdt. ε N (tdt 4 Q ε Q N (tdt = ε 4 4 ; this estimate is true for all n. Now let M = max x [ /2,/2] f(x. Then by our estimate (* for Q n we see that II n (x + III n (x 2M max Q n(t 2M n( 2 n t [,] [δ,] and since 2M n( 2 n tends to 0 as n we see that there is N so that for n N we have max x Iγ II n (x+iii n (x < ε/2 for n N. If we combine this with the estimate for I n (x we see that L n f(x f(x < ε for n > N and all x I γ.

5 Arbitrary compact intervals. Consider an interval[a, b] and let f C([a, b]. Let l(t = Ct + D so that l( 2 + γ = a and l( 2 γ = b (you can compute that C, D explicitly and C 0, the inverse of l is given by l (x = C x C D. The function g l is in C(I γ. Thus there exists a polynomial P such that max g(l(t P(t < ε t I γ and therefore if we set Q(x = P(l (x then Q is a polynomial and we have max g(x Q(x < ε. x [a,b] Remar: A much more general version of the Weierstrass approximation theorem is due to Marshall Stone, the Stone-Weierstrass theorem. We will prove it in class and the proof can be found in Rudin s boo. III. The Bernstein polynomials: A second proof of Weierstrass theorem Here we consider the interval [0,]. For n =,2,... define B n f(t = f( ( n n t ( t n, =0 the sequence of Bernstein polynomials associated to f. Here ( n = n!!(n!, the binomial coefficients. For each n, B n f is a polynomial of degree at most n. Theorem. If f C([0,] then the polynomials B n f converge to f uniformly on [0,]. For the proof we will use the following auxiliary Lemma. ( ( n n t2 t ( t n n. 0 n We shall first prove the Theorem based on the Lemma and then give a proof of the Lemma. There is also a probabilistic interpretation of the Lemma which is appended below. Proof of the theorem. By the binomial theorem ( n = (t+( t n = t ( t n =0

6 and thus we may write B n f(t f(t = = f( ( n n t ( t n f(t =0 =0 0 n n t δ [ f( n f(t] ( n t ( t n Given ε > 0 find δ > 0 so that f(t+h f(t ε/4 if t,t +h [0,] and h < δ. For the terms with n t δ we will exploit the smallness of f( n f(t] and for the terms with n t > δ we will exploit the smallness of the term in the Lemma, for large n. We thus split B n f(t f(t = I n (t+ii n (t where I n (t = ( [ n f( n f(t] t ( t n II n (t = 0 n n t >δ [ f( n f(t] ( n t ( t n the decomposition depends of course on δ but δ does not depend on n. We show that I n (t ε/4 for all n = 2,3,... Indeed, since f( n f(t ε/4 for n t δ we compute I n (t f( ( n n f(t t ( t n ε 4 0 n n t δ 0 n ( n t ( t n = ε 4 where we have used again the binomial theorem. Concerning II n we observe that δ 2 ( n t2 for n t δ and estimate f( n f(t 2max f. Thus II n (t δ 2 ( ( n t2 n f( n f(t t ( t n 0 n n t >δ δ 2 2max f 0 n ( n t2 ( n t ( t n By the Lemma II n (t (4n δ 2 2max f and for sufficiently large n this is ε/2 and we are done.

Proof of the Lemma. We set ψ 0 (t =, ψ (t = t and ψ 2 (t = t 2, etc. Then we can explicitly compute the polynomials B n ψ 0, B n ψ, B n ψ 2 for n =,2,... First, by the binomial theorem (as used before ( n B n ψ 0 (t = t ( t n = =0 =0 thus B n ψ 0 = ψ 0. Next for n B n ψ (t = n ( n t ( t n (n! = (!(n! t ( t n = ( n = t t ( t n ( = n ( n = t t j ( t n j = t j j=0 which means B n ψ = ψ for n. To compute B n ψ 2 we observe that B ψ 2 (t = ψ 2 (0( t+ψ 2 (t = t = ψ (t and, for n 2 B n ψ 2 (t = =0 2 n 2 ( n t ( t n + (n! = n (!(n! t ( t n = = ( ( n 2 t 2 (n t 2 ( t n 2 ( 2 n 2 =2 ( n +t t ( t n ( = = (n t2 +t = t 2 + t t2 n n. We summarize: For n 2 we have B n ψ 0 = ψ 0, B n ψ = ψ, B n ψ 2 = ψ 2 + n (ψ ψ 2. To prove the assertion in the Lemma lets multiply out ( n t2 = ( n 2 2t n +t2 7

8 and use that the transformation f B n f(t is linear (i.e we have B n [c f + c 2 f 2 ](x = c B n f (x+c 2 B n f 2 (x for functions f,f 2 and scalars c,c 2. We compute, for n 2 ( ( n n t2 t ( t n 0 n = B n ψ 2 (t 2tB n ψ (t+t 2 = t 2 + t t2 2t t+t 2 = t t2 n n and since max 0 t t t 2 = /4 we get the assertion of the Lemma. Remar. Let s consider an arbitrary compact interval [a, b] and let f C([a, b]. Then the polynomials ( n (x a P n f(x = f(a+ n (b a (b x n (b a n =0 converge to f uniformly on [a,b]. Using a change of variable derive this statement from the above theorem.

9 Addendum: Probabilistic interpretation of the Bernstein polynomials. You might have seen the expressions B n f(t in a course on probability. In what follows the parameter t is a parameter for a probability (between 0 and. Let s consider a series of trials of an experiment. Each trial may is supposed to have two possible outcomes (either success or failure. Each integer in X n := {,...,n} represents a trial; we label the jth trial as T j. Let t [0,] be fixed. In each trial the probability of success is assumed to be t, and the probability of failure is then ( t. The trials are supposed to be independent. Let A be a specific subset of {,...,n} which is of cardinality, i.e. A is of the form {j,j 2,...,j } for mutually different integers j,...,j ; if = 0 then A =. Then the event Ω A that for each j A the trial T j results in a successandforeach j X N \A thetrial T j resultsinafailurehasprobability t ( t n. There are exactly ( n subsets A of Xn which have cardinality and they represent mutually exclusive (aa disjoint events. Let Ω be the event that the n trials result in successes, then the probability of Ω is ( n P(Ω = t ( t n. The probabilities of the mutually exclusive events Ω add up to ; P(Ω = ; =0 (cf. the binomial theorem. Let now X be the number of successes in a series of n trials (X is a random variable which depends on the outcome of each trial. The event Ω is just the event that X assumes the value (one writes P(Ω also as P(X =. The random variable X/n is the ratio of successes and total numberoftrials, andit taes values in[0,] (moreprecisely in{0, n,..., n n }. The expected value of X/n is by definition =0 E[X/n] = =0 n P(Ω and in the proof of the Lemma we computed it to ( n E[X/n] = t ( t n = B n ψ (t = t. n Generally, if f is a function of t, the expected value of f(x/n is equal to E[f(X/n] = f ( P(Ω = f( ( n n n t ( t n ; =0 =0

0 that gives the probabilistic interpretation of the Bernstein polynomials evaluated at t; B n f(t = E[f(X/n]. The Variance of X/n is given by E [( X n E[X n ] 2] = ( ( n t n t2 ( t n = t t2 n, =0 as computed in the proof of the lemma. Letδ > 0beasmallnumber. Theprobabilitythatthenumberofsuccesses deviates from the expected value tn by more than δn is given by 0 n nt δn P(Ω = 0 n nt δn ( n t ( t n. The smallness of this quantity (uniformly in t played an important role in the Bernstein proof of Weierstrass theorem. It was estimated by [ ( X E[X] ] 2 E = δ 2 ( ( n t δn n t2 ( t n = t t2 δ 2 n. =0 Thus,bythestatementoftheLemma, theeventthatthenumberofsuccesses deviates fromtheexpectedvaluetnbymorethanδnhasprobabilitynomore than (4δ 2 n. IV. Approximation by trigonometrical polynomials. Prelude: Basic facts and formulas for the partial sum operator for Fourier series. Consider the partial sums of the Fourier series S n f(x = c e ix = n where c = f(te it dt are the Fourier coefficients. We can write π S n f(x = f(ye iy dye ix = n = f(yd n (x ydy where D n (t = = n e it. Definition. The convolution of two periodic functions f, g is defined as f g(x = f(yg(x ydy.

Note that the convolution of periodic continuous functions is well defined and is again a -periodic continuous function. and we also have the commutativity property f g(x = g f(x To see we first note that for a periodic inegrable functione we have F(tdt = a+π a F(tdt for any a. The commutativity property follows if in the definition of f g we change variables t = x y (with dt = dy and get f g(x = = x+π x f(yg(x ydy = f(x tg(tdt = x x+π f(x tg(t( dt g(tf(x tdt = g f(x where in the last formula we have used the -periodicity of f and g. Going bac to the partial sum of the Fourier series we have S n f(x = f D n (x = D n f(x where D n (t = e it. = n Below we will need a more explicit expression for D n, namely To see this we use n e i(n+t D n (t = sin(n+ 2 t sin t 2 =0 eit = ei(n+t e it and = n eit = n = e it = andthesecondsumcanbesimplifiedto e int. ThusD e it e it n (t =. Multiplying numerator and denominator with e it/2 yields e i(n+t e int e it D n (t = ei(n+/2t e i(n+/2t e it/2 e it/2 and this yields the displayed formula. Fejér s theorem We would lie to prove that every continuous function can be approximated by trigonometric polynomials, uniformly on [, π]. One may thin that, in view of Theorem 8. in Rudin s boo, the partial sums S n f of the Fourier series are good candidates for such an approximation. Unfortunately for merely continuous f, given x, the partial sums S n f(x may not converge to f(x (and then of course S n f cannot converge uniformly. 3 3 The situation is even worse. Given x [,π] one can show that in a certain sense the convergence of S nf(x fails for typical f. I hope to return to this point later in the class.

2 However instead of S n f we consider the better behaved arithmetic means (or Cesàro means of the partial sums. Define σ N f(x = N + N S n f(x. The means σ N f are also called the Fejér means of the Fourier series, in tribute to the Hungarian mathematician Leopold Fejér who in 900 published the following Theorem. Let f be a continuous -periodic function. Then the means σ N f converge to f uniformly, i.e. n=0 max x R σ Nf(x f(x 0, as N. If we use the convolution formula S n f = D n f then it follows that where σ N f(x = K N f(x = = K N (t = N + K N (x yf(ydy K N (tf(x tdt N D n (t n=0 K N is called the Nth Fejér ernel. We need the following properties of K N. Lemma. (a Explicit formulas for K N on [,π] are given by K N (x = cos(n +x N + cosx ( sin N+ 2 = x 2, 2(N + sin x 2 if x is not an integer multiple of. Also K N (0 = N +. (b K N (x 0 for all x 0. (c π K N (tdt =. (d K N (x ( 2 for 0 < δ x π. N + cosδ By (c, (d most of K N is concentrated near 0 for large N. Properties (b, (c, (d are important, the explicit expressions for K N much less so.

Proof of the Lemma. We use and rewrite the above explicit formula for the Dirichlet ernel namely D n (x = sin(n+ 2 x sin x 2 = sin x 2 sin(n+ 2 x sin 2 x. 2 Observe that 2sinasinb = cos(a b cos(a + b and apply this with a = (n+ 2 x, b = x 2 to get Thus D n (x = cosnx cos(n+x 2sin 2 x. 2 K N (x = N + = N + = N + N D n (x n=0 N n=0 cosnx cos(n+x 2sin 2 x 2 cos(n +x 2sin 2 x 2 Now recall the formula cos2a = cos 2 a sin 2 a = 2sin 2 a, hence 2sin 2 a = cos2a. If we use this for a = x/2 we get the first claimed formula for K N, and if we use it for a = (N + x 2 then we get the second claimed formula. Compute the limit as x 0, this yields K N (0 = N +. Property (d is immediate from the first explicit formula. Estimate cos(n + x 2 and ( cosx cosδ for δ x π and also use that the cosine is an even function to get the same estimate for x. The nonnegativity of K N is also clear from the explicit formulas. The property (c follows from D n(tdt = (and taing the arithmetic mean of s gives a. Proof of Fejér s theorem. Given ε > 0 we have to show that there is M = M(ε so that for all N M, Now we write σ N f(x f(x = = σ N f(x f(x ε for all x. K N (tf(x tdt f(x K N (t [ f(x t f(x ] dt; here we have used property (c. f is continuous and therefore uniformly continuous on any compact interval. Since f is also -periodic, f is uniformly continuous on R. This means 3

4 that there is a δ > 0 such that f(x t f(x ε 4 for t δ, and allx R. We split the integral into two parts: where K N (t [ f(x t f(x ] dt = I N (x+ii N (x I N (x = II N (x = K N (t [ f(x t f(x ] dt, [,π]\[,δ] K N (t [ f(x t f(x ] dt. We give an estimate of I N which holds for all N. Namely I N (x K N (t f(x t f(x dt K N (t ε 4 dt K N (t ε 4 dt = ǫ 4, by (b and (c. Since this estimate holds for all N we may now choose N large to estimate the second term II N (x. We use property (d to estimate the integral for x [δ,π] [,]. We crudely bound f(x t f(x f(x t + f(x 2max f. Thus II N (x 2max f N + ( 4max f cosδ [,π]\[,δ]. N + ( 2 cosδ dt As N+ 0 as N we may choose N 0 so that for N N 0 the quantity ( 4max f N+ cos δ is less than ε/4. Thus for N N0 both quantities I N (x and II N (x are ε/4 for all x and thus we conclude that max x R σ Nf(x f(x ε/2 for N N 0. An application for the partial sum operator Theorem. Let f be a continuous -periodic function. Then ( /2 S n f(x f(x dx 2 = 0 lim n

i.e., S n f converges to f in the L 2 -norm in the space of square-integrable functions. 4 Proof. By Theorem 8. in Rudin (which is linear algebra we have S N t M = t M for every trigonometric polynomial t M (x = M = M γ e it provided that N M. Now let ε > 0. By Fejér s theorem we can find such a trigonometric polynomial t M (of some degree M depending on ǫ so that max f(x t M (x ε. Then for n > M we have S n f f = S n (f t M (f t M. We also have SN (f t M 2 f t M 2 this is just (76 in 8.3 in Rudin. Thus S n f f S n (f t M + f t M 2 f t M. 5 But we have ( π /2 f(x t M (x 2 dx max f tm < ε 2 and we are done. Fejér s theorem implies the Weierstrass approximation theorem A very short proof of the Weierstrass approximation theorem can be given assuming Fejér s theorem. By a change of variable it suffices to consider the interval [a,b] = [ π 2, π 2 ]. Extend the function f to a continuous function F on [,π] so that F(x = f(x on [ π 2, π 2 ] and F( = F(π = 0. Then we can extend F to a continuous periodic function on R. Let ε > 0. By Fejér s theorem we can find a trigonometric polynomial N T(x = a 0 + [a cosx+b sinx] = so that max F(x T(x < ε/2. x R Now the Taylor series for cos and sin converge uniformly on every compact interval. Thus we can find a polynomial P so that max T(x P(x < ε/2. x [.π] Combining the two estimates (and using that f = F on [ π 2, π 2 ] yields max f(x P(x = max F(x P(x < ε. x π/2 x π/2 4 ( Recall: This norm is given by f = f(x 2 dx /2 and is derived from the scalar product f,g = π f(xg(xdx.