Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed

Similar documents
1 Convergence in Probability and the Weak Law of Large Numbers

LECTURE 11 LINEAR PROCESSES III: ASYMPTOTIC RESULTS

Lecture 11 and 12: Basic estimation theory

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

7.1 Convergence of sequences of random variables

Chapter 7 Isoperimetric problem

7.1 Convergence of sequences of random variables

Convergence of random variables. (telegram style notes) P.J.C. Spreij

5 Birkhoff s Ergodic Theorem

Math Solutions to homework 6

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Lecture 12: September 27

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Sequences and Series of Functions

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 7: Properties of Random Samples

Stat410 Probability and Statistics II (F16)

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Lecture 8: Convergence of transformations and law of large numbers

Efficient GMM LECTURE 12 GMM II

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

PAPER : IIT-JAM 2010

Abstract Vector Spaces. Abstract Vector Spaces

LECTURE 8: ASYMPTOTICS I

Lecture 20: Multivariate convergence and the Central Limit Theorem

Chapter 3 Inner Product Spaces. Hilbert Spaces

Advanced Stochastic Processes.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Lecture 19: Convergence

Filter banks. Separately, the lowpass and highpass filters are not invertible. removes the highest frequency 1/ 2and

Machine Learning Brett Bernstein

Topics. Homework Problems. MATH 301 Introduction to Analysis Chapter Four Sequences. 1. Definition of convergence of sequences.

Probability for mathematicians INDEPENDENCE TAU

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

x x x Using a second Taylor polynomial with remainder, find the best constant C so that for x 0,

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

MAS111 Convergence and Continuity

8.1 Introduction. 8. Nonparametric Inference Using Orthogonal Functions

MINIMAX RATES OF CONVERGENCE AND OPTIMALITY OF BAYES FACTOR WAVELET REGRESSION ESTIMATORS UNDER POINTWISE RISKS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Math 61CM - Solutions to homework 3

Assignment 5: Solutions

Introduction to Optimization Techniques

INVERSE THEOREMS OF APPROXIMATION THEORY IN L p,α (R + )

Intro to Learning Theory

APPROXIMATION BY BERNSTEIN-CHLODOWSKY POLYNOMIALS

1 Lecture 2: Sequence, Series and power series (8/14/2012)

Lecture 19. sup y 1,..., yn B d n

Solutions: Homework 3

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

1 Approximating Integrals using Taylor Polynomials

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS * M. JA]IMOVI], I. KRNI] 1.

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Kernel density estimator

Riesz-Fischer Sequences and Lower Frame Bounds

An Introduction to Asymptotic Theory

arxiv: v1 [math.pr] 13 Oct 2011

A brief introduction to linear algebra

On the convergence rates of Gladyshev s Hurst index estimator

HOMEWORK I: PREREQUISITES FROM MATH 727

Lecture 27: Optimal Estimators and Functional Delta Method

6.3 Testing Series With Positive Terms

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

ECE 901 Lecture 13: Maximum Likelihood Estimation

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Self-normalized deviation inequalities with application to t-statistic

Estimation for Complete Data

STAT Homework 2 - Solutions

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

e to approximate (using 4

Distribution of Random Samples & Limit theorems

Lecture 10 October Minimaxity and least favorable prior sequences

Math 220B Final Exam Solutions March 18, 2002

Mi-Hwa Ko and Tae-Sung Kim

Section 11.8: Power Series

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

f n (x) f m (x) < ɛ/3 for all x A. By continuity of f n and f m we can find δ > 0 such that d(x, x 0 ) < δ implies that

Introduction to Probability. Ariel Yadin

Math 2784 (or 2794W) University of Connecticut

Solutions to home assignments (sketches)

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

Learning Theory: Lecture Notes

Machine Learning Brett Bernstein

Supplemental Material: Proofs

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

2.4.2 A Theorem About Absolutely Convergent Series

MAT1026 Calculus II Basic Convergence Tests for Series

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Lecture 13: Maximum Likelihood Estimation

Transcription:

Wee 0 A Itroductio to Wavelet regressio. De itio: Wavelet is a fuctio such that f j= j ; j; Zg is a orthoormal basis for L (R). This fuctio is called mother wavelet, which ca be ofte costructed from father wavelet '. The father wavelet ' is ot a wavelet, but we ca costruct wavelets from it, so it is equally importat as mother wavelet. Example: Haar wavelet (A. Haar, Math. A. (90)) ad (x) = x [0; =) x [=; ) j= j j= x = j ; j + (j+) j= x j + (j+) ; j + j. Let ' (t) is the idicator fuctio over the iterval [0; ). Let s de e V j = spa ' j = j= ' j o, Z Sice ' (x) = ' (x ) + ' (x), the followig four properties are satis ed (i)... V V V 0 V V :::; (ii) f (x) V j () f (x) V j+ ; (iii) [ jz V j = L (R), (iv) there is a fuctio ' such that f' ( ) ; Zg is a orthoormal basis for V 0. A equivalet form of property (iii): Recall that bf () = Z f (x) exp ( ix) dx It ca be show that, for ay ' the property (iii) is equivalet to b' (0) 6= 0 ad jb'j is cotiuous at 0. A setch of the proof is as follows: First, [ jz V j is ivariace uder all traslatios. Secod, if g is orthogoal to [ jz V j, which implies g (x) is orthogoal to ' j (x + t) for all t, the Placherel formula implies bg () b' j = 0 a.s.. We ow b' 6= 0 aroud 0. Let j!, we the coclude bg () = 0 a.s.. From ' to : Observe that (x) = ' (x) ' (x )

ad If we de e h'; i = 0. W j = spa j= j o, Z the ad more geerally We also see that JL j= i other words,f j = j= j L (R) : V 0 L W0 = V V j L Wj = V j+ W j = V J+! L (R), J!,, j; Zg is a orthoormal basis for. Multiresolutio aalysis (MRA) a geeral framewor to costruct wavelet fuctios More geerally, if there is ' such that f' ( ) ; Zg is a orthoormal system, ad x p ' = h ' (x ) adjb'j is cotiuous at 0 with b' (0) 6= 0. Let s de e V j = spa j= ' j o, Z The the followig four properties are satis ed i)... V V V 0 V V :::; ii) f (x) V j () f (x) V j+ ; iii) [ jz V j = L (R) iv) there is a fuctio ' such that f' ( ) ; Zg is a orthoormal basis for V 0. This is called a MRA i L (R). It is easy to see property (iv) is equivalet to Z ;0 = ' (x) ' (x ) dx = Z b' () e i d = Z for all, i.e., P l= jb' ( + l)j = a.s.. From ' to : There is a sequece fh g such that x p ' = h ' (x ) 0 l= jb' ( + l)j e i d the b' () = p b' () b h () = M p b' () h exp (i).

It is easy to see b h (0) = p ad b h () + b h ( + ) = which is due to the idetity P l= jb' ( + l)j =. If satis es x p = g ' (x ) the De e i.e., b () = p b' () bg () = M p b' () g exp (i). g = ( ) h bg () = ( ) h exp (i) = ( ) h exp ( i ( ) ) = e i b h ( + ); the which implies f ( l= b ( + l) = ) ; Zg is a orthoormal system, ad bg () b h () + bg ( + ) b h ( + ) = 0 which implies spa f ( ) ; Zg? spa f' ( ) ; Zg because De i b () ; b' () E ad it is easy to see More geerally ad = D e i b' () bg () ; b' () b E h () = Z h e i bg () b h () + bg ( + ) b i h ( + ) d = 0. 0 spa f ( ) ; Zg L spa f' ( ) ; Zg = V. JL j= where W j = spaf j = j= j V j L Wj = V j+ W j = V J+! L (R), J!, ; Zg. 3

Costruct ' (Meyer, Mallat): If b h () is periodic, C ear = 0, ad (i) b h (0) = p ad b h () + b h ( + ) = (ii) if [ =;=] b h () > 0, the Q b' () = p b h j is the Fourier trasformatio of a scalig fuctio ' L that geerates a MRA. 3. Wavelets o the iterval. Haar wavelet ca be modi ed to be a orthogoal bases for L (0; ) ' (x) ; j= j x ; j = 0; ; ; :::; = 0; ; :::; m Some importat developmets: ) Meyer (985), C wavelets. ) Mallat ad Meyer (987), Multiresolutio aalysis which gives a easy way to costruct wavelets ad also a fast algorithm (so called Pyramid algorithm, or Mallat algorithm). 3) Daubechies(988-99), Compactly supported C r wavelets (r is a positive umber), for example, Daubechies wavelets, Symlets, Coi ets. 4) Cohe, Daubechies ad Vial (993), Smooth wavelets o a iterval. 4 Good wavelets: vaishig momets The wavelet is said to have r vaishig momets if Z x (x) dx = 0 = 0; ; ; : : : ; r. Thus is orthogoal to all polyomials of degree r. For istace, r = for Haar wavelet. A fuctio f is said to be C ( > 0) o the iterval I R if there exists a costat C ad for every x I, there is a polyomial p x (y) of degree bc such that jf (x + y) p x (y)j C jyj, x + y I Lemma: If f is C o R ad has at least r = bc + vaishig momets, the f; j c C j(+=) Proof: Usig a chage of variable ad the vaishig momets property, Z f f; j = j= j v + j p j v (v) dv Z C j(+=) jv (v)j dv. 4

The scalig fuctio ' is said to have r vaishig momets if Z x ' (x) dx = 0, = ; ; : : : ; r. For = 0 recall that R x ' (x) dx =. Lemma: If f is C o R ad ' has at least r = bc + vaishig momets, the f; 'j j f j c ' C j(+=). Proof: Usig a chage of variable ad the vaishig momets property, f; ' j j f j Z f = j= j v + j f j p j v ' (v) dv Z C j(+=) jv ' (v)j dv. 5. Discrete wavelet trasformatio Multiresolutio aalysis (MRA) A MRA is a sequece of closed subspaces fv j ; j Zg i L (R) such that i)... V V V 0 V V :::; ii) f (x) V j () f (x) V j+ ; iii) [ jz V j = L (R) ; iv) there is a fuctio ' such that f' ( ) ; Zg is a orthoormal basis for V 0. There is a sequece fh g such that x p ' = h ' (x ) o ad for the sequece fg g = ( ) h we have ad More geerally JL j= where W j = spaf j = j= j See f i V J. The x p = g ' (x ) V j L Wj = V j+ W j = V J+! L (R), J!, ; Zg. f = J ' J = hf; ' J i ' J t f J ' J. 5

We see p j= ' j x = h j= ' j x i.e., (j )= ' j x = h j= ' j x the j ;0 = h j. or more geerally, j ;m = h j;+m. Similarly, (j )= j x = g j= ' j x the j ;0 = g j, or more geerally, j ;m = g j;+m. 6. Besov Balls The Besov sequece orms are de ed as follows. Let f = j0' j0 + j j j Suppose R ad 0 < p = q ad that we write s = + = =p. The b p;q = j0; lp + jsp j j j po =p ad geeral de itios for 0 < p; q is b p;q = j0; lp + jsq p q=p =q jjj. A cotrol of the Besov orm b p;q M is equivalet to js jj j p =p Mj or j j j M 0 j p due to the Berstei-type iequality below, or! () j j M 0 j p 6

Berstei-type Iequality Let j, K be a orthoormal sequece of fuctios satisfyig (i) P R j a j= (ii) max j a j=. The for all p, there exists costats C = a (a =a ) =p ad C = a (a =a ) =p such that for ay sequece = ( ; K) C j(= =p) p p j C j(= =p) p. 7. Nearly adaptive rate miimaxity We observe y i = i + z i ; i = ; ; :::; d; where z i ~N (0; ). De e the soft thresholdig estimator b as follows ^ i = where > 0. We have proved the followig result. Theorem P b p log d y i ; i = ; ; :::; d jy i j + ( + log d) +! d i ^. Questio: If M is a ellipsoid, ( ) (m; M) = f : a i i M, a = a + = () m ; i= i= ca you prove that P b p log C (log ) =(+)? I this lecture, we cosider Besov balls 8 8! 9 >< < q=p b p;q (M) = >: : ej0; lp = + j(+= =p)q j : j j p ; j =q 9 >= M >; If we apply b p log j to estimate ( j ) =;;:::; j i each resolutio, we show sup b p;q (M) P b C (log ) =(+) 7

or equivaletly sup P f b fbp;q (M) f C (log ) =(+). However, liear miimax rate is =(+) with = + = =p whe p < ad q. We will assume p + + = (or + =p = ( + )) for a techical reaso. Proof of early adaptive rate miimaxity Observe that ey j0 = e j0 + = z j0; j = j 0 ; = ; ; :::; j0 y j = j + = z j ; j = j 0 ; j 0 + ; :::; ; = ; ; :::; j where z j ~N (0; ) : I practice, j 0 = 4 or 5, ad we use y j0 to estimate j0; J satis es J =, ad we use 0 to estimate j for j J. Write ^ E = j0 + E bj j j 0j j0 + ( + log ) j ^ + j. Jj 0j jj Uder wea assumptios, it is true that j = o jj =(+) which meas eglectig y j for j J i practice is e. The it is eough to show j ^ C =(+). j 0j j 0j Let s start with p <, which is more excitig. De e j such that j = =(+), the we have j ^ = j ^ + j ^ j 0j<j =(+) + J>jj J>jj j ^. From the de itio of Besov balls, j p M j(+= =p), which implies Jjj j ^ p= j j j p j = O +p= p(+= =p) + = O +. 8

For p, the Jese s iequality gives j! =p j( =p) j j j p jj jj = O j( =p) j(+= =p) = O j, Remar. Liear miimaxity for p <? Let s cosider the case p = q <, the R L Bp;q (M) = R L B; (M) where = + = =p; ad R L B ; (M) =(+). For example, whe = p = q =, we have R L B ; (M) = but the optimal miimax rate is =3. Why? Recall that for p < i each resolutio j R L;j = if E (cy j j ) c R L = j P j j + P j Bp;q (M) t j0 + j 0j j P j j + P j Ad the max of P j uder the l p costrait j p M j(+= M j with = + = =p. =p) is 8. SURE Estimatio Dooho ad Johstoe (995, SURE). Setch of the proof. Cosider the sequece model where y i = i + z i, i = ; :::; d ad z i are idepedet ormal N(0; ) variables. Set r() = d P^. The stei s ubiased estimator of ris gives r(; L) = P (U ()) with P U () = d i= I (jyi j ) + y i I (jy i j ) + I (jy i j > ) = d # fi; jy i j g + P (jy i j ^ ) 9

Whe y(i) < y(i+), U () is a icreasig fuctio of. This implies Propositio: s = arg mi U () f0; jy j ; : : : ; jy jg. P sup ju () 0 p log d r (; L)j C log3= d d = : 9. Fuctioal Estimatio Quadratic fuctioal estimatio Model: Observe the sequece model: y i = i + = z i i:i:d: where z i N (0; ). The model comes from the white oise model (or may other models): dy (t) = f (t) dt + = db (t), t [0; ]. Let f i (t) ; i = ; ; : : :g be a orthoormal basis of L ([0; ]). The white oise problem is the equivalet to the sequece model with y i = h i ; yi ; i = h i ; fi ; ad z i = h i ; Bi i:i:d: N (0; ). Assumptio: Let = ( ; ; : : :). Assume ( ;M = : ) i i M. i= If the orthoormal basis is the Fourier basis, this assumptio correspods to the periodic Sobolev ball with smoothess. Now we may write the sequece model as P = fp ; ; ;M g where P ; is the joit distributio of idepedet y i N ( i ; =). Problem: Estimate Q = P i= i ( or R f with f = P i= i i ), ad determie the optimal miimax rate satisfyig lim if! bq sup P Q b Q : Solutio: Step : Achievig the optimal rate. De e m bq m = yi ; i= where m will be speci ed later. We ow P b Q m = 0 m i= i

ad var bqm = = = m i= m i= m i= = m var yi var z i p z i i var z i + 4 m i. i= + m i= var p z i i The ad Choose m such that i.e., 0 bias @ im+ i A M m 4 var bqm m + 4M. m = M m 4 M =(+4) m = =(+4). Thus we have a upper boud of the mea square error P bq Q M m 4 + 4M 8=(+4) +. So the optimal miimax covergece rate is = whe =4, or 8=(+4) whe < =4. Step : Ca ot do better. We tae P 0 = fp, = 0g ad P = fp ; sub g with sub = f : ji j = a; i m; i = 0; i m + g. where a ad m will be speci ed later. Simple calculatios give m i i m + a, i= m i i=! = m a 4, 4 i = ma 4. i=

Let m = =(4+) ad a = c (=) (+)=(4+). The m i i c ; i=! = c 4 (=) 8=(4+) ; 4 i = c 4 i i= i= Let c be a small costat, the sub ;M, ad the a ity P 0 P c for some positive costat c (why?). So (More details i class). sup P bq Q! i i= c (=) 8=(4+). 0. High dimesioal estimatio P 0 P