Wavelet Methods for Time Series Analysis. Part III: Wavelet-Based Signal Extraction and Denoising

Size: px

Start display at page:

Download "Wavelet Methods for Time Series Analysis. Part III: Wavelet-Based Signal Extraction and Denoising"

Lynne Stevenson
5 years ago
Views:

1 Wavelet Methods for Time Series Analysis Part III: Wavelet-Based Signal Extraction and Denoising overview of key ideas behind wavelet-based approach description of four basic models for signal estimation discussion of why wavelets can help estimate certain signals simple thresholding & shrinkage schemes for signal estimation wavelet-based thresholding and shrinkage discuss some extensions to basic approach III 1

2 Wavelet-Based Signal Estimation: I DWT analysis of X yields W = WX DWT synthesis X = W T W yields multiresolution analysis by splitting W T W into pieces associated with different scales DWT synthesis can also estimate signal hidden in X if we can modify W to get rid of noise in the wavelet domain if W is a noise reduced version of W, can form signal estimate via W T W WMTSA: 393 III 2

3 Wavelet-Based Signal Estimation: II key ideas behind simple wavelet-based signal estimation certain signals can be efficiently described by the DWT using all of the scaling coefficients a small number of large wavelet coefficients noise is manifested in a large number of small wavelet coefficients can either threshold or shrink wavelet coefficients to eliminate noise in the wavelet domain key ideas led to wavelet thresholding and shrinkage proposed by Donoho, Johnstone and coworkers in 199s WMTSA: III 3

4 Models for Signal Estimation: I will consider two types of signals: 1. D, an N dimensional deterministic signal 2. C, an N dimensional stochastic signal; i.e., a vector of random variables (RVs) with covariance matrix Σ C will consider two types of noise: 1., an N dimensional vector of independent and identically distributed (IID) RVs with mean and covariance matrix Σ = σ 2 I N 2. η, an N dimensional vector of non-iid RVs with mean and covariance matrix Ση one form: RVs independent, but have different variances another form of non-iid: RVs are correlated WMTSA: III 4

5 Models for Signal Estimation: II leads to four basic signal + noise models for X 1. X = D + 2. X = D + η 3. X = C + 4. X = C + η in the latter two cases, the stochastic signal C is assumed to be independent of the associated noise WMTSA: III 5

6 Signal Representation via Wavelets: I consider deterministic signals D first signal estimation problem is simplified if we can assume that the important part of D is in its large values assumption is not usually viable in the original (i.e., time domain) representation D, but might be true in another domain an orthonormal transform O might be useful because O = OD is equivalent to D (since D = O T O) we might be able to find O such that the signal is isolated in M ø N large transform coefficients Q: how can we judge whether a particular O might be useful for representing D? WMTSA: 394 III 6

7 Signal Representation via Wavelets: II let O j be the jth transform coefficient in O = OD let O (), O (1),..., O (N 1) be the O j s reordered by magnitude: O () O (1) O (N 1) example: if O = [ 3, 1, 4, 7, 2, 1] T, then O () = O 3 = 7, O (1) = O 2 = 4, O (2) = O = 3 etc. define a normalized partial energy sequence (NPES): P M 1 j= C M 1 O (j) 2 energy in largest M terms P N 1 j= O = (j) 2 total energy in signal let I M be N N diagonal matrix whose jth diagonal term is 1 if O j is one of the M largest magnitudes and is otherwise WMTSA: III 7

8 Signal Representation via Wavelets: III form b D M O T I M O, which is an approximation to D when O = [ 3, 1, 4, 7, 2, 1] T and M = 3, we have 1 3 I 3 = 1 1 and thus D b M = O T 4 7 one interpretation for NPES: C M 1 = 1 kd b D M k 2 kdk 2 = 1 relative approximation error WMTSA: III 8

9 Signal Representation via Wavelets: IV 1 D 1 D 2 D t t t consider three signals plotted above D 1 is a sinusoid, which can be represented succinctly by the discrete Fourier transform (DFT) D 2 is a bump (only a few nonzero values in the time domain) D 3 is a linear combination of D 1 and D 2 WMTSA: III 9

10 Signal Representation via Wavelets: V consider three different orthonormal transforms identity transform I (time) the orthonormal DFT F (frequency), where F has (k, t)th element exp( i2πtk/n)/ N for k, t N 1 the LA(8) DWT W (wavelet) # of terms M needed to achieve relative error < 1%: D 1 D 2 D 3 DFT identity LA(8) wavelet WMTSA: III 1

11 Signal Representation via Wavelets: VI 1 D 1 D 2 D M M M use NPESs to see how well these three signals are represented in the time, frequency (DFT) and wavelet (LA(8)) domains time (solid curves), frequency (dotted) and wavelet (dashed) WMTSA: III 11

12 Signal Representation via Wavelets: VII example: vertical ocean shear time series has frequency-domain fluctuations also has time-domain turbulent activity next overhead shows the signal D itself its approximation b D 1 from 1 LA(8) DWT coefficients b D 3 from 3 LA(8) DWT coefficients, giving C 299. =.9983 b D 3 from 3 DFT coefficients, giving C 299. =.9973 note that 3 coefficients is less than 5% of N = 6784! WMTSA: III 12

13 Signal Representation via Wavelets: VIII 7 D 7 7 LA(8) b D LA(8) b D DFT b D meters need 123 additional ODFT coefficients to match C 299 for DWT WMTSA: III 13

14 Signal Representation via Wavelets: IX 4 DFT LA(8) DWT X 4 bd 5 4 bd 1 4 bd 2 4 bd t t 2nd example: DFT b D M (left-hand column) & J = 6 LA(8) DWT b D M (right) for NMR series X (A. Maudsley, UCSF) WMTSA: III 14

15 Signal Estimation via Thresholding: I assume model of deterministic signal plus IID noise: X = D + let O be an N N orthonormal matrix form O = OX = OD + O d + e component-wise, have O l = d l + e l define signal to noise ratio (SNR): kdk 2 E{k k 2 } = assume that SNR is large kdk2 E{kek 2 } = P N 1 l= d2 l P N 1 l= E{e2 l } assume that d has just a few large coefficients; i.e., large signal coefficients dominate O WMTSA: 398 III 15

16 Signal Estimation via Thresholding: II recall simple estimator D b M O T I M O and previous example: 1 O O O 1 bd M = O T 1 O 2 1 O 3 = O T O 2 O 3 O 4 O 5 let J m be a set of m indices corresponding to places where jth diagonal element of I m is 1 in example above, we have J 3 = {, 2, 3} strategy in forming b D M is to keep a coefficient O j if j J m but to replace it with if j 6 J m ( kill or keep strategy) WMTSA: 398 III 16

17 Signal Estimation via Thresholding: III can pose a simple optimization problem whose solution 1. is a kill or keep strategy (and hence justifies this strategy) 2. dictates that we use coefficients with the largest magnitudes 3. tells us what M should be (once we set a certain parameter) optimization problem: find b D M such that γ m kx b D m k 2 + mδ 2 is minimized over all possible I m, m =,..., N in the above δ 2 is a fixed parameter (set a priori) WMTSA: 398 III 17

18 Signal Estimation via Thresholding: IV kx b D m k 2 is a measure of fidelity rationale for this term: under our assumption of a high SNR, bd m shouldn t stray too far from X fidelity increases (the measure decreases) as m increases in minimizing γ m, consideration of this term alone suggests that m should be large mδ 2 is a penalty for too many terms rationale: heuristic says d has only a few large coefficients penalty increases as m increases in minimizing γ m, consideration of this term alone suggests that m should be small optimization problem: balance off fidelity & parsimony WMTSA: 398 III 18

19 Signal Estimation via Thresholding: V claim: γ m = kx b D m k 2 + mδ 2 is minimized when m is set to the number of coefficients O j such that O 2 j > δ2 proof of claim: since X = O T O & b D m O T I m O, have γ m = kx b D m k 2 + mδ 2 = ko T O O T I m Ok 2 + mδ 2 = ko T (I N I m )Ok 2 + mδ 2 = k(i N I m )Ok 2 + mδ 2 = X Oj 2 + X δ 2 j6 J m j J m for any given j, if j 6 J m, we contribute Oj 2 to first sum; on the other hand, if j J m, we contribute δ 2 to second sum to minimize γ m, we need to put j in J m if O 2 j > δ2, thus establishing the claim WMTSA: 398 III 19

20 Thresholding Functions: I more generally, thresholding schemes involve 1. computing O OX 2. defining O (t) as vector with lth element O (t) l = (, if O l δ; some nonzero value, where nonzero values are yet to be defined 3. estimating D via b D (t) O T O (t) otherwise, simplest scheme is hard thresholding ( kill/keep strategy): ( O (ht), if O l = l δ; O l, otherwise. WMTSA: 399 III 2

21 Thresholding Functions: II plot shows mapping from O l to O (ht) l 3δ 2δ δ O (ht) l δ 2δ 3δ 3δ 2δ δ δ 2δ 3δ O l WMTSA: 399 III 21

22 Thresholding Functions: III alternative scheme is soft thresholding: where sign {O l } O (st) l = sign {O l } ( O l δ) +, +1, if O l > ;, if O l = ; 1, if O l <. and (x) + ( x, if x ;, if x <. one rationale for soft thresholding is that it fits into Stein s class of estimators (will discuss later on) WMTSA: III 22

23 Thresholding Functions: IV here is the mapping from O l to O (st) l 3δ 2δ δ O (st) l δ 2δ 3δ 3δ 2δ δ δ 2δ 3δ O l WMTSA: III 23

24 Thresholding Functions: V third scheme is mid thresholding: where O (mt) l = sign {O l } ( O l δ) ++, ( O l δ) ++ ( 2( O l δ) +, if O l < 2δ; O l, otherwise provides compromise between hard and soft thresholding WMTSA: III 24

25 Thresholding Functions: VI here is the mapping from O l to O (mt) l 3δ 2δ δ O (mt) l δ 2δ 3δ 3δ 2δ δ δ 2δ 3δ O l WMTSA: III 25

26 Thresholding Functions: VII example of mid thresholding with δ = 1 3 X t O l O (mt) l bd (mt) t 3 64 t or l III 26

27 Universal Threshold: I Q: how do we go about setting δ? specialize to IID Gaussian noise with covariance σ 2 I N can argue e O is also IID Gaussian with covariance σ 2 I N Donoho & Johnstone (1995) proposed δ (u) [2σ 2 log(n)] ( log here is log base e ) rationale for δ (u) : because of Gaussianity, can argue that 1 as N [4π log (N)] P max{ e l } > δ (u) l h max l { e l } δ (u)i 1 as N, so no noise and hence P will exceed threshold in the limit WMTSA: 4 42 III 27

28 Universal Threshold: II suppose D is a vector of zeros so that O l = e l implies that O (ht) = with high probability as N hence will estimate correct D with high probability critique of δ (u) : consider lots of IID Gaussian series, N = 128: only 13% will have any values exceeding δ (u) δ (u) is slanted toward eliminating vast majority of noise, but, if we use, e.g., hard thresholding, any nonzero signal transform coefficient of a fixed magnitude will eventually get set to as N nonetheless: δ (u) works remarkably well WMTSA: 4 42 III 28

29 Minimum Unbiased Risk: I second approach for setting δ is data-adaptive, but only works for selected thresholding functions assume model of deterministic signal plus non-iid noise: X = D + η so that O OX = OD + Oη d + n component-wise, have O l = d l + n l further assume that n l is an N (, σ 2 n l ) RV, where σ 2 n l is assumed to be known, but we allow the possibility that n l s are correlated let O (δ) l be estimator of d l based on a (yet to be determined) threshold δ want to make E{(O (δ) l d l ) 2 } as small as possible WMTSA: III 29

30 Minimum Unbiased Risk: II Stein (1981) considered estimators restricted to be of the form O (δ) l = O l + A (δ) (O l ), where A (δ) ( ) must be weakly differentiable (basically, piecewise continuous plus a bit more) since O l = d l + n l, above yields O (δ) l d l = n l + A (δ) (O l ), so E{(O (δ) l d l ) 2 } = σ 2 n l + 2E{n l A (δ) (O l )} + E{[A (δ) (O l )] 2 } because of Gaussianity, can reduce middle term: ( E{n l A (δ) (O l )} = σn 2 d l E dx A(δ) (x) Ø Ø x=ol ) WMTSA: III 3

31 Minimum Unbiased Risk: III practical scheme: given realizations o l of O l, find δ minimizing estimate of N 1 X E (O (δ) l d l ) 2, which, in view of l= E{(O (δ) l d l ) 2 } = σn 2 l +2σn 2 l E is N 1 X l= R(σ nl, o l, δ) N 1 X l= ( d dx A(δ) (x) Ø Ø x=ol ) +E{[A (δ) (O l )] 2 }, σ 2 n l + 2σ 2 n l d dx A(δ) (o l ) + [A (δ) (o l )] 2 for a given δ, above is Stein s unbiased risk estimator (SURE) WMTSA: 44 III 31

32 example: if we set Minimum Unbiased Risk: IV A (δ) (O l ) = ( O l, if O l < δ; δ sign{o l }, if O l δ, we obtain O (δ) l = O l + A (δ) (O l ) = O (st) l, i.e., soft thresholding for this case, can argue that R(σ nl, O l, δ) = O 2 l σ2 n l + (2σ 2 n l O 2 l + δ2 )1 [δ 2, ) (O2 l ), where 1 [δ 2, ) (x) ( 1, if δ 2 x < ;, otherwise. only the last term depends on δ, and, as a function of δ, SURE is minimized when last term is minimized WMTSA: III 32

33 Minimum Unbiased Risk: V data-adaptive scheme is to replace O l with its realization, say o l, and to set δ equal to the value, say δ (S), minimizing N 1 X (2σn 2 l o 2 l + δ2 )1 [δ 2, ) (o2 l ), l= must have δ (S) = o l for some l, so minimization is easy if n l have a common variance, i.e., σn 2 l = σ 2 for all l, need to find minimizer of the following function of δ: N 1 X (2σ 2 o2 l + δ2 )1 [δ 2, ) (o2 l ), l= (in practice, σ 2 is usually unknown, so later on we will consider how to estimate this also) WMTSA: III 33

34 Signal Estimation via Shrinkage so far, we have only considered signal estimation via thresholding rules, which will map some O l to zeros will now consider shrinkage rules, which differ from thresholding only in that nonzero coefficients are mapped to nonzero values rather than exactly zero (but values can be very close to zero!) there are three approaches that lead us to shrinkage rules 1. linear mean square estimation 2. conditional mean and median 3. Bayesian approach will only consider 1 and 2, but one form of Bayesian approach turns out to be identical to 2 III 34

35 Linear Mean Square Estimation: I assume model of stochastic signal plus non-iid noise: X = C + η so that O = OX = OC + Oη R + n component-wise, have O l = R l + n l assume C and η are multivariate Gaussian with covariance matrices Σ C and Ση implies R and n are also Gaussian RVs, but now with covariance matrices OΣ C O T and OΣηO T assume that E{R l } = for any component of interest and that R l & n l are uncorrelated suppose we estimate R l via a simple scaling of O l : br l a l O l, where a l is a constant to be determined WMTSA: 47 III 35

36 Linear Mean Square Estimation: II let us select a l by making E{(R l b R l ) 2 } as small as possible, which occurs when we set a l = E{R lo l } E{O 2 l } because R l and n l are uncorrelated with means and because O l = R l + n l, we have E{R l O l } = E{R 2 l } and E{O2 l } = E{R2 l } + E{n2 l }, yielding br l = E{R 2 l } E{R 2 l } + E{n2 l }O l = σ 2 R l σ 2 R l + σ 2 n l O l note: optimum a l shrinks O l toward zero, with shrinkage increasing as the noise variance increases WMTSA: III 36

37 Background on Conditional PDFs: I let X and Y be RVs with probability density functions (PDFs) f X ( ) and f Y ( ) let f X,Y (x, y) be their joint PDF at the point (x, y) f X ( ) and f Y ( ) are called marginal PDFs and can be obtained from the joint PDF via integration: f X (x) = Z f X,Y (x, y) dy the conditional PDF of Y given X = x is defined as f Y X=x (y) = f X,Y (x, y) f X (x) (read as given or conditional on ) WMTSA: III 37

38 Background on Conditional PDFs: II by definition RVs X and Y are said to be independent if in which case f X,Y (x, y) = f X (x)f Y (y), f Y X=x (y) = f X,Y (x, y) f X (x) = f X(x)f Y (y) f X (x) = f Y (y) thus X and Y are independent if knowing X doesn t allow us to alter our probabilistic description of Y f Y X=x ( ) is a PDF, so its mean value is E{Y X = x} = Z yf Y X=x (y) dy; the above is called the conditional mean of Y, given X WMTSA: 26 III 38

39 Background on Conditional PDFs: III suppose RVs X and Y are related, but we can only observe X suppose we want to approximate the unobservable Y based on some function of the observable X example: we observe part of a time series containing a signal buried in noise, and we want to approximate the unobservable signal component based upon a function of what we observed suppose we want our approximation to be the function of X, say U 2 (X), such that the mean square difference between Y and U 2 (X) is as small as possible; i.e., we want to be as small as possible E{(Y U 2 (X)) 2 } WMTSA: III 39

40 Background on Conditional PDFs: IV solution is to use U 2 (X) = E{Y X}; i.e., the conditional mean of Y given X is our best guess at Y in the sense of minimizing the mean square error (related to fact that E{(Y a) 2 } is smallest when a = E{Y }) on the other hand, suppose we want the function U 1 (X) such that the mean absolute error E{ Y U 1 (X) } is as small as possible the solution now is to let U 1 (X) be the conditional median; i.e., we must solve Z U1 (x) f Y X=x (y) dy =.5 to figure out what U 1 (x) should be when X = x WMTSA: III 4

41 Conditional Mean and Median Approach: I assume model of stochastic signal plus non-iid noise: X = C + η so that O = OX = OC + Oη R + n component-wise, have O l = R l + n l because C and η are independent, R and n must be also suppose we approximate R l via b R l U 2 (O l ), where U 2 (O l ) is selected to minimize E{(R l U 2 (O l )) 2 } solution is to set U 2 (O l ) equal to E{R l O l }, so let s work out what form this conditional mean takes to get E{R l O l }, need the PDF of R l given O l, which is f Rl O l =o l (r l ) = f R l,o l (r l, o l ) f Ol (o l ) WMTSA: III 41

42 Conditional Mean and Median Approach: II joint PDF of R l and O l related to the joint PDF f Rl,n l (, ) of R l and n l via f Rl,O l (r l, o l ) = f Rl,n l (r l, o l r l ) = f Rl (r l )f nl (o l r l ), with the 2nd equality following since R l & n l are independent marginal PDF for O l can be obtained from joint PDF f Rl,O l (, ) by integrating out the first argument: f Ol (o l ) = Z f Rl,O l (r l, o l ) dr l = Z f Rl (r l )f nl (o l r l ) dr l putting all these pieces together yields the conditional PDF f Rl O l =o l (r l ) = f R l,o l (r l, o l ) f Rl (r l )f nl (o l r l ) = R f Ol (o l ) f Rl (r l )f nl (o l r l ) dr l WMTSA: III 42

43 Conditional Mean and Median Approach: III mean value of f Rl O l =o l ( ) yields estimator b R l = E{R l O l }: E{R l O l = o l } = = Z r l f Rl O l =o l (r l ) dr l R r l f Rl (r l )f nl (o l r l )dr l R f Rl (r l )f nl (o l r l ) dr l to make further progress, we need a model for the waveletdomain representation R l of the signal heuristic that signal in the wavelet domain has a few large values and lots of small values suggests a Gaussian mixture model WMTSA: 41 III 43

44 Conditional Mean and Median Approach: IV let I l be an RV such that P [I l = 1] = p l & P [I l = ] = 1 p l under Gaussian mixture model, R l has same distribution as I l N (, γ 2 l σ2 G l ) + (1 I l )N (, σ 2 G l ) where N (, σ 2 ) is a Gaussian RV with mean and variance σ 2 2nd component models small # of large signal coefficients 1st component models large # of small coefficients (γ 2 l ø 1) example: PDFs for case σ 2 G l = 1, γ 2 l σ2 G l = 1 and p l = WMTSA: 41 III 44

45 Conditional Mean and Median Approach: V to complete model, let n l obey a Gaussian distribution with mean and variance σn 2 l conditional mean estimator of the signal RV R l is given by E{R l O l = o l } = a la l (o l ) + b l B l (o l ) o A l (o l ) + B l (o l ) l, where a l A l (o l ) B l (o l ) γl 2σ2 G l γl 2σ2 G + σ 2 and b l σ2 Gl l n l σg 2 + σ 2 l n l p l (2π[γ 2 l σg 2 + σ 2 l n l ]) e o2 l /[2(γ2 l σ2 G +σ 2 l n )] l 1 p l (2π[σ 2 Gl + σ 2 n l ]) e o2 l /[2(σ2 G l +σ 2 n l )] WMTSA: III 45

46 Conditional Mean and Median Approach: VI let s simplify to a sparse signal model by setting γ l = ; i.e., large # of small coefficients are all zero distribution for R l same as (1 I l )N (, σ 2 G l ) conditional mean estimator becomes E{R l O l = o l } = where c l = p l (σ 2 Gl + σn 2 l ) e o2 l b l/(2σn 2 ) l (1 p l )σ nl b l 1+c l o l, WMTSA: 411 III 46

47 Conditional Mean and Median Approach: VII 6 3 E{R l O l = o l } o l conditional mean shrinkage rule for p l =.95 (i.e., 95% of signal coefficients are ); σ 2 n l = 1; and σ 2 G l = 5 (curve furthest from dotted diagonal), 1 and 25 (curve nearest to diagonal) as σ 2 G l gets large (i.e., large signal coefficients increase in size), shrinkage rule starts to resemble mid thresholding rule WMTSA: III 47

48 Conditional Mean and Median Approach: VIII now suppose we estimate R l via b R l = U 1 (O l ), where U 1 (O l ) is selected to minimize E{ R l U 1 (O l ) } solution is to set U 1 (o l ) to the median of the PDF for R l given O l = o l to find U 1 (o l ), need to solve for it in the equation Z U1 (o l ) f Rl O l =o l (r l ) dr l = R U1 (o l ) f R l (r l )f nl (o l r l ) dr l R = 1 f Rl (r l )f nl (o l r l ) dr l 2 WMTSA: III 48

49 Conditional Mean and Median Approach: IX simplifying to the sparse signal model, Godfrey & Rocca (1981) show that (, if O U 1 (O l ) l δ; b l O l, otherwise, where δ = σ nl 2 log µ pl σ Gl (1 p l )σ nl 1/2 and b l = σ 2 G l σ 2 G l + σ 2 n l above approximation valid if p l /(1 p l ) σ 2 n l /(σ Gl δ) and σ 2 G l σ 2 n l note that U 1 ( ) is approximately a hard thresholding rule WMTSA: III 49

50 Wavelet-Based Thresholding assume model of deterministic signal plus IID Gaussian noise with mean and variance σ 2 : X = D + using a DWT matrix W, form W = WX = WD+W d+e because IID Gaussian, so is e Donoho & Johnstone (1994) advocate the following: form partial DWT of level J : W 1,..., W J and V J threshold W j s but leave V J alone (i.e., administratively, all N/2 J scaling coefficients assumed to be part of d) use universal threshold δ (u) = [2σ 2 log(n)] use thresholding rule to form W (t) j (hard, etc.) estimate D by inverse transforming W (t) 1,..., W(t) J and V J WMTSA: III 5

51 MAD Scale Estimator: I procedure assumes σ is know, which is not usually the case if unknown, use median absolute deviation (MAD) scale estimator to estimate σ using W 1 median { W 1,, W 1,1,..., W 1, N ˆσ (mad) 2 1 }.6745 heuristic: bulk of W 1,t s should be due to noise.6745 yields estimator such that E{ˆσ (mad) } = σ when W 1,t s are IID Gaussian with mean and variance σ 2 designed to be robust against large W 1,t s due to signal WMTSA: 42 III 51

52 MAD Scale Estimator: II example: suppose W 1 has 7 small noise coefficients & 2 large signal coefficients (say, a & b, with 2 ø a < b ): W 1 = [1.23, 1.72,.8,.1, a,.3,.67, b, 1.33] T ordering these by their magnitudes yields.1,.3,.67,.8, 1.23, 1.33, 1.72, a, b median of these absolute deviations is 1.23, so ˆσ (mad) = 1.23/ = 1.82 ˆσ (mad) not influenced adversely by a and b; i.e., scale estimate depends largely on the many small coefficients due to noise WMTSA: 42 III 52

53 Examples of DWT-Based Thresholding: I 4 X 4 bd (ht), LA(8) 4 bd (ht), D(4) t top plot: NMR spectrum X middle: signal estimate using J = 6 partial LA(8) DWT with hard thresholding and universal threshold level estimated by ˆδ (u) = [2ˆσ 2 (mad) log (N)]. = 6.13 bottom: same, but now using D(4) DWT with ˆδ (u). = 6.49 WMTSA: 418 III 53

54 Examples of DWT-Based Thresholding: II 4 bd (ht), LA(8) 4 bd (st), LA(8) 4 bd (mt), LA(8) t top: signal estimate using J = 6 partial LA(8) DWT with hard thresholding (repeat of middle plot of previous overhead) middle: same, but now with soft thresholding bottom: same, but now with mid thresholding WMTSA: 419 III 54

55 Examples of MODWT-Based Thresholding 4 ed (ht), LA(8) 4 ed (st), LA(8) 4 ed (mt), LA(8) t as in previous overhead, but using MODWT rather than DWT because of MODWT filters are normalized differently, universal threshold must be adjusted for each level: δ (u) j [2 σ 2 (mad) log (N)/2j ]. = 6.5/2 j/2 results are identical to what cycle spinning would yield WMTSA: III 55

56 VisuShrink Donoho & Johnstone (1994) recipe with soft thresholding is known as VisuShrink (but really thresholding, not shrinkage) rather than using the universal threshold, can also determine δ for VisuShrink by finding value ˆδ (S) that minimizes SURE, i.e., J X j=1 N j 1 X t= (2ˆσ 2 (mad) W 2 j,t + δ2 )1 [δ 2, ) (W 2 j,t ), as a function of δ, with σ 2 estimated via MAD WMTSA: III 56

57 Examples of DWT-Based Thresholding: III 4 ˆδ (S). = ˆδ (S). = t top: VisuShrink estimate based upon level J = 6 partial LA(8) DWT and SURE with MAD estimate based upon W 1 bottom: same, but now with MAD estimate based upon W 1, W 2,..., W 6 (the common variance in SURE is assumed common to all wavelet coefficients) resulting signal estimate of bottom plot is less noisy than for top plot WMTSA: III 57

58 Wavelet-Based Shrinkage: I assume model of stochastic signal plus Gaussian IID noise: X = C + so that W = WX = WC + W R + e component-wise, have W j,t = R j,t + e j,t form partial DWT of level J, shrink W j s, but leave V J alone assume E{R j,t } = (reasonable for W j, but not for V J ) use a conditional mean approach with the sparse signal model R j,t has distribution dictated by (1 I j,t )N (, σg 2 ), where P I j,t = 1 = p and P I j,t = = 1 p R j,t s are assumed to be IID model for e j,t is Gaussian with mean and variance σ 2 note: parameters do not vary with j or t WMTSA: 424 III 58

59 Wavelet-Based Shrinkage: II model has three parameters σ 2 G, p and σ2, which need to be set let σ 2 R and σ2 W be variances of RVs R j,t and W j,t have relationships σ 2 R = (1 p)σ2 G and σ2 W = σ2 R + σ2 set ˆσ 2 = ˆσ 2 (mad) using W 1 let ˆσ 2 W be sample mean of all W 2 j,t given p, let ˆσ 2 G = (ˆσ2 W ˆσ2 )/(1 p) p usually chosen subjectively, keeping in mind that p is proportion of noise-dominated coefficients (can set based on rough estimate of proportion of small coefficients) WMTSA: III 59

60 Examples of Wavelet-Based Shrinkage 4 4 p =.9 p = t p =.99 shrinkage signal estimates of the NMR spectrum based upon the level J = 6 partial LA(8) DWT and the conditional mean with p =.9 (top plot),.95 (middle) and.99 (bottom) as p 1, we declare there are proportionately fewer significant signal coefficients, implying need for heavier shrinkage WMTSA: 425 III 6

61 Comments on Next Generation Denoising: I t (seconds) R T 2 V 6 T 3 W 6 T 3 W 5 T 3 W 4 T 3 W 3 T 2 W 2 T 2 W 1 classical denoising looks at each W j,t alone; for real world signals, coefficients often cluster within a given level and persist across adjacent levels (ECG series offers an example) X WMTSA: 45 III 61

62 Comments on Next Generation Denoising: II here are some next generation approaches that exploit these real world properties: Crouse et al. (1998) use hidden Markov models for stochastic signal DWT coefficients to handle clustering, persistence and non-gaussianity Huang and Cressie (2) consider scale-dependent multiscale graphical models to handle clustering and persistence Cai and Silverman (21) consider block thesholding in which coefficients are thresholded in blocks rather than individually (handles clustering) Dragotti and Vetterli (23) introduce the notion of wavelet footprints to track discontinuities in a signal across different scales (handles persistence) WMTSA: III 62

63 Comments on Next Generation Denoising: III classical denoising also suffers from problem of overall significance of multiple hypothesis tests next generation work integrates idea of false discovery rate (Benjamini and Hochberg, 1995) into denoising (see Wink and Roerdink, 24, for an applications-oriented discussion) for more recent developments (there are a lot!!!), see review article by Antoniadis (27) Chapters 3 and 4 of book by Nason (28) October 29 issue of Statistica Sinica, which has a special section entitled Multiscale Methods and Statistics: A Productive Marriage III 63

64 References A. Antoniadis (27), Wavelet Methods in Statistics: Some Recent Developments and Their Applications, Statistical Surveys, 1, pp Y. Benjamini and Y. Hochberg (1995), Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society, Series B, 57, pp T. Cai and B. W. Silverman (21), Incorporating Information on Neighboring Coefficients into Wavelet Estimation, Sankhya Series B, 63, pp M. S. Crouse, R. D. Nowak and R. G. Baraniuk (1998), Wavelet-Based Statistical Signal Processing Using Hidden Markov Models, IEEE Transactions on Signal Processing, 46, pp P. L. Dragotti and M. Vetterli (23), Wavelet Footprints: Theory, Algorithms, and Applications, IEEE Transactions on Signal Processing, 51, pp H. C. Huang and N. Cressie (2), Deterministic/Stochastic Wavelet Decomposition for Recovery of Signal from Noisy Data, Technometrics, 42, pp G. P. Nason (28), Wavelet Methods in Statistics with R, Springer, Berlin III 64

65 A. M. Wink and J. B. T. M. Roerdink (24), Denoising Functional MR Images: A Comparison of Wavelet Denoising and Gaussian Smoothing, IEEE Transactions on Medical Imaging, 23(3), pp III 65

Wavelet Methods for Time Series Analysis. Part II: Wavelet-Based Statistical Analysis of Time Series

Wavelet Methods for Time Series Analysis Part II: Wavelet-Based Statistical Analysis of Time Series topics to covered: wavelet variance (analysis phase of MODWT) wavelet-based signal extraction (synthesis