Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm

Sparse Representation and the K-SV Algorithm he CS epartment he echnion Israel Institute of technology Haifa 3, Israel University of Erlangen - Nürnberg April 8 Noise Removal? Our story begins with image denoising Practical application Remove Additive Noise? A convenient platform (being the simplest inverse problem) for testing basic ideas in image processing. enoising By Energy imization he Evolution Of Pr() any of the proposed denoising algorithms are related to the minimization of an energy function of the form f y : Given measurements : Unnown to be recovered ( ) y + Pr( ) Sanity (relation to measurements) his is in-fact a Bayesian point of view, adopting the aimum-aposteriori Probability (AP) estimation. Prior or regularization Clearly, the wisdom in such an approach is within the choice of the prior modeling the images of interest. homas Bayes 7-76 uring the past several decades we have made all sort of guesses about the prior Pr() for images: Pr Pr ( ) ( ) λ Energy λ otal- Variation Pr Pr ( ) λ L Smoothness ( ) λ W Wavelet Sparsity Pr ( ) ( ) λ B Pr λ L W Adapt+ Smooth Bilateral Filter Pr Pr ( ) λρ{ L} ( ) Robust Statistics λ for Sparse & Redundant 3 4

Agenda Generating Signals in Sparseland Sparseland. A Visit to Sparseland Sparseland Introducing sparsity & overcompleteness. ransforms & Regularizations How & why should this wor? 3. What about the dictionary? he quest for the origin of signals 4. Putting it all together Image filling, denoising, compression, W elcom e to Sparseland N K Fied ictionary Sparse & random vector N Every column in (dictionary) is a prototype signal (atom). he vector is generated randomly with few non-zeros in random locations and with random values. 5 6 Sparseland Sparseland Signals Are Special Sparseland? ransforms in Sparseland ultiply by Simple: Every signal is built as a linear combination of a few atoms from the dictionary. Effective: Recent wors adopt this model and successfully deploy it to applications. Empirically established: Neurological studies show similarity between this model and early vision processes. [Olshausen & Field ( 96)] Assume that is nown to emerge from.. How about "Given, find the that generated it in "? R N K R 7 8

Problem Statement easure of Sparsity? We need to solve an under-determined linear system of equations: p p p j p f ( ) j Among all (infinitely many) possible solutions we want the sparsest!! We will measure sparsity using the L "norm": Known As p we get a count of the non-zeros in the vector p p p - + p p p < 9 Where We Are Sparseland? Inverse Problems in Sparseland A sparse & random vector ultiply by s.t. ˆ Assume that is nown to emerge from. Suppose we observe y H + v, a degraded and noisy version of with v ε. How do we recover? How about "find the that generated y "? 3 ajor Questions Is ˆ? NP-hard: practical ways to get ˆ? How do we now? N R H Noise y R Q ˆ R K

Inverse Problem Statement Agenda A sparse & random vector 3 ajor Questions (again!) ultiply by "blur" by H Is ˆ? v y H + ˆ v How can we compute? What should we use? y H s.t. ε ˆ. A Visit to Sparseland Introducing sparsity & overcompleteness. ransforms & Regularizations How & why should this wor? 3. What about the dictionary? he quest for the origin of signals 4. Putting it all together Image filling, denoising, compression, Q 3 4 he Sparse Coding Problem Our dream for now: Find the sparsest solution to P Known Put formally, : s.t. nown Question Uniqueness? ultiply by s.t. Suppose we can solve this eactly Why should we necessarily get ˆ? It might happen that eventually ˆ <. ˆ 5 6

atri "Spar" efinition: Given a matri, σspar{} is the smallest and and number of columns that are linearly dependent. By definition, if v then v σ Say I have and you have, and the two are different representations of the same : ( ) σ onoho & Elad ( ) Uniqueness Rule σ Now, what if my satisfies <? σ he rule σ implies that! > Uniqueness onoho & Elad ( ) If we have a representation that satisfies σ > then necessarily it is the sparsest. So, if generates signals using "sparse enough", the solution of P : s.t. will find them eactly. 7 8 Question Practical P Solver? atching Pursuit (P) allat & Zhang (993) ultiply by s.t. ˆ he P is a greedy algorithm that finds one atom at a time. Step : find the one atom that best matches the signal. Net steps: given the previously found atoms, find the net one to best fit Are there reasonable ways to find ˆ? he Orthogonal P (OP) is an improved version that re-evaluates the coefficients after each round. 9

Basis Pursuit (BP) Chen, onoho, & Saunders (995) Question 3 Appro. Quality? Instead of solving s.t. Solve this: s.t. he newly defined problem is conve (linear programming). ultiply by P/BP ˆ Very efficient solvers can be deployed: Interior point methods [Chen, onoho, & Saunders (`95)], Iterated shrinage [Figuerido & Nowa (`3), aubechies, efrise, & emole ( 4), Elad (`5), Elad, atalon, & Zibulevsy (`6)]. How effective are P/BP? BP and P Performance Agenda onoho & Elad ( ) Gribonval & Nielsen ( 3) ropp ( 3) emlyaov ( 3) Given a signal with a representation, if < ( some threshold) then BP and P are guaranteed to find it. P and BP are different in general (hard to say which is better). he above results correspond to the worst-case. Average performance results available too, showing much better bounds [onoho (`4), Candes et.al. (`4), anner et.al. (`5), ropp et.al. (`6)]. Similar results for general inverse problems [onoho, Elad & emlyaov (`4), ropp (`4), Fuchs (`4), Gribonval et. al. (`5)].. A Visit to Sparseland Introducing sparsity & overcompleteness. ransforms & Regularizations How & why should this wor? 3. What about the dictionary? he quest for the origin of signals 4. Putting it all together Image filling, denoising, compression, 3 4

Problem Setting he Objective Function L ultiply by { } P j j X Given these P eamples and a fied size (N K) dictionary, how would we find?, A X he eamples are linear combinations of atoms from A X st.. j, F j L Each eample has a sparse representation with no more than L atoms A (N,K,L are assumed nown, has normalized columns) 5 6 K SV An Overview K SV: Sparse Coding Stage Initialize A A X s.t. j, L F j Sparse Coding Use P or BP ictionary Update X For the j th eample we solve j s.t. L X Column-by-Column by SV computation Ordinary Sparse Coding! Aharon, Elad & Brucstein ( 4) 7 8

K SV: ictionary Update Stage K SV ictionary Update Stage A X s.t. j, L d j j j F For the th atom we solve d a E F j E d a X (the residual) X d a E d d,a We can do better than this d a E But wait! What about sparsity? F F X 9 3 K SV ictionary Update Stage he K SV Algorithm Summary d,a We want to solve: d Only some of the eamples use column d! a When updating a, only recompute the coefficients corresponding to those eamples E F Solve with SV! Initialize Sparse Coding Use P or BP ictionary Update Column-by-Column by SV computation X 3 3

Agenda Image Inpainting: heory. A Visit to Sparseland Introducing sparsity & overcompleteness. ransforms & Regularizations How & why should this wor? 3. What about the dictionary? he quest for the origin of signals 4. Putting it all together Image filling, denoising, compression, Assumption: the signal was created by with a very sparse. issing values in imply missing rows in this linear system. By removing these rows, we get. Now solve ɶ ɶ s.t. ~ ~ If was sparse enough, it will be the solution of the above problem! hus, computing recovers perfectly. 33 34 Inpainting: he Practice Inpainting Results We define a diagonal mas operator W representing the lost samples, so that y W+v wi,i {,} Given y, we try to recover the representation of, by solving Source ictionary: Curvelet (cartoon) + Global C (teture) ˆ Arg s.t. y - W ε ˆ ˆ We use a dictionary that is the sum of two dictionaries, to get an effective representation of both teture and cartoon contents. his also leads to image separation [Elad, Starc, & onoho ( 5)] Outcome 35 36

Inpainting Results Source ictionary: Curvelet (cartoon) + Overlapped C (teture) Inpainting Results % 5% Outcome 8% 37 38 enoising: heory and Practice From Local to Global reatment Given a noisy image y, we can clean it by solving ˆ Arg s.t. y - ε Can we use the K-SV dictionary? ˆ ˆ With K-SV, we cannot train a dictionary for an entire image. How do we go from local treatment of patches to a global prior? Solution: force shift-invariant sparsity for each NN patch of the image, including overlaps. ˆ Arg s.t. y - ε,{ } For patches, our AP penalty becomes ˆ Arg - y +µ - R s.t. L ˆ ˆ Etracts the (i,j) th patch Our prior 39 4

What ata to rain On? Image enoising: he Algorithm ˆ Arg -y,{ }, Option : Use a database of images: wors quite well (~.5-dB below the state-of-the-art) Option : Use the corrupted image itself! and nown and nown Compute per patch Compute to minimize R - Simply sweep through all NN patches (with overlaps) and use them to train s.t. L using matching pursuit Image of size piels ~6 eamples to use more than enough. + µ R - R - using SV, updating one column at a time s.t. L and nown Compute by I + µ RR y + µ R which is a simple averaging of shifted patches K-SV his wors much better! 4 enoising Results 4 enoising Results: 3 Source Source: Vis. ale Head (Slice #37) d-ksv: PSNR7.3dB Result 3.89dB Noisy image PSNR.dB 3d-KSV: Initial dictionary Obtained dictionary (overcomplete C) 64 56 after iterations PSNRdB 43 PSNR3.4dB 44

Image Compression Compression: he Algorithm Problem: compressing photo-i images. ivide each image to disjoint 55 patches, and for each compute a unique dictionary General purpose methods (JPEG, JPEG) do not tae into account the specific family. ivide to disjoint patches, and sparse-code each patch Compression etect features and align By adapting to the image-content, better results can be obtained. raining set (5 images) raining etect main features and align the images to a common reference ( parameters) Quantize and entropy-code Compression Results Original JPEG 46 Compression Results JPEG PCA Original K-SV Results for 8 bytes per image Bottom: RSE values 45 JPEG JPEG PCA K-SV Results for 55 bytes per image.99.49 8.8 5.56 5.8 3.89.66 6.6.83 8.9 7.89 4.8 4.67.4 9.44 5.49 5.3.57.7 6.36.93 8.7 8.6 Bottom: RSE values 5.58 47 48

oday We Have iscussed Summary. A Visit to Sparseland Introducing sparsity & overcompleteness. ransforms & Regularizations How & why should this wor? 3. What about the dictionary? he quest for the origin of signals 4. Putting it all together Image filling, denoising, compression, Sparsity and overcompleteness are important ideas for designing better tools in signal and image processing What net? We have seen inpainting, denoising and compression algorithms. (a) Generalizations: multiscale, nonnegative, Coping with an NP-hard problem (b) Speed-ups and improved algorithms (c) eploy to other applications Approimation algorithms can be used, are theoretically established and wor well in practice How is all this used? What dictionary to use? Several dictionaries already eist. We have shown how to practically train using the K-SV 49 5 Why Over-Completeness? esired ecomposition..5 -.5 -...5 -.5 -..5.5 φ φ φ 3 φ 4..3.5.5 +. - -. -4 -. {φ +.3φ +.5φ 3 +.5φ 4 } -6 -.3 -.4 {φ +.3φ } -.5-4 6 8 -.6 C Coefficients - -4-6 -8-4 8 6 4 C Coefficients Spie (Identity) Coefficients 64 8 5 5

Inpainting Results 7% issing Samples C (RSE.4) Haar (RSE.45) K-SV (RSE.3) 9% issing Samples C (RSE.85_ Haar (RSE.7) K-SV (RSE.6) 53