Special Thanks to Yuzhe Jin

Size: px
Start display at page:

Download "Special Thanks to Yuzhe Jin"

Transcription

1 Bhaskar Rao DepartmentofElectricaland ComputerEngineering UniversityofCalifornia,SanDiego David Wipf BiomedicalImagingLaboratory UniversityofCalifornia,San Francisco SpecialThankstoYuzhe Jin 1

2 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 2

3 Outline:Part2 Motivation:Limitationsofpopular p inversemethods Maximumaposteriori(MAP)estimation ( ) BayesianInference AnalysisofBayesianinferenceand connectionswithmap Applicationstoneuroimaging i i 3

4 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 4

5 EarlyWorks R.R.HockingandR.N.Leslie, SelectionoftheBest SubsetinRegressionAnalysis, Technometricss, S.Singhal andb.s.atal, AmplitudeOptimizationand PitchEstimationinMultipulse Coders, IEEETrans. Acoust.,Speech,SignalProcessing,1989 S.D.CabreraandT.W.Parks, Extrapolationand spectralestimationwithiterativeweightednorm modification, IEEETrans.Acoust.,Speech,Signal o a s coust,, Sg a Processing,April1991. ManyMoreworks Ourfirstwork I.F.Gorodnitsky,B.D.Rao andj.george, SourceLocalizationin Magnetoencephal0graphyusinganIterativeWeightedMinimum NormAlgorithm,IEEEAsilomar ConferenceonSignals,Systems andcomputers,pacificgrove,ca,pages:167171,oct.1992pacificgrove CA Pages: Oct

6 EarlySessiononSparsitySparsity OrganizedwithProf.Bresler aspecialsessionatthe1998ieee InternationalConferenceonAcoustics,SpeechandSignalProcessingSpeechandSignalProcessing SPEC-DSP: SIGNAL PROCESSING WITH SPARSENESS CONSTRAINT Signal Processing with the Sparseness Constraint BRao(Universityof B. of California, San Diego, USA) Application of Basis Pursuit in Spectrum Estimation S. Chen (IBM, USA); D. Donoho (Stanford University, USA) Parsimony and Wavelet Method for Denoising H. Krim (MIT, USA); J. Pesquet (University Paris Sud, France); I. Schick (GTE Ynternetworking and Harvard Univ., USA) Parsimonious Side Propagation P. Bradley, 0. Mangasarian (University of Wisconsin-Madison, USA) Fast Optimal and Suboptimal Algorithms for Sparse Solutions to Linear Inverse Problems G. Harikumar (Tellabs Research, USA); C. Couvreur, Y. Bresler (University of Illinois, Urbana-Champaign. USA) Measures and Algorithms for Best Basis Selection K. Kreutz-Delgado, B. Rao (University of Califomia, San Diego, USA) Sparse Inverse Solution Methods for Signal and Image Processing Applications B. Jeffs (Brigham Young University, USA) Image Denoising Using Multiple Compaction Domains P. Ishwar, K. Ratakonda, P. Moulin, N. Ahuja(University of Illinois, Urbana-Champaign, USA) 6

7 MotivationforTutorial SparseSignalRecoveryisaninterestingareawith manypotentialapplications.unificationofthetheory il li i U ifi i f h willprovidesynergy. Mthdd MethodsdevelopedforsolvingtheSparseSignal df li th S Si l Recoveryproblemcanbeavaluabletoolforsignal processingpractitioners. Manyinterestingdevelopmentsintherecentpastthat makethesubjecttimely. 7

8 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 8

9 ProblemDescription y x n m y isn 1measurementvector. isn m Dictionarymatrix.m >>n. x ism 1desiredvectorwhichissparsewithk nonzeroentries. istheadditivenoisemodeledasadditivewhitegaussian. i i dld i hi G i 9

10 ProblemStatement NoiseFreeCase:Givenatargetsignalt anda dictionary,findtheweightsxi fi d i h thatsolve: m min Ix ( 0)subjectto y x x i1 wherei(.)istheindicatorfunction () i NoisyCase:Givenatargetsignaly andadictionary, findtheweightsx thatsolve: m min Ix ( 0)subjectto y x x i1 i 2 10

11 Complexity Searchoverallpossiblesubsets,whichwouldmeanasearch overatotalof( m C k )subsets.combinatorialcomplexity. C bi i l i Withm =30;n =20;andk =10thereare subsets(very Complex) Abranchandboundalgorithmcanbeusedtofindtheoptimal solution.thespaceofsubsetssearchedisprunedbutthe searchmaystillbeverycomplex. Indicatorfunctionnotcontinuousandsonotamenableto standardoptimizationtools. Challenge:Findlowcomplexitymethodswithacceptable performance 11

12 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 12

13 Applications SignalRepresentation(Mallat,Coifman,Wickerhauser,Donoho,...) EEG/MEG (Leahy,Gorodnitsky,Ioannides,...) FunctionalApproximationandNeuralNetworks(Chen, Natarajan,Cun,Hassibi,...) Bandlimited extrapolationsandspectralestimation(papoulis, Lee,Cabrera,Parks,...) Parks SpeechCoding(Ozawa,Ono,Kroon,Atal,...) Sparsechannelequalization(Fevrier,Greenstein,Proakis, ) li (F i G t i P ) CompressiveSampling(Donoho,Candes,Tao...) MagneticResonanceImaging(Lustig,..) i 13

14 DFTExample Measurement y yl [] 2(cos0lcos 1l),l0,1,2,..., n1. n , DictionaryElements: e e e l m Considerm =64,128,256and and512 Questions: ( m) 2 ( 1) 2 [1, j l, j l l,..., j n T l ], l WhatistheresultofazeropaddedDFT? Whenviewedasproblemofsolvingalinearsystemof equationsdictionary,whatsolutiondoesthedftgiveus? Aretheremoredesirablesolutionsforthisproblem? 14

15 DFTExample Notethat y (128) (128) (128) (128) (256) (256) (256) (256) (512) (512) (512) (512) Considerthelinearsystemofequations ( ) y m x Thefrequencycomponentsinthedataarein thedictionaries (m) for m=128,256, Whatsolutionamongallpossiblesolutions doesthedftcompute? 15

16 DFTExample m=64 m= m=256 m=

17 SparseChannelEstimation Receivedseq. m 1 j0 Noise ri () si ( j ) cj () () (), i i01 0,1,..., n 1 Trainingseq. Channelimpulse response 17

18 Example: SparseChannelEstimation Formulatedasasparsesignalrecoveryproblem r ( o ) s (0) s ( 1) s( m 1) co ( ) ( o ) r(1) s(1) s(0) s( m2) c(1) (1) rn ( 1) sn ( 1) sn ( 2) s ( mn ) cm ( 1) ( n1) Canuseanyrelevantalgorithmtoestimatethesparse channelcoefficients 18

19 CompressiveSampling D.Donoho, CompressedSensing, IEEE Trans.onInformationTheory,2006 E.Candes andt.tao, NearOptimal SignalRecoveryfromrandom Projections:UniversalEncoding Strategies, IEEETransns.onInformationonInformation Theory,

20 CompressiveSampling TransformCoding x b Whatistheproblemhere? SamplingattheNyquist rate Keepingonlyasmallamountofnonzerocoefficients Canwedirectlyacquirethesignalbelow thenyquist rate? 20

21 CompressiveSampling TransformCoding x b Compressive Sampling A x A b y 21

22 CompressiveSampling CompressiveSampling A x A b y Computation: 1. Solveforw suchthatx=y t 2. Reconstruction:b =x Issues Needtorecoversparsesignalw withconstraintx=y NeedtodesignsamplingmatrixA 22

23 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 23

24 PotentialApproaches CombinatorialComplexityandsoneedalternate strategies GreedySearchTechniques:MatchingPursuit, OrthogonalMatchingPursuit g MinimizingDiversityMeasures:Indicatorfunction notcontinuous.definesurrogatecostfunctions thataremoretractableandwhoseminimizationt t t h i i ti leadstosparsesolutions,e.g. minimization BayesianMethods:MakeappropriateStatistical assumptionsonthesolutionandapplyestimation techniquestoidentifythedesiredsparsesolution 1 24

25 GreedySearchMethod:Matching Pursuit Selectacolumnthatismostalignedwiththecurrentresidual y x r (0) =y S (i) :setofindicesselected l argmax r 1 j m T j ( i1) Practicalstopcriteria: Certain#iterations r () i smallerthan 2 threshold Removeitscontributionfromtheresidual UpdateS (i) ( i 1) ( i) ( i 1) :If.Or,keepS l S, S S { l} (i) thesame Update r (i) : r r r r P l () i ( i1) ( i1) T ( i1) l l 25

26 GreedySearchMethod: Oth OrthogonalMatchingPursuit(OMP) thi P it(omp) Selectacolumnthatismostalignedwiththecurrentresidual y x r (0) =y S (i) :setofindicesselected l argmax r 1 j m T j ( i1) Removeitscontributionfromtheresidual UpdateS (i) () i ( i1) : S S {} l Update r (i) : r P r r P r () i ( i1) ( i1) ( i1) [ l, l,..., l ] [ l, l,..., l ] 1 2 i 1 2 i 26

27 GreedySearchMethod: OrderRecursiveOMP Selectacolumnthatismostalignedwiththecurrentresidual y x r (0) =y S (i) :setofindicesselected T i i l argmax r 1 jm j ( 1) 2 ( 1) 2 j 2 Removeitscontributionfromtheresidual UpdateS (i) () i ( i1) : S S {} l Updater (i) : r P r r P r l () i ( i1) ( i1) ( i1) [ l, l,..., l ] [ l, l,..., li] i 1 2 P [ 1, 2,..., i ] () i () i Update:.Canbecomputedrecursively 2 l l l l l 27

28 DeficiencyofMatchingPursuit TypeAlgorithms Ifthealgorithmpicksawrongindexataniteration, thereisnowaytocorrectthiserrorinsubsequent iterations. SomeRecentAlgorithms Stagewise OrthogonalMatchingPursuit(Donoho,Tsaig,..) COSAMP(Needelll,Tropp) Tropp) 28

29 InverseTechniques Forthesystemsofequationsx =y,thesolutionsetis characterizedby{x s :x s = + y +v;v N()},whereN() denotesthenullspaceof and + = T ( T ) 1. MinimumNormsolution:Theminimum 2 normsolution x mn = + y isapopularsolution NoisyCase:regularized 2 normsolutionoftenemployedand isgivenby x reg = T ( T +I) 1 y 29

30 Minimum2NormSolution Problem:Minimum 2 normsolutionisnotsparse Example: ,yy T xmn vs. x T DFT:Alsocomputesminimum2normsolution 30

31 DiversityMeasures Recall: m min Ix ( 0)subjecttoy x x i1 i Functionals whoseminimizationleadstosparse solutions Manyexamplesarefoundinthefieldsofeconomics, socialscienceandinformationtheory Thesefunctionals areusuallyconcavewhichleadsto difficultoptimizationproblems p 31

32 ExamplesofDiversityMeasures (p1) DiversityMeasure m ( p) p E x xi p i1 ( ), 1 Asp 0, m m ( p) p lim E ( x) lim xi I( xi 0) p 0 p 0 i 1 i 1 1 norm,convexrelaxationof 0 m (1) E ( x ) xi i1 32

33 1 DiversityMeasure Noiselesscase min x m i1 x subjecttox y i Noisycase 1 regularization[candes,romberg,tao] m xi y x 2 i1 min subjectto x Lasso[Tibshirani],BasisPursuitDenoising[Chen, [, Donoho,Saunders] min x m 2 y x x 2 i i1 33

34 1 normminimizationandmap estimation i MAPestimation xˆ argmax p( x y) Ifweassume x argmax log p( y x) log p( x) x i iszeromean,i.i.d.gaussiannoise p( x) p( x ),where p( x ) exp x Then i i i i 2 xˆ argmin y x x 2 i x i 34

35 Attractivenessof 1 methods ConvexOptimizationandassociatedwith richclassofoptimizationalgorithms Interiorpointmethods Coordinatedescentmethod. Question Whatistheabilitytofindthesparsesolution? 35

36 Whydiversitymeasureencourages sparsesolutions? p T min [ x, x ] subjectto x x y 1 2 p x x y equalnorm contour 0 p <1 p =1 p >1 36

37 1 normandlinearprogramming LinearProgram(LP): T mincxsubjecttoj x y x KeyResultinLP(Luenberger): a) Ifthereisafeasiblesolution,thereisabasic feasiblesolution* b) Ifthereisanoptimalfeasiblesolution,there isanoptimalbasicfeasiblesolution. *If isn m,thenabasicfeasiblesolutionisasolution withnnonzeroentries 37

38 Examplewith 1 diversitymeasure ,y NoiselessCase x BP =[1,0,0] T (machineprecision) NoisyCase Assumethemeasurementnoise =[0.01,0.01] T regularizationresult:xl1r lr =[00.986,0, ] T 1 Lassoresult( =0.05):x lasso =[0.975,0, ] T 38

39 Examplewith 1 diversitymeasure ContinuewiththeDFTexample: yl [] 2(cos0lcos 1l),l0,1,2,..., n1. n , ,128,256,512DFTcannotseparatetheadjacent frequencycomponents Using 1 diversitymeasureminimization(m=256)

40 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 40

41 ImportantQuestions Whenisthe 0 solutionunique? Whenisthe 1 solutionequivalenttothatof 0? NoiselessCase NoisyMeasurements 41

42 Uniqueness DefinitionofSpark iti fs Thesmallestnumberofcolumnsfromthatare linearlydependent.(spark() (Spark() n+1) Uniquenessofsparsestsolution m 1 If,thenx Ix ( i 0) Spark() istheunique i 1 2 solutionto m argmin Ix ( 0)subjecttox y x i1 i 42

43 MutualCoherence Foragivenmatrix =[ 1, 2,, m ],themutual coherenceµ()isdefinedas d d () max 1 i, jm; ij i T i j 2 j 2 43

44 Performancesof 1 diversity minimizationalgorithms NoiselessCase[Donoho [D &Elad 03] m 1 1 If,thenx Ix ( i 0) 1 istheunique i 1 2 () solutionto argmin x m i1 x subjecttox y i 44

45 Performancesof 1 diversity minimizationalgorithms NoisyCase[Donoho etal06] m 1 1 Assumey =x +,, Ix ( 0) 1 2 i i 1 4 () Thenthesolution satisfies m argmin subjectto * d di y d d i1 2 d x 2 1 () 4x 1 *

46 Empiricaldistributionofmutual coherence(gaussianmatrices) GaussianMatrices:N(0,1),normalizecolumnnormto randomgeneratedmatrices Histogramofmutualcoherence 90 Histogram 80 Histogram frequency frequency x200,mean=0.4025,std= x2000,mean=0.161,std=

47 PerformanceofOrthogonal MatchingPursuit NoiselessCase[Tropp 04] m If,thenOMPguarantees Ix ( i 0) 1 2 () i m recoveryofx afteriterations. Ix ( 0) i1 i 47

48 PerformanceofOrthogonal MatchingPursuit NoisyCase[Donoho etal] Assumey =x +,,, xmin x 2 1min i i m m 1 1 and. Ix ( i 0) 1 i 1 2 () () xmin StopOMPwhenresidualerror. ThenthesolutionofOMPsatisfies xomp 2 2 x 2 1 () x

49 RestrictedIsometry Constant Definition[Candes etal] Foramatrix,thesmallest constant k suchthat (1 ) x x (1 ) x foranyksparse signalx. k 2 2 k 2 49

50 Performancesof 1 diversity measureminimizationalgorithms [Candes 08] Assumey =x +,x isksparse,and. 2k 21 Then,thesolution 2 satisfies m argmin subjectto * d di y d d i1 d * x C wherec onlydependson 2k

51 Performancesof 1 diversity measureminimizationalgorithms [Candes 08] Assumey=x +,and. 2k 21 Then,thesolution i1 2 m argmin subjectto * d di y d d satisfies * 1 d x C1 xx 2 2 k C 1 k wherec 1,C 2 onlydependon 2k,andx k isthevectorx with allbuttheklargestentriessettozero. 2 51

52 MatriceswithGoodRIC Itturnsoutthatrandommatricescansatisfythe requirement(say,)withhighprobability. )withhighprobability Foramatrix n m 2k 2 1 Generateeachelement ij ~N(0,1/n),i.i.d. i id [Candes etal]if,thenp()1. =O log m n k 2k 21 k Observations: AlargeGaussianrandommatrixwillhavegoodRICwithhigh probability. Similarresultscanbeobtainedusingotherprobabilityensembles: Bernoulli,RandomFourier,. For 1 basedprocedure,numberofmeasurements requiredaren >k logm ~ 52

53 MoreGeneralQuestion Whatarelimitsofrecoveryinthepresence ofnoise? Noconstraintonrecoveryalgorithm Informationtheoreticapproachesareuseful inthiscase(wainwright,fletcher, Akcakaya,..) a Canconnecttheproblemofsupport recoverytochannelcodingoverthe Gaussianmultipleaccesschannel.Capacity regionsbecomeuseful(jin) 53

54 PerformanceforSupportRecovery 2 Noisymeasurements: y =x +, i ~N(0,),i.i.d. 2 Randommatrix, ij ~N(0,),i.i.d. a Performancemetric:exactsupportrecovery Considerasequenceofproblemswithincreasingsizes a k =2:.(TwouserMACcapacity) cx ( ) min log1 x 2 nonzero, i T {1,2} 2 T i T Sufficientcondition If logm limsup cx ( ) m n m then thereexistsasequenceof supportrecoverymethods(for diff.problemsresp.) suchthat lim P supp( xˆ ) supp( x) 0 Necessarycondition Ifthereexistsasequenceof supportrecoverymethods suchthat lim P supp( xˆ ) supp( x ) 0 m then logm limsup cx ( ) n m m m Theapproachcandealwithgeneralscalingamong(m,n,k). 54

55 Networkinformationtheoryperspective Connectiontochannelcoding(K (K K=1) ) y x x i thiscolumnisselected 55

56 y i x i y =x i i + 56

57 AdditiveWhiteGaussianNoise(AWGN) Channel e a h b a:channelinput h:channelgain h i e:additivegaussiannoise b:channeloutput,b=ha+eb=ha+e Recall y =x i i + 57

58 i y x i e a h b 58

59 i y x i e a h b 59

60 i y x i e a h b 60

61 i y x i e a h b 61

62 i y x i e a h b 62

63 Connectiontochannelcoding:(K1) 1) y x x s1 x sk 63

64 MultipleAccessChannel(MAC) s1 sk y x s1 x sk y =x s1 s1 + +x sk sk + e a K h K h 1 a 1 b b =h 1 a 1 + +h K a K +e 64

65 Connectionbetweentwoproblems y x e x s 1 a K h K b x sk a 1 h 1 :dictionary,measurementmatrix i :acolumn m dff differentpositionsinx s 1,,s k :indicesofnonzer0s codebook acodeword messageset{1,2,,m} themessagesselected xs i :sourceactivity it channelgainh h i 65

66 Differencesbetweentwoproblems Channel coding Eachsenderworks witha codebookdesignedonly forthat sender. Capacityresultavailablewhen channel gainh i isknown at receiver. Support recovery All senders sharethesame th codebooka.different senders selectdifferentcodewords. (Commoncodebook) Wedon t knowthenonzerosignal activitiesx i. (Unknownchannel) Proposedapproach[Jin,KimandRao] LeverageMACcapacityresulttoshedlightonperformancelimitofexact supportrecovery. Conventionalmethods+customizedmethods. Distancedecoding Nonzerosignalvalueestimation Fano s Inequality 66

67

68 Outline 1. Motivation: Limitations of popular inverse methods 2. Maximum a posteriori (MAP) estimation 3. Bayesian Inference 4. Analysis of Bayesian inference and connections with MAP 5. Applications to neuroimaging

69 Section I: Motivation

70 Limitation I Most sparse recovery results, using either greedy methods (e.g., OMP) or convex 1 minimization (e.g., BP), place heavy restrictions on the form of the dictionary Φ. While in some situations we can satisfy these restrictions (e.g., compressive sensing), for many applications we cannot (e.g., source localization, feature selection). When the assumptions on the dictionary are violated, performance may be very suboptimal

71 Limitation II The distribution of nonzero magnitudes in the maximally sparse solution x 0 can greatly affect the difficulty of canonical sparse recovery problems. This effect is strongly algorithm-dependent

72 Examples: 1 -norm Solutions and OMP With 1 -norm solutions, performance is independent of the magnitudes of nonzero coefficients [Malioutov et al., 2004]. Problem: Performance does not improve when the situation is favorable. OMP is highly sensitive to these magnitudes. Problem: Perform degrades heavily with unit magnitudes.

73 In General If the magnitudes of the non-zero elements in x 0 are highly scaled, then the canonical sparse recovery problem should be easier. x 0 x 0 scaled coefficients (easy) uniform coefficients (hard) The (approximate) Jeffreys distribution produces sufficiently scaled coefficients such that best solution can always be easily computed.

74 Jeffreys Distribution Density: px ( ) 1 x p(x) p(w) All have equal area w x A B C

75 Empirical Example For each test case: 1. Generate a random dictionary Φ with 50 rows and 100 columns. 2. Generate a sparse coefficient vector x Compute signal via y = Φ x 0 (noiseless). 4. Run BP and OMP, as well as a competing Bayesian method called SBL (more on this later) to try and correctly estimate x Average over1000 trials to compute empirical probability of failure. Repeat with different sparsity values, i.e., x 0 ranging 0 from 10 to 30.

76 Sample Results (n = 50, m = 100) Approx. Jeffreys Coefficients Unit Coefficients Err ror Rate Err ror Rate x0 x 0 0 0

77 Limitation III It is not immediately clear how to use these methods to assess uncertainty in coefficient estimates (e.g., covariances). Such estimates can be useful for designing an optimal (non-random) dictionary Φ. For example, it is well known in and machine learning and image processing communities that under-sampled random projections of natural scenes are very suboptimal.

78 Section II: MAP Estimation Using the Sparse Linear Model

79 Overview Can be viewed as sparse penalized regression with general sparsity-promoting penalties ( 1 penalty is special case). These penalties can be chosen to overcome some previous limitations when minimized with simple, efficient algorithms. Theory is somewhat harder to come by, but practical results are promising. In some cases can guarantee improvement over 1 solution on sparse recovery problems.

80 Sparse Linear Model Linear generative model: Observed n-dimensional data vector y = Φ x+ Matrix of m basis vectors Unknown Coefficients Gaussian noise with variance λ Objective: Estimate the unknown x given the following assumptions: 1. Φ is overcomplete, meaning the number of columns m is greater than the signal dimension n. 2. x is maximally sparse, i.e., many elements equal zero.

81 Sparse Inverse Problem Noiseless case (ε = 0): x arg min x s.t. y =Φx 0 x 0 0 norm = # of nonzeros in x Noisy case (ε > 0): ( λ) x arg min y Φ x + λ x 0 x = arg max exp 2λ y Φx exp 2 2 x x 0 likelihood prior

82 Difficulties 1. Combinatorial number of local minima 2. Objective is discontinuous A variety of existing approximate methods can be viewed as MAP estimation using a flexible class of sparse priors.

83 Sparse Priors: 2-D Example [Tipping, 2001] Gaussian Distribution Sparse Distribution

84 Basic MAP Estimation ( x y) xˆ arg max p x ( y x) p( x) = arg min log p log x n 2 = arg min y Φ x + λ 2 x i=1 data fit ( ) g x i 2 ( i) = h( xi ) g x where h is a nondecreasing, concave function, Note: Bayesian interpretation will be useful later

85 Example Sparsity Penalties ( ) = I[ x 0] g x With i i we have the canonical sparsity penalty and its associated problems. Practical selections: 2 ( ) ( ) i = i + ε g x x ε log, [ Chartrand and Yin, 2008 ] ( i) = ( xi + ε ) g x ( ) g x i = log, [ Candes et al., 2008] x i p, [ Rao et al., 2003] Different choices favor different levels of sparsity.

86 1 Example 2-D Contours ( ) g x i = x i p 0.5 x p = 0.2 p = 0.5 p = x 1

87 Which Penalty Should We Choose? Two competing issues: 1. If the prior is too sparse (e.g., p 0), then we may get stuck in a local minima: convergence error. 2. If the prior is not sparse enough (e.g. p 1), then the global minimum may be found, but it might not equal x 0 : structural error. Answer is ultimately application- and algorithmdependent

88 Convergence Errors vs. Structural Errors Convergence Error (p 0) Structural Error (p 1) -log p(x y) -log p(x y) x' x 0 x' x 0 x' = solution we have converged to x 0 = maximally sparse solution

89 Desirable Properties of Algorithms Can be implemented via relatively simple primitives already in use, e.g., 1 solvers, etc. Improved performance over OMP, BP, etc. Naturally extends to more general problems: 1. Constrained sparse estimation (e.g., finding non-negative sparse solutions) 2. Group sparsity problems

90 Example : Extension: Group Sparsity The simultaneous sparse estimation problem - the goal is to recover a matrix X, with maximal row sparsity [Cotter et al., 2005; Tropp, 2006], given observation matrix Y produced via Optimization Problem: ( ) Y =Φ X+ E X λ 0 = argmin Y Φ X + λ I i 0 F x X 2 m i= 1 # of nonzero rows in X Can be efficiently solved/approximated by replacing indicator function with alternative function g.

91 Assume: Reweighted 2 Optimization ( ) ( 2 ) i = i g x h x, h concave Updates: k + 2 x arg min y Φ x + w ( 1) ( k ) 2 2 i i x i λ = W Φ I +ΦW Φ ( λ ) ( k) T ( k) T ( ) 1 g x, W diag w ( k + 1) i ( k+ 1) ( k+ 1) i 2 xi ( k+ 1) xi = xi ( ) g x i Based on simple 1 st order approx. to [Palmer et al., 2006]. Guaranteed not to increase objective function. Global convergence assured given additional assumptions. w 1 y x

92 Examples 1. FOCUSS algorithm [Rao et al., 2003]: Penalty: Weight update: p ( ) =, 0 2 g x x p w i i ( k+ 1) ( k+ 1) p 2 i xi Properties: Well-characterized convergence rates; very susceptible to local minima when p is small. 2. Chartrand and Yin (2008) algorithm: Penalty: Weight update: 2 ( i) ( i ) k+ 2 k+ ( i ) g x = log x + ε, ε 0 ( 1) ( 1) wi x + ε 1 Properties: Slowly reducing ε to zero smoothes out local minima initially allowing better solutions to be found; very useful for recovering scaled coefficients

93 Empirical Comparison For each test case: 1. Generate a random dictionary Φ with 50 rows and 250 columns 2. Generate a sparse coefficient vector x Compute signal via y = Φ x 0 (noiseless). 4. Compare Chartrand and Yin s reweighted 2 method with 1 norm solution with regard to estimating x Average over 1000 independent trials. Repeat with different sparsity levels and different nonzero coefficient distributions.

94 Empirical: Unit Nonzeros probability of success x 0 0

95 Results: Gaussian Nonzeros probability of success x 0 0

96 Reweighted 1 Optimization Assume: ( ) = ( ), concave g x h x h i i Updates: k + 2 x arg min y Φ x + λ ( 1) w ( k ) 2 i i x ( ) g x ( k 1) i w + i x i ( k+ 1) xi = xi x i ( ) g x i Based on simple 1 st order approximation to [Fazel et al., 2003] Global convergence given minimal assumptions [Zangwill, 1969]. Per-iteration cost expensive, but few needed (and each are sparse). Easy to incorporate alternative data fit terms or constraints on x.

97 Penalty: Example [Candes et al., 2008] ( i) ( i ) g x = log x + ε, ε 0 Updates: ( k + 1) 2 ( k ) arg min Φ + λ w 2 i i x x y x ( k+ 1) ( k+ 1) wi xi + ε 1 x i When nonzeros in x 0 are scaled, works much better than regular 1, depending on how ε is chosen. Local minima exist, but since each iteration is sparse, local solutions are not so bad (no worse than regular 1 solution).

98 Empirical Comparison For each test case: 1. Generate a random dictionary Φ with 50 rows and 100 columns. 2. Generate a sparse coefficient vector x 0 with 30 truncated Gaussian, strictly positive nonzero coefficients. 3. Compute signal via y = Φ x 0 (noiseless). 4. Compare Candes et al. s reweighted 1 method (10 iterations) with 1 norm solution, both constrained to be non-negative to try and estimate x Average over 1000 independent trials. Repeat with different values of the parameter ε.

99 Empirical Comparison probability of success ε

100 Conclusions In practice, MAP estimation addresses some limitations of standard methods (although not Limitation III, assessing uncertainty). Simple updates are possible using either iterative reweighted 1 or 2 minimization. More generally, iterative reweighted f minimization, where f is a convex function, is possible.

101 Section III: Bayesian Inference Using the Sparse Linear Model

102 Note MAP estimation is really just standard/classical penalized regression. So the Bayesian interpretation has not really contributed much as of yet

103 Posterior Modes vs. Posterior Mass Previous methods focus on finding the implicit mode of p(x y) by maximizing the joint distribution p xy, = p y x p x ( ) ( ) ( ) Bayesian inference uses posterior information beyond the mode, i.e., posterior mass: Examples: 1. Posterior mean: Can have attractive properties when used as a sparse point estimate (more on this later ). 2. Posterior covariance: Useful assessing uncertainty in estimates, e.g., experimental design, learning new projection directions for compressive sensing measurements.

104 Posterior Modes vs. Posterior Mass Mode Probability Mass p(x y y) x

105 Problem For essentially all sparse priors, cannot compute normalized posterior p ( x y) ( ) p( ) ( ) ( ) p y x x = p y x p x dx Also cannot computer posterior moments, e.g., µ = x Σ = x [ x y] [ x y] E Cov So efficient approximations are needed

106 Approximating Posterior Mass Goal: Approximate p(x,y) with some distribution pˆ xy, that 1. Reflects the significant mass in p(x,y). ( ) 2. Can be normalized to get the posterior p(x y). 3. Has easily computable moments, e.g., can compute E[x y] or Cov[x y]. Optimization Problem: Find the that minimizes the sum of the misaligned mass: (, ) ˆ(, ) = ( ) ( ) ˆ( ) p xy p xy dx p y x p x p x dx ( ) pˆ xy,

107 Recipe 1. Start with a Gaussian likelihood p ( ) ( ) ( 2 ) 1 y x = 2πλ exp y Φx 2 n 2 2σ 2 2. Pick an appropriate prior p(x) that encourages sparsity 3. Choose a convenient parameterized class of approximate priors pˆ x = p x; 4. Solve: 5. Normalize: ˆ = arg min p( y x) p( x) p( x; ) dx p ( x y; ˆ ) ( ) ( ) ( ) p( ; ˆ ) ( ) ( ; ˆ ) p y x x = p y x p x dx

108 Step 2: Prior Selection Assume a sparse prior distribution on each coefficient: ( ) ( 2 ) log px ( ) g x = h x, hconcave, non-decreasing. i i i 2-D example [Tipping, 2001] Gaussian Distribution Sparse Distribution

109 Step 3: Forming Approximate Priors p(x;γ) Any sparse prior can be expressed via the dual form [Palmer et al., 2006]: 2 1/2 x i px ( i) = max( 2πγ i) exp ϕ ( γ i) γ i 0 2γ i Two options: 1. Start with px ( i ) and then compute ϕ γ i via convexity results, or 2. Choose ϕ( γ i ) directly and then compute px ( i ); this procedure will always produce a sparse prior [Palmer et al. 2006]. ( ) Dropping the maximization gives the strict variational lower bound 2 1/2 x i px ( i) px ( i; γi) = ( 2πγi) exp ϕ( γi) 2γ i Yields a convenient class of scaled Gaussian approximations: p ( x; ) p( x; γ ) = i i i

110 Example: Approximations to Sparse Prior p(x i ) p(x i ;γ i ) x i

111 Step 4: Solving for the Optimal γ To find the best approximation, must solve ˆ = arg min p( y x) p( x) p( x; ) dx 0 By virtue of the strict lower bound, this is equivalent to ˆ = arg max p( y x) p( x; ) dx 0 m T 1 arg min log y y y y 2 log i 0 i= 1 = Σ + Σ ( ) ϕ γ T where Σ = λi +ΦΓΦ Γ= diag ( ) y

112 How difficult is finding the optimal parameters γ? If original MAP estimation problem is convex, then so is Bayesian inference cost function [Nickisch and Seeger, 2009]. In other situations, Bayesian inference cost is generally In other situations, Bayesian inference cost is generally much smoother than associated MAP estimation (more on this later ).

113 Step 5: Posterior Approximation We have found the approximation p y x p x; ˆ = p xy, ; ˆ p xy, ( ) ( ) ( ) ( ) By design, this approximation reflects significant mass in the full distribution p(x,y). Also, it is easily normalized to form p x y; ˆ = N µ x, Σ x ( ) ( ) [ ] ˆ T ˆ ( ˆ T x y I ) µ = E ; γ = ΓΦ λ +ΦΓΦ x 1 y T [ ˆ] ( T x y γ λi ) Σ = Cov ; = Γ ΓΦ ˆ ˆ +ΦΓΦ ˆ ΦΓˆ x 1

114 Applications of Bayesian Inference 1. Finding maximally sparse representations Replace MAP estimate with posterior mean estimator. For certain prior selections, this can be very effective (next section) 2. Active learning, experimental design

115 Experimental Design Basic Idea [Shihao Ji et al., 2007, Seeger and Nickisch, 2008]: Use the approximate posterior ( x y; ˆ ) = (, Σ ) p N µ x x to learn new rows of the design matrix Φ such that uncertainty about x is reduced. Choose each additional row to minimize the differential entropy H: H 1 2 ( ) T λi 1 ˆ ˆ T = log Σ, Σ =Γ ΓΦ +ΦΓΦ ˆ ΦΓˆ x x

116 Experimental Design Cont. Drastic improvement over random projections is possible in a variety of domains. Examples: Reconstructing natural scenes [Seeger and Nickisch, 2008] Undersampled MRI reconstruction [Seeger et al., 2009]

117 Section IV: Analysis of Bayesian Inference and Connections with MAP

118 Overview Bayesian inference can be recast as a general form of MAP estimation in x-space. This is useful for several reasons: 1. Allows us to leverage same algorithmic formulations as with iterative reweighted methods for MAP estimation. 2. Reveals that Bayesian inference can actually be an easier computational task than searching for the mode as with MAP. 3. Provides theoretical motivation for posterior mean estimator when searching for maximally sparse solutions. 4. Allows modifications to Bayesian inference cost (e.g., adding constraints), and inspires new non-bayesian sparse estimators.

119 Reformulation of Posterior Mean Estimator Theorem 2 [ ˆ ] g ( ) = E x y, = arg min y Φ x +λ x x with Bayesian inference penalty function x 2 infer g infer = Γ + +ΦΓΦ T ( ) 1 T x min x x log λi 2 logϕ( γi ) 0 i [Wipf and Nagarajan, 2008] So the posterior mean can be obtained by minimizing a penalized regression problem just like MAP Posterior covariance is a natural byproduct as well

120 Property I of Penalty g infer (x) Penalty g infer (x) is formed from a minimum of upperbounding hyperplanes with respect to each 2. x i This implies: 2 x i 1. Concavity in for all i [Boyd 2004]. 2. Can be implemented via iterative reweighted 2 minimization (multiple possibilities using various bounding techniques) [Seeger, 2009; Wipf and Nagarajan, 2009]. 3. Note: Posterior covariance can be obtained easily too, therefore entire approximation can be computed for full Bayesian inference.

121 Student s t Example Assume the following sparse distribution for each unknown coefficient: Note: p x a = b a = b 0 ( ) i b+ ( a 1/2) When, prior approaches a Gaussian (not sparse) x 2 i 2 + When, prior approaches a Jeffreys (highly sparse) Using convex bounding techniques to approximate the required derivatives, leads to simple reweighted 2 update rules [Seeger, 2009; Wipf and Nagarajan, 2009]. Algorithm can also be derived via EM [Tipping, 2001].

122 Reweighted 2 Implementation Example ( k + 1) 2 ( k ) 2 arg min Φ + λ w 2 i xi x i x y x = W Φ I +ΦW Φ ( λ ) ( k) T ( k) T 1 y, W = diag w ( k) ( k) 1 w a a ( k 1) i ( ( 1) ) 2 ( ( ) ) 1 ( ( ) ) 2 ( ( ) ) k+ k k T k T 1 x + w w φ λi +ΦW Φ φ + 2b i i i i i Guaranteed to reduce or leave unchanged objective function at each iteration Other variants are possible using different bounding techniques Upon convergence, posterior covariance is given by 1 T ( k) 1 ( k) ( k) Σ x = λ Φ Φ+ W, W = diag w

123 Property II of Penalty g infer (x) ( ) If 2logϕ γ is concave in γ, then: i i 1. g infer (x) is concave in x i for all i sparsity-inducing 2. The implies posterior mean will always have at least m n elements equal to exactly zero as occurs with MAP. 3. Can be useful for canonical sparse recovery problems 4. Can implement via reweighted 1 minimization [Wipf, 2006; Wipf and Nagarajan, 2009]

124 Reweighted 1 Implementation Example Assume ( ) 2log ϕ γ = aγ, a 0 i i ( k + 1) 2 ( k ) argmin Φ + λ w 2 i x i x y x ( diag x ) 1 w ( k+ 1) T 2 ( k) ( k 1) T i φi σ I +ΦW + Φ φi + a x i 1 2 Guaranteed to converge to a minimum of the Bayesian inference objective function Easy to outfit with additional constraints In noiseless case, sparsity will not increase, i.e., ( ) ( k+ 1) ( k ) 1 x x x -norm 0 0 0

125 Property III of Penalty g infer (x) Bayesian inference is most sensitive to posterior mass, therefore it is less sensitive to spurious local peaks as is MAP estimation. Consequently, in x-space, the Bayesian Inference penalized regression problem min x 2 2 infer ( ) y Φ x +λg x is generally smoother than associated MAP problem.

126 Student s t Example Assume the Student s t distribution for each unknown coefficient: Note: p x a = b a = b 0 ( ) i b+ ( a 1/2) When, prior approaches a Gaussian (not sparse) x 2 i 2 + When, prior approaches a Jeffreys (highly sparse) Goal: Compare Bayesian inference and MAP for different sparsity levels to show smoothing effect.

127 Visualizing Local Minima Smoothing y =Φx Consider when has a 1-D feasible region, i.e., m = n +1 Any feasible solution x will satisfy: x = x +α v true true ( ) v Null Φ where α is a scalar x is the true generative coefficients Can plot penalty functions vs. α to view local minima profile over the 1-D feasible region.

128 Empirical Example Generate an iid Gaussian random dictionary Φ with 10 rows and 11 columns. Generate a sparse coefficient vector x true with 9 nonzeros and Gaussian iid amplitudes. Compute signal y = Φ x 0. Assume a Student s t prior on x with varying degrees of sparsity. Plot MAP/Bayesian inference penalties vs. α to compare local minima profiles over the 1-D feasible region.

129 Local Minima Smoothing Example #1 penalty, no ormalized Low Sparsity Student s t α

130 Local Minima Smoothing Example #2 penalty, no ormalized Medium Sparsity Student s t α

131 Local Minima Smoothing Example #3 penalty, no ormalized High Sparsity Student s t α

132 Property IV of Penalty g infer (x) Non-separable, meaning ( x) ( ) g g x infer i i i Non-separable penalty functions can have an advantage over separable penalties (i.e., MAP) when it comes to canonical sparse recovery problems [Wipf and Nagarajan, 2010].

133 Example Consider original sparse estimation problem x arg min x s.t. y =Φx 0 x 0 Problem: Combinatorial number of local minima: m 1 m + 1 number of local minima n n Local minima occur at each basic feasible solution (BFS): x n s.t. y =Φx 0

134 Visualization of Local Minima in 0 Norm Generate an iid Gaussian random dictionary Φ with 10 rows and 11 columns. Generate a sparse coefficient vector x true with 9 nonzeros and Gaussian iid amplitudes. Compute signal via y = Φ x 0. x Plot vs. α (1-D null space dimension) to view local 0 minima profile of the 0 norm over the 1-D feasible region.

135 0 Norm in 1-D Feasible Region nzeros # of non α

136 Non-Separable Penalty Example Would like to smooth local minima while retaining same global solution as 0 at all times (unlike 1 norm) This can be accomplished by a simple modification of the 0 penalty. Truncated 0 penalty: xˆ = arg min x x 0 s.t. x y = k largest elements of =Φx x If k < m, then there will necessarily be fewer local minima; however, the implicit prior/penalty function is non-separable.

137 Truncated 0 Norm in 1-D Feasible Region # of non nzeros k=10 α

138 Using posterior mean estimator for finding maximally sparse solutions Summary of why this could be a good idea: ( ) 1. If 2logϕ γi is concave, then posterior mean will be sparse (local minima of Bayesian inference cost will also be sparse). 2. The implicit Bayesian inference cost function can be much smoother than the associated MAP objective. 3. Potential advantages of non-separable penalty functions.

139 Choosing the function ϕ ( ) For sparsity, require that 2logϕ γ i is concave. To avoid adding extra local minima (i.e., to maximally exploit smoothing effect), require that 2logϕ γ is convex. ( ) ( ) So 2log ϕ γ = a γ, a 0 is well-motivated [Wipf et al. 2007]. i i i ( ) Assume simplest case: 2log ϕ γ i = 0, sometimes referred to as sparse Bayesian learning (SBL) [Tipping, 2001]. ( ) We denote the penalty function in this case g x. SBL

140 Advantages of Posterior Mean Estimator Theorem In the low noise limit (λ 0), and assuming then the SBL penalty satisfies: x 0 0 [ ] < spark Φ 1, ( x) arg min g = arg min SBL x : y =Φ x x : y =Φ x 0 No separable penalty g( x) = gi( x ) satisfies this i i condition and has fewer minima than the SBL penalty in the feasible region. x [Wipf and Nagarajan, 2008]

141 Conditions For a Single Minimum Theorem [ ] Assume x < spark Φ 1 0. If the magnitudes of the non-zero 0 elements in x 0 are sufficiently scaled, then the SBL cost (λ 0) has a single minimum which is located at x 0. x 0 x 0 scaled coefficients (easy) uniform coefficients (hard) No possible separable penalty (standard MAP) satisfies this condition. [Wipf and Nagarajan, 2009]

142 Empirical Example Generate an iid Gaussian random dictionary Φ with 10 rows and 11 columns. Generate a maximally sparse coefficient vector x 0 with 9 nonzeros and either 1. amplitudes of similar scales, or 2. amplitudes with very different scales. Compute signal via y = Φ x 0. Plot MAP/Bayesian inference penalty functions vs. α to compare local minima profiles over the 1-D feasible region to see the effect of coefficient scaling.

143 malized penalty, norm Smoothing Example: Similar Scales

144 malized penalty, norm Smoothing Example: Different Scales

145 Always Room for Improvement Theorem Consider the noiseless sparse recovery problem. x arg min x s.t. y =Φx 0 x 0 Under very mild conditions, SBL with reweighted 1 implementation will: 1. Never do worse than the regular 1 -norm solution 2. For any dictionary and sparsity profile, there will always be cases where it does better. [Wipf and Nagarajan, 2010]

146 Empirical Example: Simultaneous Sparse Approximation Generate data matrix via Y = ΦX 0 (noiseless): X 0 is 100-by-5 with random nonzero rows. Φ is 50-by-100 with Gaussian iid entries Check if X 0 is recovered using various algorithms: 1. Generalized SBL, reweighted 2 implementation [Wipf and Nagarajan, 2010] 2. Candes et al. (2008) reweighted 1 method 3. Chartrand and Yin (2008) reweighted 2 method 1 solution via Group Lasso [Yuan and Lin, 2006]

147 Empirical Results (1000 Trials) of success probability o SBL row sparsity

148 Conclusions Posterior information beyond the mode can be very useful in a wide variety of applications. Variational approximation provides useful estimates of posterior means and covariances, which can be computed efficiently using standard iterative reweighting algorithms. In certain situations, posterior mean estimate can be effective substitute for 0 norm minimization. In simulation tests, out-performs a wide variety of MAPbased algorithms [Wipf and Nagarajan, 2010]...

149 Section V: Application Examples in Neuroimaging

150 Applications of Sparse Bayesian Methods 1. Recovering fiber track geometry from diffusion weighted MR images [Ramirez-Manzanares et al. 2007]. 2. Multivariate autoregressive modeling of fmri time series for functional connectivity analyses [Harrison et al. 2003]. 3. Compressive sensing for rapid MRI [Lustig et al. 2007]. 4. MEG/EEG source localization [Sato et al. 2004; Friston et al. 2008].

151 MEG/EEG Source Localization Maxwell s eqs.? source space (X) sensor space (Y)

152 The Dictionary Φ Can be computed using a boundary element brain model and Maxwell s equations. Will be dependent on location of sensors and whether we are doing MEG, EEG, or both. Unlike compressive sensing domain, columns of Φ will be highly correlated regardless of where sensors are placed.

153 Source Localization Given multiple measurement vectors Y, MAP or Bayesian inference algorithms can be run to estimate X. The estimated nonzero rows should correspond with the location of active brain areas (also called sources). Like compressive sensing, may apply algorithms in appropriate transform domain where row-sparsity assumption holds.

154 Empirical Results 1. Simulations with real brain noise/interference: Generate damped sinusoidal sources Map to sensors using Φ and apply real brain noise, artifacts 2. Data from real-world experiments: Auditory evoked fields from binaurally presented tones (which produce correlated, bilateral activations) Compare localization results using MAP estimation and SBL posterior mean from Bayesian inference

155 MEG Source Reconstruction Example Ground Truth SBL Group Lasso

156 Real Data: Auditory Evoked Field (AEF) SBL Beamformer sloreta Group Lasso

157 Conclusion MEG/EEG source localization demonstrates the effectiveness of Bayesian inference on problems where the dictionary is: Highly overcomplete, meaning m n, e.g., n = 275 and m = 100, 000. Very ill-conditioned and coherent, i.e., columns are highly correlated.

158 Final Thoughts Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving the Sparse Signal Recovery problem can be valuable tools for signal processing practitioners. Rich set of computational algorithms, e.g., Greedy search (OMP) 1 norm minimization (Basis Pursuit, Lasso) MAP methods (Reweighted 1 and 2 methods) Bayesian Inference methods like SBL (show great promise) Potential for great theory in support of performance guarantees for algorithms. Expectation is that there will be continued growth in the application domain as well as in the algorithm development.

159 Thank You

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego 1 Outline Course Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational

More information

Sparse Signal Recovery: Theory, Applications and Algorithms

Sparse Signal Recovery: Theory, Applications and Algorithms Sparse Signal Recovery: Theory, Applications and Algorithms Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Collaborators: I. Gorodonitsky, S. Cotter,

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Latent Variable Bayesian Models for Promoting Sparsity

Latent Variable Bayesian Models for Promoting Sparsity 1 Latent Variable Bayesian Models for Promoting Sparsity David Wipf 1, Bhaskar Rao 2, and Srikantan Nagarajan 1 1 Biomagnetic Imaging Lab, University of California, San Francisco 513 Parnassus Avenue,

More information

l 0 -norm Minimization for Basis Selection

l 0 -norm Minimization for Basis Selection l 0 -norm Minimization for Basis Selection David Wipf and Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego, CA 92092 dwipf@ucsd.edu, brao@ece.ucsd.edu Abstract

More information

On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery

On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery Yuzhe Jin and Bhaskar D. Rao Department of Electrical and Computer Engineering, University of California at San Diego, La

More information

Sparsifying Transform Learning for Compressed Sensing MRI

Sparsifying Transform Learning for Compressed Sensing MRI Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

Variational Bayesian Inference Techniques

Variational Bayesian Inference Techniques Advanced Signal Processing 2, SE Variational Bayesian Inference Techniques Johann Steiner 1 Outline Introduction Sparse Signal Reconstruction Sparsity Priors Benefits of Sparse Bayesian Inference Variational

More information

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego,

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Compressed Sensing: Extending CLEAN and NNLS

Compressed Sensing: Extending CLEAN and NNLS Compressed Sensing: Extending CLEAN and NNLS Ludwig Schwardt SKA South Africa (KAT Project) Calibration & Imaging Workshop Socorro, NM, USA 31 March 2009 Outline 1 Compressed Sensing (CS) Introduction

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems 1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Robust multichannel sparse recovery

Robust multichannel sparse recovery Robust multichannel sparse recovery Esa Ollila Department of Signal Processing and Acoustics Aalto University, Finland SUPELEC, Feb 4th, 2015 1 Introduction 2 Nonparametric sparse recovery 3 Simulation

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

Signal Recovery, Uncertainty Relations, and Minkowski Dimension Signal Recovery, Uncertainty Relations, and Minkowski Dimension Helmut Bőlcskei ETH Zurich December 2013 Joint work with C. Aubel, P. Kuppinger, G. Pope, E. Riegler, D. Stotz, and C. Studer Aim of this

More information

Simultaneous Sparsity

Simultaneous Sparsity Simultaneous Sparsity Joel A. Tropp Anna C. Gilbert Martin J. Strauss {jtropp annacg martinjs}@umich.edu Department of Mathematics The University of Michigan 1 Simple Sparse Approximation Work in the d-dimensional,

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Recovery Guarantees for Rank Aware Pursuits

Recovery Guarantees for Rank Aware Pursuits BLANCHARD AND DAVIES: RECOVERY GUARANTEES FOR RANK AWARE PURSUITS 1 Recovery Guarantees for Rank Aware Pursuits Jeffrey D. Blanchard and Mike E. Davies Abstract This paper considers sufficient conditions

More information

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Sean Borman and Robert L. Stevenson Department of Electrical Engineering, University of Notre Dame Notre Dame,

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 1 One Picture and a Thousand Words Using Matrix Approximations Dianne P. O Leary

More information

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

wissen leben WWU Münster

wissen leben WWU Münster MÜNSTER Sparsity Constraints in Bayesian Inversion Inverse Days conference in Jyväskylä, Finland. Felix Lucka 19.12.2012 MÜNSTER 2 /41 Sparsity Constraints in Inverse Problems Current trend in high dimensional

More information

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1 CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.

More information

Perspectives on Sparse Bayesian Learning

Perspectives on Sparse Bayesian Learning Perspectives on Sparse Bayesian Learning David Wipf, Jason Palmer, and Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego, CA 909 dwipf,japalmer@ucsd.edu,

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France. Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE 2010 2935 Variance-Component Based Sparse Signal Reconstruction and Model Selection Kun Qiu, Student Member, IEEE, and Aleksandar Dogandzic,

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016 Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent

More information

The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

A hierarchical Krylov-Bayes iterative inverse solver for MEG with physiological preconditioning

A hierarchical Krylov-Bayes iterative inverse solver for MEG with physiological preconditioning A hierarchical Krylov-Bayes iterative inverse solver for MEG with physiological preconditioning D Calvetti 1 A Pascarella 3 F Pitolli 2 E Somersalo 1 B Vantaggi 2 1 Case Western Reserve University Department

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

Nonconvex penalties: Signal-to-noise ratio and algorithms

Nonconvex penalties: Signal-to-noise ratio and algorithms Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex

More information

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit arxiv:0707.4203v2 [math.na] 14 Aug 2007 Deanna Needell Department of Mathematics University of California,

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Type II variational methods in Bayesian estimation

Type II variational methods in Bayesian estimation Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093

More information