Special Thanks to Yuzhe Jin
|
|
- Ira Perkins
- 5 years ago
- Views:
Transcription
1 Bhaskar Rao DepartmentofElectricaland ComputerEngineering UniversityofCalifornia,SanDiego David Wipf BiomedicalImagingLaboratory UniversityofCalifornia,San Francisco SpecialThankstoYuzhe Jin 1
2 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 2
3 Outline:Part2 Motivation:Limitationsofpopular p inversemethods Maximumaposteriori(MAP)estimation ( ) BayesianInference AnalysisofBayesianinferenceand connectionswithmap Applicationstoneuroimaging i i 3
4 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 4
5 EarlyWorks R.R.HockingandR.N.Leslie, SelectionoftheBest SubsetinRegressionAnalysis, Technometricss, S.Singhal andb.s.atal, AmplitudeOptimizationand PitchEstimationinMultipulse Coders, IEEETrans. Acoust.,Speech,SignalProcessing,1989 S.D.CabreraandT.W.Parks, Extrapolationand spectralestimationwithiterativeweightednorm modification, IEEETrans.Acoust.,Speech,Signal o a s coust,, Sg a Processing,April1991. ManyMoreworks Ourfirstwork I.F.Gorodnitsky,B.D.Rao andj.george, SourceLocalizationin Magnetoencephal0graphyusinganIterativeWeightedMinimum NormAlgorithm,IEEEAsilomar ConferenceonSignals,Systems andcomputers,pacificgrove,ca,pages:167171,oct.1992pacificgrove CA Pages: Oct
6 EarlySessiononSparsitySparsity OrganizedwithProf.Bresler aspecialsessionatthe1998ieee InternationalConferenceonAcoustics,SpeechandSignalProcessingSpeechandSignalProcessing SPEC-DSP: SIGNAL PROCESSING WITH SPARSENESS CONSTRAINT Signal Processing with the Sparseness Constraint BRao(Universityof B. of California, San Diego, USA) Application of Basis Pursuit in Spectrum Estimation S. Chen (IBM, USA); D. Donoho (Stanford University, USA) Parsimony and Wavelet Method for Denoising H. Krim (MIT, USA); J. Pesquet (University Paris Sud, France); I. Schick (GTE Ynternetworking and Harvard Univ., USA) Parsimonious Side Propagation P. Bradley, 0. Mangasarian (University of Wisconsin-Madison, USA) Fast Optimal and Suboptimal Algorithms for Sparse Solutions to Linear Inverse Problems G. Harikumar (Tellabs Research, USA); C. Couvreur, Y. Bresler (University of Illinois, Urbana-Champaign. USA) Measures and Algorithms for Best Basis Selection K. Kreutz-Delgado, B. Rao (University of Califomia, San Diego, USA) Sparse Inverse Solution Methods for Signal and Image Processing Applications B. Jeffs (Brigham Young University, USA) Image Denoising Using Multiple Compaction Domains P. Ishwar, K. Ratakonda, P. Moulin, N. Ahuja(University of Illinois, Urbana-Champaign, USA) 6
7 MotivationforTutorial SparseSignalRecoveryisaninterestingareawith manypotentialapplications.unificationofthetheory il li i U ifi i f h willprovidesynergy. Mthdd MethodsdevelopedforsolvingtheSparseSignal df li th S Si l Recoveryproblemcanbeavaluabletoolforsignal processingpractitioners. Manyinterestingdevelopmentsintherecentpastthat makethesubjecttimely. 7
8 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 8
9 ProblemDescription y x n m y isn 1measurementvector. isn m Dictionarymatrix.m >>n. x ism 1desiredvectorwhichissparsewithk nonzeroentries. istheadditivenoisemodeledasadditivewhitegaussian. i i dld i hi G i 9
10 ProblemStatement NoiseFreeCase:Givenatargetsignalt anda dictionary,findtheweightsxi fi d i h thatsolve: m min Ix ( 0)subjectto y x x i1 wherei(.)istheindicatorfunction () i NoisyCase:Givenatargetsignaly andadictionary, findtheweightsx thatsolve: m min Ix ( 0)subjectto y x x i1 i 2 10
11 Complexity Searchoverallpossiblesubsets,whichwouldmeanasearch overatotalof( m C k )subsets.combinatorialcomplexity. C bi i l i Withm =30;n =20;andk =10thereare subsets(very Complex) Abranchandboundalgorithmcanbeusedtofindtheoptimal solution.thespaceofsubsetssearchedisprunedbutthe searchmaystillbeverycomplex. Indicatorfunctionnotcontinuousandsonotamenableto standardoptimizationtools. Challenge:Findlowcomplexitymethodswithacceptable performance 11
12 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 12
13 Applications SignalRepresentation(Mallat,Coifman,Wickerhauser,Donoho,...) EEG/MEG (Leahy,Gorodnitsky,Ioannides,...) FunctionalApproximationandNeuralNetworks(Chen, Natarajan,Cun,Hassibi,...) Bandlimited extrapolationsandspectralestimation(papoulis, Lee,Cabrera,Parks,...) Parks SpeechCoding(Ozawa,Ono,Kroon,Atal,...) Sparsechannelequalization(Fevrier,Greenstein,Proakis, ) li (F i G t i P ) CompressiveSampling(Donoho,Candes,Tao...) MagneticResonanceImaging(Lustig,..) i 13
14 DFTExample Measurement y yl [] 2(cos0lcos 1l),l0,1,2,..., n1. n , DictionaryElements: e e e l m Considerm =64,128,256and and512 Questions: ( m) 2 ( 1) 2 [1, j l, j l l,..., j n T l ], l WhatistheresultofazeropaddedDFT? Whenviewedasproblemofsolvingalinearsystemof equationsdictionary,whatsolutiondoesthedftgiveus? Aretheremoredesirablesolutionsforthisproblem? 14
15 DFTExample Notethat y (128) (128) (128) (128) (256) (256) (256) (256) (512) (512) (512) (512) Considerthelinearsystemofequations ( ) y m x Thefrequencycomponentsinthedataarein thedictionaries (m) for m=128,256, Whatsolutionamongallpossiblesolutions doesthedftcompute? 15
16 DFTExample m=64 m= m=256 m=
17 SparseChannelEstimation Receivedseq. m 1 j0 Noise ri () si ( j ) cj () () (), i i01 0,1,..., n 1 Trainingseq. Channelimpulse response 17
18 Example: SparseChannelEstimation Formulatedasasparsesignalrecoveryproblem r ( o ) s (0) s ( 1) s( m 1) co ( ) ( o ) r(1) s(1) s(0) s( m2) c(1) (1) rn ( 1) sn ( 1) sn ( 2) s ( mn ) cm ( 1) ( n1) Canuseanyrelevantalgorithmtoestimatethesparse channelcoefficients 18
19 CompressiveSampling D.Donoho, CompressedSensing, IEEE Trans.onInformationTheory,2006 E.Candes andt.tao, NearOptimal SignalRecoveryfromrandom Projections:UniversalEncoding Strategies, IEEETransns.onInformationonInformation Theory,
20 CompressiveSampling TransformCoding x b Whatistheproblemhere? SamplingattheNyquist rate Keepingonlyasmallamountofnonzerocoefficients Canwedirectlyacquirethesignalbelow thenyquist rate? 20
21 CompressiveSampling TransformCoding x b Compressive Sampling A x A b y 21
22 CompressiveSampling CompressiveSampling A x A b y Computation: 1. Solveforw suchthatx=y t 2. Reconstruction:b =x Issues Needtorecoversparsesignalw withconstraintx=y NeedtodesignsamplingmatrixA 22
23 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 23
24 PotentialApproaches CombinatorialComplexityandsoneedalternate strategies GreedySearchTechniques:MatchingPursuit, OrthogonalMatchingPursuit g MinimizingDiversityMeasures:Indicatorfunction notcontinuous.definesurrogatecostfunctions thataremoretractableandwhoseminimizationt t t h i i ti leadstosparsesolutions,e.g. minimization BayesianMethods:MakeappropriateStatistical assumptionsonthesolutionandapplyestimation techniquestoidentifythedesiredsparsesolution 1 24
25 GreedySearchMethod:Matching Pursuit Selectacolumnthatismostalignedwiththecurrentresidual y x r (0) =y S (i) :setofindicesselected l argmax r 1 j m T j ( i1) Practicalstopcriteria: Certain#iterations r () i smallerthan 2 threshold Removeitscontributionfromtheresidual UpdateS (i) ( i 1) ( i) ( i 1) :If.Or,keepS l S, S S { l} (i) thesame Update r (i) : r r r r P l () i ( i1) ( i1) T ( i1) l l 25
26 GreedySearchMethod: Oth OrthogonalMatchingPursuit(OMP) thi P it(omp) Selectacolumnthatismostalignedwiththecurrentresidual y x r (0) =y S (i) :setofindicesselected l argmax r 1 j m T j ( i1) Removeitscontributionfromtheresidual UpdateS (i) () i ( i1) : S S {} l Update r (i) : r P r r P r () i ( i1) ( i1) ( i1) [ l, l,..., l ] [ l, l,..., l ] 1 2 i 1 2 i 26
27 GreedySearchMethod: OrderRecursiveOMP Selectacolumnthatismostalignedwiththecurrentresidual y x r (0) =y S (i) :setofindicesselected T i i l argmax r 1 jm j ( 1) 2 ( 1) 2 j 2 Removeitscontributionfromtheresidual UpdateS (i) () i ( i1) : S S {} l Updater (i) : r P r r P r l () i ( i1) ( i1) ( i1) [ l, l,..., l ] [ l, l,..., li] i 1 2 P [ 1, 2,..., i ] () i () i Update:.Canbecomputedrecursively 2 l l l l l 27
28 DeficiencyofMatchingPursuit TypeAlgorithms Ifthealgorithmpicksawrongindexataniteration, thereisnowaytocorrectthiserrorinsubsequent iterations. SomeRecentAlgorithms Stagewise OrthogonalMatchingPursuit(Donoho,Tsaig,..) COSAMP(Needelll,Tropp) Tropp) 28
29 InverseTechniques Forthesystemsofequationsx =y,thesolutionsetis characterizedby{x s :x s = + y +v;v N()},whereN() denotesthenullspaceof and + = T ( T ) 1. MinimumNormsolution:Theminimum 2 normsolution x mn = + y isapopularsolution NoisyCase:regularized 2 normsolutionoftenemployedand isgivenby x reg = T ( T +I) 1 y 29
30 Minimum2NormSolution Problem:Minimum 2 normsolutionisnotsparse Example: ,yy T xmn vs. x T DFT:Alsocomputesminimum2normsolution 30
31 DiversityMeasures Recall: m min Ix ( 0)subjecttoy x x i1 i Functionals whoseminimizationleadstosparse solutions Manyexamplesarefoundinthefieldsofeconomics, socialscienceandinformationtheory Thesefunctionals areusuallyconcavewhichleadsto difficultoptimizationproblems p 31
32 ExamplesofDiversityMeasures (p1) DiversityMeasure m ( p) p E x xi p i1 ( ), 1 Asp 0, m m ( p) p lim E ( x) lim xi I( xi 0) p 0 p 0 i 1 i 1 1 norm,convexrelaxationof 0 m (1) E ( x ) xi i1 32
33 1 DiversityMeasure Noiselesscase min x m i1 x subjecttox y i Noisycase 1 regularization[candes,romberg,tao] m xi y x 2 i1 min subjectto x Lasso[Tibshirani],BasisPursuitDenoising[Chen, [, Donoho,Saunders] min x m 2 y x x 2 i i1 33
34 1 normminimizationandmap estimation i MAPestimation xˆ argmax p( x y) Ifweassume x argmax log p( y x) log p( x) x i iszeromean,i.i.d.gaussiannoise p( x) p( x ),where p( x ) exp x Then i i i i 2 xˆ argmin y x x 2 i x i 34
35 Attractivenessof 1 methods ConvexOptimizationandassociatedwith richclassofoptimizationalgorithms Interiorpointmethods Coordinatedescentmethod. Question Whatistheabilitytofindthesparsesolution? 35
36 Whydiversitymeasureencourages sparsesolutions? p T min [ x, x ] subjectto x x y 1 2 p x x y equalnorm contour 0 p <1 p =1 p >1 36
37 1 normandlinearprogramming LinearProgram(LP): T mincxsubjecttoj x y x KeyResultinLP(Luenberger): a) Ifthereisafeasiblesolution,thereisabasic feasiblesolution* b) Ifthereisanoptimalfeasiblesolution,there isanoptimalbasicfeasiblesolution. *If isn m,thenabasicfeasiblesolutionisasolution withnnonzeroentries 37
38 Examplewith 1 diversitymeasure ,y NoiselessCase x BP =[1,0,0] T (machineprecision) NoisyCase Assumethemeasurementnoise =[0.01,0.01] T regularizationresult:xl1r lr =[00.986,0, ] T 1 Lassoresult( =0.05):x lasso =[0.975,0, ] T 38
39 Examplewith 1 diversitymeasure ContinuewiththeDFTexample: yl [] 2(cos0lcos 1l),l0,1,2,..., n1. n , ,128,256,512DFTcannotseparatetheadjacent frequencycomponents Using 1 diversitymeasureminimization(m=256)
40 Outline:Part1 MotivationforTutorial SparseSignalRecoveryProblem Applications ComputationalAlgorithms GreedySearch 1 normminimization i i PerformanceGuarantees 40
41 ImportantQuestions Whenisthe 0 solutionunique? Whenisthe 1 solutionequivalenttothatof 0? NoiselessCase NoisyMeasurements 41
42 Uniqueness DefinitionofSpark iti fs Thesmallestnumberofcolumnsfromthatare linearlydependent.(spark() (Spark() n+1) Uniquenessofsparsestsolution m 1 If,thenx Ix ( i 0) Spark() istheunique i 1 2 solutionto m argmin Ix ( 0)subjecttox y x i1 i 42
43 MutualCoherence Foragivenmatrix =[ 1, 2,, m ],themutual coherenceµ()isdefinedas d d () max 1 i, jm; ij i T i j 2 j 2 43
44 Performancesof 1 diversity minimizationalgorithms NoiselessCase[Donoho [D &Elad 03] m 1 1 If,thenx Ix ( i 0) 1 istheunique i 1 2 () solutionto argmin x m i1 x subjecttox y i 44
45 Performancesof 1 diversity minimizationalgorithms NoisyCase[Donoho etal06] m 1 1 Assumey =x +,, Ix ( 0) 1 2 i i 1 4 () Thenthesolution satisfies m argmin subjectto * d di y d d i1 2 d x 2 1 () 4x 1 *
46 Empiricaldistributionofmutual coherence(gaussianmatrices) GaussianMatrices:N(0,1),normalizecolumnnormto randomgeneratedmatrices Histogramofmutualcoherence 90 Histogram 80 Histogram frequency frequency x200,mean=0.4025,std= x2000,mean=0.161,std=
47 PerformanceofOrthogonal MatchingPursuit NoiselessCase[Tropp 04] m If,thenOMPguarantees Ix ( i 0) 1 2 () i m recoveryofx afteriterations. Ix ( 0) i1 i 47
48 PerformanceofOrthogonal MatchingPursuit NoisyCase[Donoho etal] Assumey =x +,,, xmin x 2 1min i i m m 1 1 and. Ix ( i 0) 1 i 1 2 () () xmin StopOMPwhenresidualerror. ThenthesolutionofOMPsatisfies xomp 2 2 x 2 1 () x
49 RestrictedIsometry Constant Definition[Candes etal] Foramatrix,thesmallest constant k suchthat (1 ) x x (1 ) x foranyksparse signalx. k 2 2 k 2 49
50 Performancesof 1 diversity measureminimizationalgorithms [Candes 08] Assumey =x +,x isksparse,and. 2k 21 Then,thesolution 2 satisfies m argmin subjectto * d di y d d i1 d * x C wherec onlydependson 2k
51 Performancesof 1 diversity measureminimizationalgorithms [Candes 08] Assumey=x +,and. 2k 21 Then,thesolution i1 2 m argmin subjectto * d di y d d satisfies * 1 d x C1 xx 2 2 k C 1 k wherec 1,C 2 onlydependon 2k,andx k isthevectorx with allbuttheklargestentriessettozero. 2 51
52 MatriceswithGoodRIC Itturnsoutthatrandommatricescansatisfythe requirement(say,)withhighprobability. )withhighprobability Foramatrix n m 2k 2 1 Generateeachelement ij ~N(0,1/n),i.i.d. i id [Candes etal]if,thenp()1. =O log m n k 2k 21 k Observations: AlargeGaussianrandommatrixwillhavegoodRICwithhigh probability. Similarresultscanbeobtainedusingotherprobabilityensembles: Bernoulli,RandomFourier,. For 1 basedprocedure,numberofmeasurements requiredaren >k logm ~ 52
53 MoreGeneralQuestion Whatarelimitsofrecoveryinthepresence ofnoise? Noconstraintonrecoveryalgorithm Informationtheoreticapproachesareuseful inthiscase(wainwright,fletcher, Akcakaya,..) a Canconnecttheproblemofsupport recoverytochannelcodingoverthe Gaussianmultipleaccesschannel.Capacity regionsbecomeuseful(jin) 53
54 PerformanceforSupportRecovery 2 Noisymeasurements: y =x +, i ~N(0,),i.i.d. 2 Randommatrix, ij ~N(0,),i.i.d. a Performancemetric:exactsupportrecovery Considerasequenceofproblemswithincreasingsizes a k =2:.(TwouserMACcapacity) cx ( ) min log1 x 2 nonzero, i T {1,2} 2 T i T Sufficientcondition If logm limsup cx ( ) m n m then thereexistsasequenceof supportrecoverymethods(for diff.problemsresp.) suchthat lim P supp( xˆ ) supp( x) 0 Necessarycondition Ifthereexistsasequenceof supportrecoverymethods suchthat lim P supp( xˆ ) supp( x ) 0 m then logm limsup cx ( ) n m m m Theapproachcandealwithgeneralscalingamong(m,n,k). 54
55 Networkinformationtheoryperspective Connectiontochannelcoding(K (K K=1) ) y x x i thiscolumnisselected 55
56 y i x i y =x i i + 56
57 AdditiveWhiteGaussianNoise(AWGN) Channel e a h b a:channelinput h:channelgain h i e:additivegaussiannoise b:channeloutput,b=ha+eb=ha+e Recall y =x i i + 57
58 i y x i e a h b 58
59 i y x i e a h b 59
60 i y x i e a h b 60
61 i y x i e a h b 61
62 i y x i e a h b 62
63 Connectiontochannelcoding:(K1) 1) y x x s1 x sk 63
64 MultipleAccessChannel(MAC) s1 sk y x s1 x sk y =x s1 s1 + +x sk sk + e a K h K h 1 a 1 b b =h 1 a 1 + +h K a K +e 64
65 Connectionbetweentwoproblems y x e x s 1 a K h K b x sk a 1 h 1 :dictionary,measurementmatrix i :acolumn m dff differentpositionsinx s 1,,s k :indicesofnonzer0s codebook acodeword messageset{1,2,,m} themessagesselected xs i :sourceactivity it channelgainh h i 65
66 Differencesbetweentwoproblems Channel coding Eachsenderworks witha codebookdesignedonly forthat sender. Capacityresultavailablewhen channel gainh i isknown at receiver. Support recovery All senders sharethesame th codebooka.different senders selectdifferentcodewords. (Commoncodebook) Wedon t knowthenonzerosignal activitiesx i. (Unknownchannel) Proposedapproach[Jin,KimandRao] LeverageMACcapacityresulttoshedlightonperformancelimitofexact supportrecovery. Conventionalmethods+customizedmethods. Distancedecoding Nonzerosignalvalueestimation Fano s Inequality 66
67
68 Outline 1. Motivation: Limitations of popular inverse methods 2. Maximum a posteriori (MAP) estimation 3. Bayesian Inference 4. Analysis of Bayesian inference and connections with MAP 5. Applications to neuroimaging
69 Section I: Motivation
70 Limitation I Most sparse recovery results, using either greedy methods (e.g., OMP) or convex 1 minimization (e.g., BP), place heavy restrictions on the form of the dictionary Φ. While in some situations we can satisfy these restrictions (e.g., compressive sensing), for many applications we cannot (e.g., source localization, feature selection). When the assumptions on the dictionary are violated, performance may be very suboptimal
71 Limitation II The distribution of nonzero magnitudes in the maximally sparse solution x 0 can greatly affect the difficulty of canonical sparse recovery problems. This effect is strongly algorithm-dependent
72 Examples: 1 -norm Solutions and OMP With 1 -norm solutions, performance is independent of the magnitudes of nonzero coefficients [Malioutov et al., 2004]. Problem: Performance does not improve when the situation is favorable. OMP is highly sensitive to these magnitudes. Problem: Perform degrades heavily with unit magnitudes.
73 In General If the magnitudes of the non-zero elements in x 0 are highly scaled, then the canonical sparse recovery problem should be easier. x 0 x 0 scaled coefficients (easy) uniform coefficients (hard) The (approximate) Jeffreys distribution produces sufficiently scaled coefficients such that best solution can always be easily computed.
74 Jeffreys Distribution Density: px ( ) 1 x p(x) p(w) All have equal area w x A B C
75 Empirical Example For each test case: 1. Generate a random dictionary Φ with 50 rows and 100 columns. 2. Generate a sparse coefficient vector x Compute signal via y = Φ x 0 (noiseless). 4. Run BP and OMP, as well as a competing Bayesian method called SBL (more on this later) to try and correctly estimate x Average over1000 trials to compute empirical probability of failure. Repeat with different sparsity values, i.e., x 0 ranging 0 from 10 to 30.
76 Sample Results (n = 50, m = 100) Approx. Jeffreys Coefficients Unit Coefficients Err ror Rate Err ror Rate x0 x 0 0 0
77 Limitation III It is not immediately clear how to use these methods to assess uncertainty in coefficient estimates (e.g., covariances). Such estimates can be useful for designing an optimal (non-random) dictionary Φ. For example, it is well known in and machine learning and image processing communities that under-sampled random projections of natural scenes are very suboptimal.
78 Section II: MAP Estimation Using the Sparse Linear Model
79 Overview Can be viewed as sparse penalized regression with general sparsity-promoting penalties ( 1 penalty is special case). These penalties can be chosen to overcome some previous limitations when minimized with simple, efficient algorithms. Theory is somewhat harder to come by, but practical results are promising. In some cases can guarantee improvement over 1 solution on sparse recovery problems.
80 Sparse Linear Model Linear generative model: Observed n-dimensional data vector y = Φ x+ Matrix of m basis vectors Unknown Coefficients Gaussian noise with variance λ Objective: Estimate the unknown x given the following assumptions: 1. Φ is overcomplete, meaning the number of columns m is greater than the signal dimension n. 2. x is maximally sparse, i.e., many elements equal zero.
81 Sparse Inverse Problem Noiseless case (ε = 0): x arg min x s.t. y =Φx 0 x 0 0 norm = # of nonzeros in x Noisy case (ε > 0): ( λ) x arg min y Φ x + λ x 0 x = arg max exp 2λ y Φx exp 2 2 x x 0 likelihood prior
82 Difficulties 1. Combinatorial number of local minima 2. Objective is discontinuous A variety of existing approximate methods can be viewed as MAP estimation using a flexible class of sparse priors.
83 Sparse Priors: 2-D Example [Tipping, 2001] Gaussian Distribution Sparse Distribution
84 Basic MAP Estimation ( x y) xˆ arg max p x ( y x) p( x) = arg min log p log x n 2 = arg min y Φ x + λ 2 x i=1 data fit ( ) g x i 2 ( i) = h( xi ) g x where h is a nondecreasing, concave function, Note: Bayesian interpretation will be useful later
85 Example Sparsity Penalties ( ) = I[ x 0] g x With i i we have the canonical sparsity penalty and its associated problems. Practical selections: 2 ( ) ( ) i = i + ε g x x ε log, [ Chartrand and Yin, 2008 ] ( i) = ( xi + ε ) g x ( ) g x i = log, [ Candes et al., 2008] x i p, [ Rao et al., 2003] Different choices favor different levels of sparsity.
86 1 Example 2-D Contours ( ) g x i = x i p 0.5 x p = 0.2 p = 0.5 p = x 1
87 Which Penalty Should We Choose? Two competing issues: 1. If the prior is too sparse (e.g., p 0), then we may get stuck in a local minima: convergence error. 2. If the prior is not sparse enough (e.g. p 1), then the global minimum may be found, but it might not equal x 0 : structural error. Answer is ultimately application- and algorithmdependent
88 Convergence Errors vs. Structural Errors Convergence Error (p 0) Structural Error (p 1) -log p(x y) -log p(x y) x' x 0 x' x 0 x' = solution we have converged to x 0 = maximally sparse solution
89 Desirable Properties of Algorithms Can be implemented via relatively simple primitives already in use, e.g., 1 solvers, etc. Improved performance over OMP, BP, etc. Naturally extends to more general problems: 1. Constrained sparse estimation (e.g., finding non-negative sparse solutions) 2. Group sparsity problems
90 Example : Extension: Group Sparsity The simultaneous sparse estimation problem - the goal is to recover a matrix X, with maximal row sparsity [Cotter et al., 2005; Tropp, 2006], given observation matrix Y produced via Optimization Problem: ( ) Y =Φ X+ E X λ 0 = argmin Y Φ X + λ I i 0 F x X 2 m i= 1 # of nonzero rows in X Can be efficiently solved/approximated by replacing indicator function with alternative function g.
91 Assume: Reweighted 2 Optimization ( ) ( 2 ) i = i g x h x, h concave Updates: k + 2 x arg min y Φ x + w ( 1) ( k ) 2 2 i i x i λ = W Φ I +ΦW Φ ( λ ) ( k) T ( k) T ( ) 1 g x, W diag w ( k + 1) i ( k+ 1) ( k+ 1) i 2 xi ( k+ 1) xi = xi ( ) g x i Based on simple 1 st order approx. to [Palmer et al., 2006]. Guaranteed not to increase objective function. Global convergence assured given additional assumptions. w 1 y x
92 Examples 1. FOCUSS algorithm [Rao et al., 2003]: Penalty: Weight update: p ( ) =, 0 2 g x x p w i i ( k+ 1) ( k+ 1) p 2 i xi Properties: Well-characterized convergence rates; very susceptible to local minima when p is small. 2. Chartrand and Yin (2008) algorithm: Penalty: Weight update: 2 ( i) ( i ) k+ 2 k+ ( i ) g x = log x + ε, ε 0 ( 1) ( 1) wi x + ε 1 Properties: Slowly reducing ε to zero smoothes out local minima initially allowing better solutions to be found; very useful for recovering scaled coefficients
93 Empirical Comparison For each test case: 1. Generate a random dictionary Φ with 50 rows and 250 columns 2. Generate a sparse coefficient vector x Compute signal via y = Φ x 0 (noiseless). 4. Compare Chartrand and Yin s reweighted 2 method with 1 norm solution with regard to estimating x Average over 1000 independent trials. Repeat with different sparsity levels and different nonzero coefficient distributions.
94 Empirical: Unit Nonzeros probability of success x 0 0
95 Results: Gaussian Nonzeros probability of success x 0 0
96 Reweighted 1 Optimization Assume: ( ) = ( ), concave g x h x h i i Updates: k + 2 x arg min y Φ x + λ ( 1) w ( k ) 2 i i x ( ) g x ( k 1) i w + i x i ( k+ 1) xi = xi x i ( ) g x i Based on simple 1 st order approximation to [Fazel et al., 2003] Global convergence given minimal assumptions [Zangwill, 1969]. Per-iteration cost expensive, but few needed (and each are sparse). Easy to incorporate alternative data fit terms or constraints on x.
97 Penalty: Example [Candes et al., 2008] ( i) ( i ) g x = log x + ε, ε 0 Updates: ( k + 1) 2 ( k ) arg min Φ + λ w 2 i i x x y x ( k+ 1) ( k+ 1) wi xi + ε 1 x i When nonzeros in x 0 are scaled, works much better than regular 1, depending on how ε is chosen. Local minima exist, but since each iteration is sparse, local solutions are not so bad (no worse than regular 1 solution).
98 Empirical Comparison For each test case: 1. Generate a random dictionary Φ with 50 rows and 100 columns. 2. Generate a sparse coefficient vector x 0 with 30 truncated Gaussian, strictly positive nonzero coefficients. 3. Compute signal via y = Φ x 0 (noiseless). 4. Compare Candes et al. s reweighted 1 method (10 iterations) with 1 norm solution, both constrained to be non-negative to try and estimate x Average over 1000 independent trials. Repeat with different values of the parameter ε.
99 Empirical Comparison probability of success ε
100 Conclusions In practice, MAP estimation addresses some limitations of standard methods (although not Limitation III, assessing uncertainty). Simple updates are possible using either iterative reweighted 1 or 2 minimization. More generally, iterative reweighted f minimization, where f is a convex function, is possible.
101 Section III: Bayesian Inference Using the Sparse Linear Model
102 Note MAP estimation is really just standard/classical penalized regression. So the Bayesian interpretation has not really contributed much as of yet
103 Posterior Modes vs. Posterior Mass Previous methods focus on finding the implicit mode of p(x y) by maximizing the joint distribution p xy, = p y x p x ( ) ( ) ( ) Bayesian inference uses posterior information beyond the mode, i.e., posterior mass: Examples: 1. Posterior mean: Can have attractive properties when used as a sparse point estimate (more on this later ). 2. Posterior covariance: Useful assessing uncertainty in estimates, e.g., experimental design, learning new projection directions for compressive sensing measurements.
104 Posterior Modes vs. Posterior Mass Mode Probability Mass p(x y y) x
105 Problem For essentially all sparse priors, cannot compute normalized posterior p ( x y) ( ) p( ) ( ) ( ) p y x x = p y x p x dx Also cannot computer posterior moments, e.g., µ = x Σ = x [ x y] [ x y] E Cov So efficient approximations are needed
106 Approximating Posterior Mass Goal: Approximate p(x,y) with some distribution pˆ xy, that 1. Reflects the significant mass in p(x,y). ( ) 2. Can be normalized to get the posterior p(x y). 3. Has easily computable moments, e.g., can compute E[x y] or Cov[x y]. Optimization Problem: Find the that minimizes the sum of the misaligned mass: (, ) ˆ(, ) = ( ) ( ) ˆ( ) p xy p xy dx p y x p x p x dx ( ) pˆ xy,
107 Recipe 1. Start with a Gaussian likelihood p ( ) ( ) ( 2 ) 1 y x = 2πλ exp y Φx 2 n 2 2σ 2 2. Pick an appropriate prior p(x) that encourages sparsity 3. Choose a convenient parameterized class of approximate priors pˆ x = p x; 4. Solve: 5. Normalize: ˆ = arg min p( y x) p( x) p( x; ) dx p ( x y; ˆ ) ( ) ( ) ( ) p( ; ˆ ) ( ) ( ; ˆ ) p y x x = p y x p x dx
108 Step 2: Prior Selection Assume a sparse prior distribution on each coefficient: ( ) ( 2 ) log px ( ) g x = h x, hconcave, non-decreasing. i i i 2-D example [Tipping, 2001] Gaussian Distribution Sparse Distribution
109 Step 3: Forming Approximate Priors p(x;γ) Any sparse prior can be expressed via the dual form [Palmer et al., 2006]: 2 1/2 x i px ( i) = max( 2πγ i) exp ϕ ( γ i) γ i 0 2γ i Two options: 1. Start with px ( i ) and then compute ϕ γ i via convexity results, or 2. Choose ϕ( γ i ) directly and then compute px ( i ); this procedure will always produce a sparse prior [Palmer et al. 2006]. ( ) Dropping the maximization gives the strict variational lower bound 2 1/2 x i px ( i) px ( i; γi) = ( 2πγi) exp ϕ( γi) 2γ i Yields a convenient class of scaled Gaussian approximations: p ( x; ) p( x; γ ) = i i i
110 Example: Approximations to Sparse Prior p(x i ) p(x i ;γ i ) x i
111 Step 4: Solving for the Optimal γ To find the best approximation, must solve ˆ = arg min p( y x) p( x) p( x; ) dx 0 By virtue of the strict lower bound, this is equivalent to ˆ = arg max p( y x) p( x; ) dx 0 m T 1 arg min log y y y y 2 log i 0 i= 1 = Σ + Σ ( ) ϕ γ T where Σ = λi +ΦΓΦ Γ= diag ( ) y
112 How difficult is finding the optimal parameters γ? If original MAP estimation problem is convex, then so is Bayesian inference cost function [Nickisch and Seeger, 2009]. In other situations, Bayesian inference cost is generally In other situations, Bayesian inference cost is generally much smoother than associated MAP estimation (more on this later ).
113 Step 5: Posterior Approximation We have found the approximation p y x p x; ˆ = p xy, ; ˆ p xy, ( ) ( ) ( ) ( ) By design, this approximation reflects significant mass in the full distribution p(x,y). Also, it is easily normalized to form p x y; ˆ = N µ x, Σ x ( ) ( ) [ ] ˆ T ˆ ( ˆ T x y I ) µ = E ; γ = ΓΦ λ +ΦΓΦ x 1 y T [ ˆ] ( T x y γ λi ) Σ = Cov ; = Γ ΓΦ ˆ ˆ +ΦΓΦ ˆ ΦΓˆ x 1
114 Applications of Bayesian Inference 1. Finding maximally sparse representations Replace MAP estimate with posterior mean estimator. For certain prior selections, this can be very effective (next section) 2. Active learning, experimental design
115 Experimental Design Basic Idea [Shihao Ji et al., 2007, Seeger and Nickisch, 2008]: Use the approximate posterior ( x y; ˆ ) = (, Σ ) p N µ x x to learn new rows of the design matrix Φ such that uncertainty about x is reduced. Choose each additional row to minimize the differential entropy H: H 1 2 ( ) T λi 1 ˆ ˆ T = log Σ, Σ =Γ ΓΦ +ΦΓΦ ˆ ΦΓˆ x x
116 Experimental Design Cont. Drastic improvement over random projections is possible in a variety of domains. Examples: Reconstructing natural scenes [Seeger and Nickisch, 2008] Undersampled MRI reconstruction [Seeger et al., 2009]
117 Section IV: Analysis of Bayesian Inference and Connections with MAP
118 Overview Bayesian inference can be recast as a general form of MAP estimation in x-space. This is useful for several reasons: 1. Allows us to leverage same algorithmic formulations as with iterative reweighted methods for MAP estimation. 2. Reveals that Bayesian inference can actually be an easier computational task than searching for the mode as with MAP. 3. Provides theoretical motivation for posterior mean estimator when searching for maximally sparse solutions. 4. Allows modifications to Bayesian inference cost (e.g., adding constraints), and inspires new non-bayesian sparse estimators.
119 Reformulation of Posterior Mean Estimator Theorem 2 [ ˆ ] g ( ) = E x y, = arg min y Φ x +λ x x with Bayesian inference penalty function x 2 infer g infer = Γ + +ΦΓΦ T ( ) 1 T x min x x log λi 2 logϕ( γi ) 0 i [Wipf and Nagarajan, 2008] So the posterior mean can be obtained by minimizing a penalized regression problem just like MAP Posterior covariance is a natural byproduct as well
120 Property I of Penalty g infer (x) Penalty g infer (x) is formed from a minimum of upperbounding hyperplanes with respect to each 2. x i This implies: 2 x i 1. Concavity in for all i [Boyd 2004]. 2. Can be implemented via iterative reweighted 2 minimization (multiple possibilities using various bounding techniques) [Seeger, 2009; Wipf and Nagarajan, 2009]. 3. Note: Posterior covariance can be obtained easily too, therefore entire approximation can be computed for full Bayesian inference.
121 Student s t Example Assume the following sparse distribution for each unknown coefficient: Note: p x a = b a = b 0 ( ) i b+ ( a 1/2) When, prior approaches a Gaussian (not sparse) x 2 i 2 + When, prior approaches a Jeffreys (highly sparse) Using convex bounding techniques to approximate the required derivatives, leads to simple reweighted 2 update rules [Seeger, 2009; Wipf and Nagarajan, 2009]. Algorithm can also be derived via EM [Tipping, 2001].
122 Reweighted 2 Implementation Example ( k + 1) 2 ( k ) 2 arg min Φ + λ w 2 i xi x i x y x = W Φ I +ΦW Φ ( λ ) ( k) T ( k) T 1 y, W = diag w ( k) ( k) 1 w a a ( k 1) i ( ( 1) ) 2 ( ( ) ) 1 ( ( ) ) 2 ( ( ) ) k+ k k T k T 1 x + w w φ λi +ΦW Φ φ + 2b i i i i i Guaranteed to reduce or leave unchanged objective function at each iteration Other variants are possible using different bounding techniques Upon convergence, posterior covariance is given by 1 T ( k) 1 ( k) ( k) Σ x = λ Φ Φ+ W, W = diag w
123 Property II of Penalty g infer (x) ( ) If 2logϕ γ is concave in γ, then: i i 1. g infer (x) is concave in x i for all i sparsity-inducing 2. The implies posterior mean will always have at least m n elements equal to exactly zero as occurs with MAP. 3. Can be useful for canonical sparse recovery problems 4. Can implement via reweighted 1 minimization [Wipf, 2006; Wipf and Nagarajan, 2009]
124 Reweighted 1 Implementation Example Assume ( ) 2log ϕ γ = aγ, a 0 i i ( k + 1) 2 ( k ) argmin Φ + λ w 2 i x i x y x ( diag x ) 1 w ( k+ 1) T 2 ( k) ( k 1) T i φi σ I +ΦW + Φ φi + a x i 1 2 Guaranteed to converge to a minimum of the Bayesian inference objective function Easy to outfit with additional constraints In noiseless case, sparsity will not increase, i.e., ( ) ( k+ 1) ( k ) 1 x x x -norm 0 0 0
125 Property III of Penalty g infer (x) Bayesian inference is most sensitive to posterior mass, therefore it is less sensitive to spurious local peaks as is MAP estimation. Consequently, in x-space, the Bayesian Inference penalized regression problem min x 2 2 infer ( ) y Φ x +λg x is generally smoother than associated MAP problem.
126 Student s t Example Assume the Student s t distribution for each unknown coefficient: Note: p x a = b a = b 0 ( ) i b+ ( a 1/2) When, prior approaches a Gaussian (not sparse) x 2 i 2 + When, prior approaches a Jeffreys (highly sparse) Goal: Compare Bayesian inference and MAP for different sparsity levels to show smoothing effect.
127 Visualizing Local Minima Smoothing y =Φx Consider when has a 1-D feasible region, i.e., m = n +1 Any feasible solution x will satisfy: x = x +α v true true ( ) v Null Φ where α is a scalar x is the true generative coefficients Can plot penalty functions vs. α to view local minima profile over the 1-D feasible region.
128 Empirical Example Generate an iid Gaussian random dictionary Φ with 10 rows and 11 columns. Generate a sparse coefficient vector x true with 9 nonzeros and Gaussian iid amplitudes. Compute signal y = Φ x 0. Assume a Student s t prior on x with varying degrees of sparsity. Plot MAP/Bayesian inference penalties vs. α to compare local minima profiles over the 1-D feasible region.
129 Local Minima Smoothing Example #1 penalty, no ormalized Low Sparsity Student s t α
130 Local Minima Smoothing Example #2 penalty, no ormalized Medium Sparsity Student s t α
131 Local Minima Smoothing Example #3 penalty, no ormalized High Sparsity Student s t α
132 Property IV of Penalty g infer (x) Non-separable, meaning ( x) ( ) g g x infer i i i Non-separable penalty functions can have an advantage over separable penalties (i.e., MAP) when it comes to canonical sparse recovery problems [Wipf and Nagarajan, 2010].
133 Example Consider original sparse estimation problem x arg min x s.t. y =Φx 0 x 0 Problem: Combinatorial number of local minima: m 1 m + 1 number of local minima n n Local minima occur at each basic feasible solution (BFS): x n s.t. y =Φx 0
134 Visualization of Local Minima in 0 Norm Generate an iid Gaussian random dictionary Φ with 10 rows and 11 columns. Generate a sparse coefficient vector x true with 9 nonzeros and Gaussian iid amplitudes. Compute signal via y = Φ x 0. x Plot vs. α (1-D null space dimension) to view local 0 minima profile of the 0 norm over the 1-D feasible region.
135 0 Norm in 1-D Feasible Region nzeros # of non α
136 Non-Separable Penalty Example Would like to smooth local minima while retaining same global solution as 0 at all times (unlike 1 norm) This can be accomplished by a simple modification of the 0 penalty. Truncated 0 penalty: xˆ = arg min x x 0 s.t. x y = k largest elements of =Φx x If k < m, then there will necessarily be fewer local minima; however, the implicit prior/penalty function is non-separable.
137 Truncated 0 Norm in 1-D Feasible Region # of non nzeros k=10 α
138 Using posterior mean estimator for finding maximally sparse solutions Summary of why this could be a good idea: ( ) 1. If 2logϕ γi is concave, then posterior mean will be sparse (local minima of Bayesian inference cost will also be sparse). 2. The implicit Bayesian inference cost function can be much smoother than the associated MAP objective. 3. Potential advantages of non-separable penalty functions.
139 Choosing the function ϕ ( ) For sparsity, require that 2logϕ γ i is concave. To avoid adding extra local minima (i.e., to maximally exploit smoothing effect), require that 2logϕ γ is convex. ( ) ( ) So 2log ϕ γ = a γ, a 0 is well-motivated [Wipf et al. 2007]. i i i ( ) Assume simplest case: 2log ϕ γ i = 0, sometimes referred to as sparse Bayesian learning (SBL) [Tipping, 2001]. ( ) We denote the penalty function in this case g x. SBL
140 Advantages of Posterior Mean Estimator Theorem In the low noise limit (λ 0), and assuming then the SBL penalty satisfies: x 0 0 [ ] < spark Φ 1, ( x) arg min g = arg min SBL x : y =Φ x x : y =Φ x 0 No separable penalty g( x) = gi( x ) satisfies this i i condition and has fewer minima than the SBL penalty in the feasible region. x [Wipf and Nagarajan, 2008]
141 Conditions For a Single Minimum Theorem [ ] Assume x < spark Φ 1 0. If the magnitudes of the non-zero 0 elements in x 0 are sufficiently scaled, then the SBL cost (λ 0) has a single minimum which is located at x 0. x 0 x 0 scaled coefficients (easy) uniform coefficients (hard) No possible separable penalty (standard MAP) satisfies this condition. [Wipf and Nagarajan, 2009]
142 Empirical Example Generate an iid Gaussian random dictionary Φ with 10 rows and 11 columns. Generate a maximally sparse coefficient vector x 0 with 9 nonzeros and either 1. amplitudes of similar scales, or 2. amplitudes with very different scales. Compute signal via y = Φ x 0. Plot MAP/Bayesian inference penalty functions vs. α to compare local minima profiles over the 1-D feasible region to see the effect of coefficient scaling.
143 malized penalty, norm Smoothing Example: Similar Scales
144 malized penalty, norm Smoothing Example: Different Scales
145 Always Room for Improvement Theorem Consider the noiseless sparse recovery problem. x arg min x s.t. y =Φx 0 x 0 Under very mild conditions, SBL with reweighted 1 implementation will: 1. Never do worse than the regular 1 -norm solution 2. For any dictionary and sparsity profile, there will always be cases where it does better. [Wipf and Nagarajan, 2010]
146 Empirical Example: Simultaneous Sparse Approximation Generate data matrix via Y = ΦX 0 (noiseless): X 0 is 100-by-5 with random nonzero rows. Φ is 50-by-100 with Gaussian iid entries Check if X 0 is recovered using various algorithms: 1. Generalized SBL, reweighted 2 implementation [Wipf and Nagarajan, 2010] 2. Candes et al. (2008) reweighted 1 method 3. Chartrand and Yin (2008) reweighted 2 method 1 solution via Group Lasso [Yuan and Lin, 2006]
147 Empirical Results (1000 Trials) of success probability o SBL row sparsity
148 Conclusions Posterior information beyond the mode can be very useful in a wide variety of applications. Variational approximation provides useful estimates of posterior means and covariances, which can be computed efficiently using standard iterative reweighting algorithms. In certain situations, posterior mean estimate can be effective substitute for 0 norm minimization. In simulation tests, out-performs a wide variety of MAPbased algorithms [Wipf and Nagarajan, 2010]...
149 Section V: Application Examples in Neuroimaging
150 Applications of Sparse Bayesian Methods 1. Recovering fiber track geometry from diffusion weighted MR images [Ramirez-Manzanares et al. 2007]. 2. Multivariate autoregressive modeling of fmri time series for functional connectivity analyses [Harrison et al. 2003]. 3. Compressive sensing for rapid MRI [Lustig et al. 2007]. 4. MEG/EEG source localization [Sato et al. 2004; Friston et al. 2008].
151 MEG/EEG Source Localization Maxwell s eqs.? source space (X) sensor space (Y)
152 The Dictionary Φ Can be computed using a boundary element brain model and Maxwell s equations. Will be dependent on location of sensors and whether we are doing MEG, EEG, or both. Unlike compressive sensing domain, columns of Φ will be highly correlated regardless of where sensors are placed.
153 Source Localization Given multiple measurement vectors Y, MAP or Bayesian inference algorithms can be run to estimate X. The estimated nonzero rows should correspond with the location of active brain areas (also called sources). Like compressive sensing, may apply algorithms in appropriate transform domain where row-sparsity assumption holds.
154 Empirical Results 1. Simulations with real brain noise/interference: Generate damped sinusoidal sources Map to sensors using Φ and apply real brain noise, artifacts 2. Data from real-world experiments: Auditory evoked fields from binaurally presented tones (which produce correlated, bilateral activations) Compare localization results using MAP estimation and SBL posterior mean from Bayesian inference
155 MEG Source Reconstruction Example Ground Truth SBL Group Lasso
156 Real Data: Auditory Evoked Field (AEF) SBL Beamformer sloreta Group Lasso
157 Conclusion MEG/EEG source localization demonstrates the effectiveness of Bayesian inference on problems where the dictionary is: Highly overcomplete, meaning m n, e.g., n = 275 and m = 100, 000. Very ill-conditioned and coherent, i.e., columns are highly correlated.
158 Final Thoughts Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving the Sparse Signal Recovery problem can be valuable tools for signal processing practitioners. Rich set of computational algorithms, e.g., Greedy search (OMP) 1 norm minimization (Basis Pursuit, Lasso) MAP methods (Reweighted 1 and 2 methods) Bayesian Inference methods like SBL (show great promise) Potential for great theory in support of performance guarantees for algorithms. Expectation is that there will be continued growth in the application domain as well as in the algorithm development.
159 Thank You
Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationBayesian Methods for Sparse Signal Recovery
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery
More informationBhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego
Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego 1 Outline Course Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational
More informationSparse Signal Recovery: Theory, Applications and Algorithms
Sparse Signal Recovery: Theory, Applications and Algorithms Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Collaborators: I. Gorodonitsky, S. Cotter,
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationLatent Variable Bayesian Models for Promoting Sparsity
1 Latent Variable Bayesian Models for Promoting Sparsity David Wipf 1, Bhaskar Rao 2, and Srikantan Nagarajan 1 1 Biomagnetic Imaging Lab, University of California, San Francisco 513 Parnassus Avenue,
More informationl 0 -norm Minimization for Basis Selection
l 0 -norm Minimization for Basis Selection David Wipf and Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego, CA 92092 dwipf@ucsd.edu, brao@ece.ucsd.edu Abstract
More informationOn the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery
On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery Yuzhe Jin and Bhaskar D. Rao Department of Electrical and Computer Engineering, University of California at San Diego, La
More informationSparsifying Transform Learning for Compressed Sensing MRI
Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois
More informationBayesian Paradigm. Maximum A Posteriori Estimation
Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationSparse & Redundant Signal Representation, and its Role in Image Processing
Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique
More informationNoisy Signal Recovery via Iterative Reweighted L1-Minimization
Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationElaine T. Hale, Wotao Yin, Yin Zhang
, Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationGreedy Signal Recovery and Uniform Uncertainty Principles
Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles
More informationVariational Bayesian Inference Techniques
Advanced Signal Processing 2, SE Variational Bayesian Inference Techniques Johann Steiner 1 Outline Introduction Sparse Signal Reconstruction Sparsity Priors Benefits of Sparse Bayesian Inference Variational
More informationSparse & Redundant Representations by Iterated-Shrinkage Algorithms
Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego,
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationSparse Solutions of an Undetermined Linear System
1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationRecovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm
Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCompressed Sensing: Extending CLEAN and NNLS
Compressed Sensing: Extending CLEAN and NNLS Ludwig Schwardt SKA South Africa (KAT Project) Calibration & Imaging Workshop Socorro, NM, USA 31 March 2009 Outline 1 Compressed Sensing (CS) Introduction
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationRigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems
1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationRobust multichannel sparse recovery
Robust multichannel sparse recovery Esa Ollila Department of Signal Processing and Acoustics Aalto University, Finland SUPELEC, Feb 4th, 2015 1 Introduction 2 Nonparametric sparse recovery 3 Simulation
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationGauge optimization and duality
1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationSignal Recovery, Uncertainty Relations, and Minkowski Dimension
Signal Recovery, Uncertainty Relations, and Minkowski Dimension Helmut Bőlcskei ETH Zurich December 2013 Joint work with C. Aubel, P. Kuppinger, G. Pope, E. Riegler, D. Stotz, and C. Studer Aim of this
More informationSimultaneous Sparsity
Simultaneous Sparsity Joel A. Tropp Anna C. Gilbert Martin J. Strauss {jtropp annacg martinjs}@umich.edu Department of Mathematics The University of Michigan 1 Simple Sparse Approximation Work in the d-dimensional,
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior
More informationInverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France
Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationRecovery Guarantees for Rank Aware Pursuits
BLANCHARD AND DAVIES: RECOVERY GUARANTEES FOR RANK AWARE PURSUITS 1 Recovery Guarantees for Rank Aware Pursuits Jeffrey D. Blanchard and Mike E. Davies Abstract This paper considers sufficient conditions
More informationSimultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors
Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Sean Borman and Robert L. Stevenson Department of Electrical Engineering, University of Notre Dame Notre Dame,
More informationA Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases
2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary
More informationOne Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017
One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 1 One Picture and a Thousand Words Using Matrix Approximations Dianne P. O Leary
More informationConditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint
Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationGeneralized Orthogonal Matching Pursuit- A Review and Some
Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationwissen leben WWU Münster
MÜNSTER Sparsity Constraints in Bayesian Inversion Inverse Days conference in Jyväskylä, Finland. Felix Lucka 19.12.2012 MÜNSTER 2 /41 Sparsity Constraints in Inverse Problems Current trend in high dimensional
More informationCSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1
CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.
More informationPerspectives on Sparse Bayesian Learning
Perspectives on Sparse Bayesian Learning David Wipf, Jason Palmer, and Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego, CA 909 dwipf,japalmer@ucsd.edu,
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationInverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.
Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationIEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE 2010 2935 Variance-Component Based Sparse Signal Reconstruction and Model Selection Kun Qiu, Student Member, IEEE, and Aleksandar Dogandzic,
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationLecture: Introduction to Compressed Sensing Sparse Recovery Guarantees
Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationInferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016
Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent
More informationLearning with L q<1 vs L 1 -norm regularisation with exponentially many irrelevant features
Learning with L q
More informationThe Analysis Cosparse Model for Signals and Images
The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationDetecting Sparse Structures in Data in Sub-Linear Time: A group testing approach
Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline
More informationPre-weighted Matching Pursuit Algorithms for Sparse Recovery
Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie
More informationA hierarchical Krylov-Bayes iterative inverse solver for MEG with physiological preconditioning
A hierarchical Krylov-Bayes iterative inverse solver for MEG with physiological preconditioning D Calvetti 1 A Pascarella 3 F Pitolli 2 E Somersalo 1 B Vantaggi 2 1 Case Western Reserve University Department
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationSensing systems limited by constraints: physical size, time, cost, energy
Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original
More informationNonconvex penalties: Signal-to-noise ratio and algorithms
Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex
More informationUniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit
Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit arxiv:0707.4203v2 [math.na] 14 Aug 2007 Deanna Needell Department of Mathematics University of California,
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationType II variational methods in Bayesian estimation
Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093
More information