Bayesian Nonparametric Dictionary Learning for Compressed Sensing MRI

Size: px
Start display at page:

Download "Bayesian Nonparametric Dictionary Learning for Compressed Sensing MRI"

Transcription

1 1 Bayesian Nonparametric Dictionary Learning for Compressed Sensing MRI Yue Huang, John Paisley, Qin Lin, Xinghao Ding, Xueyang Fu and Xiao-ping Zhang arxiv: v2 [cs.cv] 9 Oct 2013 Abstract We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRI) from highly undersampled k-space data. Our model uses the beta process as a nonparametric prior for dictionary learning, in which an image patch is a sparse combination of dictionary elements. The size of the dictionary and the patch-specific sparsity pattern is inferred from the data, in addition to all dictionary learning variables. Dictionary learning is performed as part of the image reconstruction process, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model. We derive a stochastic optimization algorithm based on Markov Chain Monte Carlo (MCMC) sampling for the Bayesian model, and use the alternating direction method of multipliers (ADMM) for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods. Index Terms Bayesian nonparametrics, dictionary learning, compressed sensing, magnetic resonance imaging I. INTRODUCTION Magnetic resonance imaging (MRI) is a widely used technique for visualizing the structure and functioning of the body. A limitation of MRI is its slow scan speed during data acquisition. Therefore, methods for accelerating the MRI process have received much research attention. Recent advances in signal reconstruction from measurements sampled below the Nyquist rate, called compressed sensing (CS) [1][2], have had a major impact on MRI [3]. CS-MRI allows for significant undersampling in the Fourier measurement domain of MR images (called k-space), while still outputting a high-quality image reconstruction. While image reconstruction using this undersampled data is a case of an ill-posed inverse problem, compressed sensing theory has shown that it is possible to reconstruct a signal from significantly fewer measurements than mandated by traditional Nyquist sampling if the signal is sparse in a particular transform domain. Motivated by the need to find a sparse domain for signal representation, a large body of literature now exists on reconstructing MRI from significantly undersampled k-space Yue Huang, Qin Lin, Xinghao Ding and Xueyang Fu are with the Department of Communications Engineering at Xiamen University in Xiamen, Fujian, China. John Paisley is with the Department of Electrical Engineering at Columbia University in New York, NY, USA. Xiao-ping Zhang is with the Department of Electrical and Computer Engineering at Ryerson University in Toronto, Canada. This work supported by the National Natural Science Foundation of China (Nos , , , ), the Fundamental Research Funds for the Central Universities (Nos , ) and the Natural Science Foundation of Fujian Province of China (No. 2012J05160). Equal contributions. Corresponding author: dxh@xmu.edu.cn data. Existing improvements in CS-MRI mostly focus on (i) seeking sparse domains for the image, such as contourlets [5][6]; (ii) using approximations of the l 0 norm for better reconstruction performance with fewer measurements, for example l 1, FOCUSS, l p quasi-norms with 0 < p < 1, or using smooth functions to approximate the l 0 norm [7] [10]; and (iii) accelerating image reconstruction through more efficient optimization techniques []. In this paper we present a modeling framework that is similarly motivated. CS-MRI reconstruction algorithms tend to fall into two categories: Those which enforce sparsity directly within some image transform domain [3] [16], and those which enforce sparsity in some underlying latent representation of the image, such as a dictionary learning representation [17] [20]. Most CS-MRI reconstruction algorithms belong to the first category. For example Sparse MRI [3], the leading study in CS-MRI, performs MR image reconstruction by enforcing sparsity in both the wavelet domain and the total variation (TV) of the reconstructed image. Algorithms with image-level sparsity constraints such as Sparse MRI typically employ an off-theshelf basis, which can usually capture only one feature of the image. For example, wavelets recover point-like features, while contourlets recover curve-like features. Since MR images contain a variety of underlying features, such as edges and textures, using a basis not adapted to the image can be considered a drawback of the algorithms in this group. Finding a sparse basis that is suited to the image at hand can benefit MR image reconstruction, since CS theory shows that the required number of measurements is linked to the sparsity of the signal in the selected transform domain. Using a standard basis not adapted to the image under consideration will likely not provide a representation that can compete in sparsity with an adapted basis. To this end, dictionary learning, which falls in the second group of algorithms, learns a sparse basis on image subregions called patches that is adapted to the image class of interest. Recent studies in the image processing literature have shown that dictionary learning is an effective means for finding a sparse representation of an image on the patch-level [22] [24], [29]. These algorithms learn a patchlevel basis (i.e., dictionary) by exploiting structural similarities between patches extracted from images within a class of interest (for example BM3D [22], MOD [23] and K-SVD [24]). Among these approaches, adaptive dictionary learning where the dictionary is learned directly on the image being considered based on patch-level sparsity constraints usually outperforms analytical dictionary approaches in denoising, super-resolution reconstruction, interpolation, inpainting, classification and other applications, since the adaptively learned

2 2 dictionary suits the signal of interest [23] [26]. Dictionary learning has been applied to CS-MRI as a sparse basis for reconstruction (e.g., LOST [18] and DLMRI [19]). With these methods, parameters such as the dictionary size and patch sparsity are preset, and algorithms are considered that are non-bayesian. In this paper, we consider a new dictionary learning algorithm for CS-MRI that is based on Bayesian nonparametric statistics. Specifically, we consider the beta process as a nonparametric prior for a dictionary learning model that provides the sparse representation necessary for CS-MRI reconstruction. The beta process is a method for generating measures on infinite parameter spaces that can be employed in latent factor models [30][31]; in this case the latent factors are the dictionary elements and the measure is a value in (0, 1] that gives the corresponding activation probability. While the dictionary is theoretically infinite in size, through posterior inference the beta process learns a representation that is both sparse in dictionary size and in the dictionary usage for any given patch. The proposed Bayesian nonparametric model gives an alternative approach to dictionary learning for CS-MRI reconstruction to those previously considered. We derive a Markov Chain Monte Carlo (MCMC) sampling algorithm for stochastic optimization of the dictionary learning variables in the objective function. In addition, we consider including a sparse total variation (TV) penalty, for which we perform efficient optimization using the alternating direction method of multipliers (ADMM). We organize the paper as follows. In Section II we review CS-MRI inversion methods and the beta process for dictionary learning. In Section III, we describe the proposed regularization framework and optimization algorithm. We then show the advantages of the proposed Bayesian nonparametric regularization framework on several CS-MRI problems in Section IV. II. BACKGROUND AND RELATED WORK We use the following notation: Let x R N be a N N MR image in vectorized form. Let F u C u N, u < N, be the undersampled Fourier encoding matrix and y = F u x represent the sub-sampled set of k-space measurements. The goal is to estimate x from the small fraction of k-space measurements y. For dictionary learning, let R i be the ith patch extraction matrix. That is, R i is a P N matrix of all zeros except for a one in each row that extracts a vectorized P P patch from the image, R i x R P for i = 1,..., N. We work with overlapping image patches with a shift of one pixel and allow a patch to wrap around the image at the boundaries for mathematical convenience [19][26]. A. Two approaches to CS-MRI inversion We focus on CS-MRI inversion via optimizing an unconstrained function of the form arg min x h(x) + λ 2 F ux y 2 2, (1) where F u x y 2 2 is a data fidelity term, λ > 0 is a parameter and h(x) is a regularization function that controls properties of the image we want to reconstruct. As discussed in the introduction, the function h can take several forms, but tends to fall into one of two categories according to whether imagelevel or patch-level information is considered. We next review these two approaches. 1) Image-level sparse regularization: CS-MRI with an image-level, or global regularization function h g (x) is one in which sparsity is enforced within a transform domain defined on the entire image. For example, in Sparse MRI [3] the regularization function is h g (x) = W x 1 + µ T V (x), (2) where W is the wavelet basis and T V (x) is the total variation (spatial finite differences) of the image. Regularizing with this function requires that the image be sparse in the wavelet domain, as measured by the l 1 norm of the wavelet coefficients W x 1, which acts as a surrogate for l 0 [1][2]. The total variation term enforces homogeneity within the image by encouraging neighboring pixels to have similar values while allowing for sudden high frequency jumps at edges. The parameter µ > 0 controls the trade-off between the two terms. Various other definitions of h g (x) have also been proposed for MRI reconstruction, which we briefly summarize. Examples are over-complete contourlets [5], a combination of wavelets, contourlets and TV [6], and regularization of wavelet coefficient correlations based on Gaussian scale mixtures [4]. Other methods replace the l 1 norm with approximations of the l 0 norm, for example FOCUSS [9][10], l p norms [8], and homotopic l 0 minimization [7]. Numerical algorithms for optimizing (1) with an image-level h g (x) include nonlinear conjugate gradient descent with backtracking line search [3], an operator-splitting algorithm (TVCMRI) [11] and a variable splitting method (RecPF) [21]. Both TVCMRI and RecPF can replace iterative linear solvers with Fourier domain computations, with substantial time savings. Other methods in the literature include a combination of variable and operator splitting techniques [13], a fast composite splitting algorithm (FCSA) [], a contourlet transform with iterative soft thresholding [5], a combination of Gaussian scale mixture model with iterative hard thresholding [4], a variation on Bregman operator splitting (BOS) [15] and alternating proximal minimization applied to the TV-based SENSE problem [16]. The above algorithms generally employ variable and operator splitting techniques with the FFT and alternating minimization to simplify the object function. In this work, we follow a similar approach for total variation minimization. 2) Patch-level sparse regularization: An alternative to the image-level sparsity constraint h g (x) is a patch-level, or local regularization function h l (x), which enforces sparsity in a transform domain defined on patches (square sub-regions of the image) extracted from the full image. An example of such a regularization function is, h l (x) = i γ 2 R ix Dα i f(α i, D), (3) where the dictionary matrix is D R P K and α i is a K- dimensional vector. An important difference between h l (x)

3 3 and h g (x) is the additional function f(α i, D). While imagelevel sparsity constraints fall within a predefined transform domain, such as the wavelet basis, the sparse transform domain can be unknown for patch-level regularization and learned from data. The function f enforces sparsity by learning a D for which α i is sparse. 1 For example, [19] uses K-SVD to learn D off-line, and then approximately optimize the objective function arg min α 1:N R i x Dα i 2 2 subject to α i 0 T, i, (4) i using orthogonal matching pursuits (OMP) [25]. (Note that this objective can be written using f(α i, D) = κ i α i 0 for some κ i > 0.) In this case, the extra parameters α i are included in the objective function (1), and so the problem is no longer convex. Using this definition of h l (x) in (1), a local optimal solution can be found by an alternating minimization procedure: first solve the least squares solution for x using the current values of α i and D, and then update α i and D, or only α i if D is learned off-line. The dictionary learning step can be thought of as a denoising procedure. That is, the combination of each Dα i in effect produces a denoised proposal reconstruction for x, after which the reconstruction takes into account the squared error from this smooth proposal and from the sub-sampled k-space, with weight determined by the regularization parameters. Aside from sparse dictionary learning, other patch-level algorithms have been reported. For example, regularization of patches in a spatial region with a robust distance metric [17], patch clustering followed by de-aliasing and artifact removal for reconstruction using 3DFFT (LOST) [18] or directional wavelets [20]. These methods each take into account similarities between image patches in determining the dictionary. Next, we review our method for dictionary learning by using a Bayesian nonparametric prior called the beta process. B. Dictionary learning with beta process factor analysis Typical dictionary learning approaches require a predefined dictionary size and, for each patch, the setting of either a sparsity level T, or an error threshold ɛ to determine how many dictionary elements are used. In both cases, if the settings do not agree with ground truth, the performance can significantly degrade. Instead, we consider a Bayesian nonparametric method called beta process factor analysis (BPFA) [27], which has been shown to successfully infer both of these values, as well as have competitive performance with algorithms in several application areas [27] [29], and see [] [39] for related algorithms. The beta process is driven by an underlying Poisson process, and so it s properties as a stochastic process for Bayesian modeling are well understood [30]. Originally used for survival analysis in the statistics literature, its use for latent factor modeling has been significantly increasing within the machine learning field [27] [29],[31],[37] [39]. Being a Bayesian method, the prior definition of our proposed model gives a way (in principle) of generating images. Writing the generative method for BPFA gives an informative 1 We ve suppress this dependence on α and D in h l (x). Algorithm 1 Generating an image with BPFA 1) Construct a dictionary D = [d 1,..., d K ]: d k N(0, P 1 I P ), k = 1,..., K. 2) Draw a probability π k [0, 1] for each d k : π k Beta(cγ/K, c(1 γ/k)), k = 1,..., K. 3) Draw precision values for noise and each weight γ ε Gamma(g 0, h 0 ), γ s,k Gamma(e 0, f 0 ). 4) For the ith patch in x: a) Draw the vector s i N(0, diag(γ 1 sk )). b) Draw the binary vector z i with z ik Bernoulli(π k ). c) Define α i = s i z i by an element-wise product. d) Construct the patch R i x = Dα i + ε i with noise ε i N(0, γε 1 I P ). 5) Construct the image x as the average of all R i x that overlap on a given pixel. picture of what the algorithm is doing and what assumptions are being made. 2 To construct an image with the proposed model, we use the generative structure given in Algorithm 1. With this approach, the model constructs a dictionary matrix D R P K of i.i.d. random variables, and assigns probability π k to vector d k. The parameters for these probabilities are set such that most of the π k are expected to be small, with a few large. In Algorithm 1 we use an approximation to the beta process; for a fixed c > 0 and γ > 0, convergence is guaranteed as K [30][28]. Under this parameterization, each patch R i x extracted from the image x is modeled as a sparse weighted combination of the dictionary elements, as determined by the element-wise product of z i {0, 1} K with the Gaussian vector s i. What makes the model nonparametric is that for many values of k, the values of z ik will equal zero for all i; the model learns the number of these unused dictionary elements and their index values from the data. The independent Bernoulli random variables ensure values of zero for the kth element of each z i when π k is very small, and thereby eliminates d k from the model. Therefore, the value of K should be set to a large number that is more than the expected size of the dictionary. It can be shown that under the assumptions of this prior, in the limit K, the number of dictionary elements used by a patch is Poisson(γ) distributed and the total number of dictionary elements used by the data grows like cγ ln(c + N), where N is the number of patches [31]. 1) Relationship to K-SVD: Another widely used dictionary learning method is K-SVD [24]. Though they are models for the same problem, BPFA and K-SVD have some significant differences that we briefly discuss. K-SVD learns the sparsity pattern of the coding vectror α i using the OMP algorithm 2 The model has an equivalent representation as an optimization procedure over an analytical objective function, but the result is less informative.

4 4 TABLE I PEAK SIGNAL-TO-NOISE RATIO (PSNR) FOR IMAGE DENOISED BY BPFA AND K-SVD. PERFORMANCE IS COMPARABLE WHEN THE NOISE PARAMETER OF K-SVD IS CORRECT (MATCH). BPFA OUTPERFORMS K-SVD WHEN THIS SETTING IS WRONG (MISMATCH). σ 2 K-SVD denoising (PSNR) BPFA denoising (PSNR) Match Mismatch Results Learned noise [25] for each i. Holding the sparsity fixed, it then updates each dictionary element and dimension of α jointly by a rank one approximation to the residual. BPFA on the other hand updates the sparsity pattern by generating from a beta posterior distribution and generates weights and the dictionary from Gaussian posterior distributions using Bayes Rule. Because of this probabilistic structure, we derive a sampling algorithm for these variables that takes advantage of marginalization, and naturally learns the auxiliary variables γ ε and γ s,k. 2) Example denoising problem: We briefly illustrate BPFA on a denoising problem using 6 6 patches extracted from a image and setting K = 108. In Figures 1(a) and 1(b) we show the noisy and denoised images. In Figures 1(c) and 1(d) we show some statistics from dictionary learning. For example, Figure 1(c) shows the values of π k sorted, where we see that fewer than 100 elements are used by the data. Figure 1(d) shows the empirical distribution of the number of elements per patch, where we see the ability of the model to adapt the sparsity to the patch. In Table I we show PSNR results for three noise variance levels. For K- SVD, we consider the case when the error parameter matches the ground truth, and when it mismatches it by a magnitude of five. As expected, when K-SVD does not have an appropriate setting of this value the performance suffers. BPFA on the other hand can adaptively infer the noise variance which leads to an improvement in denoising. III. CS-MRI WITH BPFA AND TV PENALTY We next present our regularization scheme for reconstructing MR images from highly undersampled k-space data. In reference to the discussion in Section II, we consider a sparsity constraint of the form h g (x) := T V (x), arg min x,ϕ λ gh g (x) + h l (x) + λ 2 F ux y 2 2, (5) h l (x) := i γ ε 2 R ix Dα i f(ϕ i ). For the local regularization function h l (x) we use BPFA as given in Algorithm 1 in Section II-B. The parameters to be optimized for this penalty are contained in the set ϕ i = {D, s i, z i, γ ε, γ s, π}, and are defined in Algorithm 1. The regularization term γ ε is a model variable that corresponds to an inverse variance parameter of the multivariate Gaussian likelihood. This likelihood is equivalently viewed as the squared error penalty term in (5). This term acts as the sparse basis for the image and also aids in producing a denoised reconstruction, as discussed in Section II-B. (We (a) Noisy image (c) Dictionary probabilities (b) Denoising by BPFA (d) Dictionary elements per patch Fig. 1. An example of denoising by BPFA. (c) Shows the final probabilities of the dictionary elements and (d) shows a distribution on the number of dictionary elements per patch. indicate how to construct the analytical form of f in the appendix.) For the global regularization function h g (x) we use the total variation of the image. This term encourages homogeneity within contiguous regions of the image, while still allowing for sharp jumps in pixel value at edges due to the underlying l 1 penalty. The regularization parameters λ g, γ ε and λ control the trade-off between the terms in this optimization, which is adaptively learned since γ ε changes with each iteration. For the total variation penalty T V (x) we use the isotropic TV model. Let ψ i be the 2 N difference operator for pixel i. Each row of ψ i contains a 1 centered on pixel i, and 1 on the pixel directly above pixel i (for the first row of ψ i ) or to the right (for the second row of ψ i ), and zeros elsewhere. Let Ψ = [ψ1 T,..., ψn T ]T be the resulting 2N N difference matrix for the entire image. The TV coefficients are β = Ψx R 2N, and the isotropic TV penalty is T V (x) = i ψ ix 2 = i β2i β2 2i, where i ranges over the pixels in the MR image. For optimization we use the alternating direction method of multipliers (ADMM) [][33]. ADMM works by performing dual ascent on the augmented Lagrangian objective function introduced for the total variation coefficients. For completeness, we give a brief review of ADMM in the appendix. A. Algorithm We present an algorithm for finding a local optimal solution to the non-convex objective function given in (5). We can write this objective as L(x, ϕ) = λ g i ψ ix 2 + i γ ε 2 R i x Dα i i f(ϕ i) + λ 2 F ux y 2 2. (6)

5 5 We seek to minimize this function with respect to x and the dictionary learning variables ϕ i = {D, s i, z i, γ ε, γ s, π}. Our first step is to put the objective into a more suitable form. We begin by defining the TV coefficients for the ith pixel as β i := [β 2i 1 β 2i ] T = ψ i x. We introduce the vector of Lagrange multipliers η i, and then split β i from ψ i x by relaxing the equality via an augmented Lagrangian. This results in the objective function L(x, β, η, ϕ) = λ g i β i 2 + ηi T (ψ ix β i ) + ρ 2 ψ ix β i γ ε i 2 R i x Dα i f(ϕ i ) + λ 2 F ux y 2 2. (7) From the ADMM theory, this objective will have (local) optimal values β i and x with β i = ψ i x, and so the equality constraints will be satisfied. 3 Optimizing this function can be split into three separate sub-problems: one for TV, one for BPFA and one for updating the reconstruction x. Following the discussion of ADMM in the appendix, we define u i = (1/ρ)η i and complete the square in the first line of (7). We then cycle through the following three sub-problems, (P 1) β i = arg min β λ g β 2 + ρ 2 ψ ix β + u i 2 2, i, (P 2) ϕ = arg min ϕ i γ ε 2 R i x Dα i f(ϕ i ), (P 3) x ρ = arg min x i 2 ψ ix β i + u i γ ε i 2 R i x D α i λ 2 F ux y 2 2, u i = u i + ψ i x β i, i = 1,..., N. For each sub-problem, we use the most recent values of all other parameters. Solutions for P 1 and P 3 are globally optimal and in closed form, while the update for u i follows from ADMM. Since P 2 is non-convex, we cannot perform the desired minimization, and so an approximation is required. Furthermore, this problem requires iterating through the several dictionary learning variables of BPFA, and so a local optimal solution cannot be given either. Our approach is to use stochastic optimization for problem P 2 by Gibbs sampling each variable in BPFA conditioned on current values of all other variables. We next present the updates for each sub-problem, and give an outline in Algorithm 2. 1) Algorithm for P1 (total variation): We can solve for β i exactly for each pixel i = 1,..., N by using a generalized shrinkage operation [], β i = max { ψ i x + u i 2 λ g ρ, 0 } ψ i x + u i ψ i x + u i 2. (8) We recall that β i corresponds to the 2-dimensional TV coefficients for pixel i, with differences in one direction vertically and horizontally. These coefficients have been been split from ψ i x using ADMM, but gradually converge to one another and become equal in the limit. We recall that after updating x, we update the Lagrange multiplier u i = u i + ψ i x β i. 3 We note that for a fixed D and α 1:N, the solution is also globally optimal. Algorithm 2 Outline of algorithm Input: y undersampled k-space data Output: x reconstructed MR image Step 1. Initialize x = F H u y (zero filling), and u = 0. Initialize BPFA variables using x. Step 2. Solve P 1 sub-problem by optimizing β via shrinkage. Step 3. Update P 2 sub-problem by Gibbs sampling BPFA variables. Step 4. Solve P 3 sub-problem in Fourier domain, followed by inverse transform. Step 5. Update Lagrange multiplier vector u. if not converged then return to Step 2. 2) Algorithm for P2 (BPFA): We update the parameters of BPFA using Gibbs sampling. We are therefore stochastically optimizing (7), but only for this sub-problem. With reference to Algorithm 1, the P2 sub-problem entails sampling new values for the dictionary D, the binary vectors z i and weights s i, with which we construct α i = s i z i through the element-wise product, the precisions γ ε and γ sk, and the beta probabilities π 1:K, which give the probability that z ik = 1. In principle, there is no limit to the number of samples that can be made, with the final sample giving the updates used in the other sub-problems. We found that a single sample is sufficient in practice and leads to a faster algorithm. The samples we make are given below. a) Sample dictionary D: We define the P N matrix X = [R 1 x,..., R N x], which is the matrix of all vectorized patches extracted from the image x. We also define the K N matrix α = [α 1,..., α N ] containing the dictionary weight coefficients for the corresponding columns in X such that Dα is an approximation of X prior to additive Gaussian noise. The update for the dictionary D is E p,: D = Xα T (αα T + (P/γ ε )I P ) 1 + E, (9) ind N(0, (γ ε αα T + P I P ) 1 ), p = 1,..., P. We note that the first term in Equation (9) is the l 2 -regularized least squares solution for D. Correlated Gaussian noise is then added to generate a sample from the conditional posterior of D. Since both the number of pixels and γ ε will tend to be very large, the variance of the noise is small and the mean term dominates the update for D. b) Sample sparse coding α i : Sampling α i entails sampling s ik and z ik for each k. We sample these values using block sampling. We recall that to block sample two variables from their joint conditional posterior distribution, (s, z) p(s, z ), one can first sample z from the marginal distribution, z p(z ), and then sample s z p(s z, ) from the conditional distribution. The other sampling direction is possible as well, but for our problem sampling z s z is more efficient in finding a mode of the objective function. We define r i, k to be the residual error in approximating the ith patch with the current values from BPFA minus the kth dictionary element, r i, k = R i x j k (s ijz ij )d j. We then sample z ik from its conditional posterior Bernoulli distribution

6 6 z ik p ik δ 1 + (1 p ik )δ 0, where following a simplification, ( p ik π k 1 + (γε /γ sk )d T ) 1 k d 2 k (10) { γε } exp 2 (dt k r i, k ) 2 /(γ sk /γ ε + d T k d k ), 1 p ik 1 π k. (11) We observe that the probability that z ik = 1 takes into account how well dictionary element d k correlates with the residual r i, k. After sampling z ik we sample the corresponding weight s ik from its conditional posterior Gaussian distribution, ( ) d T k r i, k s ik z ik N z ik γ sk /γ ε + d T, (γ sk + γ ε z ik d T k d k ) 1. k d k (12) When z ik = 1, the mean of s ik is the regularized least squares solution and the variance will be small if γ ε is large. When z ik = 0, s ik is sampled from the prior. 4 c) Sample γ ε and γ sk : We next sample from the conditional gamma posterior distribution of the noise precision and weight precision, γ ε Gamma ( g P N, h i R ix Dα i 2) 2, (13) γ sk Gamma(e i z ik, f i z iks 2 ik ). (14) The expected value of each variable is the first term of the distribution divided by the second, which is close to the inverse of the average empirical error for γ ε. d) Sample π k : The conditional posterior of π k is a beta distribution sampled as follows, π k Beta (a 0 + i z ik, b 0 + i (1 z ik)). (15) The parameters to the beta distribution include counts of how many times dictionary element d k was used by a patch. 3) Algorithm for P3 (MRI reconstruction): The final subproblem is to reconstruct the image x. Our approach takes advantage of the Fourier domain similar to other methods, e.g. [33]. The corresponding objective function is x = arg min x i ρ 2 ψ ix β i + u i i + λ 2 F ux y 2 2. γ ε 2 R ix Dα i 2 2 Since this is a least squares problem, x has a closed form solution that satisfies ( ρψ T Ψ + γ ε i RT i R i + λf H u F u ) x = (16) ρψ T (β u) + γ ε P x BPFA + λf H u y. We recall that Ψ is the matrix of stacked ψ i. The vector β is also obtained by stacking each β i, and similarly u is the vector formed by stacking u i. The vector x BPFA is the proposed reconstructed image from BPFA using the current D and α 1:N, which results from the equality P x BPFA = i RT i Dα i. We observe that inverting the left N N matrix is computationally prohibitive, since N is the number of pixels in the image. Fortunately, given the form of the matrix in Equation 4 We note that the value of s ik does not factor into the model in this case, since s ik z ik = 0 and s ik is integrated out the next time z ik is sampled. Fig. 2. Example masks used to undersample k-space. (left) Cartesian mask, (right) radial mask. (16) we can simplify the problem by working in the Fourier domain, which allows for element-wise updates in k-space, followed by an inverse Fourier transform. We represent x as x = F H θ, where θ is the Fourier transform of x and H denotes the conjugate transpose. We then take the Fourier transform of each side of Equation (16) to give F ( ρψ T Ψ + γ ε i RT i R i + λf H u F u ) F H θ = (17) ρfψ T (β u) + γ ε FP x BPFA + λff H u y. The left-hand matrix simplifies to a diagonal matrix, F ( ρψ T Ψ + γ ε i RT i R i + λf H u F u ) F H = (18) ρλ + γ ε P I N + λi u N. Term-by-term this results as follows: The product of the finite difference operator matrix Ψ with itself yields a circulant matrix, which has the rows of the Fourier matrix F as its eigenvectors and eigenvalues Λ = FΨ T ΨF H. The matrix Ri T R i is a matrix of all zeros, except for ones on the diagonal entries that correspond to the indices of x associated with the ith patch. Since each pixel appears in P patches, the sum over i gives P I N, and the Fourier product cancels. The final diagonal matrix IN u also contains all zeros, except for ones along the diagonal corresponding to the indices in k-space that are measured, which results from FFu H F u F H. Since the left matrix is diagonal we can perform elementwise updating of the Fourier coefficients θ, θ i = ρf iψ T (β u) + γ ε P F i x BPFA + λf i Fu H y ρλ ii + γ ε P + λf i Fu H. (19) 1 We observe that the rightmost term in the numerator and denominator equals zero if i is not a measured k-space location. We invert θ via the inverse Fourier transform F H to obtain the reconstructed MR image x. B. Discussion on λ We note that a feature of dictionary learning approaches is that λ can be allowed to go to infinity, and so parameter selection isn t necessary here. This is because a denoised reconstruction of the image is obtained through the dictionary learning reconstruction. In reference to Equation (19), we observe that in this case we are fixing the measured k-space values and using the k-space projection of BPFA and TV to fill in the missing values.

7 7 (a) Circle of Willis (b) Lumbar (a) Zero filling (b) BPFA reconstruction (c) Shoulder (d) Brain Fig. 3. Ground truth images considered in the experiments ( ). IV. EXPERIMENTS AND DISCUSSION We present experimental results on synthetic data and the MRI shown in Figure 3. We consider a variety of sampling rates and masks, and compare with four other algorithms: SparseMRI [3], PBDW [19], TV [33] and DLMRI [18]. We use the publicly available code for these algorithms and tried several parameter settings, selecting the best ones for comparison. We also compare with BPFA without using total variation, which is a special case of our algorithm with λ g = 0. A. Set-up We consider two sampling trajectories in k-space corresponding to the two practical approaches to CS-MRI: Cartesian sampling with random phase encodes and radial sampling. We also considered random sampling and found comparable results, with reconstruction improved for each algorithm, as expected from CS theory. Since this is not a practical sampling method we omit these results. In the first scheme, measurement trajectories are sampled from a variable density Cartesian grid and in the second we measure along radial lines uniformly spaced in angle. We show examples of these trajectories in Figure 2. We considered several subsampling rates for each trajectory, measuring 10%, 20%, 25%, 30%, and 35% of k- space. As a performance measure we use the peak signal-tonoise ratio (PSNR) to the ground truth image, in addition to showing qualitative performance comparisons. For all images, we extract 6 6 patches where each pixel defines the upper left corner of a patch and wrap around the image at the boundaries. For the synthetic data we learn (c) BPFA denoising (f) Dictionary probabilities (e) Dictionary (magnitude) (d) Total variation denoising (g) Dictionary elements per patch Fig. 4. GE data with noise (σ = ) and 30% Cartesian sampling. The BPFA dictionary learning model: (b) reconstructs the original noisy image, and (c) denoises the reconstruction in unison using two versions of the image in the reconstruction. (d) Total variation minimization reconstructs and denoises one image. Also shown are the dictionary learning variables sorted by π k : (e) the dictionary, (f) the distribution on the dictionary, π k. (g) The normalized histogram of number of the dictionary elements used per patch.

8 8 (a) BPFA+TV (b) BPFA (c) TV (d) DLMRI (e) PBDW DLMRI BPFA BPFA+TV SparseMRI PBDW TV (f) SparseMRI (g) Zero filling (h) Cartesian results: PSNR vs sample % (i) Radial results: PSNR vs sample % Fig. 5. Circle of Willis MRI: (a) (f) Example reconstructions for Cartesian sampling of 25% of k-space, and (g) zero-filling. (h) PSNR results for Cartesian sampling. (i) PSNR results for radial sampling. (a) BPFA+TV (b) BPFA (c) TV (d) DLMRI (e) PBDW DLMRI BPFA BPFA+TV SparseMRI PBDW TV (f) SparseMRI (g) Zero filling (h) Cartesian results: PSNR vs sample % (i) Radial results: PSNR vs sample % Fig. 6. Lumbar MRI: (a) (f) Example reconstructions for radial sampling of 20% of k-space, and (g) zero-filling. (h) PSNR results for Cartesian sampling. (i) PSNR results for radial sampling.

9 9 (a) BPFA+TV (b) BPFA (c) TV (d) DLMRI (e) PBDW DLMRI BPFA BPFA+TV SparseMRI PBDW TV (f) SparseMRI (g) Zero filling (h) Cartesian results: PSNR vs sample % (i) Radial results: PSNR vs sample % Fig. 7. Shoulder MRI: (a) (f) Example reconstructions for Cartesian sampling of 20% of k-space, and (g) zero-filling. (h) PSNR results for Cartesian sampling. (i) PSNR results for radial sampling. (a) BPFA+TV (b) BPFA (c) TV (d) DLMRI (e) PBDW DLMRI BPFA BPFA+TV SparseMRI PBDW TV (f) SparseMRI (g) Zero filling (h) Cartesian results: PSNR vs sample % (i) Radial results: PSNR vs sample % Fig. 8. Brain MRI: (a) (f) Example reconstructions for radial sampling of 10% of k-space, and (g) zero-filling. (h) PSNR results for Cartesian sampling. (i) PSNR results for radial sampling.

10 10 complex-valued dictionaries, while for the MRI data we restrict the model to real-valued dictionaries. We initialize x by zero-filling in k-space. We use a dictionary with K = 108 initial dictionary elements, recalling that the final number of dictionary elements will be smaller due to the sparse BPFA prior. If 108 is found to be too small, K can be increased with the result being a slower inference algorithm. (In principle K can be infinitely large.) We ran 1000 iterations of the algorithm and show the results of the last iteration. For regularization parameters of our model, we set the data fidelity regularization λ = We are therefore treating λ as effectively being infinity and allowing BPFA to fill in the missing k-space and denoise, as discussed in Section III-B. We also set λg = 10 and ρ = For BPFA we set c = 1, e0 = f0 = 1, γ = 5, g0 = 0.5N 2 /10, h0 = g0 v/8 where v is the empirical variance of the initialization. C. Experiments on MRI We next evaluate the performance of our algorithm using the MRI shown in Figure 3. As mentioned, we compare our algorithm with Sparse MRI [3], which is a combination of wavelets and total variation, TV [33] using the isotropic model, DLMRI [18], which is a dictionary learning model based on K-SVD, and PBDW [19], which is patch-based method that uses directional wavelets, and therefore places greater restrictions on the dictionary. In all algorithms, we considered several parameter settings and picked the best results for comparison. In addition we consider our algorithm with and without the total variation penalty, denoted BPFA+TV and BPFA, respectively. 1) Reconstruction results: We present quantitative and qualitative results for the reconstruction algorithms in Figures 5 8. In these figures, we show the peak signal to noise ratio (PSNR) for Cartesian and radial sampling as a function of percentage sampled in k-space. We see that the proposed Bayesian nonparametric dictionary learning method gives an (a) BPFA+TV B. Experiments on simulated data In Figure 4 we show results on the GE phantom with additive noise having standard deviation σ =. In this experiment we use BPFA without TV to reconstruct the original image using 30% Cartesian sampling. We show the reconstruction using zero-filling in Figure 4(a). Since λ = 10100, we see in Figure 4(b) that BPFA essentially helps reconstruct the underlying noisy image for x. However, using the denoising property of the BPFA model shown in Figure 1, we obtain the denoised reconstruction of Figure 4(c) by focusing on xbpfa from Equation (16). This is in contrast with the best result we could obtain with TV in Figure 4(d), which places the sparse penalty directly on the reconstructed image. For TV the value of λ relative to the regularization parameter becomes significant. We set λ = 1 and swept through values in (0, 5) for the TV regularization parameter. Similar to Figure 1 we show some statistics from the BPFA model in Figures 4(e)-(g). Roughly 80 dictionary elements were used, and an average of 2.28 elements were used by a patch given that at least one was used (which discounts the black regions). 5 0 (b) PBDW (c) DLMRI 0 (d) Sparse MRI Fig. 9. Absolute errors for 20% radial sampling of the shoulder MRI. improved reconstruction. We also see additional slight improvement when a TV penalty is added, though this is not always the case. Given the denoising property of dictionary learning, this is perhaps not surprising. We also observe that radial sampling performed better than Cartesian sampling in all experiments. We again note that we performed similar experiments using random sampling and observed similar relative results with an overall improvement compared with radial sampling, but we omit these results for space, and because random sampling is not practical for MRI. In each figure we also show example reconstructions for the algorithms considered, including zero-filling in k-space. In some MRI, such as the Circle of Willis in Figure 5, the improvement is less in the structural information and more in image quality. In other MRI, the proposed method is able to capture structure in the image that is missed by the other algorithms. In Figure 6(a) we indicate one of these regions for the shoulder MRI. In Figure 9 we show the residual errors (in absolute value) for several algorithms on the shoulder MRI. (Note that these images correspond to a different sampling pattern than in Figure 7.) In this example we see that the errors for BPFA are more noise-like than for the other algorithms. The proposed method has several advantages, which we believe leads to the improvement in performance. A significant advantage is the adaptive learning of the dictionary size and per-patch sparsity level using a nonparametric stochastic process that is naturally suited for this problem. In addition to this, several other parameters such as the noise variance and the variances of the score weights are adjusted through a natural MCMC sampling approach. Also, the regularization introduced by the prior helps prevent over-fitting, which is important since in the first several iterations BPFA is modeling an MRI reconstruction that is significantly distorted.

11 11 Sum of probabilities Dictionary element index (a) Dictionary for 10% sampling (b) Dictionary for 20% sampling (c) Dictionary for 30% sampling 10% sampling 20% sampling 30% sampling (d) BPFA weights (cumulative) empirical probability % sampling 20% sampling 30% sampling # dictionary elements used by patch (e) Dictionary elements per patch Fig. 10. Radial sampling for the Circle of Willis. (a)-(c) The learned dictionary for various sampling rates. (d) The cumulative function of the sorted π k from BPFA for each sampling rate. This gives information on sparsity and average usage of the dictionary. (e) The distribution on the number of elements used per patch for each sampling rate. Another advantage of our model is the Markov Chain Monte Carlo inference algorithm. In highly non-convex Bayesian models (or similar models with a Bayesian interpretation), it is generally observed by the statistics community that MCMC sampling outperforms deterministic methods. Given that BPFA is a Bayesian model, such inference/optimization techniques are readily derived, as we showed in Section III-A. A drawback of MCMC is that more iterations are required than deterministic methods (we used 1000 iterations requiring approximately 1.5 hours, whereas the other algorithms required under 100). However, we note that inference for the BPFA model is easily parallelizable, which can mitigate this problem. 2) Dictionary learning: We next investigate the model learned by BPFA. In Figure 10 we show dictionary learning results learned by BPFA+TV for radial sampling of the Circle of Willis. In the top portion, we show the dictionaries learned for 10%, 20% and 30% sampling. We see that they are consistent, but the number of elements increases as the sampling percentage increases, since more complex information is contained in the k-space measurements of the image. This is also shown in Figure 10(d). In this plot we show the cumulative sum of the ordered π k from BPFA. We can read off the average number of elements used per patch by looking at the right-most value. We see that more elements are used per patch as the fraction of observed k-space increases. We also see that for 10%, 20% and 30% sampling, roughly 60, 80 and 95, respectively, of the 108 total dictionary elements were significantly used, as indicated by the leveling off of these functions. This highlights the adaptive property of the nonparametric beta process prior. In Figure 10(e) we show the empirical distribution on the number of dictionary elements used per patch for each sampling rate. We see that there are two modes, one for the empty background and one for the foreground, and the second mode tends to increase as the sampling rate increases. The adaptability of this value to each patch is another characteristic of the beta process model. We note that these results are typical of what we observed in the other experiments. 3) Discussion: We initialized the image using zero-filling and initialized the first dictionary elements using the singular vectors of the patches from this image. We then randomly sampled the remaining dictionary elements from the prior. We initialized z ik = 0 for all i and k, and π k =. As mentioned, for the Gamma(g 0, h 0 ) prior on the inverse noise variance of the patch, we set g 0 = 0.5N 2 /10 and h 0 = g 0 v/8, where v is the empirical noise variance of the zero-filled image. This gives a prior expected noise of v/8. Here, 10 indicates that the prior is 1/10 the strength of the likelihood, and 8 indicates that the prior expects a SNR of 8 to 1 with respect to the zero-filled image. The purpose of this is that the MRI we consider have very little noise and so using a non-informative prior (where g 0 = h 0 = ) would cause dictionary learning to fit the early reconstructions tightly by correctly learning that there is very little noise. While we still observed good results, the convergence was very slow. Strengthening the prior enforces a more smooth reconstruction in the early stages of inference. We note that for more significant levels of noise, such as our examples in Sections II-B2 and IV-B, this issue did not arise and noninformative priors could be used. We note that the added computation time for the TV penalty is very small compared with dictionary learning; the total amount of time required for one iteration was between 5 and 6 seconds for the BPFA+TV model. This is significantly faster than DLMRI, since our sampling approach is much less computationally intensive than the OMP algorithm, which requires matrix inversions, but slower than the other algorithms we compare with. V. CONCLUSION We have presented an algorithm for CS-MRI reconstruction that uses Bayesian nonparametric dictionary learning. Our Bayesian approach uses a model called beta process factor analysis (BPFA) for in situ dictionary learning. Through this hierarchical generative structure, we can learn the dictionary size, sparsity pattern and additional regularization parameters. We also considered a total variation penalty term for additional constraints on image smoothness. We presented an optimization algorithm using the alternating direction method

12 12 of multipliers (ADMM) and MCMC Gibbs sampling for all BPFA variables. Experimental results on several MR images showed that our proposed regularization framework compares favorably with other algorithms for various sampling trajectories and rates. VI. APPENDIX A. Constructing the Bayesian part of the objective function We give some additional details of the Bayesian structure of our dictionary learning approach. The unknown variables of the model are D = {d 1,..., d K }, π = {π 1,..., π K }, {s i } i=1:n, {z i } i=1:n, γ ε, γ s = {γ s,1,..., γ s,k }. The data from the perspective of BPFA is the set of patches extracted from the current reconstruction, {R i x} i=1:n. The joint likelihood of these variables and data is p({r i x}, D, π, {s i }, {z i }, γ ε, γ s ) = [ N p(r i x D, z i, s i, γ ε )p(s i γ s ) ] k p(z ik π k ) i=1 [ K ] p(π k )p(d k ) p(γ ε ) k p(γ s,k). (20) k=1 The first bracketed group constitutes the patch-specific part of the likelihood. The second group contains the dictionary elements and their probabilities and the remaining distributions are for inverse variances. The specific distributions used are given in Algorithm 1. By writing out these distributions explicitly, the functional form of the joint likelihood can be obtained. The dictionary learning part of the objective function, which corresponds to sub-problem P2, is γ ε 2 R ix Dα i f(ϕ i ) = i ln p({r i x}, D, π, {s i }, {z i }, γ ε, γ s ). Optimizing this non-convex function is equivalent to finding a mode of the joint likelihood. Rather than use a deterministic gradient-based method, we use the MCMC Gibbs sampling to stochastically find a mode. The functional form is unnecessary for deriving the Gibbs sampling algorithm. We note that many of the updates are essentially noisy versions of regularized least squares solutions. B. Alternating Direction Method of Multipliers To review the general form of ADMM [35] we are interested in, we start with the convex optimization problem min x Ax b h(x), (21) where h is a non-smooth convex function, such as an l 1 penalty. ADMM decouples the smooth squared error term from this penalty by introducing a second vector v such that min x Ax b h(v) subject to v = x. (22) This is followed by a relaxation of the equality v = x via an augmented Lagrangian term L(x, v, η) = Ax b h(v) + η T (x v) + ρ 2 x v 2 2. (23) A minimax saddle point is found with the minimization taking place over both x and v and dual ascent for η. Another way to write the objective in (23) is to define u = (1/ρ)η and combine the last two terms. The result is an objective that can be optimized by cycling through the following updates for x, v and u, x = arg min x Ax b ρ 2 x v + u 2 2, (24) v = arg min v h(v) + ρ 2 x v + u 2 2, (25) u = u + x v. (26) This algorithm simplifies the optimization since the objective for x is quadratic and thus has a simple analytic solution, while the update for v is a proximity operator of h with penalty ρ, the difference being that v is not pre-multiplied by a matrix as x is in (21). Such optimization problems tend to be much easier to solve; for example when h is the TV penalty the solution for v is analytical. REFERENCES [1] E. Candés, J. Romberg, and T. Tao, Robust Uncertainty Principles: Exact Signal Reconstruction From Highly Incomplete Frequency Information, IEEE Trans. on Information Theory, vol. 52, no. 2, pp , [2] D. Donoho, Compressed sensing, IEEE Trans. on Information Theory, vol. 52, no. 4, pp , [3] M. Lustig, D. Donoho, and J. M. Pauly, Sparse MRI: The Application of Compressed Sensing for Rapid MR Imaging, Magnetic Resonance in Medicine, vol. 58, no. 6, pp , [4] Y. Kim, M. S. Nadar, and A. Bilgin, Wavelet-Based Compressed Sensing Using Gaussian Scale Mixtures, IEEE Trans. on Image Processing, vol. 21, no. 6, pp , [5] X. Qu, W. Zhang, D. Guo, C. Cai, S. Cai, and Z. Chen, Iterative Thresholding Compressed Sensing MRI Based on Contourlet Transform, Inverse Problems Sci. Eng., Jun [6] X. Qu, X. Cao, D. Guo, C. Hu, and Z. Chen, Combined Sparsifying Transforms for Compressed Sensing MRI, Electronics Letters, vol. 46, no. 2, pp , [7] J. Trzasko and A. Manduca, Highly Undersampled Magnetic Resonance Image Reconstruction via Homotopic L0-Minimization, IEEE Trans. on Medical Imaging, vol. 28, no. 1, pp , [8] R. Chartrand, Fast Algorithms for Nonconvex Compressive Sensing: MRI Reconstruction from Very Few Data, in Proc. IEEE Int. Symp. on Biomedical Imaging, pp , [9] J. C. Ye, S. Tak, Y. Han, and H. W. Park, Projection Reconstruction MR Imaging Using FOCUSS, Magnetic Resonance in Medicine, vol. 57, no. 4, pp , [10] H. Jung, K. Sung, K. S. Nayak, E. Y. Kim, and J. C. Ye, k-t FOCUSS: A General Compressed Sensing Framework for High Resolution Dynamic MRI, Magnetic Resonance in Medicine, vol. 61, pp , [11] J. Yang, Y. Zhang, and W. Yin, A Fast Alternating Direction Method for TVL1-L2 Signal Reconstruction from Partial Fourier Data, IEEE J. Sel. Topics in Signal Processing, vol. 4, no. 2, pp , [12] Y. Chen and X. Ye, A Novel Method and Fast Algorithm for MR Image Reconstruction with Significantly Under- sampled Data, Inverse Problems and Imaging, vol. 4, no. 2, pp , [13] J. Huang, S. Zhang, and D. Metaxas, Efficient MR Image Reconstruction for Compressed MR Imaging, Medical Image Analysis, vol. 15, no. 5, pp , [14] S. Ji, Y. Xue and L. Carin, Bayesian compressive sensing, IEEE Trans. on Signal Processing, vol. 56, no. 6, pp , [15] X. Ye, Y. Chen, and F. Huang, Computational Acceleration for MR Image Reconstruction in Partially Parallel Imaging, IEEE Trans. on Medical Imaging, vol. 30, no. 5, pp , [16] X. Ye, Y. Chen, W. Lin, and F. Huang, Fast MR Image Reconstruction for Partially Parallel Imaging with Arbitrary k-space Trajectories, IEEE Trans. on Medical Imaging, vol. 30, no. 3, pp , 2011.

Sparsifying Transform Learning for Compressed Sensing MRI

Sparsifying Transform Learning for Compressed Sensing MRI Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Recent developments on sparse representation

Recent developments on sparse representation Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Compressed Sensing: Extending CLEAN and NNLS

Compressed Sensing: Extending CLEAN and NNLS Compressed Sensing: Extending CLEAN and NNLS Ludwig Schwardt SKA South Africa (KAT Project) Calibration & Imaging Workshop Socorro, NM, USA 31 March 2009 Outline 1 Compressed Sensing (CS) Introduction

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Lecture 22: More On Compressed Sensing

Lecture 22: More On Compressed Sensing Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an

More information

Accelerated MRI Image Reconstruction

Accelerated MRI Image Reconstruction IMAGING DATA EVALUATION AND ANALYTICS LAB (IDEAL) CS5540: Computational Techniques for Analyzing Clinical Data Lecture 15: Accelerated MRI Image Reconstruction Ashish Raj, PhD Image Data Evaluation and

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Nonparametric Bayesian Dictionary Learning for Machine Listening

Nonparametric Bayesian Dictionary Learning for Machine Listening Nonparametric Bayesian Dictionary Learning for Machine Listening Dawen Liang Electrical Engineering dl2771@columbia.edu 1 Introduction Machine listening, i.e., giving machines the ability to extract useful

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

arxiv: v1 [cs.it] 26 Oct 2018

arxiv: v1 [cs.it] 26 Oct 2018 Outlier Detection using Generative Models with Theoretical Performance Guarantees arxiv:1810.11335v1 [cs.it] 6 Oct 018 Jirong Yi Anh Duc Le Tianming Wang Xiaodong Wu Weiyu Xu October 9, 018 Abstract This

More information

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University

More information

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Compressive Imaging by Generalized Total Variation Minimization

Compressive Imaging by Generalized Total Variation Minimization 1 / 23 Compressive Imaging by Generalized Total Variation Minimization Jie Yan and Wu-Sheng Lu Department of Electrical and Computer Engineering University of Victoria, Victoria, BC, Canada APCCAS 2014,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

SOS Boosting of Image Denoising Algorithms

SOS Boosting of Image Denoising Algorithms SOS Boosting of Image Denoising Algorithms Yaniv Romano and Michael Elad The Technion Israel Institute of technology Haifa 32000, Israel The research leading to these results has received funding from

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011 Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Bayesian non parametric approaches: an introduction

Bayesian non parametric approaches: an introduction Introduction Latent class models Latent feature models Conclusion & Perspectives Bayesian non parametric approaches: an introduction Pierre CHAINAIS Bordeaux - nov. 2012 Trajectory 1 Bayesian non parametric

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems 1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Compressive Sensing (CS)

Compressive Sensing (CS) Compressive Sensing (CS) Luminita Vese & Ming Yan lvese@math.ucla.edu yanm@math.ucla.edu Department of Mathematics University of California, Los Angeles The UCLA Advanced Neuroimaging Summer Program (2014)

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

A Riemannian Framework for Denoising Diffusion Tensor Images

A Riemannian Framework for Denoising Diffusion Tensor Images A Riemannian Framework for Denoising Diffusion Tensor Images Manasi Datar No Institute Given Abstract. Diffusion Tensor Imaging (DTI) is a relatively new imaging modality that has been extensively used

More information

Greedy Dictionary Selection for Sparse Representation

Greedy Dictionary Selection for Sparse Representation Greedy Dictionary Selection for Sparse Representation Volkan Cevher Rice University volkan@rice.edu Andreas Krause Caltech krausea@caltech.edu Abstract We discuss how to construct a dictionary by selecting

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Applied Machine Learning for Biomedical Engineering. Enrico Grisan Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Joint Bayesian Compressed Sensing with Prior Estimate

Joint Bayesian Compressed Sensing with Prior Estimate Joint Bayesian Compressed Sensing with Prior Estimate Berkin Bilgic 1, Tobias Kober 2,3, Gunnar Krueger 2,3, Elfar Adalsteinsson 1,4 1 Electrical Engineering and Computer Science, MIT 2 Laboratory for

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Using ADMM and Soft Shrinkage for 2D signal reconstruction

Using ADMM and Soft Shrinkage for 2D signal reconstruction Using ADMM and Soft Shrinkage for 2D signal reconstruction Yijia Zhang Advisor: Anne Gelb, Weihong Guo August 16, 2017 Abstract ADMM, the alternating direction method of multipliers is a useful algorithm

More information

Variational Bayesian Inference Techniques

Variational Bayesian Inference Techniques Advanced Signal Processing 2, SE Variational Bayesian Inference Techniques Johann Steiner 1 Outline Introduction Sparse Signal Reconstruction Sparsity Priors Benefits of Sparse Bayesian Inference Variational

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Compressed Sensing via Partial l 1 Minimization

Compressed Sensing via Partial l 1 Minimization WORCESTER POLYTECHNIC INSTITUTE Compressed Sensing via Partial l 1 Minimization by Lu Zhong A thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Relaxed linearized algorithms for faster X-ray CT image reconstruction Relaxed linearized algorithms for faster X-ray CT image reconstruction Hung Nien and Jeffrey A. Fessler University of Michigan, Ann Arbor The 13th Fully 3D Meeting June 2, 2015 1/20 Statistical image reconstruction

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

arxiv: v1 [cs.na] 29 Nov 2017

arxiv: v1 [cs.na] 29 Nov 2017 A fast nonconvex Compressed Sensing algorithm for highly low-sampled MR images reconstruction D. Lazzaro 1, E. Loli Piccolomini 1 and F. Zama 1 1 Department of Mathematics, University of Bologna, arxiv:1711.11075v1

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information