Statistical Methods for Image Reconstruction

Size: px
Start display at page:

Download "Statistical Methods for Image Reconstruction"

Transcription

1 Statistical Methods for Image Reconstruction Jeffrey A. Fessler EECS Department The University of Michigan NSS-MIC Oct. 19,

2 Image Reconstruction Methods (Simplified View) Analytical (FBP) Iterative (OSEM?) 0.1

3 Image Reconstruction Methods / Algorithms ANALYTICAL FBP BPF Gridding... Algebraic (y = Ax) ART MART SMART... ITERATIVE (Weighted) Least Squares CG CD ISRA... Statistical Likelihood (e.g., Poisson) EM (etc.) OSEM SAGE CG Int. Point GCA PSCD FSCD

4 Outline Part 0: Introduction / Overview / Examples Part 1: From Physics to Statistics (Emission tomography) Assumptions underlying Poisson statistical model Emission reconstruction problem statement Part 2: Four of Five Choices for Statistical Image Reconstruction Object parameterization System physical modeling Statistical modeling of measurements Cost functions and regularization Part 3: Fifth Choice: Iterative algorithms Classical optimization methods Considerations: nonnegativity, convergence rate,... Optimization transfer: EM etc. Ordered subsets / block iterative / incremental gradient methods Part 4: Performance Analysis Spatial resolution properties Noise properties Detection performance Part 5: Miscellaneous topics (?)

5 History Successive substitution method vs direct Fourier (Bracewell, 1956) Iterative method for X-ray CT (Hounsfield, 1968) ART for tomography (Gordon, Bender, Herman, JTB, 1970) Richardson/Lucy iteration for image restoration (1972, 1974) Weighted least squares for 3D SPECT (Goitein, NIM, 1972) Proposals to use Poisson likelihood for emission and transmission tomography Emission: (Rockmore and Macovski, TNS, 1976) Transmission: (Rockmore and Macovski, TNS, 1977) Expectation-maximization (EM) algorithms for Poisson model Emission: (Shepp and Vardi, TMI, 1982) Transmission: (Lange and Carson, JCAT, 1984) Regularized (aka Bayesian) Poisson emission reconstruction (Geman and McClure, ASA, 1985) Ordered-subsets EM algorithm (Hudson and Larkin, TMI, 1994) Commercial introduction of OSEM for PET scanners circa

6 Why Statistical Methods? Object constraints (e.g., nonnegativity, object support) Accurate physical models (less bias = improved quantitative accuracy) (e.g., nonuniform attenuation in SPECT) improved spatial resolution? Appropriate statistical models (less variance = lower image noise) (FBP treats all rays equally) Side information (e.g., MRI or CT boundaries) Nonstandard geometries (e.g., irregular sampling or missing data) Disadvantages? Computation time Model complexity Software complexity Analytical methods (a different short course!) Idealized mathematical model Usually geometry only, greatly over-simplified physics Continuum measurements (discretize/sample after solving) No statistical model Easier analysis of properties (due to linearity) e.g., Huesman (1984) FBP ROI variance for kinetic fitting 0.5

7 What about Moore s Law? 0.6

8 Benefit Example: Statistical Models Soft Tissue True FBP 1 PWLS 1 PL 1 Cortical Bone NRMS Error Method Soft Tissue Cortical Bone FBP 22.7% 29.6% PWLS 13.6% 16.2% PL 11.8% 15.8% 0.7

9 Benefit Example: Physical Models a. True object a. Soft tissue corrected FBP b. Unocrrected FBP b. JS corrected FBP c. Monoenergetic statistical reconstruction c. Polyenergetic Statistical Reconstruction

10 Benefit Example: Nonstandard Geometries 0.9 Photon Source Detector Bins

11 Truncated Fan-Beam SPECT Transmission Scan Truncated Truncated Untruncated FBP PWLS FBP 0.10

12 One Final Advertisement: Iterative MR Reconstruction 0.11

13 Part 1: From Physics to Statistics or What quantity is reconstructed? (in emission tomography) Outline Decay phenomena and fundamental assumptions Idealized detectors Random phenomena Poisson measurement statistics State emission tomography reconstruction problem 1.1

14 What Object is Reconstructed? In emission imaging, our aim is to image the radiotracer distribution. The what? At time t = 0 we inject the patient with some radiotracer, containing a large number N of metastable atoms of some radionuclide. Let X k (t) R 3 denote the position of the kth tracer atom at time t. These positions are influenced by blood flow, patient physiology, and other unpredictable phenomena such as Brownian motion. The ultimate imaging device would provide an exact list of the spatial locations X 1 (t),..., X N (t) of all tracer atoms for the entire scan. Would this be enough? 1.2

15 Atom Positions or Statistical Distribution? Repeating a scan would yield different tracer atom sample paths { X k (t)... statistical formulation } N k=1. Assumption 1. The spatial locations of individual tracer atoms at any time t 0 are independent random variables that are all identically distributed according to a common probability density function (pdf) p t ( x). This pdf is determined by patient physiology and tracer properties. Larger values of p t ( x) correspond to hot spots where the tracer atoms tend to be located at time t. Units: inverse volume, e.g., atoms per cubic centimeter. The radiotracer distribution p t ( x) is the quantity of interest. (Not { X k (t) } N k=1!) 1.3

16 Example: Perfect Detector 6 Radiotracer Distribution x N = 2000 x_2 0 x_ x_1 6 True radiotracer distribution p t ( x) at some time t x_1 A realization of N = 2000 i.i.d. atom positions (dots) recorded exactly. Little similarity! 1.4

17 Binning/Histogram Density Estimator Histogram Density Estimate x_ x_1 6 Estimate of p t ( x) formed by histogram binning of N = 2000 points. Ramp remains difficult to visualize. 1.5

18 Kernel Density Estimator 6 Gaussian Kernel Density Estimate x Horizontal Profile 0.01 x2 0 Density w = x 1 0 True Bin Kernel x 1 Gaussian kernel density estimator for p t ( x) from N = 2000 points. Horizontal profiles at x 2 = 3 through density estimates. 1.6

19 Poisson Spatial Point Process Assumption 2. The number of administered tracer atoms N has a Poisson distribution with some mean µ N E[N] = np{n = n}. n=0 Let N t (B) denote the number of tracer atoms that have spatial locations in any set B R 3 (VOI) at time t after injection. N t ( ) is called a Poisson spatial point process. Fact. For any set B, N t (B) is Poisson distributed with mean: { E[N t (B)] = E[N]P X k (t) B } Z = µ N p t ( x)d x. Poisson N injected atoms + i.i.d. locations = Poisson point process B 1.7

20 Illustration of Point Process (µ N = 200) 25 points within ROI 15 points within ROI x x x 1 20 points within ROI x 1 26 points within ROI x x x x 1 1.8

21 Radionuclide Decay Preceding quantities are all unobservable. We observe a tracer atom only when it decays and emits photon(s). The time that the kth tracer atom decays is a random variable T k. Assumption 3. The T k s are statistically independent random variables, and are independent of the (random) spatial location. Assumption 4. Each T k has an exponential distribution with mean µ T = t 1/2 /ln2. Cumulative distribution function: P{T k t} = 1 exp( t/µ T ) 1 P[T k t] t 1/2 t / µ T 1.9

22 Statistics of an Ideal Decay Counter Let K t (B) denote the number of tracer atoms that decay by time t, and that were located in the VOI B R 3 at the time of decay. Fact. K t (B) is a Poisson counting process with mean E[K t (B)] = Z t 0 Z B λ( x,τ)d xdτ, where the (nonuniform) emission rate density is given by λ( x,t) µ N e t/µ T Ingredients: dose, decay, distribution µ T p t ( x). Units: counts per unit time per unit volume, e.g., µci/cc. Photon emission is a Poisson process What about the actual measurement statistics? 1.10

23 Idealized Detector Units A nuclear imaging system consists of n d conceptual detector units. Assumption 5. Each decay of a tracer atom produces a recorded count in at most one detector unit. Let S k {0,1,...,n d } denote the index of the incremented detector unit for decay of kth tracer atom. (S k = 0 if decay is undetected.) Assumption 6. The S k s satisfy the following conditional independence: { } N { } P S 1,...,S N N, T 1,...,T N, X 1 ( ),..., X N ( ) = P S k X k (T k ). k=1 The recorded bin for the kth tracer atom s decay depends only on its position when it decays, and is independent of all other tracer atoms. (No event pileup; no deadtime losses.) 1.11

24 PET Example i = 1 Sinogram Ray i Angular Positions Radial Positions i = n d n d (n crystals 1) n crystals /2 1.12

25 SPECT Example i = 1 Sinogram Radial Positions 1.13 Angular Positions nd = nradial bins nangular steps i = nd Collimator / Detector

26 Detector Unit Sensitivity Patterns Spatial localization: s i ( x) probability that decay at x is recorded by ith detector unit. Idealized Example. Shift-invariant PSF: s i ( x) = h( k i x r i ) r i is the radial position of ith ray k i is the unit vector orthogonal to ith parallel ray h( ) is the shift-invariant radial PSF (e.g., Gaussian bell or rectangular function) x 2 h(r r i ) r r i k i x x k i x

27 Example: SPECT Detector-Unit Sensitivity Patterns s 1 ( x) s 2 ( x) x 2 x 1 Two representative s i ( x) functions for a collimated Anger camera. 1.15

28 Example: PET Detector-Unit Sensitivity Patterns

29 Detector Unit Sensitivity Patterns s i ( x) can include the effects of geometry / solid angle collimation scatter attenuation detector response / scan geometry duty cycle (dwell time at each angle) detector efficiency positron range, noncollinearity... System sensitivity pattern: s( x) n d s i ( x) = 1 s 0 ( x) 1 i=1 (probability that decay at location x will be detected at all by system) 1.17

30 (Overall) System Sensitivity Pattern: s( x) = n d i=1 s i( x) x 2 x 1 Example: collimated 180 SPECT system with uniform attenuation. 1.18

31 Detection Probabilities s i ( x 0 ) (vs det. unit index i) s i ( x 0 ) θ x 2 x 0 x 1 Image domain r Sinogram domain 1.19

32 Summary of Random Phenomena Number of tracer atoms injected N Spatial locations of tracer atoms { X k (t) Time of decay of tracer atoms {T k } N k=1 Detection of photon [S k 0] Recording detector unit {S k } n d i=1 } N k=1 1.20

33 Emission Scan Record events in each detector unit for t 1 t t 2. Y i number of events recorded by ith detector unit during scan, for i = 1,...,n d. Y i N k=1 1 {S k =i, T k [t 1,t 2 ]}. The collection {Y i : i = 1,...,n d } is our sinogram. Note 0 Y i N. Fact. Under Assumptions 1-6 above, Z } Y i Poisson{ s i ( x)λ( x)d x (cf line integral ) and Y i s are statistically independent random variables, where the emission density is given by λ( x) = µ N Z t2 t 1 1 µ T e t/µ T p t ( x)dt. (Local number of decays per unit volume during scan.) Ingredients: dose (injected) duration of scan decay of radionuclide distribution of radiotracer 1.21

34 Poisson Statistical Model (Emission) Actual measured counts = foreground counts + background counts. Sources of background counts: cosmic radiation / room background random coincidences (PET) scatter not accounted for in s i ( x) crosstalk from transmission sources in simultaneous T/E scans anything else not accounted for by R s i ( x)λ( x)d x Assumption 7. The background counts also have independent Poisson distributions. Statistical model (continuous to discrete) { Z } Y i Poisson s i ( x)λ( x)d x+r i, i = 1,...,n d r i : mean number of background counts recorded by ith detector unit. 1.22

35 Emission Reconstruction Problem Estimate the emission density λ( x) using (something like) this model: { Z } Y i Poisson s i ( x)λ( x)d x+r i, i = 1,...,n d. Knowns: {Y i = y i } n d i=1 : observed counts from each detector unit s i ( x) sensitivity patterns (determined by system models) r i s : background contributions (determined separately) Unknown: λ( x) 1.23

36 List-mode acquisitions Recall that conventional sinogram is temporally binned: Y i N k=1 This binning discards temporal information. 1 {Sk =i, T k [t 1,t 2 ]}. List-mode measurements: record all (detector,time) pairs in a list, i.e., {(S k,t k ) : k = 1,...,N}. List-mode dynamic reconstruction problem: Estimate λ ( x,t) given {(S k,t k )}. 1.24

37 Emission Reconstruction Problem - Illustration λ( x) {Y i } x 2 θ x 1 r 1.25

38 Example: MRI Sensitivity Pattern x2 x 1 Each k-space sample corresponds to a sinusoidal pattern weighted by: RF receive coil sensitivity pattern phase effects of field inhomogeneity spin relaxation effects. Z ( ) y i = f( x)s i ( x)d x+ε i, s i ( x) = c RF ( x)exp( ıω( x)t i )exp( t i /T 2 ( x))exp ı2π k(t i ) x 1.26

39 Continuous-Discrete Models Emission tomography: y i Poisson{ R λ( x)s i ( x)d x+r i } Transmission tomography (monoenergetic): { ( y i Poisson b i exp ı R } L i µ( x)dl )+r i R ( Transmission (polyenergetic): y i Poisson{ Ii (E)exp ı R } L i µ( x,e)dl )de +r i Magnetic resonance imaging: y i = R f( x)s i ( x)d x+ε i Discrete measurements y = (y 1,...,y nd ) Continuous-space unknowns: λ( x), µ( x), f( x) Goal: estimate f( x) given y Solution options: Continuous-continuous formulations ( analytical ) Continuous-discrete formulations usually ˆf( x) = n d i=1 c i s i ( x) Discrete-discrete formulations f( x) n p j=1 x j b j ( x) 1.27

40 Part 2: Five Categories of Choices Object parameterization: function f( r) vs finite coefficient vector x System physical model: {s i ( r)} Measurement statistical model y i? Cost function: data-mismatch and regularization Algorithm / initialization No perfect choices - one can critique all approaches! 2.1

41 Choice 1. Object Parameterization Finite measurements: {y i } n d i=1. Continuous object: f( r). Hopeless? All models are wrong but some models are useful. Linear series expansion approach. Replace f( r) by x = (x 1,...,x np ) where f( r) f( r) = n p x j b j ( r) j=1 basis functions Forward projection: Z s i ( r) f( r)d r = = [ ] Z np n p [ Z s i ( r) x j b j ( r) d r = s i ( r)b j ( r)d r] x j j=1 j=1 n p Z a i j x j = [Ax] i, where a i j s i ( r)b j ( r)d r j=1 Projection integrals become finite summations. a i j is contribution of jth basis function (e.g., voxel) to ith detector unit. The units of a i j and x j depend on the user-selected units of b j ( r). The n d n p matrix A = {a i j } is called the system matrix. 2.2

42 (Linear) Basis Function Choices Fourier series (complex / not sparse) Circular harmonics (complex / not sparse) Wavelets (negative values / not sparse) Kaiser-Bessel window functions (blobs) Overlapping circles (disks) or spheres (balls) Polar grids, logarithmic polar grids Natural pixels {s i ( r)} B-splines (pyramids) Rectangular pixels / voxels (rect functions) Point masses / bed-of-nails / lattice of points / comb function Organ-based voxels (e.g., from CT),... Considerations Represent f( r) well with moderate n p Orthogonality? (not essential) Easy to compute a i j s and/or Ax Rotational symmetry If stored, the system matrix A should be sparse (mostly zeros). Easy to represent nonnegative functions e.g., if x j 0, then f( r) 0. A sufficient condition is b j ( r)

43 Nonlinear Object Parameterizations Estimation of intensity and shape (e.g., location, radius, etc.) Surface-based (homogeneous) models Circles / spheres Ellipses / ellipsoids Superquadrics Polygons Bi-quadratic triangular Bezier patches,... Other models Generalized series f( r) = j x j b j ( r, θ) Deformable templates f( r) = b(t θ ( r))... Considerations Can be considerably more parsimonious If correct, yield greatly reduced estimation error Particularly compelling in limited-data problems Often oversimplified (all models are wrong but...) Nonlinear dependence on location induces non-convex cost functions, complicating optimization 2.4

44 Example Basis Functions - 1D 4 Continuous object f( r) Piecewise Constant Approximation Quadratic B Spline Approximation x 2.5

45 Pixel Basis Functions - 2D µ 0 (x,y) x x Continuous image f( r) Pixel basis approximation n p j=1 x j b j ( r) 2.6

46 Blobs in SPECT: Qualitative Post filt. OSEM (3 pix. FWHM) blob based α= Post filt. OSEM (3 pix. FWHM) rotation based Post filt. OSEM (3 pix. FWHM) blob based α= ˆxR ˆx ˆx B0 B1 x mm (2D SPECT thorax phantom simulations) 2.7

47 Blobs in SPECT: Quantitative Standard deviation vs. bias in reconstructed phantom images Per iteration, rotation based Per iteration, blob based α=10.4 Per iteration, blob based α=0 Per FWHM, rotation based Per FWHM, blob based α=10.4 Per FWHM, blob based α=0 FBP 2 Standard deviation (%) Bias (%) 2.8

48 Discrete-Discrete Emission Reconstruction Problem Having chosen a basis and linearly parameterized the emission density... Estimate the emission density coefficient vector x = (x 1,...,x np ) (aka image ) using (something like) this statistical model: y i Poisson { np a i j x j + r i }, i = 1,...,n d. j=1 {y i } n d i=1 : observed counts from each detector unit A = {a i j } : system matrix (determined by system models) r i s : background contributions (determined separately) Many image reconstruction problems are find x given y where y i = g i ([Ax] i )+ε i, i = 1,...,n d. 2.9

49 Choice 2. System Model System matrix elements: a i j = scan geometry collimator/detector response attenuation scatter (object, collimator, scintillator) duty cycle (dwell time at each angle) detector efficiency / dead-time losses positron range, noncollinearity, crystal penetration, Z s i ( r)b j ( r)d r Considerations Improving system model can improve Quantitative accuracy Spatial resolution Contrast, SNR, detectability Computation time (and storage vs compute-on-fly) Model uncertainties (e.g., calculated scatter probabilities based on noisy attenuation map) Artifacts due to over-simplifications 2.10

50 Measured System Model? Determine a i j s by scanning a voxel-sized cube source over the imaging volume and recording counts in all detector units (separately for each voxel). Avoids mathematical model approximations Scatter / attenuation added later (object dependent), approximately Small probabilities = long scan times Storage Repeat for every voxel size of interest Repeat if detectors change 2.11

51 Line Length System Model ith ray x 1 x 2 a i j length of intersection 2.12

52 Strip Area System Model ith ray x 1 x j 1 a i j area 2.13

53 (Implicit) System Sensitivity Patterns n d a i j s( r j ) = i=1 n d s i ( r j ) i=1 Line Length Strip Area 2.14

54 Point-Lattice Projector/Backprojector x 1 x 2 ith ray a i j s determined by linear interpolation 2.15

55 Point-Lattice Artifacts Projections (sinograms) of uniform disk object: 0 45 θ r r Point Lattice Strip Area 2.16

56 Forward- / Back-projector Pairs Forward projection (image domain to projection domain): ȳy i = Z s i ( r) f( r)d r = n p a i j x j = [Ax] i, or ȳy = Ax j=1 Backprojection (projection domain to image domain): { } np nd A y = a i j y i i=1 j=1 The term forward/backprojection pair often corresponds to an implicit choice for the object basis and the system model. Often A y is implemented as By for some backprojector B A Least-squares solutions (for example): ˆx = [A A] 1 A y [BA] 1 By 2.17

57 Mismatched Backprojector B A x ˆx (PWLS-CG) ˆx (PWLS-CG) Matched Mismatched 2.18

58 Horizontal Profiles ˆf(x1,32) Matched Mismatched x

59 System Model Tricks Factorize (e.g., PET Gaussian detector response) A SG (geometric projection followed by Gaussian smoothing) Symmetry Rotate and Sum Gaussian diffusion for SPECT Gaussian detector response Correlated Monte Carlo (Beekman et al.) In all cases, consistency of backprojector with A requires care. 2.20

60 SPECT System Model Collimator / Detector Complications: nonuniform attenuation, depth-dependent PSF, Compton scatter 2.21

61 Choice 3. Statistical Models After modeling the system physics, we have a deterministic model: y i g i ([Ax] i ) for some functions g i, e.g., g i (l) = l + r i for emission tomography. Statistical modeling is concerned with the aspect. Considerations More accurate models: can lead to lower variance images, may incur additional computation, may involve additional algorithm complexity (e.g., proper transmission Poisson model has nonconcave log-likelihood) Statistical model errors (e.g., deadtime) Incorrect models (e.g., log-processed transmission data) 2.22

62 Statistical Model Choices for Emission Tomography None. Assume y r = Ax. Solve algebraically to find x. White Gaussian noise. Ordinary least squares: minimize y Ax 2 Non-white Gaussian noise. Weighted least squares: minimize y Ax 2 n d W = w i (y i [Ax] i ) 2, where [Ax] i i=1 (e.g., for Fourier rebinned (FORE) PET data) n p a i j x j j=1 Ordinary Poisson model (ignoring or precorrecting for background) Poisson model y i Poisson{[Ax] i } y i Poisson{[Ax] i + r i } Shifted Poisson model (for randoms precorrected PET) y i = y prompt i y delay i Poisson{[Ax] i + 2r i } 2r i 2.23

63 Precorrected random coincidences: Shifted Poisson model for PET y i = y prompt i y delay i y prompt i Poisson{[Ax] i + r i } y delay i Poisson{r i } E[y i ] = [Ax] i Var{y i } = [Ax] i + 2r i Mean Variance = not Poisson! Statistical model choices Ordinary Poisson model: ignore randoms [y i ] + Poisson{[Ax] i } Causes bias due to truncated negatives Data-weighted least-squares (Gaussian model): y i N ( ) [Ax] i, ˆσ 2 i, ˆσ 2 i = max ( ) y i + 2ˆr i,σ 2 min Causes bias due to data-weighting Shifted Poisson model (matches 2 moments): [y i + 2ˆr i ] + Poisson{[Ax] i + 2ˆr i } Insensitive to inaccuracies in ˆr i. One can further reduce bias by retaining negative values of y i + 2ˆr i. 2.24

64 Shifted-Poisson Model for X-ray CT A model that includes both photon variability and electronic readout noise: y i Poisson{ȳy i (µ)}+n ( 0,σ 2) Shifted Poisson approximation [ yi + σ 2] + Poisson{ ȳy i (µ)+σ 2} or just use WLS... Complications: Intractability of likelihood for Poisson+Gaussian Compound Poisson distribution due to photon-energy-dependent detector signal. X-ray statistical models is a current research area in several groups! 2.25

65 Choice 4. Cost Functions Components: Data-mismatch term Regularization term (and regularization parameter β) Constraints (e.g., nonnegativity) Ψ(x) = DataMismatch(y, Ax) + β Roughness(x) ˆx argmin Ψ(x) x 0 Actually several sub-choices to make for Choice 4... Distinguishes statistical methods from algebraic methods for y = Ax. 2.26

66 Why Cost Functions? (vs procedure e.g., adaptive neural net with wavelet denoising) Theoretical reasons ML is based on minimizing a cost function: the negative log-likelihood ML is asymptotically consistent ML is asymptotically unbiased ML is asymptotically efficient (under true statistical model...) Estimation: Penalized-likelihood achieves uniform CR bound asymptotically Detection: Qi and Huesman showed analytically that MAP reconstruction outperforms FBP for SKE/BKE lesion detection (T-MI, Aug. 2001) Practical reasons Stability of estimates (if Ψ and algorithm chosen properly) Predictability of properties (despite nonlinearities) Empirical evidence (?) 2.27

67 Bayesian Framework Given a prior distribution p(x) for image vectors x, by Bayes rule: so posterior: p(x y) = p(y x) p(x)/ p(y) logp(x y) = logp(y x)+logp(x) logp(y) log p(y x) corresponds to data mismatch term (likelihood) log p(x) corresponds to regularizing penalty function Maximum a posteriori (MAP) estimator: ˆx = argmax log p(x y) x Has certain optimality properties (provided p(y x) and p(x) are correct). Same form as Ψ 2.28

68 Choice 4.1: Data-Mismatch Term Options (for emission tomography): Negative log-likelihood of statistical model. Poisson emission case: L(x; y) = logp(y x) = n d ([Ax] i + r i ) y i log([ax] i + r i )+logy i! i=1 Ordinary (unweighted) least squares: n d i=1 1 2 (y i ˆr i [Ax] i ) 2 Data-weighted least squares: n d i=1 1 2 (y i ˆr i [Ax] i ) 2 / ˆσ 2 i, ˆσ 2 i = max ( y i + ˆr i,σmin) 2, (causes bias due to data-weighting). Reweighted least-squares: ˆσ 2 i = [Aˆx] i + ˆr i Model-weighted least-squares (nonquadratic, but convex!) n d 1 i=1 2 (y i ˆr i [Ax] i ) 2 /([Ax] i + ˆr i ) Nonquadratic cost-functions that are robust to outliers... Considerations Faithfulness to statistical model vs computation Ease of optimization (convex?, quadratic?) Effect of statistical modeling errors 2.29

69 Choice 4.2: Regularization Forcing too much data fit gives noisy images Ill-conditioned problems: small data noise causes large image noise Solutions: Noise-reduction methods True regularization methods Noise-reduction methods Modify the data Prefilter or denoise the sinogram measurements Extrapolate missing (e.g., truncated) data Modify an algorithm derived for an ill-conditioned problem Stop algorithm before convergence Run to convergence, post-filter Toss in a filtering step every iteration or couple iterations Modify update to dampen high-spatial frequencies 2.30

70 Noise-Reduction vs True Regularization Advantages of noise-reduction methods Simplicity (?) Familiarity Appear less subjective than using penalty functions or priors Only fiddle factors are # of iterations, or amount of smoothing Resolution/noise tradeoff usually varies with iteration (stop when image looks good - in principle) Changing post-smoothing does not require re-iterating Advantages of true regularization methods Stability (unique minimizer & convergence = initialization independence) Faster convergence Predictability Resolution can be made object independent Controlled resolution (e.g., spatially uniform, edge preserving) Start with decent image (e.g., FBP) = reach solution faster. 2.31

71 True Regularization Methods Redefine the problem to eliminate ill-conditioning, rather than patching the data or algorithm! Options Use bigger pixels (fewer basis functions) Visually unappealing Can only preserve edges coincident with pixel edges Results become even less invariant to translations Method of sieves (constrain image roughness) Condition number for pre-emission space can be even worse Lots of iterations Commutability condition rarely holds exactly in practice Degenerates to post-filtering in some cases Change cost function by adding a roughness penalty / prior Disadvantage: apparently subjective choice of penalty Apparent difficulty in choosing penalty parameters (cf. apodizing filter / cutoff frequency in FBP) 2.32

72 Penalty Function Considerations Computation Algorithm complexity Uniqueness of minimizer of Ψ(x) Resolution properties (edge preserving?) # of adjustable parameters Predictability of properties (resolution and noise) Choices separable vs nonseparable quadratic vs nonquadratic convex vs nonconvex 2.33

73 Separable Penalty Functions: Separable vs Nonseparable Identity norm: R(x) = 1 2 x Ix = n p j=1 x2 j/2 penalizes large values of x, but causes squashing bias Entropy: R(x) = n p j=1 x j logx j Gaussian prior with mean µ j, variance σ 2 j: R(x) = n p (x j µ j ) 2 j=1 2σ 2 j Gamma prior R(x) = n p j=1 p(x j,µ j,σ j ) where p(x,µ,σ) is Gamma pdf The first two basically keep pixel values from blowing up. The last two encourage pixels values to be close to prior means µ j. General separable form: R(x) = n p j=1 f j (x j ) Slightly simpler for minimization, but these do not explicitly enforce smoothness. The simplicity advantage has been overcome in newer algorithms. 2.34

74 Penalty Functions: Separable vs Nonseparable Nonseparable (partially couple pixel values) to penalize roughness x 1 x 2 x 3 x 4 x 5 Example R(x) = (x 2 x 1 ) 2 +(x 3 x 2 ) 2 +(x 5 x 4 ) 2 +(x 4 x 1 ) 2 +(x 5 x 2 ) R(x) = 1 R(x) = 6 R(x) = 10 Rougher images = greater R(x) 2.35

75 Roughness Penalty Functions First-order neighborhood and pairwise pixel differences: R(x) = n p 1 j=1 2 ψ(x j x k ) k N j N j neighborhood of jth pixel (e.g., left, right, up, down) ψ called the potential function Finite-difference approximation to continuous roughness measure: Z Z R( f( )) = f( r) 2 d r = x f( r) 2 + y f( r) 2 + z f( r) 2 d r. Second derivatives also useful: (More choices!) 2 x f( r) 2 f( r j+1 ) 2 f( r j )+ f( r j 1 ) r= r j R(x) = n p ψ(x j+1 2x j + x j 1 )+ j=1 2.36

76 Penalty Functions: General Form R(x) = ψ k ([Cx] k ) where [Cx] k = k n p c k j x j j=1 Example: x 1 x 2 x 3 x 4 x 5 Cx = x 1 x 2 x 3 x 4 x 5 = x 2 x 1 x 3 x 2 x 5 x 4 x 4 x 1 x 5 x 2 5 R(x) = ψ k ([Cx] k ) k=1 = ψ 1 (x 2 x 1 )+ψ 2 (x 3 x 2 )+ψ 3 (x 5 x 4 )+ψ 4 (x 4 x 1 )+ψ 5 (x 5 x 2 ) 2.37

77 Penalty Functions: Quadratic vs Nonquadratic R(x) = ψ k ([Cx] k ) k Quadratic ψ k If ψ k (t) = t 2 /2, then R(x) = 1 2 x C Cx, a quadratic form. Simpler optimization Global smoothing Nonquadratic ψ k Edge preserving More complicated optimization. (This is essentially solved in convex case.) Unusual noise properties Analysis/prediction of resolution and noise properties is difficult More adjustable parameters (e.g., δ) Example: Huber function. ψ(t) { t 2 /2, t δ δ t δ 2 /2, t > δ Example: Hyperbola function. ψ(t) δ 2 1+(t/δ)

78 Quadratic vs Non quadratic Potential Functions Parabola (quadratic) Huber, δ=1 Hyperbola, δ=1 ψ(t) t Lower cost for large differences = edge preservation 2.39

79 Edge-Preserving Reconstruction Example Phantom Quadratic Penalty Huber Penalty 2.40

80 More Edge Preserving Regularization Chlewicki et al., PMB, Oct. 2004: Noise reduction and convergence of Bayesian algorithms with blobs based on the Huber function and median root prior 2.41

81 Piecewise Constant Cartoon Objects 400 k space samples x true x "conj 2 0phase" 2 x pcg quad x true x pcg edge x "conj phase" x pcg quad x pcg edge

82 Total Variation Regularization Non-quadratic roughness penalty: Z f( r) d r [Cx] k k Uses magnitude instead of squared magnitude of gradient. Problem: is not differentiable. Practical solution: t 1+(t/δ) 2 (hyperbola!) 5 Potential functions 4 Total Variation Hyperbola, δ=0.2 Hyperbola, δ=1 ψ(t) t 2.43

83 Penalty Functions: Convex vs Nonconvex Convex Easier to optimize Guaranteed unique minimizer of Ψ (for convex negative log-likelihood) Nonconvex Greater degree of edge preservation Nice images for piecewise-constant phantoms! Even more unusual noise properties Multiple extrema More complicated optimization (simulated / deterministic annealing) Estimator ˆx becomes a discontinuous function of data Y Nonconvex examples broken parabola true median root prior: R(x) = n p (x j median j (x)) 2 j=1 median j (x) ψ(t) = min(t 2, t 2 max) where median j (x) is local median Exception: orthonormal wavelet threshold denoising via nonconvex potentials! 2.44

84 Potential Function ψ(t) Potential Functions Parabola (quadratic) Huber (convex) Broken parabola (non convex) δ= t = x x j k 2.45

85 Local Extrema and Discontinuous Estimators Ψ(x) x Small change in data = large change in minimizer ˆx. Using convex penalty functions obviates this problem. ˆx 2.46

86 Augmented Regularization Functions Replace roughness penalty R(x) with R(x b) + αr(b), where the elements of b (often binary) indicate boundary locations. Line-site methods Level-set methods Joint estimation problem: (ˆx, ˆb) = argmin Ψ(x, b), x,b Ψ(x, b) = L(x; y)+βr(x b)+αr(b). Example: b jk indicates the presence of edge between pixels j and k: R(x b) = n p j=1 Penalty to discourage too many edges (e.g.): k N j (1 b jk ) 1 2 (x j x k ) 2 R(b) = b jk. jk Can encourage local edge continuity May require annealing methods for minimization 2.47

87 Modified Penalty Functions R(x) = n p 1 j=1 2 w jk ψ(x j x k ) k N j Adjust weights {w jk } to Control resolution properties Incorporate anatomical side information (MR/CT) (avoid smoothing across anatomical boundaries) Recommendations Emission tomography: Begin with quadratic (nonseparable) penalty functions Consider modified penalty for resolution control and choice of β Use modest regularization and post-filter more if desired Transmission tomography (attenuation maps), X-ray CT consider convex nonquadratic (e.g., Huber) penalty functions choose δ based on attenuation map units (water, bone, etc.) choice of regularization parameter β remains nontrivial, learn appropriate values by experience for given study type 2.48

88 Choice 4.3: Constraints Nonnegativity Known support Count preserving Upper bounds on values e.g., maximum µ of attenuation map in transmission case Considerations Algorithm complexity Computation Convergence rate Bias (in low-count regions)

89 Open Problems Modeling Noise in a i j s (system model errors) Noise in ˆr i s (estimates of scatter / randoms) Statistics of corrected measurements Statistics of measurements with deadtime losses For PL or MAP reconstruction, Qi (MIC 2004) has derived a bound on system model errors relative to data noise. Cost functions Performance prediction for nonquadratic penalties Effect of nonquadratic penalties on detection tasks Choice of regularization parameters for nonquadratic regularization 2.50

90 Summary 1. Object parameterization: function f( r) vs vector x 2. System physical model: s i ( r) 3. Measurement statistical model Y i? 4. Cost function: data-mismatch / regularization / constraints Reconstruction Method Cost Function + Algorithm Naming convention criterion - algorithm : ML-EM, MAP-OSL, PL-SAGE, PWLS+SOR, PWLS-CG,

91 Part 3. Algorithms Method = Cost Function + Algorithm Outline Ideal algorithm Classical general-purpose algorithms Considerations: nonnegativity parallelization convergence rate monotonicity Algorithms tailored to cost functions for imaging Optimization transfer EM-type methods Poisson emission problem Poisson transmission problem Ordered-subsets / block-iterative algorithms Recent convergent versions 3.1

92 Why iterative algorithms? For nonquadratic Ψ, no closed-form solution for minimizer. For quadratic Ψ with nonnegativity constraints, no closed-form solution. For quadratic Ψ without constraints, closed-form solutions: PWLS: OLS: ˆx = argmin y Ax 2 W 1/2 + x Rx = [A W A+ R] 1 A W y x ˆx = argmin y Ax 2 = [A A] 1 A y x Impractical (memory and computation) for realistic problem sizes. A is sparse, but A A is not. All algorithms are imperfect. No single best solution. 3.2

93 General Iteration Projection Measurements Calibration... x (n) Iteration System Model Ψ x (n+1) Parameters Deterministic iterative mapping: x (n+1) = M (x (n) ) 3.3

94 Ideal Algorithm x argmin Ψ(x) x 0 (global minimizer) Properties stable and convergent {x (n) } converges to x if run indefinitely converges quickly {x (n) } gets close to x in just a few iterations globally convergent lim n x (n) independent of starting image x (0) fast requires minimal computation per iteration robust insensitive to finite numerical precision user friendly nothing to adjust (e.g., acceleration factors) parallelizable (when necessary) simple easy to program and debug flexible accommodates any type of system model (matrix stored by row or column, or factored, or projector/backprojector) Choices: forgo one or more of the above 3.4

95 Classic Algorithms Non-gradient based Exhaustive search Nelder-Mead simplex (amoeba) Converge very slowly, but work with nondifferentiable cost functions. Gradient based Gradient descent x (n+1) x (n) α Ψ ( x (n)) Choosing α to ensure convergence is nontrivial. Steepest descent x (n+1) x (n) α n Ψ ( x (n)) where α n argmin Ψ ( x (n) α Ψ ( x (n))) α Computing α n can be expensive. Limitations Converge slowly. Do not easily accommodate nonnegativity constraint. 3.5

96 Gradients & Nonnegativity - A Mixed Blessing Unconstrained optimization of differentiable cost functions: Ψ(x) = 0 when x = x A necessary condition always. A sufficient condition for strictly convex cost functions. Iterations search for zero of gradient. Nonnegativity-constrained minimization: Karush-Kuhn-Tucker conditions Ψ(x) x j x=x is { = 0, x j > 0 0, x j = 0 A necessary condition always. A sufficient condition for strictly convex cost functions. Iterations search for??? 0 = x j x j Ψ(x ) is a necessary condition, but never sufficient condition. 3.6

97 Karush-Kuhn-Tucker Illustrated Ψ(x) Inactive constraint Active constraint x 3.7

98 Why Not Clip Negatives? 3 2 WLS with Clipped Newton Raphson Nonnegative Orthant 1 x Newton-Raphson with negatives set to zero each iteration. Fixed-point of iteration is not the constrained minimizer! x 1 3.8

99 Newton-Raphson Algorithm x (n+1) = x (n) [ 2 Ψ ( x (n)) ] 1 Ψ ( x (n)) Advantage: Super-linear convergence rate (if convergent) Disadvantages: Requires twice-differentiable Ψ Not guaranteed to converge Not guaranteed to monotonically decrease Ψ Does not enforce nonnegativity constraint Computing Hessian 2 Ψ often expensive Impractical for image recovery due to matrix inverse General purpose remedy: bound-constrained Quasi-Newton algorithms 3.9

100 2nd-order Taylor series: Newton s Quadratic Approximation Ψ(x) φ(x; x (n) ) Ψ ( x (n)) + Ψ ( x (n)) (x x (n) )+ 1 2 (x x(n) ) T 2 Ψ ( x (n)) (x x (n) ) Set x (n+1) to the ( easily found) minimizer of this quadratic approximation: x (n+1) argmin φ(x; x (n) ) x = x (n) [ 2 Ψ ( x (n)) ] 1 Ψ ( x (n)) Can be nonmonotone for Poisson emission tomography log-likelihood, even for a single pixel and single ray: Ψ(x) = (x+r) ylog(x+r). 3.10

101 Nonmonotonicity of Newton-Raphson Ψ(x) 0.5 new 1 old 1.5 Log Likelihood Newton Parabola x 3.11

102 An algorithm is monotonic if Consideration: Monotonicity Ψ ( x (n+1)) Ψ ( x (n)), x (n). Three categories of algorithms: Nonmonotonic (or unknown) Forced monotonic (e.g., by line search) Intrinsically monotonic (by design, simplest to implement) Forced monotonicity Most nonmonotonic algorithms can be converted to forced monotonic algorithms by adding a line-search step: x temp M (x (n) ), d = x temp x (n) x (n+1) x (n) α n d (n) where α n argmin Ψ ( x (n) αd (n)) α Inconvenient, sometimes expensive, nonnegativity problematic. 3.12

103 Conjugate Gradient Algorithm Advantages: Fast converging (if suitably preconditioned) (in unconstrained case) Monotonic (forced by line search in nonquadratic case) Global convergence (unconstrained case) Flexible use of system matrix A and tricks Easy to implement in unconstrained quadratic case Highly parallelizable Disadvantages: Nonnegativity constraint awkward (slows convergence?) Line-search awkward in nonquadratic cases Highly recommended for unconstrained quadratic problems (e.g., PWLS without nonnegativity). Useful (but perhaps not ideal) for Poisson case too. 3.13

104 Consideration: Parallelization Simultaneous (fully parallelizable) update all pixels simultaneously using all data EM, Conjugate gradient, ISRA, OSL, SIRT, MART,... Block iterative (ordered subsets) update (nearly) all pixels using one subset of the data at a time OSEM, RBBI,... Row action update many pixels using a single ray at a time ART, RAMLA Pixel grouped (multiple column action) update some (but not all) pixels simultaneously a time, using all data Grouped coordinate descent, multi-pixel SAGE (Perhaps the most nontrivial to implement) Sequential (column action) update one pixel at a time, using all (relevant) data Coordinate descent, SAGE 3.14

105 Coordinate Descent Algorithm aka Gauss-Siedel, successive over-relaxation (SOR), iterated conditional modes (ICM) Update one pixel at a time, holding others fixed to their most recent values: ( x new j = argmin Ψ x1 new,...,x new j 1,x j,x old j+1,...,xn old p ), j = 1,...,n p x j 0 Advantages: Intrinsically monotonic Fast converging (from good initial image) Global convergence Nonnegativity constraint trivial Disadvantages: Requires column access of system matrix A Cannot exploit some tricks for A, e.g., factorizations Expensive arg min for nonquadratic problems Poorly parallelizable 3.15

106 Constrained Coordinate Descent Illustrated 2 Clipped Coordinate Descent Algorithm x x

107 Coordinate Descent - Unconstrained 2 Unconstrained Coordinate Descent Algorithm x x

108 Coordinate-Descent Algorithm Summary Recommended when all of the following apply: quadratic or nearly-quadratic convex cost function nonnegativity constraint desired precomputed and stored system matrix A with column access parallelization not needed (standard workstation) Cautions: Good initialization (e.g., properly scaled FBP) essential. (Uniform image or zero image cause slow initial convergence.) Must be programmed carefully to be efficient. (Standard Gauss-Siedel implementation is suboptimal.) Updates high-frequencies fastest = poorly suited to unregularized case Used daily in UM clinic for 2D SPECT / PWLS / nonuniform attenuation 3.18

109 Summary of General-Purpose Algorithms Gradient-based Fully parallelizable Inconvenient line-searches for nonquadratic cost functions Fast converging in unconstrained case Nonnegativity constraint inconvenient Coordinate-descent Very fast converging Nonnegativity constraint trivial Poorly parallelizable Requires precomputed/stored system matrix CD is well-suited to moderate-sized 2D problem (e.g., 2D PET), but poorly suited to large 2D problems (X-ray CT) and fully 3D problems Neither is ideal.... need special-purpose algorithms for image reconstruction! 3.19

110 Data-Mismatch Functions Revisited For fast converging, intrinsically monotone algorithms, consider the form of Ψ. WLS: Ł(x) = n d 1 i=1 2 w i(y i [Ax] i ) 2 = Emission Poisson (negative) log-likelihood: Ł(x) = n d i=1 n d i=1 n d h i ([Ax] i ), where h i (l) 1 i=1 2 w i(y i l) 2. ([Ax] i + r i ) y i log([ax] i + r i ) = where h i (l) (l + r i ) y i log(l + r i ). Transmission Poisson log-likelihood: ) ) Ł(x) = (b i e [Ax] i + r i y i log (b i e [Ax] i + r i = n d h i ([Ax] i ) i=1 where h i (l) (b i e l + r i ) y i log ( b i e l + r i ). n d h i ([Ax] i ) i=1 MRI, polyenergetic X-ray CT, confocal microscopy, image restoration,... All have same partially separable form. 3.20

111 General Imaging Cost Function General form for data-mismatch function: n d Ł(x) = h i ([Ax] i ) i=1 General form for regularizing penalty function: General form for cost function: Ψ(x) = Ł(x)+βR(x) = R(x) = ψ k ([Cx] k ) k n d i=1 h i ([Ax] i )+β ψ k ([Cx] k ) k Properties of Ψ we can exploit: summation form (due to independence of measurements) convexity of h i functions (usually) summation argument (inner product of x with ith row of A) Most methods that use these properties are forms of optimization transfer. 3.21

112 Optimization Transfer Illustrated Ψ(x) and φ (n) (x) Surrogate function Cost function x (n) x (n+1) 3.22

113 Optimization Transfer General iteration: x (n+1) = argmin φ ( x; x (n)) x 0 Monotonicity conditions (cost function Ψ decreases provided these hold): φ(x (n) ; x (n) ) = Ψ(x (n) ) x φ(x; x (n) ) = Ψ(x) (n) x=x x=x (n) (matched current value) (matched gradient) φ(x; x (n) ) Ψ(x) x 0 (lies above) These 3 (sufficient) conditions are satisfied by the Q function of the EM algorithm (and SAGE). The 3rd condition is not satisfied by the Newton-Raphson quadratic approximation, which leads to its nonmonotonicity. 3.23

114 Optimization Transfer in 2d 3.24

115 Optimization Transfer cf EM Algorithm E-step: choose surrogate function φ(x; x (n) ) M-step: minimize surrogate function Designing surrogate functions Easy to compute Easy to minimize Fast convergence rate x (n+1) = argmin φ ( x; x (n)) x 0 Often mutually incompatible goals... compromises 3.25

116 Convergence Rate: Slow High Curvature Small Steps Slow Convergence φ Φ Old New x 3.26

117 Convergence Rate: Fast Low Curvature Large Steps Fast Convergence φ Φ Old New x 3.27

118 Tool: Convexity Inequality g(x) x x 1 αx 1 +(1 α)x 2 x 2 g convex = g(αx 1 +(1 α)x 2 ) αg(x 1 )+(1 α)g(x 2 ) for α [0,1] More generally: α k 0 and k α k = 1 = g( k α k x k ) k α k g(x k ). Sum outside! 3.28

119 Example 1: Classical ML-EM Algorithm Negative Poisson log-likelihood cost function (unregularized): Ψ(x) = n d h i ([Ax] i ), h i (l) = (l + r i ) y i log(l + r i ). i=1 Intractable to minimize directly due to summation within logarithm. Clever trick due to De Pierro (let ȳy (n) i = [Ax (n) ] i + r i ): [ ]( ) n p n p ai j x (n) j x j [Ax] i = a i j x j = ȳy (n) j=1 j=1 ȳy (n) i x (n) i j Since the h i s are convex in Poisson emission model: ( [ ]( )) np ai j x (n) n j x p j h i ([Ax] i ) = h i ȳy (n) j=1 ȳy (n) i x (n) i j j=1 [ ai j x (n) j ȳy (n) i. ]h i ( x j ȳy (n) x (n) i j ) x j ȳy (n) x (n) i j n d Ψ(x) = h i ([Ax] i ) φ ( [ ( x; x (n)) n d n p ai j x (n) j ]h i=1 i=1 j=1 ȳy (n) i i Replace convex cost function Ψ(x) with separable surrogate function φ(x; x (n) ). ) 3.29

120 ML-EM Algorithm M-step E-step gave separable surrogate function: φ ( x; x (n)) = M-step separates: Minimizing: n p ( φ j x j ; x (n)) (, where φ j x j ; x (n)) j=1 x (n+1) = argmin φ ( x; x (n)) = x (n+1) j x 0 x j φ j ( x j ; x (n)) = Solving (in case r i = 0): x (n+1) j n d ( a i j ḣ i ȳy (n) i i=1 = x (n) j [ nd i=1 a i j ) x j /x (n) j y i [Ax (n) ] i ] n d i=1 ( = argmin φ j x j ; x (n)), x j 0 / = n d a i j [1 i=1 [ ( ai j x (n) j ]h ȳy (n) i i ȳy (n) i y i x j /x (n) j ] ( nd a i j ), j = 1,...,n p i=1 x j ȳy (n) x (n) i j ) j = 1,...,n p x j =x (n+1) j. = 0. Derived without any statistical considerations, unlike classical EM formulation. Uses only convexity and algebra. Guaranteed monotonic: surrogate function φ satisfies the 3 required properties. M-step trivial due to separable surrogate. 3.30

121 ML-EM is Scaled Gradient Descent x (n+1) j = x (n) j = x (n) j = x (n) j [ nd i=1 + x (n) j a i j y i ȳy (n) i [ nd i=1 ] ( x (n) j n d i=1 a i j / a i j ( ) ( ) nd a i j i=1 )] y i ȳy (n) i 1 x j Ψ ( x (n)), / ( ) nd a i j i=1 j = 1,...,n p x (n+1) = x (n) + D(x (n) ) Ψ ( x (n)) This particular diagonal scaling matrix remarkably ensures monotonicity, ensures nonnegativity. 3.31

122 Consideration: Separable vs Nonseparable 2 Separable 2 Nonseparable 1 1 x2 0 x x x 1 Contour plots: loci of equal function values. Uncoupled vs coupled minimization. 3.32

123 Separable Surrogate Functions (Easy M-step) The preceding EM derivation structure applies to any cost function of the form Ψ(x) = n d h i ([Ax] i ). i=1 cf ISRA (for nonnegative LS), convex algorithm for transmission reconstruction Derivation yields a separable surrogate function Ψ(x) φ ( x; x (n)), where φ ( x; x (n)) = n p ( φ j x j ; x (n)) j=1 M-step separates into 1D minimization problems (fully parallelizable): x (n+1) = argmin φ ( x; x (n)) = x (n+1) j x 0 ( = argmin φ j x j ; x (n)), x j 0 j = 1,...,n p Why do EM / ISRA / convex-algorithm / etc. converge so slowly? 3.33

124 Separable vs Nonseparable Ψ Ψ φ φ Separable Nonseparable Separable surrogates (e.g., EM) have high curvature... slow convergence. Nonseparable surrogates can have lower curvature... faster convergence. Harder to minimize? Use paraboloids (quadratic surrogates). 3.34

125 High Curvature of EM Surrogate 2 Log Likelihood EM Surrogates hi(l) and Q(l;l n ) l 3.35

126 1D Parabola Surrogate Function Find parabola q (n) i (l) of the form: ( ( ) 1 q (n) i (l) = h i l (n) i )+ḣ i l (n) i (l l (n) i )+c (n) i (l l(n) i ) 2, where l (n) i [Ax (n) ] i 2 Satisfies tangent condition. Choose curvature to ensure lies above condition: { } c (n) i min c 0 : q (n) i (l) h i (l), l 0. Surrogate Functions for Emission Poisson Negative log likelihood Parabola surrogate function EM surrogate function Cost function values l l (n) i l Lower curvature! 3.36

127 Paraboloidal Surrogate Combining 1D parabola surrogates yields paraboloidal surrogate: n d Ψ(x) = h i ([Ax] i ) φ ( x; x (n)) n d = q (n) i ([Ax] i ) i=1 i=1 Rewriting: φ ( δ+ x (n) ; x (n)) = Ψ ( x (n)) + Ψ ( x (n)) δ+ 1 { 2 δ A diag Advantages Surrogate φ(x; x (n) ) is quadratic, unlike Poisson log-likelihood = easier to minimize Not separable (unlike EM surrogate) Not self-similar (unlike EM surrogate) Small curvatures = fast convergence Intrinsically monotone global convergence Fairly simple to derive / implement c (n) i } Aδ Quadratic minimization Coordinate descent + fast converging + Nonnegativity easy - precomputed column-stored system matrix Gradient-based quadratic minimization methods - Nonnegativity inconvenient 3.37

128 Example: PSCD for PET Transmission Scans square-pixel basis strip-integral system model shifted-poisson statistical model edge-preserving convex regularization (Huber) nonnegativity constraint inscribed circle support constraint paraboloidal surrogate coordinate descent (PSCD) algorithm 3.38

129 Separable Paraboloidal Surrogate To derive a parallelizable algorithm apply another De Pierro trick: n p [ ] ai j [Ax] i = π i j (x j x (n) j )+l (n) i, l (n) i = [Ax (n) ] i. j=1 π i j Provided π i j 0 and n p j=1 π i j = 1, since parabola q i is convex: ( np [ ] ) n p q (n) i ([Ax] i ) = q (n) ai j i π i j (x j x (n) j )+l (n) i π i j q (n) i j=1 π i j... φ ( x; x (n)) = n d i=1 q (n) i ([Ax] i ) φ ( x; x (n)) Separable Paraboloidal Surrogate: φ ( x; x (n)) = n p ( φ j x j ; x (n)), j=1 n d i=1 φ j ( x j ; x (n)) j=1 n p π i j q (n) i j=1 Parallelizable M-step (cf gradient descent!): [ ( x (n+1) j = argmin φ j x j ; x (n)) = x (n) j 1 Ψ ( ] x (n)) x j 0 d (n) x j j Natural choice is π i j = a i j / a i, a i = n p j=1 a i j ( ai j ( ai j (x j x (n) j )+l (n) i π i j (x j x (n) j )+l (n) i π i j ) n d ( ) π i j q (n) ai j i (x j x (n) j )+l (n) i i=1 π i j +, d (n) j = n d a 2 i j c (n) i i=1 π i j ) 3.39

130 Example: Poisson ML Transmission Problem Transmission negative log-likelihood (for ith ray): h i (l) = (b i e l + r i ) y i log ( b i e l + r i ). Optimal (smallest) parabola surrogate curvature (Erdoğan, T-MI, Sep. 1999): [ ] h(0) h(l)+ḣ(l)l 2, l > 0 c (n) i = c(l (n) i,h i ), c(l,h) = [ḧ(l) ] l 2 + +, l = 0. Separable Paraboloidal Surrogate (SPS) Algorithm: Precompute a i = n p j=1 a i j, i = 1,...,n d x (n+1) j = [ x (n) j 1 d (n) j l (n) i = [Ax (n) ] i, (forward projection) = b i e l(n) i + r i (predicted means) = 1 y i /ȳy (n) i (slopes) = c(l (n) i,h i ) (curvatures) Ψ ( ] [ ] x (n)) = x (n) j n d i=1 a i j ḣi (n) x j ȳy (n) i (n) ḣ i c (n) i + n d i=1 a i j a i c (n) i Monotonically decreases cost function each iteration. +, j = 1,...,n p No logarithm! 3.40

131 The MAP-EM M-step Problem Add a penalty function to our surrogate for the negative log-likelihood: Ψ(x) = Ł(x)+βR(x) φ ( x; x (n)) n p ( = φ j x j ; x (n)) +βr(x) j=1 M-step: x (n+1) = argmin φ ( x; x (n)) = argmin x 0 x 0 n p ( φ j x j ; x (n)) +βr(x) =? j=1 For nonseparable penalty functions, the M-step is coupled... difficult. Suboptimal solutions Generalized EM (GEM) algorithm (coordinate descent on φ) Monotonic, but inherits slow convergence of EM. One-step late (OSL) algorithm (use outdated gradients) (Green, T-MI, 1990) x j φ(x; x (n) ) = x j φ j (x j ; x (n) )+β x j R(x)? x j φ j (x j ; x (n) )+β x j R(x (n) ) Nonmonotonic. Known to diverge, depending on β. Temptingly simple, but avoid! Contemporary solution Use separable surrogate for penalty function too (De Pierro, T-MI, Dec. 1995) Ensures monotonicity. Obviates all reasons for using OSL! 3.41

132 De Pierro s MAP-EM Algorithm Apply separable paraboloidal surrogates to penalty function: n p R(x) R SPS (x; x (n) ) = R j (x j ; x (n) ) j=1 Overall separable surrogate: φ ( x; x (n)) = The M-step becomes fully parallelizable: x (n+1) j n p ( φ j x j ; x (n)) +β j=1 ( = argmin φ j x j ; x (n)) βr j (x j ; x (n) ), j = 1,...,n p. x j 0 Consider quadratic penalty R(x) = k ψ([cx] k ), where ψ(t) = t 2 /2. If γ k j 0 and n p j=1 γ k j = 1 then n p [ ck j [Cx] k = γ k j (x j x (n) j )+[Cx (n) ] k ]. j=1 γ k j Since ψ is convex: ψ([cx] k ) = ψ ( np [ ck j γ k j (x j x (n) j=1 γ k j n p γ k j ψ j=1 ( ck j j )+[Cx (n) ] k ] ) γ k j (x j x (n) j )+[Cx (n) ] k ) n p R j (x j ; x (n) ) j=1 3.42

133 De Pierro s Algorithm Continued So R(x) R(x; x (n) ) n p j=1 R j(x j ; x (n) ) where R j (x j ; x (n) ) γ k j ψ k ( ck j γ k j (x j x (n) j )+[Cx (n) ] k ) M-step: Minimizing φ j (x j ; x (n) )+βr j (x j ; x (n) ) yields the iteration: x (n+1) j = where B j 1 2 and ȳy (n) i = [Ax (n) ] i + r i. B j + [ nd i=1 x (n) j n d i=1 a i jy i /ȳy (n) i ( B 2j + x (n) j n d i=1 a i jy i /ȳy (n) i ( a i j + β c k j [Cx (n) ] k c2 k j x (n) j k γ k j )( ) β k c 2 k j /γ k j )], j = 1,...,n p Advantages: Intrinsically monotone, nonnegativity, fully parallelizable. Requires only a couple % more computation per iteration than ML-EM Disadvantages: Slow convergence (like EM) due to separable surrogate 3.43

134 Ordered Subsets Algorithms aka block iterative or incremental gradient algorithms The gradient appears in essentially every algorithm: Ł(x) = n d h i ([Ax] i ) = Ł(x) = i=1 x j n d a i j ḣ i ([Ax] i ). i=1 This is a backprojection of a sinogram of the derivatives { ḣ i ([Ax] i ) }. Intuition: with half the angular sampling, this backprojection would be fairly similar 1 n d where S is a subset of the rays. n d a i j ḣ i ( ) 1 i=1 S a i j ḣ i ( ), i S To OS-ize an algorithm, replace all backprojections with partial sums. Recall typical iteration: x (n+1) = x (n) D(x (n) ) Ψ ( x (n)). 3.44

135 Geometric View of Ordered Subsets n x f x 1 ( k ) Φ ( x k ) f2( x k ) f1( x n ) argmax f 1( x ) k x Φ ( x n ) f2( x n ) * x argmax f 2( x ) Two subset case: Ψ(x) = f 1 (x)+ f 2 (x) (e.g., odd and even projection views). For x (n) far from x, even partial gradients should point roughly towards x. For x (n) near x, however, Ψ(x) 0, so f 1 (x) f 2 (x) = cycles! Issues. Subset gradient balance : Ψ(x) M f k (x). Choice of ordering. 3.45

136 Incremental Gradients (WLS, 2 Subsets) f WLS (x) 10 2 f even (x) 10 2 f odd (x) 10 x true x 0 1 difference 5 difference 5 M=2 0 5 (full subset)

137 Subset Gradient Imbalance f WLS (x) 10 2 f 0 90 (x) 10 2 f (x) x 0 1 difference 5 difference 5 M=2 0 5 (full subset)

138 Non-monotone Does not converge (may cycle) Problems with OS-EM Byrne s rescaled block iterative (RBI) approach converges only for consistent (noiseless) data... unpredictable What resolution after n iterations? Object-dependent, spatially nonuniform What variance after n iterations? ROI variance? (e.g., for Huesman s WLS kinetics) 3.48

139 OSEM vs Penalized Likelihood image sinogram 10 6 counts 15% randoms/scatter uniform attenuation contrast in cold region within-region σ opposite side 3.49

140 Contrast-Noise Results OSEM 1 subset OSEM 4 subset OSEM 16 subset PL PSCA Noise (64 angles) Uniform image Contrast 3.50

141 1.5 Horizontal Profile OSEM 4 subsets, 5 iterations PL PSCA 10 iterations Relative Activity x

142 Making OS Methods Converge Relaxation Incrementalism Relaxed block-iterative methods M Ψ(x) = Ψ m (x) m=1 ( x (n+(m+1)/m) = x (n+m/m) α n D(x (n+m/m) ) Ψ m x ), (n+m/m) m = 0,...,M 1 Relaxation of step sizes: α n 0 as n, α n =, n ART RAMLA, BSREM (De Pierro, T-MI, 1997, 2001) Ahn and Fessler, NSS/MIC 2001, T-MI 2003 α 2 n < n Considerations Proper relaxation can induce convergence, but still lacks monotonicity. Choice of relaxation schedule requires experimentation. Ψ m (x) = Ł m (x)+ 1 M R(x), so each Ψ m includes part of the likelihood yet all of R 3.52

143 Relaxed OS-SPS x Penalized likelihood increase Original OS SPS Modified BSREM Relaxed OS SPS Iteration 3.53

144 Incremental Methods Incremental EM applied to emission tomography by Hsiao et al. as C-OSEM Incremental optimization transfer (Ahn & Fessler, MIC 2004) Find majorizing surrogate for each sub-objective function: φ m (x; x) = Ψ m (x), φ m (x; x) Ψ m (x), x x, x Define the following augmented cost function: F(x; x 1,..., x M ) = M m=1 φ m (x; x m ). Fact: by construction ˆx = argmin x Ψ(x) = argmin x min x 1,..., x M F(x; x 1,..., x M ). Alternating minimization: for m = 1,...,M: ( x new = argmin F x; x (n+1) 1,..., x (n+1) (n) m 1, x m, x (n) x x (n+1) m = argmin F x m m+1 ( x new ; x (n+1) 1,..., x (n+1) m 1, x m, x (n) m+1 ) (n),... x M ) (n),... x = x new. Use all current information, but increment the surrogate for only one subset. Monotone in F, converges under reasonable assumptions on Ψ In constrast, OS-EM uses the information from only one subset at a time M 3.54

145 TRIOT Example: Convergence Rate Transmission incremental optimization transfer (TRIOT) normalized Φ difference subsets, initialized with uniform image SPS MC SPS PC TRIOT MC TRIOT PC OS SPS 2 iterations of OS SPS included iteration 3.55

146 TRIOT Example: Attenuation Map Images FBP PL optimal image OS-SPS TRIOT-PC OS-SPS: 64 subsets, 20 iterations, one point of the limit cycle TRIOT-PC: 64 subsets, 20 iterations, after 2 iterations of OS-SPS) 3.56

147 OSTR aka Transmission OS-SPS Ordered subsets version of separable paraboloidal surrogates for PET transmission problem with nonquadratic convex regularization Matlab m-file fessler /code/transmission/tpl osps.m 3.57

148 Precomputed curvatures for OS-SPS Separable Paraboloidal Surrogate (SPS) Algorithm: [ ] x (n+1) j = x (n) j n d i=1 a i j ḣi([ax (n) ] i ) n d i=1 a, j = 1,...,n i j a i c (n) p i Ordered-subsets abandons monotonicity, so why use optimal curvatures c (n) i? + Precomputed curvature: c i = ḧ i (ˆl i ), ˆl i = argmin h i (l) l Precomputed denominator (saves one backprojection each iteration!): d j = OS-SPS algorithm with M subsets: [ x (n+1) j = n d a i j a i c i, j = 1,...,n p. i=1 x (n) j i S (n) a i j ḣ i ([Ax (n) ] i ) d j /M ] +, j = 1,...,n p 3.58

149 Summary of Algorithms General-purpose optimization algorithms Optimization transfer for image reconstruction algorithms Separable surrogates = high curvatures = slow convergence Ordered subsets accelerate initial convergence require relaxation or incrementalism for true convergence Principles apply to emission and transmission reconstruction Still work to be done... An Open Problem Still no algorithm with all of the following properties: Nonnegativity easy Fast converging Intrinsically monotone global convergence Accepts any type of system matrix Parallelizable 3.59

150 Spatial resolution properties Noise properties Detection properties Part 4. Performance Characteristics 4.1

151 Choosing β can be painful, so... For true minimization methods: Spatial Resolution Properties ˆx = argmin Ψ(x) x the local impulse response is approximately (Fessler and Rogers, T-MI, 1996): l j (x) = lim δ 0 E[ˆx x+δe j ] E[ˆx x] δ [ 20 Ψ ] 1 11 Ψ x j ȳy(x). Depends only on chosen cost function and statistical model. Independent of optimization algorithm (if iterated to convergence ). Enables prediction of resolution properties (provided Ψ is minimized) Useful for designing regularization penalty functions with desired resolution properties. For penalized likelihood: l j (x) [A W A+βR] 1 A W Ax true. Helps choose β for desired spatial resolution 4.2

152 Modified Penalty Example, PET a) b) c) d) e) a) filtered backprojection b) Penalized unweighted least-squares c) PWLS with conventional regularization d) PWLS with certainty-based penalty [36] e) PWLS with modified penalty [183] 4.3

153 Modified Penalty Example, SPECT - Noiseless Target filtered object FBP Conventional PWLS Truncated EM Post-filtered EM Modified Regularization 4.4

154 Modified Penalty Example, SPECT - Noisy Target filtered object FBP Conventional PWLS Truncated EM Post-filtered EM Modified Regularization 4.5

155 Regularized vs Post-filtered, with Matched PSF Pixel Standard Deviation (%) Noise Comparisons at the Center Pixel Uniformity Corrected FBP Penalized Likelihood Post Smoothed ML Target Image Resolution (mm) 4.6

156 Reconstruction Noise Properties For unconstrained (converged) minimization methods, the estimator is implicit: What is Cov{ˆx}? ˆx = ˆx(y) = argmin Ψ(x, y). x New simpler derivation. Denote the column gradient by g(x, y) x Ψ(x, y). Ignoring constraints, the gradient is zero at the minimizer: g(ˆx(y), y) = 0. First-order Taylor series expansion: Equating to zero: g(ˆx, y) g(x true, y)+ x g(x true, y)(ˆx x true ) = g(x true, y)+ 2 x Ψ ( x true, y ) (ˆx x true ). ˆx x true [ 2 x Ψ ( x true, y )] 1 x Ψ ( x true, y ). If the Hessian 2 Ψ is weakly dependent on y, then Cov{ˆx} [ 2 x Ψ ( x true,ȳy )] 1 Cov { x Ψ ( x true, y )}[ 2 x Ψ ( x true,ȳy )] 1. If we further linearize w.r.t. the data: g(x, y) g(x,ȳy)+ y g(x,ȳy)(y ȳy), then Cov{ˆx} [ 2 x Ψ ] 1 ( x y Ψ) Cov{y} ( x y Ψ) [ 2 x Ψ ]

157 Covariance approximation: Covariance Continued Cov{ˆx} [ 2 x Ψ ( x true,ȳy )] 1 Cov { x Ψ ( x true, y )}[ 2 x Ψ ( x true,ȳy )] 1 Depends only on chosen cost function and statistical model. Independent of optimization algorithm. Enables prediction of noise properties Can make variance images Useful for computing ROI variance (e.g., for weighted kinetic fitting) Good variance prediction for quadratic regularization in nonzero regions Inaccurate for nonquadratic penalties, or in nearly-zero regions 4.8

158 Qi and Huesman s Detection Analysis SNR of MAP reconstruction > SNR of FBP reconstruction (T-MI, Aug. 2001) quadratic regularization SKE/BKE task prewhitened observer non-prewhitened observer Open issues Choice of regularizer to optimize detectability? Active work in several groups. (e.g., 2004 MIC poster by Yendiki & Fessler.) 4.9

159 Part 5. Miscellaneous Topics (Pet peeves and more-or-less recent favorites) Short transmission scans 3D PET options OSEM of transmission data (ugh!) Precorrected PET data Transmission scan problems List-mode EM List of other topics I wish I had time to cover

160 PET Attenuation Correction (J. Nuyts) 5.2

161 Iterative reconstruction for 3D PET Fully 3D iterative reconstruction Rebinning / 2.5D iterative reconstruction Rebinning / 2D iterative reconstruction PWLS OSEM with attenuation weighting 3D FBP Rebinning / FBP 5.3

162 OSEM of Transmission Data? Bai and Kinahan et al. Post-injection single photon transmission tomography with ordered-subset algorithms for whole-body PET imaging 3D penalty better than 2D penalty OSTR with 3D penalty better than FBP and OSEM standard deviation from a single realization to estimate noise can be misleading Using OSEM for transmission data requires taking logarithm, whereas OSTR does not. 5.4

163 Precorrected PET data C. Michel examined shifted-poisson model, weighted OSEM of various flavors. concluded attenuation weighting matters especially 5.5

164 Transmission Scan Challenges Overlapping-beam transmission scans Polyenergetic X-ray CT scans Sourceless attenuation correction All can be tackled with optimization transfer methods. 5.6

165 List-mode EM x (n+1) j = x (n) j [ nd i=1 a i j y i ȳy (n) i ] / ( ) nd a i j i=1 = x(n) j n d i=1 a i j i:y i 0 a i j y i ȳy (n) i Useful when n d i=1 y i n d i=1 1 Attenuation and scatter non-trivial Computing a i j on-the-fly Computing sensitivity n d i=1 a i j still painful List-mode ordered-subsets is naturally balanced 5.7

166 Misc 4D regularization (reconstruction of dynamic image sequences) Sourceless attenuation-map estimation Post-injection transmission/emission reconstruction µ-value priors for transmission reconstruction Local errors in ˆµ propagate into emission image (PET and SPECT) 5.8

167 Summary Predictability of resolution / noise and controlling spatial resolution argues for regularized cost function todo: Still work to be done

Part 3. Algorithms. Method = Cost Function + Algorithm

Part 3. Algorithms. Method = Cost Function + Algorithm Part 3. Algorithms Method = Cost Function + Algorithm Outline Ideal algorithm Classical general-purpose algorithms Considerations: nonnegativity parallelization convergence rate monotonicity Algorithms

More information

Statistical Methods for Image Reconstruction

Statistical Methods for Image Reconstruction Statistical Methods for Image Reconstruction Jeffrey A. Fessler EECS Department The University of Michigan Johns Hopkins University: Short Course May 11, 2007 0.0 Image Reconstruction Methods (Simplified

More information

Statistical Methods for Image Reconstruction. Image Reconstruction Methods (Simplified View) Jeffrey A. Fessler

Statistical Methods for Image Reconstruction. Image Reconstruction Methods (Simplified View) Jeffrey A. Fessler Statistical Methods for Image Reconstruction Jeffrey A. Fessler EECS Department The University of Michigan These annotated slides were prepared by Jeff Fessler for attendees of the NSS-MIC short course

More information

Objective Functions for Tomographic Reconstruction from. Randoms-Precorrected PET Scans. gram separately, this process doubles the storage space for

Objective Functions for Tomographic Reconstruction from. Randoms-Precorrected PET Scans. gram separately, this process doubles the storage space for Objective Functions for Tomographic Reconstruction from Randoms-Precorrected PET Scans Mehmet Yavuz and Jerey A. Fessler Dept. of EECS, University of Michigan Abstract In PET, usually the data are precorrected

More information

Scan Time Optimization for Post-injection PET Scans

Scan Time Optimization for Post-injection PET Scans Presented at 998 IEEE Nuc. Sci. Symp. Med. Im. Conf. Scan Time Optimization for Post-injection PET Scans Hakan Erdoğan Jeffrey A. Fessler 445 EECS Bldg., University of Michigan, Ann Arbor, MI 4809-222

More information

Reconstruction from Digital Holograms by Statistical Methods

Reconstruction from Digital Holograms by Statistical Methods Reconstruction from Digital Holograms by Statistical Methods Saowapak Sotthivirat Jeffrey A. Fessler EECS Department The University of Michigan 2003 Asilomar Nov. 12, 2003 Acknowledgements: Brian Athey,

More information

Optimization transfer approach to joint registration / reconstruction for motion-compensated image reconstruction

Optimization transfer approach to joint registration / reconstruction for motion-compensated image reconstruction Optimization transfer approach to joint registration / reconstruction for motion-compensated image reconstruction Jeffrey A. Fessler EECS Dept. University of Michigan ISBI Apr. 5, 00 0 Introduction Image

More information

A Study of Numerical Algorithms for Regularized Poisson ML Image Reconstruction

A Study of Numerical Algorithms for Regularized Poisson ML Image Reconstruction A Study of Numerical Algorithms for Regularized Poisson ML Image Reconstruction Yao Xie Project Report for EE 391 Stanford University, Summer 2006-07 September 1, 2007 Abstract In this report we solved

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

MAP Reconstruction From Spatially Correlated PET Data

MAP Reconstruction From Spatially Correlated PET Data IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL 50, NO 5, OCTOBER 2003 1445 MAP Reconstruction From Spatially Correlated PET Data Adam Alessio, Student Member, IEEE, Ken Sauer, Member, IEEE, and Charles A Bouman,

More information

Two-Material Decomposition From a Single CT Scan Using Statistical Image Reconstruction

Two-Material Decomposition From a Single CT Scan Using Statistical Image Reconstruction / 5 Two-Material Decomposition From a Single CT Scan Using Statistical Image Reconstruction Yong Long and Jeffrey A. Fessler EECS Department James M. Balter Radiation Oncology Department The University

More information

Optimized first-order minimization methods

Optimized first-order minimization methods Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure

More information

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Sparsity Regularization for Image Reconstruction with Poisson Data

Sparsity Regularization for Image Reconstruction with Poisson Data Sparsity Regularization for Image Reconstruction with Poisson Data Daniel J. Lingenfelter a, Jeffrey A. Fessler a,andzhonghe b a Electrical Engineering and Computer Science, University of Michigan, Ann

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

Superiorized Inversion of the Radon Transform

Superiorized Inversion of the Radon Transform Superiorized Inversion of the Radon Transform Gabor T. Herman Graduate Center, City University of New York March 28, 2017 The Radon Transform in 2D For a function f of two real variables, a real number

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 5, MAY

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 5, MAY IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 5, MAY 2004 591 Emission Image Reconstruction for Roms-Precorrected PET Allowing Negative Sinogram Values Sangtae Ahn*, Student Member, IEEE, Jeffrey

More information

Radionuclide Imaging MII Positron Emission Tomography (PET)

Radionuclide Imaging MII Positron Emission Tomography (PET) Radionuclide Imaging MII 3073 Positron Emission Tomography (PET) Positron (β + ) emission Positron is an electron with positive charge. Positron-emitting radionuclides are most commonly produced in cyclotron

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Analytical Approach to Regularization Design for Isotropic Spatial Resolution

Analytical Approach to Regularization Design for Isotropic Spatial Resolution Analytical Approach to Regularization Design for Isotropic Spatial Resolution Jeffrey A Fessler, Senior Member, IEEE Abstract In emission tomography, conventional quadratic regularization methods lead

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Iteration basics Notes for 2016-11-07 An iterative solver for Ax = b is produces a sequence of approximations x (k) x. We always stop after finitely many steps, based on some convergence criterion, e.g.

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Introduction to the Mathematics of Medical Imaging

Introduction to the Mathematics of Medical Imaging Introduction to the Mathematics of Medical Imaging Second Edition Charles L. Epstein University of Pennsylvania Philadelphia, Pennsylvania EiaJTL Society for Industrial and Applied Mathematics Philadelphia

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

6: Positron Emission Tomography

6: Positron Emission Tomography 6: Positron Emission Tomography. What is the principle of PET imaging? Positron annihilation Electronic collimation coincidence detection. What is really measured by the PET camera? True, scatter and random

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

A Brief Introduction to Medical Imaging. Outline

A Brief Introduction to Medical Imaging. Outline A Brief Introduction to Medical Imaging Outline General Goals Linear Imaging Systems An Example, The Pin Hole Camera Radiations and Their Interactions with Matter Coherent vs. Incoherent Imaging Length

More information

Motion Estimation (I) Ce Liu Microsoft Research New England

Motion Estimation (I) Ce Liu Microsoft Research New England Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a moving world Perceiving, understanding and predicting motion is an important part of our daily lives Motion

More information

Uniform Quadratic Penalties Cause Nonuniform Spatial Resolution

Uniform Quadratic Penalties Cause Nonuniform Spatial Resolution Uniform Quadratic Penalties Cause Nonuniform Spatial Resolution Jeffrey A. Fessler and W. Leslie Rogers 3480 Kresge 111, Box 0552, University of Michigan, Ann Arbor, MI 48109-0552 ABSTRACT This paper examines

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Sean Borman and Robert L. Stevenson Department of Electrical Engineering, University of Notre Dame Notre Dame,

More information

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan Karklin and Eero P. Simoncelli NYU Overview Efficient coding is a well-known objective for the evaluation and

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Exploring the energy landscape

Exploring the energy landscape Exploring the energy landscape ChE210D Today's lecture: what are general features of the potential energy surface and how can we locate and characterize minima on it Derivatives of the potential energy

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

A. I, II, and III B. I C. I and II D. II and III E. I and III

A. I, II, and III B. I C. I and II D. II and III E. I and III BioE 1330 - Review Chapters 7, 8, and 9 (Nuclear Medicine) 9/27/2018 Instructions: On the Answer Sheet, enter your 2-digit ID number (with a leading 0 if needed) in the boxes of the ID section. Fill in

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Regularizing inverse problems using sparsity-based signal models

Regularizing inverse problems using sparsity-based signal models Regularizing inverse problems using sparsity-based signal models Jeffrey A. Fessler William L. Root Professor of EECS EECS Dept., BME Dept., Dept. of Radiology University of Michigan http://web.eecs.umich.edu/

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Reconstruction from Fourier Samples (Gridding and alternatives)

Reconstruction from Fourier Samples (Gridding and alternatives) Chapter 6 Reconstruction from Fourier Samples (Gridding and alternatives) ch,four The customer of sampling and reconstruction technology faces a large gap between the practice of (non-uniform) sampling

More information

MEDICAL EQUIPMENT: NUCLEAR MEDICINE. Prof. Yasser Mostafa Kadah

MEDICAL EQUIPMENT: NUCLEAR MEDICINE. Prof. Yasser Mostafa Kadah MEDICAL EQUIPMENT: NUCLEAR MEDICINE Prof. Yasser Mostafa Kadah www.k-space.org Recommended Textbook Introduction to Medical Imaging: Physics, Engineering and Clinical Applications, by Nadine Barrie Smith

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

PET. Technical aspects

PET. Technical aspects PET Technical aspects 15 N 15 O Detector 1 β+ Detector 2 e- Evolution of PET Detectors CTI/Siemens 15 N 15 O Detector block 1 β+ Detector block 2 x e- x y y location line of response Constant fraction

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Lecture 8: Interest Point Detection. Saad J Bedros

Lecture 8: Interest Point Detection. Saad J Bedros #1 Lecture 8: Interest Point Detection Saad J Bedros sbedros@umn.edu Review of Edge Detectors #2 Today s Lecture Interest Points Detection What do we mean with Interest Point Detection in an Image Goal:

More information

Motion Estimation (I)

Motion Estimation (I) Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a moving world Perceiving, understanding and predicting motion is an important part of our daily lives Motion

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

ELG7173 Topics in signal Processing II Computational Techniques in Medical Imaging

ELG7173 Topics in signal Processing II Computational Techniques in Medical Imaging ELG7173 Topics in signal Processing II Computational Techniques in Medical Imaging Topic #1: Intro to medical imaging Medical Imaging Classifications n Measurement physics Send Energy into body Send stuff

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Compressive Sensing (CS)

Compressive Sensing (CS) Compressive Sensing (CS) Luminita Vese & Ming Yan lvese@math.ucla.edu yanm@math.ucla.edu Department of Mathematics University of California, Los Angeles The UCLA Advanced Neuroimaging Summer Program (2014)

More information

Laplace-distributed increments, the Laplace prior, and edge-preserving regularization

Laplace-distributed increments, the Laplace prior, and edge-preserving regularization J. Inverse Ill-Posed Probl.? (????), 1 15 DOI 1.1515/JIIP.????.??? de Gruyter???? Laplace-distributed increments, the Laplace prior, and edge-preserving regularization Johnathan M. Bardsley Abstract. For

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

In the derivation of Optimal Interpolation, we found the optimal weight matrix W that minimizes the total analysis error variance.

In the derivation of Optimal Interpolation, we found the optimal weight matrix W that minimizes the total analysis error variance. hree-dimensional variational assimilation (3D-Var) In the derivation of Optimal Interpolation, we found the optimal weight matrix W that minimizes the total analysis error variance. Lorenc (1986) showed

More information

Inverse problem and optimization

Inverse problem and optimization Inverse problem and optimization Laurent Condat, Nelly Pustelnik CNRS, Gipsa-lab CNRS, Laboratoire de Physique de l ENS de Lyon Decembre, 15th 2016 Inverse problem and optimization 2/36 Plan 1. Examples

More information

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition) NONLINEAR PROGRAMMING (Hillier & Lieberman Introduction to Operations Research, 8 th edition) Nonlinear Programming g Linear programming has a fundamental role in OR. In linear programming all its functions

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Chapter 2 PET Imaging Basics

Chapter 2 PET Imaging Basics Chapter 2 PET Imaging Basics Timothy G. Turkington PET Radiotracers Positron emission tomography (PET) imaging is the injection (or inhalation) of a substance containing a positron emitter, the subsequent

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Lossless Compression of List-Mode 3D PET Data 1

Lossless Compression of List-Mode 3D PET Data 1 Lossless Compression of List-Mode 3D PET Data 1 Evren Asma, David W. Shattuck and Richard M. Leahy Signal and Image Processing Institute, University of Southern California, Los Angeles, CA 989-56 Abstract

More information

Unconstrained minimization: assumptions

Unconstrained minimization: assumptions Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Nuclear Medicine Intro & Physics from Medical Imaging Signals and Systems, Chapter 7, by Prince and Links

Nuclear Medicine Intro & Physics from Medical Imaging Signals and Systems, Chapter 7, by Prince and Links Nuclear Medicine Intro & Physics from Medical Imaging Signals and Systems, Chapter 7, by Prince and Links NM - introduction Relies on EMISSION of photons from body (versus transmission of photons through

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

AN NONNEGATIVELY CONSTRAINED ITERATIVE METHOD FOR POSITRON EMISSION TOMOGRAPHY. Johnathan M. Bardsley

AN NONNEGATIVELY CONSTRAINED ITERATIVE METHOD FOR POSITRON EMISSION TOMOGRAPHY. Johnathan M. Bardsley Volume X, No. 0X, 0X, X XX Web site: http://www.aimsciences.org AN NONNEGATIVELY CONSTRAINED ITERATIVE METHOD FOR POSITRON EMISSION TOMOGRAPHY Johnathan M. Bardsley Department of Mathematical Sciences

More information