Information-Theoretic Imaging

Size: px

Start display at page:

Download "Information-Theoretic Imaging"

Elmer Stokes
5 years ago
Views:

1 with Applications in Hyperspectral Imaging and Transmission Tomography Joseph A. O SullivanO Electronic Systems and Signals Research Laboratory Department of Electrical Engineering Washington University jao@essrl.wustl.edu Supported by: ONR, ARO, NIH

2 Collaborators Faculty Donald L. Snyder William H. Smith Daniel R. Fuhrmann Jeffrey F. Williamson Bruce R. Whiting Richard E. Blahut Michael I. Miller Chrysanthe Preza David G. Politte James G. Blaine Students and Post-Docs Michael D. DeVore Natalia A. Schmid Metin Oz Ryan Murphy Jasenka Benac Adam Cataldo 2

3 Outline: Information Theory and Imaging Hyperspectral Imaging - Information Value Decomposition Transmission Tomography - Estimate in Exponential Family - Object-Constrained Computerized Tomography Alternating Minimization Algorithms Applications: HSI, CT Conclusions 3

4 Some Roles of Information Theory In Imaging Problems Some Important Ideas: Roles depend on problem studied Key problems are in detection, estimation, and classification Information is quantifiable SNR as a measure has limitations Information theory often provides performance bounds 4

5 How to Measure Information QUESTIONS: Can we measure the information in an image? Does one sensor provide more information than another? Does resolution measure information? Do more pixels in an image give more information? ANSWER: MAYBE Information for what? Information relative to what? Ill-defined questions clearly defined problem 5

6 Measuring Information Information for what? Detection Estimation Classification Image formation Information relative to what? Noise only Clutter plus noise Natural Scenery 6

7 Information Theory and Imaging Center for Imaging Science established 1995 Brown MURI on Performance Metrics established 1997 Invited Paper 1998 J. A. O Sullivan, O R. E. Blahut,, and D. L. Snyder, Information-Theoretic Image Formation, IEEE Transactions on Information Theory,, Oct IEEE 1998 Information Theory Workshop on Detection, Estimation, Classification, and Imaging IEEE Transactions on Information Theory Special Issue on Information-Theoretic Imaging, Aug Complexity Regularization, Large Deviations, 7

8 IEEE Transactions on Information Theory,, October 1998, Special Issue to commemorate the 50th anniversary of Claude E. Shannon's A Mathematical Theory of Communication Problem Definition Optimality Criterion Algorithm Development Performance Quantification 8

9 Hyperspectral Imaging WU Team at Washington University Donald L. Snyder William H. Smith Daniel R. Fuhrmann Joseph A. O SullivanO Chrysanthe Preza Snyder Digital Array Scanning Interferometer (DASI) Smith Fuhrmann O Sullivan 9

10 Hyperspectral Imaging Scene Cube Data Cube Drink from a fire hose Filter wheel, interferometer, tunable FPAs Modeling and processing: - data models - optimal algorithms - efficient algorithms 10

11 Hyperspectral Imaging Likelihood Models ideal data : r y y y hyx : sx: dxd hy x : h a y x : h a y x : 2 shear vector for Wollaston prism hy x : wavelength dependent amplitude PSF of DASI sx: scene intensity for incoherent radiation at x, nonideal (more realistic) data : ry Poisson y 0 y Gaussian y data likelihood : Y 1 Er scenelog ny! y 0 y ny y e 0 y y 1 ny 1 e ry ny

12 Idealized Data Model Data spectrum for each pixel s j Linear combination of constituent spectra K s j a k1 k kj A Problem: Estimate constituents and proportions subject to nonnegativity; positivity of S assumed Ambiguity if Comments: Radiometric Calibration; Constraints Fundamental S 12

13 Idealized Problem Statement: Maximum-Likelihood Minimum I-divergenceI S A Poisson distributed data loglikelihood function I J K K l( S A) s ij ln ikakj ikakj i1 j1 k1 k1 Maximization over and A equivalent to minimization of I-divergenceI I J s ij K I( S A) s ij ln sij i j K k ika k ika 1 kj Information Value Decomposition Problem kj 13

14 Markov Approximations X and Y RV s s on finite sets, p(x,y) unknown Data: N i.i.d. pairs {X{ i,y i } n( x, Unconstrained ML Estimate of p(x,y) S N y) Lower rank Markov approximation Sˆ A X M Y M in a set of cardinality K mina I( S A) Factor analysis, contingency tables, economics Problem: Approximation of one matrix by another of lower rank C. Eckart and G. Young, Psychometrika,, vol. 1, pp SVD IVD 14

radiation oncology Cervical cancer: 50% survival rate Dose

15 CT Imaging in Presence of High Density Attenuators Brachytherapy applicators After-loading colpostats for radiation oncology Cervical cancer: 50% survival rate Dose prediction important Object-Constrained Computed Tomography (OCCT) 15

16 Filtered Back Projection FBP: inverse Radon transform Truth FBP 16

17 Transmission Tomography Source-detector pairs indexed by y; ; pixels indexed by x Data d(y) Poisson, means g(y:), loglikelihood function l(d : ¹ ) = X y2y g(y : ¹ ) = P E I 0(y; E) exp[ P x Mean unattenuated counts I 0, mean background Attenuation function (x,e) (x,e), E indexes energies ¹(x;E)= P I i=1 ¹ i (E)c i (x) [d(y) lng(y : ¹ ) g(y : ¹ )] Maximize over or c i ; equivalently minimize I-divergenceI Comment: pose search c(x) = c a (x:) ) + c b (x) h(yjx)¹ (x;e)] + (y) 17

18 Alternating Minimization Algorithms Define problem as min q (q) (q) Derive Variational Representation: (q) (q) = min p J(p,q) J is an auxiliary function p is in auxiliary set P Result: double minimization min q min p J(p,q) Alternating minimization algorithm p q ( l1) ( l1) argmin pp argmin qq J ( p, q J ( p ( l) ( l1) ), q) Comments: Guaranteed Monotonicity; J selected carefully 18

19 Alternating Minimization Algorithms: I-Divergence, Linear, Exponential Families Special Case of Interest: J is I-divergenceI Families of Interest: Linear Family L(A,b) = {p: Ap = b} Exponential Family E(,B) = {q: q i = i exp[ j b ij j ]} p q ( l1) ( l1) argmin pl argmin qe I( p q I( p ( l1) ( l) ) q) Csiszár and Tusnády dy; Dempster,, Laird, Rubin; Blahut; Richardson; Lucy; Vardi, Shepp,, and Kaufman; Cover; Miller and Snyder; O'Sullivan 19

20 Alternating Minimization Example Linear family: p p 2 = 2 Exponential family: q 1 = exp (v),( q 2 = exp (-v)( min min I( p qe pl q) p 2, q p 2, q p 2, q p 1, q 1 p 2, q p 1, q p 1, q p 1, q 1 20

21 Information Geometry I-divergence is nonnegative, convex in pair (p,q) Generalization of relative entropy, example of f-divergencef First triangle equality: p in L p * argmin pl I( p q) I( p q) I( p p * ) I( p * q) Second triangle equality: q in E q * arg min qe I( p q) I( p q) I( p q * ) I( q * q) 21

22 Variational Representations Convex decomposition lemma. Let f be convex. Then f i ( i r i x i ) 1, r i Special Case: f is ln i 0 r ln qi min i pp P p : pi 1 i i f ( 1 r i i x p i i ) p ln q i i Basis for EM; see also De Pierro,, Lange, Fessler 22

23 Shrink-Wrap Algorithm for Endmembers S = A = Endmembers,, K Columns A = Pixel Mixture Proportions SVD, Then Simplex Volume Minimization Fuhrmann,, WU 23

24 Alternating Minimization Algorithms for Hyperspectral Imaging C Initial Estimated Coefficients Snyder, WU O Sullivan, WU + Poisson Process Alternating Minimization Algorithm Estimated Coefficients S=A Given and S,, estimate A. Uses I-Divergence I Discrepancy Measure.

25 Information Value Decomposition Applied to Hyperspectral Data Downloaded spectra from USGS website 470 Spectral components Randomly generated A with 2000 columns Ran IVD on result 25

26 Information Value Decomposition Applied to Hyperspectral Data 26

Information Theoretic Imaging: Application to Hyperspectral Imaging Problem Definition Optimality Criterion Algorithm Development Performance Quantification

27 Information Theoretic Imaging: Application to Hyperspectral Imaging Problem Definition Optimality Criterion Algorithm Development Performance Quantification Likelihood Models for Sensors Likelihood Models for Scenes Spectrum Estimation and Decomposition Performance Quantification Applications to Available Sensors 27

28 Hyperspectral Imaging Ongoing Efforts Scene Models Including Spatial and Wavelength Characteristics Sensor Models Including Apodization Orientation Dependence of Hyperspectral Signatures Expanded Complementary Efforts Atmospheric Propagation Models Performance Bounds Measures of Added Information 28

29 CT Minimum I-Divergence I Formulation l(d : ¹ ) = X y2y [d(y) lng(y : ¹ ) g(y : ¹ )] g(y : ¹ ) = P E I 0(y; E) exp[ P x h(yjx)¹ (x;e)] + (y) Maximize loglikelihood function over minimize I-divergence Use Convex Decomposition Lemma g is a marginal over q L (d) = f p(y; E ) 0: X E p(y; E ) = d(y)g I(dkg(y : c)) = min I(pkq) p2l(d) 29

30 New Alternating Minimization Algorithm for Transmission Tomography min q2e(i 0 ;H ¹ ) min I(pkq) p2l(d) ^q (k) (y;e) = I 0 (y;e) exp ^p (k) (y;e) = ^q (k) (y;e) P ~ b (k) i (x) = ^b(k) i (x) = X y2y X y2y X E X E X x 2X X i d(y) ¹ i (E)h(yjx)^c (k) i E 0 ^q(k) (y;e 0 ) ¹ i (E)h(yjx)^p (k) (y;e) ¹ i (E)h(yjx)^q (k) (y;e) (x) # 30

31 New Alternating Minimization Algorithm for Transmission Tomography (k+ 1) ^c i (x) = ^c (k) i (x) 1 Z i (x) ln Ã ~b (k) i ^b(k) i! (x) (x) Interpretation: Compare predicted data to measured data via ratio of backprojections Update estimate using a normalization constant Comments: Choice for constants; monotonic convergence; Linear convergence; Constraints easily incorporated 31

32 Iterative Algorithm with Known Applicator Pose Truth Known Pose 32

33 OCCT Iterations Known Pose OCCT 33

34 Truth Known Standard FBP OCCT 34

35 Magnified views around brachytherapy applicator FBP Truth OCCT 35

36 Conclusions Information-Theoretic Imaging Alternating Minimization Algorithms Hyperspectral Imaging - Applying Formal Methodology - Data Collections - Sensor Modeling, Likelihoods - Broad, Ongoing Efforts CT Imaging - New Image Formation Algorithm - Simulations and Scanner Data - Modeling Issues: Physics Crucial 36

37 37

38 Physical Phenomena Physical Scene Physical Sensors Mathematical Abstractions Scene Model Sensor Models c C L 2 L 2 Reproduction Space C C Criteria Performance l c, fc, I fc min d,c dc,ĉ * dc,ĉ dĉ *,ĉ 38

39 Problem Definition Physical Phenomena: Electromagnetic Waves, Physical Objects Mathematical Abstraction: Function Spaces (L 2 ), Dynamical Models Sensor Models: Projection Models, Statistics Reproduction Space: Finite Dimensional Approximation Optimality Criterion Deterministic: Stochastic: Least Squares, Minimum Discrimination Maximum Likelihood, MAP, MMSE 39

40 Algorithm Development Iterative Algorithms: Random Search: Alternating Minimizations Expectation-Maximization Simulated Annealing, Diffusion, Jump-Diffusion Performance Quantification Discrepancy Measures: Performance: Performance Bounds: Squared Error, Discrimination Bias, Variance, Mean Discrimination Cramer-Rao Rao, Hilbert-Schmidt 40

41 Target Classification Problem Target Classifier â=t72 Given a SAR image r, determine a corresponding target class âa Target Orientation Estimation a=t72 Orientation Estimator =135 Given a SAR image r and a target class âa, estimate target orientation Likelihood Model Approach: p R,a (r,a) - Conditional Data Model p,a (,a) - Prior on orientation (known or simply uniform) P(a) - Prior on target class (known or simply uniform) 41

42 Training and Testing Problem: Function Estimation and Classification Training Data Scene and Sensor Physics Function Estimation Raw Data Processing Image L(r a,) Inference â=t72 Labeled training data: target type and pose Log-likelihood parameterized by a function: mean image, variance image, etc. 42

43 Function Estimation and Classification Functions are estimated from - sample data sets only - physical model datasets only (PRISM, XPATCH, etc.) - combination of these Training sets are finite Computational and likelihood models have a finite number of parameters Estimation error Approximation error Some regularization is needed 43

44 Function Estimation Let S and T be two target types, f S and f T the corresponding functions f S : D R or R + Spatial domain D: Surface of object Rectangular region in R 2 Continuous Pose space : Typically SE(3) Simpler to consider SE(2), SO(2) is continuous Assume a null function f 0 44

45 Function Estimation: Sieve Function space F Family of subsets F - Maximum likelihood estimate of f exists for each - F F Ulf Grenander Spline sieves = 1/Q, where Q = number of coefficients. Nonnegative functions, nonnegative expansions. Pierre Moulin, et al. f() ) = k a(k) k ()), a(k) >= 0 45

46 Sieve Geometry f S F f 2. f 1 F 1 F 2 46

47 Discrepancy Measures Estimation: Relative entropy between parameterized distributions d(f S, approximation error Expected relative entropy for estimated functions E{d( f )} estimation error Classification: Probability of error, rate functions Decompose into approximation and estimation error Examine performance as a function of, optimize 47

48 Motivation Many reported approaches to ATR from SAR Performance and database complexity are interrelated We seek to provide a framework for comparison that: - Allows direct comparison under identical conditions - Removes dependency on implementation details 2S1 T62 BTR 60 D7 ZIL 131 ZSU 23/4 Publicly available SAR data from the MSTAR program 48

49 Approaches: Conditionally Gaussian Model each pixel as complex Gaussian plus uncorrelated noise: ri 1 Ki, a N0 p r, a e R, A i Ki, a N0 2 GLRT Classification and MAP Estimation: â Bayes r argmax a max k ˆp r k,a ˆ r,a argmax HS ˆp r,a k k J. A. O Sullivan O and S. Jacobs, IEEE-AES 2000 J. A. O Sullivan, O M. D. DeVore,, V. Kedia,, and M. Miller, IEEE-AES to appear

Approach Select 240 combinations of implementation parameters Execute algorithms at each parameterization Scatter plot the performance-complexity pairs Determine

50 Approach Select 240 combinations of implementation parameters Execute algorithms at each parameterization Scatter plot the performance-complexity pairs Determine the best achievable performance at any complexity BMP2 Variance Image at 6 Sizes ZIL131 Variance Image at 6 Sizes Six different image sizes from 128x128 to 48x48 50

51 System Parameters and Complexity Approximate (,a) ) and 2 (,a)) as piecewise constant in Implementations parameterized by: w - number of constant intervals in d - width of training intervals in N 2 - number of pixels in an image Database complexity log 10 (# floating point values / target type) 51

Variance image of at62 tank 1 Window trained over

52 Performance and Complexity Forty combinations of angular resolution and training interval width. Variance image of at62 tank 1 Window trained over 360 Variance images of a T72 tank 72 Windows trained over 10 52

53 Problem Statement Directly compare conditionally Gaussian, log-magnitude MSE, and quarter power MSE ATR Algorithms - identical training and testing data - identical spatial and orientation windows Plot performance vs. complexity - probability of classification error - orientation estimation error - log-database size as complexity Use 10 class MSTAR SAR images Approach Select 240 combinations of implementation parameters Execute algorithms at each parameterization Scatter plot the performance-complexity pairs Determine the best achievable performance at any complexity 53

54 Approaches: Log-Magnitude Minimize distance between r db = 20 log r and db templates d 2 Make decisions according to: â LM 2 r db, LM r db LM r argminmin d 2 r db, LM k, a a k LM ˆ r a argmind 2 r db, LM k,a k Alternatively, use a form of normalization: d 2 r db r db, LM k,a LM k,a G. Owirka and L. Novak, SPIE 2230, 1994 L. Novak, et al., IEEE-AES, Jan

55 Approaches: Quarter Power Minimize distance between r QP = r 1/2 and quarter power templates d 2 Make decisions according to: â QP ˆ QP 2 r QP, QP r QP QP r argmin mind 2 r QP, QP k, a k a r,a argmin d 2 r QP, QP k,a Or, normalized by vector magnitude: d 2 k r QP, QP k,a r QP QP k,a S. W. Worrell, et al., SPIE 3070, 1997 Discussions with M. Bryant of Wright Laboratory 55

56 Conditionally Gaussian Results 56

57 Performance-Complexity Legend Forty combinations of number of piecewise constant intervals and training window width 57

58 Log-Magnitude Results Recognition without normalization Arithmetic mean normalized 58

59 Quarter Power Results Recognition without normalization Recognition with normalization 59

60 Normalized Conditionally Gaussian Results 60

61 Side-by by-side Results Comparison in terms of: Performance achievable at a given complexity Complexity required to achieve a given performance 61

62 Target Classification Results Probability of correct classification: 97.2% 2S1 BMP 2 BRDM 2 BTR 60 BTR 70 D7 T62 T 72 ZIL131 ZSU S % BMP % BRDM % BTR % BTR % D % T % T % ZIL % ZSU % 62

63 Conclusions General problem in training/testing posed as estimation/classification Method of sieves (polynomial splines chosen) Comprehensive performance-complexity study for - Ten class MSTAR problem - Conditionally Gaussian model - Log-magnitude MSE - Quarter power MSE Provided a framework for direct comparison of alternatives and selection of implementation parameters Analysis ongoing 63

ONR MURI: Performance Analysis of Multiple Sensor Systems in Automatic Target Recognition

ONR MURI: Performance Analysis of Multiple Sensor Systems in Automatic Target Recognition Joseph A. O SullivanO Electronic Systems and Signals Research Laboratory Department of Electrical Engineering Washington