[This document is a formalization of personal notes and was created for internal use]

Size: px

Start display at page:

Download "[This document is a formalization of personal notes and was created for internal use]"

Rodger Horton
5 years ago
Views:

1 Understanding Statistical and Entropy-Based Image Similarity Measures ISM Sarju Vaz Georgia Institute of Technology, [This document is a formalization of personal notes and was created for internal use] Excerpts from Elements of Information Theory [Cover & Thomas]: Entropy is originally defined in thermodynamics. = p x *log2 x p 1 x It arises as the answer to a number of natural questions such as, what is the average length of the shortest description of the random variable? The entropy of a random variable is a measure of the uncertainty of the random variable; it is a measure of the amount of information required on the average to describe the random variable. i.e. average self-information. Lemma 2.1.1: 0 Proof: 0 px 1, implies log1/px 0 Theorem The uniform distribution over a given range alphabet will yield the maximal entropy for that range alplabet. The entropy for a uniformly distributed random variable is log21/alphabet_size. The naturalness of the definition of joint entropy and conditional entropy is exhibited by the fact that the entropy of a pair of random variables is the entropy of one plus the conditional entropy of the other. Chain Rule:, = + C&T 2.45 = + 1

2 Figure 1. Entropies for green plane from Kodak PCD0992 standard test set Note the gray levels are 0~255 Number of levels = 256 = 2^8 For this example: 0 log igh entropy in sub-image 23 may be attributed to its evident high frequency content, but in sub-image 24 the high entropy is due to shading. Low entropy images here ex: 2, 17 are relatively bland Can t infer entropy from the frequency spectrum, although high coincidence 2

3 Mutual Information: Def: Mutual information is the relative Entropy between the joint distribution, px,y, and the product distribution pxpy [C&T 18] Theorem I; = I; I; = - I; = - I; = +, I; =, [C &T 20] I; [Figure 2.2 in C&T] Figure 2. Relationship between entropy and mutual information Mutual information is used as an ISM. Maximal image similarity maximizes this quality. Theorem 2.6.3: Information inequality Dp q 0, D is used to denote relative entropy. Proof in C&T 26. Corollary to this theorem: Non-negativity of mutual information I; 0 With equality iff and are independent 3

4 Proof: I; = Dpx,y px*py 0, with equality iff px,y = px*py, i.e., and are independent. Since relative entropy is not immediately intuitive to most of us, let s approach the lower bound via an educated deductive reasoning. Case 1: and are independent No mutual information lower bound Then as a result of independence,, = +. Note the logarithmic relationship with respect to joint probability when the variables are independent. Now, I; = +, = + [ + ] = 0 ** Refer the Venn diagram Case 2: REFER TEOREM If maximal mutual information all the uncertainty in one variable is resolved by observing the other ** Refer the Venn diagram upper bound:, when = ere, px,y = px,x = px Now, I; = I; = So, the upper bound is the entropy of the target image. According to theorem 2.6.4, this value is bounded by log21/alphabet_size. Note that this is not directly the upper bound, but the upper bound on the upper bound. Ex: For binary image: [0 1] For 0 ~ 255 gray scale image, the entropy bounds are [0 8] For 24 bit color image, the entropy bounds are [0 24] Again we can establish the upper entropy bound for the image, but the Mutual Information upper bound, while it will not exceed the entropy bound, is strictly dependent upon the image in question. We need to be mindful of this if we normalize this quantity. ** Normalizing this quantity in a mathematical sense DOES NOT yield what is commonly referred to as NMI below. 4

5 Normalized Mutual Information: A measure of 1-to-1 ness, ; NMI + = Entropy is NON-NEGATIVE Lemma No practical image has an entropy of 0. If entropy 0 px =1 monocrome image no imageness /structures in considered canvas Lower bound of NMI From theorem and corollary to theorem I; = +, 0 1,,,,, + + NMI Denominator of NMI Numerator of NMI Also both numerator and denominator are non-negative Equality occurs when, = +, are independent ** Refer the Venn diagram The reverse is an intuitive reasoning. and are independent lower bound Then as a result of independence,, = + 1, ; = + + = + = NMI 5

6 Upper bound of NMI From the chain rule:, = + C&T 2.45 = +, +, = this will be used in the proof below ** Also from the Venn diagram 2 *, = *, + = NMI,,, NMI, 2 + The deductive reasoning approach: If = all the uncertainty in one variable is resolved by observing the other maximal mutual information upper bound ere px,y = px,x = px This means that, =, = Now, NMI, = NMI, = [ + ] / [] = 2 CAUTION: This in not the whole picture!!! We have shown that if = NMI, = 2, but this is not necessarily true the other way around. As an ISM: NMI =2 1-to-1 relationship in the scatter plot of and only. This does not necessarily guarantee maximal image similarity depending on the definition of image similarity. Ex: apply random permutation to the graymap 6

7 Figure 3. Scatter plot of random permutation applied to the graymap of the parrots image next page Perfect 1-to-1 ness is maintained MATLAB code: [ignore, my_map] = sort rand1, 256; mapped_img = my_mapgray_im+1-1; 7

8 Figure 4. Random 1-to-1 mapping of gray levels Notice that maximal NMI is maintained, but that image similarity is questionable Structures may be recognizable based more on the extent of monochrome regions, rather than the entropy of the image 8

9 Normalized Cross Correlaton a.k.a. Correlation Coefficient, ρ: A measure of straight-lineness in the scatter plot of and Note: this is not the cross-correlation which is then normalized. NCC, = x x, x µ * y µ x µ 2 * y y µ As an ISM, maximal image similarity is achieved when the absolute value of this quantity is maximized Def: A correlation coefficient is a quantitative assessment of the strength of the linier relationship between x and y values in a set of x, y pairs [Devore & Furnam]. A lot can be understood by gazing at the equation. Note: the equations below pertain to random variables. While the relationships are indicative of the relationships when considering deterministic images, they need to be formally adapted for the deterministic case. NCC x µ = E σ y µ σ, * = cov, σ * σ where E is the expectation 2 cov, = E[ ] µ µ x wre E[] is the correlation between and If E[]=0, and is said to be uncorrelated Inferences between covariance, correlation and independence If and are either independent OR uncorrelated, E[] = E[]*E[] * The existence of this condition alone does NOT guarantee that and are either independent or that they are uncorrelated. * E[] = E[]*E[] cov, = 0 independent uncorrelated, but not necessarily the other way around If =, cov, = cov, = var 9

10 Standard proof of bounds [G.T. Zhou, ECE, Georgia Tech]: Note: this is a random variable proof and needs to be adapted for deterministic images x µ NCC, = E * σ ρ = E[ εη] Need bounds on this E E E 2 [ ε + η ] 2 2 [ ε ± 2εη + η ] [ ε ] ± 2E[ εη] + E[ η ] 1± 2ρ ± 2ρ ρ 1 y µ σ = 0 ρ 1 and ρ 1 ρ 1 E [ ] εη Properties of ρ [paraphrased from D&F]: 1. -1<= ρ <= 1, and ρ does not depend on the unit of measurement of either x or y or on which variable is labeled x and which variable is labeled y. 2. ρ = +1 or ρ = -1 iff all x,y pairs lie exactly on a straight line on the scatter plot, so ρ measures the extent to which there is a linier relationship in the two images. If ρ = +1: the scatter plot forms a straight line with a positive slope, while if ρ = -1: the scatter plot forms a straight line with a negative slope. So, the sign ρ suggests the direction of the slope, while the magnitude of ρ is indicative of the straight lineness of the scatter plot. Therefore ρ = correlation coefficient would suggest just a strong a linear relationship as would ρ = **A low magnitude of ρ does NOT rule out AN strong relationship between x and y: there still may me a non-linear strong relationship. Ex: when some gamma correction histogram equalization? is needed before image registration. NMI is robust to this!!! 10

11 Figure 5. Random 1-to-1 mapping of gray levels 11

12 Figure 6. 1-to-1 and 1-to-2 mapping of gray levels disallowing any spread in the scatter plot 12

13 Figure 7. Constant scaling and shifting gray levels with increasing random spread the spread is limited to 16 & 64 in mapping 3 & 4 respectively all 4 mappings are 1-to-1 13

14 Figure 8. Negative scaling of gray levels with increasing random spread the spread is limited to 16 & 64 in mapping 3 & 4 respectively all 4 mappings are 1-to-1 14

15 Figure 9. Quadratic scaling of gray levels with increasing random spread the spread is limited to 16 & 64 in mapping 3 & 4 respectively all 4 mappings are 1-to-1 15

16 Figure 10. Cubic of levels with increasing random spread the spread is limited to 16 & 64 in mapping 3 & 4 respectively all 4 mappings are 1-to-1 16

17 Figure to-2 linear mapping of gray levels with increasing random spread the spread is limited to 16 & 64 in mapping 3 & 4 respectively 17

18 Figure to-2 quadratic mapping of gray levels with increasing random spread the spread is limited to 16 & 64 in mapping 3 & 4 respectively 18

19 Figure to-2 mappings of gray levels with 0, 64 spread NMI appears to degrease as the spread increases if mapping is not 1-to-1 19

20 So, what does this mean for medical imaging non-rigid registration? Figure 14. Deformation pattern overlaid on undeformed image and relative performance of MSD, NCC and NMI as the degree of deformation λ is varied NMI is the ISM of choice if max deviation is within 5 voxel side-lengths NCC is more robust than MSD, and can be used for automatic registration for maximum deviation upto ~15 voxel side lengths 20

Figure 15. Image similarity is??? [image source: http://www.emjournal.pwp.blueyonder.co.uk/page009.

21 Figure 15. Image similarity is??? [image source: The definition of image similarity is still open Pros, cons & comparisons of ISMs: Both are good because they have inherent bounds gives us a sense of how close the images are without any normalization as is the case with MSE NMI is a measure of 1-to-1 ness, while NCC is a measure of straight-lineness of the scatter plot igh NCC igh NMI, but not necessarily the other way around Where NCC = -1 suggests the highest image similarity in its context, MSD is more likely yield a high value suggesting low similarity, compared to the MSD in a situation where NCC ~0 21

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality