Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Size: px

Start display at page:

Download "Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York"

Marjorie Ball
5 years ago
Views:

1 Quantization Robert M. Haralick Computer Science, Graduate Center City University of New York

2 Outline Quantizing 1 Quantizing

3 Quantizing Data is real-valued Data is integer valued with large max value To use a discrete Bayes rule the data has to be quantized Quantize each dimension to 10 or fewer quantized intervals

4 The Problem Assume input data in each dimension is discretely valued: And now it must be quantized Determine the Number of Quantizing levels for each dimension Determine the Quantizing interval boundaries Determine the Probability associated with each quantizing bin

5 Quantizer and Bins J dimensions L quantized values per dimension L J bins in discrete measurement space Each bin has a class conditional probability

6 The Quantizer Definition A quantizer q is a monotonically increasing function that takes in a real number and produces a non-negative integer between 0 and K 1 where K is the number of quantizing levels.

7 Quantizing Interval Definition The quantizing interval Q k associated with the integer k is defined by Q k = {x q(x) = k} Q 0 Q 1 Q 2 Q

8 Table Lookup Let Z = (z 1,..., z J ) be a measurement tuple Let q j be the quantizing function for the j th dimension The quantized tuple for Z is (q 1 (z 1 ),..., q J (z J )) The address for quantized Z is a(q 1 (z 1 ),..., q J (z J )) P(a(q 1 (z 1 ),..., q J (z J )) c) = P(q 1 (z 1 ),..., q J (z J ) c)

9 Entropy Definition Definition If p 1,..., p K is a discrete probability function, its Entropy H is K H = p k log 2 p k k=1

10 Entropy Meaning Person A chooses an index from {1,..., K } in accordance with probabilities p 1,..., p K Person B is to determine the index chosen by Person A by guessing Person B can ask any question that can be answered Yes or No Assuming that Person B is clever in formulating the questions, it will take on the average H questions to correctly determine the index Person A chose.

11 Entropy Quantizing H = K p k log 2 p k k=1 If a message is sent composed of index choices sampled from a distribution with probabilities p 1,..., p K, the average information in the message is H bits per symbol.

12 Entropy Estimation Data is discrete Observe x 1,..., x N, where each x n {1,..., K } Count the number of occurrences m k = #{n x n = k} Estimate the probability p k = m k N Number of zero counts n 0 = #{k m k = 0} Unbiased estimate of entropy Ĥ = K k=1 p k log 2 p k + n 0 1 2Nlog e 2 G. Miller, Note On The Bias Of Information Estimates, In H. Quastler (Ed.) Information theory in psychology II-B, Free Press, Glencoe, IL, 1955, pp

13 Number of Quantizing Levels J dimensions Each observation is a tuple Z = (z 1,..., z J ) Each z j is discretely valued Let M be the total number of quantizing bins over J dimensions How to determine the number L j of bins for the j th dimension? M = J j=1 L j

14 The Entropy Solution Let Ĥj be the entropy of the j th component of Z. Let L j be the number of bins for the j th component of Z. Define f j = Ĥ j J i=1 Ĥi L j = M f j J J L j = M f j j=1 j=1 J = M j=1 f j = M

15 The Number of Quantizing Bins Let Ĥj be the entropy of the j th component of Z. Let L j be the number of bins for the j th component of Z. Define f j = Ĥ j J i=1 Ĥi L j = M f j Now we have solved for the number of quantizing bins for each dimension.

16 How to Determine The Probabilities Memory size for each class conditional probability is M Real Valued Data If the number of observations is N Equal Interval Quantize each component to N/10 Levels Digitized Data If each data item is I bits Set the number of quantized levels to 2 I Determine the probability for each quantized level for each component Determine the entropy H j for each component j J Set the number of quantized levels for component j to be L j = M f j f j = Ĥj J i=1 Ĥi

17 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N Each tuple has J components Component j has L j quantized levels The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order. The left quantizing interval boundaries are: z (1)j z ( N L j +1)j z ( kn L j +1)j z ( ) (L j 1)N L +1 j j z (N)j

18 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N Each tuple has J components Component j has L j quantized levels The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order. The left quantizing interval boundaries are: 0 z ( N L j +1)j z ( kn L j +1)j z ( ) (L j 1)N L +1 j j 1023

19 Example Suppose N = 12, Data is 10 bits, and L j = 4. j th component z 1j,..., z 12j ordered values of j th component: z (1)j,... z (12)j N + 1 = 4 L j 2N + 1 = 7 L j 3N + 1 = 10 L j The quantizing intervals are: [0, z (4)j ) [z (4)j, z (7)j ) [z (7)j, z (10)j ) [z (10)j, 1023)

20 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order k indexes quantizing interval: k = 1,..., L j The k th quantizing interval [c kj, d kj ) for the j th component is defined by For k {2,..., L j 1} c kj = z ((k 1)N/Lj +1)j d kj = z (kn/lj +1)j

21 Non-uniform Quantization

22 Maximum Likelihood Probability Estimation The sample is Z 1,..., Z N The n th observed tuple: Z n = (z n1,..., z nj ) The quantized tuple for Z n is (q 1 (z n1 ),..., q J (z nj )) The address for Z n is a(q 1 (z n1 ),..., q J (z nj )) The bins are numbered 1,..., M The number of observations falling into bin m is t m The maximum likelihood estimate of the probability for bin m is p m t m = #{n a(q 1 (z n1 ),..., q J (z nj )) = m} p m = t m N

23 Density Estimation Using Fixed Volumes Total count N Fix a volume v Count the number k of observations in the volume v Density is mass divided by volume Estimate the density of each point in the volume by k/n v

24 Density Estimation Using Fixed Counts Total count N Fix a count k Find the smallest volume v around the point having a count k just greater than k Density is mass divided by volume Estimate the density of each point in the volume by k/n v

25 Smoothed Estimates If the sample size is not large enough, the MLE probability estimates may not be representative. bin smoothing Let bin m have volume v m and count t m Let m 1,..., m I be the indexes of the I closest bins to bin m satisfying I I 1 k < k i=1 t mi i=1 b m = I i=1 t m i Vm = I i=1 v m i Density of each point in bin m: αb m /Vm Set α so that the density integrates to 1 t mi

26 Smoothed Estimates Density of each point in bin m: αb m /V m Volume of bin m: v m Probability of bin m: p m = (αb m /V m )v m Total probability: 1 = M m=1 αb mv m /V m α = 1 M m=1 b mv m /V m p m = 1 M k=1 b k v k /V k b m v m /V m

27 Smoothed Estimates p m = 1 M k=1 b kv k /V k b m v m /V m If v m = v, m = 1,..., M, then V m = I mv p m = = b m v/i m v M k=1 b kv/i k v b m /I m M k=1 b k/i k

28 Fixed Sample Size N Sample Z 1,..., Z N Total number of bins M Calculate the quantizer Determine the probability for each bin Smooth the bin probabilities using smoothing k Calculate a decision rule maximizing expected gain Everything depends on M and k

29 Memorization and Generalization k is too small: memorization, over-fitting k is too large: over-generalization, under-fitting M is too large: memorization, over-fitting M is too small: over-generalization, under-fitting

30 Optimize The Probability Estimation Parameters Split the ground truthed sample into three parts Use the first part to calculate the quantizer and bin probabilities Calculate the discrete Bayes decision rule Apply the decision rule to the second part so that an unbiased estimate of the expected economic gain given the decision rule can be computed Brute force optimization to find the values of M and k to maximize the estimated expected gain With M and k fixed, use the third part to determine an estimate for the expected gain for the optimization

31 Optimize The Probability Estimation Parameters Once the parameters M and k have been optimized, the quantizer boundaries can be optimized. Repeat until no change Use training data part 1 Choose a dimension Choose a boundary Change the boundary Determine the adjusted probabilities Determine the discrete Bayes decision rule Use training data part 2 Calculate the Expected Gain

32 Optimize The Quantizer Boundaries 0 z ( N K +1)j z z ( ) ( kn K +1)j (K 1)N K +1 j 1023 b 1j b 2j b kj b K 1j b Kj Repeat until no change Randomly choose a component j and quantizing interval k Randomly choose a small pertubation δ (δ can be positive or negative) Randomly choose a small integer M (No collision with neighboring boundaries) b new = b kj kj δ(m + 1) For (m = 0; m 2M + 1; m + +) b new kj b new kj + δ Compute New Probabilities Recompute Bayes Rule Save expected gain Replace b kj by the boundary position associated with the highest gain

33 Optimize The Quantizer Boundaries z ( kn K +1)j z ( ) (K 1)N K +1 j z (1)j z ( N K +1)j b 1j b 2j b kj b K 1j b Kj Greedy Algorithm has a random component Multiple runs will produce different answers Repeat greedy algorithm T times Keep track of best result so far After T times, use the best result z (N)j

34 Bayesian Perspective MLE: start bin counters from 0 Bayesian: start bin counters from β, β > 0

35 The Observation There are K bins. Each time an observation is made, the observation falls into exactly one of the K bins. The probability that an observation falls into bin k is p k. To estimate the bin probabilities p 1,..., p K, we take a random sample of I observations. We find that of the I observations, I 1 observations fall into bin 1 I 2 observations fall into bin 2... I K observations fall into bin K

36 Multinomial Under the protocol of the random sampling, the probability of observing counts I 1,..., I K given the bin probabilities p 1,..., p K is given by the multinomial P(I 1,..., I K p 1,..., p K ) = I! I 1! I K! pi 1 1 pi K K

37 Bayesian We have observed I 1,..., I K we would like to determine the probability that an observation falls in bin k. Denote by d k the event that an observation falls into bin k We wish to determine P(d k I 1,..., I K )

38 K 1 Simplex To do this we will need to evaluate two integrals over the K 1-simplex S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 } 0 Simplex: point 1 Simplex: line segment 2 Simplex: triangle 3 Simplex: Tetrahedron 4 Simplex: Pentachoron

39 K 1 Simplex Definition A K 1 Simplex is a (K 1)-dimensional polytope which is the convex hull of its K vertices. S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 } The K vertices of S are the K tuples (1, 0,..., 0), (0, 1, 0,..., 0), (0, 0, 1, 0,..., 0),..., (0,..., 0, 1)

40 The K 1 Simplex S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 }

41 Two Integrals They are: (q 1,...,q K ) S dq 1,..., dq K = 1 (K 1)! K (q 1,...,q K ) S k=1 q I k k dq 1,..., dq K = K k=1 I k! (I + K 1)! where K I k = I k=1

42 Gamma Function Γ(z) = Γ(n) = (n 1)! 0 t z 1 e t dt

43 Derivation The derivation goes as follows: Prob(d k I 1,..., I K ) = Prob(d k, I 1,..., I K ) Prob(I 1,..., I K ) = = = Prob(d (p 1,...,p K ) S k, I 1,..., I K, p 1,..., p K )dp 1... dp K Prob(I (q 1,...,q K ) S 1,..., I K, q 1,..., q K )dq 1... dq K Prob(d (p 1,...p K ) S k, I 1,..., I K p 1,... p K )P(p 1,..., p K )dp 1... dp K Prob(I (q 1...,q K ) S 1,... I K q 1,... q K )P(q 1,..., q K )dq 1... dq K K n=1 I n! K m=1 pi m m p k (K 1)!dp 1 dp K (p 1,...,p K ) S (q 1...,q K ) S I! K n=1 I n! I! K m=1 qi m m (K 1)!dq 1... dq K

44 Derivation Prob(d k I 1,..., I K ) = p 1,...,p K ) S pi 1 1 pi 2 2, pi k 1 k 1 pi k +1 p I k+1 k k+1 pi K K dp 1 dp K (q 1,...,q K ) S qi 1 1 qi 2 2 qi K K dq 1 dq K I 1!I 2! I k 1!(I k +1)!I k+1! I K (I+K )! = I 1!I 2! I K! (I+K 1)! = I k + 1 I + K

45 Prior Distribution The prior distribution over the K -Simplex does not have to be taken as uniform. The natural prior distribution over the K -Simplex is the Dirichlet distribution. P(p 1,..., p K α 1,..., α K ) = Γ( K k=1 α k) K k=1 Γ(α k) α k > 0 K k=1 0 < p k < 1, k = 1,..., K K 1 p k < 1 k=1 K 1 p K = 1 k=1 p k p α k 1 k

46 Dirichlet Distribution Properties E[p k ] = α k K j=1 α j V [p k ] = E[p k](1 E[p k ]) 1 + K j=1 α j C[p i, p j ] = E[p i]e[p j ] 1 + K k=1 α k If α k > 1, k = 1,..., K, the maximum density occurs at p k = α k 1 ( K j=1 α j) K

47 The Uniform The uniform distribution on the K 1-Simplex is a special case of the Dirichlet distribution where α k = 1, k = 1,..., K. P(p 1,..., p K α 1,..., α K ) = = Γ( K k=1 α k) K k=1 Γ(α k) P(p 1,..., p K 1,..., 1) = Γ(K ) Γ(1) K k=1 = (K 1)! p 0 k K k=1 p α k 1 k

48 The Beta Distribution The Beta distribution is a special case of the Dirichlet distribution for K = 2. P(y) = 1 B(p, q) y p 1 (1 y) q 1 where p > 0 and q > 0 and 0 y 1 B(p, q) = Γ(p)Γ(q) Γ(p + q)

49 Dirichlet Prior In the case of the Dirichlet prior distribution, the derivation goes in a similar manner. Prob(d k I 1,..., I K ) = Prob(d k, I 1,..., I K ) Prob(I 1,..., I K ) (p = 1,...,p K ) S Prob(d k, I 1,..., I K, p 1,..., p K )dp 1... dp K (q 1,...,q K ) S Prob(I 1,..., I K, q 1,..., q K )dq 1... dq K (p = 1,...p K ) S Prob(I 1,..., I k 1, I k + 1, I k+1,..., I K p 1,... p K )P(p 1,..., p K )dp 1... dp K (q 1...,q K ) S Prob(I 1,... I K q 1,... q K )P(q 1,..., q K )dq 1... dq K

50 Dirichlet Prior Prob(I 1,..., I K ) = = (q 1,...,q K ) S (q 1,...,q K ) S K n=1 = I n! I! K n=1 = I n! I! Prob(I 1,..., I K, q 1,..., q K )dq 1... dq K K n=1 I n! I! Γ( K n=1 α n) K n=1 Γ(α n) Γ( K n=1 α n) N n=1 Γ(α n) K m=1 q Im m Γ( K n=1 α n) K n=1 Γ(α n) K (q 1,...,q K ) S m=1 K k=1 (I k + α k 1)! (I 1 + K k=1 α k)! K q αm 1 m m=1 q Im+αm 1 m dq 1,..., dq K dq 1,..., dq K

51 Dirichlet Prior Prob(d k, I 1,..., I K ) = = (q 1,...,q K ) S (q 1,...,q K ) S Prob(d k, I 1,..., I K, q 1,..., q K )dq 1... dq K Prob(d k )Prob(I 1,..., I K q 1,..., q K ) Prob(q 1,..., q K )dq 1,..., q K K n=1 = q I n! K Γ( K k q Im n=1 m (q 1,...,q K ) S I! α n) K K m=1 n=1 Γ(α n) m=1 K n=1 = I n! Γ( K n=1 α n) K I! K n=1 Γ(α q k n) (q 1,...,q K ) S m=1 K n=1 = I n! Γ( K n=1 α n) I! N n=1 Γ(α n) q αm 1 m dq 1,..., dq K q Im+αm 1 m dq 1,..., dq K (I k + α k ) K n=1 (I n + α n 1)! (I + K n=1 α n)(i 1 + K n=1 α n)!

52 Dirichlet Prior K n=1 Prob(d k, I 1,..., I K ) = I n! Γ( K n=1 α n) (I k + α k ) K n=1 (I n + α n 1)! I! N n=1 Γ(α n) (I + K n=1 α n)(i 1 + K n=1 α n)! K n=1 Prob(I 1,..., I K ) = I n! Γ( K n=1 α K n) k=1 (I k + α k 1)! I! N n=1 Γ(α n) (I 1 + K k=1 α k)! Prob(d k I 1,..., I K ) = (I k +α k ) K n=1 (I n+α n 1)! (I+ K k=1 α k )(I 1+ K k=1 α k )! Kn=1 (I n+α n 1)! (I 1+ K k=1 α k )! = I k + α k I + K n=1 α n

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework