Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Similar documents
Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

The Bayes classifier

Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos. Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2

Motif representation using position weight matrix

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

CS 361: Probability & Statistics

Model Averaging (Bayesian Learning)

Bayesian Methods: Naïve Bayes

Lecture 2: Conjugate priors

Latent Variable Models and Expectation Maximization

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Latent Variable Models and Expectation Maximization

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1

Bioinformatics: Biology X

Bayesian Models in Machine Learning

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing

Machine Learning, Fall 2009: Midterm

Clustering and Gaussian Mixtures

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

CS540 Machine learning L9 Bayesian statistics

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Introduction to Bayesian Learning. Machine Learning Fall 2018

EM Algorithm & High Dimensional Data

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clustering using Mixture Models

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Language as a Stochastic Process

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Using Probability to do Statistics.

University of Illinois at Urbana-Champaign. Midterm Examination

Machine Learning

Learning Bayesian network : Given structure and completely observed data

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

HOMEWORK #4: LOGISTIC REGRESSION

CS 6140: Machine Learning Spring 2016

Algorithmisches Lernen/Machine Learning

Machine Learning

Midterm: CS 6375 Spring 2018

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

Probabilistic Graphical Models

HOMEWORK #4: LOGISTIC REGRESSION

Expectation Maximization

Bayesian Inference for Dirichlet-Multinomials

Machine Learning for Signal Processing Bayes Classification and Regression

Distributed Source Coding Using LDPC Codes

Machine Learning for NLP

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Generative Clustering, Topic Modeling, & Bayesian Inference

Is cross-validation valid for small-sample microarray classification?

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

1 Maximum Likelihood Estimation

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Variable Selection in Classification Trees Based on Imprecise Probabilities

PATTERN RECOGNITION AND MACHINE LEARNING

Lecture 6: Graphical Models: Learning

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Chapter 3 Linear Block Codes

Lecture 9: Bayesian Learning

Generative Models for Discrete Data

FINAL: CS 6375 (Machine Learning) Fall 2014

Algorithms: COMP3121/3821/9101/9801

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Maximum Likelihood Estimation of Dirichlet Distribution Parameters

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007

Exponential Families

PMR Learning as Inference

CPSC 340: Machine Learning and Data Mining

A Bayesian Analysis of Some Nonparametric Problems

Discrete Bayes. Robert M. Haralick. Computer Science, Graduate Center City University of New York

MODULE -4 BAYEIAN LEARNING

CS 361: Probability & Statistics

Maximum Likelihood Estimation. only training data is available to design a classifier

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos

Machine Learning Summer School

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Computational Systems Biology: Biology X

Discrete Binary Distributions

Information Theory, Statistics, and Decision Trees

Classification Algorithms

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Ad Placement Strategies

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

Information Geometric view of Belief Propagation

Multivariate density estimation and its applications

MLE/MAP + Naïve Bayes

Binary Convolutional Codes

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Transcription:

Quantization Robert M. Haralick Computer Science, Graduate Center City University of New York

Outline Quantizing 1 Quantizing 2 3 4 5 6

Quantizing Data is real-valued Data is integer valued with large max value To use a discrete Bayes rule the data has to be quantized Quantize each dimension to 10 or fewer quantized intervals

The Problem Assume input data in each dimension is discretely valued: 0 255 0 1023 And now it must be quantized Determine the Number of Quantizing levels for each dimension Determine the Quantizing interval boundaries Determine the Probability associated with each quantizing bin

Quantizer and Bins J dimensions L quantized values per dimension L J bins in discrete measurement space Each bin has a class conditional probability

The Quantizer Definition A quantizer q is a monotonically increasing function that takes in a real number and produces a non-negative integer between 0 and K 1 where K is the number of quantizing levels.

Quantizing Interval Definition The quantizing interval Q k associated with the integer k is defined by Q k = {x q(x) = k} Q 0 Q 1 Q 2 Q 3 0.0 3.0 4.0 8.0 10.0 0 1 2 3

Table Lookup Let Z = (z 1,..., z J ) be a measurement tuple Let q j be the quantizing function for the j th dimension The quantized tuple for Z is (q 1 (z 1 ),..., q J (z J )) The address for quantized Z is a(q 1 (z 1 ),..., q J (z J )) P(a(q 1 (z 1 ),..., q J (z J )) c) = P(q 1 (z 1 ),..., q J (z J ) c)

Entropy Definition Definition If p 1,..., p K is a discrete probability function, its Entropy H is K H = p k log 2 p k k=1

Entropy Meaning Person A chooses an index from {1,..., K } in accordance with probabilities p 1,..., p K Person B is to determine the index chosen by Person A by guessing Person B can ask any question that can be answered Yes or No Assuming that Person B is clever in formulating the questions, it will take on the average H questions to correctly determine the index Person A chose.

Entropy Quantizing H = K p k log 2 p k k=1 If a message is sent composed of index choices sampled from a distribution with probabilities p 1,..., p K, the average information in the message is H bits per symbol.

Entropy Estimation Data is discrete Observe x 1,..., x N, where each x n {1,..., K } Count the number of occurrences m k = #{n x n = k} Estimate the probability p k = m k N Number of zero counts n 0 = #{k m k = 0} Unbiased estimate of entropy Ĥ = K k=1 p k log 2 p k + n 0 1 2Nlog e 2 G. Miller, Note On The Bias Of Information Estimates, In H. Quastler (Ed.) Information theory in psychology II-B, Free Press, Glencoe, IL, 1955, pp. 95-100.

Number of Quantizing Levels J dimensions Each observation is a tuple Z = (z 1,..., z J ) Each z j is discretely valued Let M be the total number of quantizing bins over J dimensions How to determine the number L j of bins for the j th dimension? M = J j=1 L j

The Entropy Solution Let Ĥj be the entropy of the j th component of Z. Let L j be the number of bins for the j th component of Z. Define f j = Ĥ j J i=1 Ĥi L j = M f j J J L j = M f j j=1 j=1 J = M j=1 f j = M

The Number of Quantizing Bins Let Ĥj be the entropy of the j th component of Z. Let L j be the number of bins for the j th component of Z. Define f j = Ĥ j J i=1 Ĥi L j = M f j Now we have solved for the number of quantizing bins for each dimension.

How to Determine The Probabilities Memory size for each class conditional probability is M Real Valued Data If the number of observations is N Equal Interval Quantize each component to N/10 Levels Digitized Data If each data item is I bits Set the number of quantized levels to 2 I Determine the probability for each quantized level for each component Determine the entropy H j for each component j J Set the number of quantized levels for component j to be L j = M f j f j = Ĥj J i=1 Ĥi

Initial Quantizing Interval Boundary The sample is Z 1,..., Z N Each tuple has J components Component j has L j quantized levels The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order. The left quantizing interval boundaries are: z (1)j z ( N L j +1)j z ( kn L j +1)j z ( ) (L j 1)N L +1 j j z (N)j

Initial Quantizing Interval Boundary The sample is Z 1,..., Z N Each tuple has J components Component j has L j quantized levels The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order. The left quantizing interval boundaries are: 0 z ( N L j +1)j z ( kn L j +1)j z ( ) (L j 1)N L +1 j j 1023

Example Suppose N = 12, Data is 10 bits, and L j = 4. j th component z 1j,..., z 12j ordered values of j th component: z (1)j,... z (12)j N + 1 = 4 L j 2N + 1 = 7 L j 3N + 1 = 10 L j The quantizing intervals are: [0, z (4)j ) [z (4)j, z (7)j ) [z (7)j, z (10)j ) [z (10)j, 1023)

Initial Quantizing Interval Boundary The sample is Z 1,..., Z N The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order k indexes quantizing interval: k = 1,..., L j The k th quantizing interval [c kj, d kj ) for the j th component is defined by For k {2,..., L j 1} c kj = z ((k 1)N/Lj +1)j d kj = z (kn/lj +1)j

Non-uniform Quantization

Maximum Likelihood Probability Estimation The sample is Z 1,..., Z N The n th observed tuple: Z n = (z n1,..., z nj ) The quantized tuple for Z n is (q 1 (z n1 ),..., q J (z nj )) The address for Z n is a(q 1 (z n1 ),..., q J (z nj )) The bins are numbered 1,..., M The number of observations falling into bin m is t m The maximum likelihood estimate of the probability for bin m is p m t m = #{n a(q 1 (z n1 ),..., q J (z nj )) = m} p m = t m N

Density Estimation Using Fixed Volumes Total count N Fix a volume v Count the number k of observations in the volume v Density is mass divided by volume Estimate the density of each point in the volume by k/n v

Density Estimation Using Fixed Counts Total count N Fix a count k Find the smallest volume v around the point having a count k just greater than k Density is mass divided by volume Estimate the density of each point in the volume by k/n v

Smoothed Estimates If the sample size is not large enough, the MLE probability estimates may not be representative. bin smoothing Let bin m have volume v m and count t m Let m 1,..., m I be the indexes of the I closest bins to bin m satisfying I I 1 k < k i=1 t mi i=1 b m = I i=1 t m i Vm = I i=1 v m i Density of each point in bin m: αb m /Vm Set α so that the density integrates to 1 t mi

Smoothed Estimates Density of each point in bin m: αb m /V m Volume of bin m: v m Probability of bin m: p m = (αb m /V m )v m Total probability: 1 = M m=1 αb mv m /V m α = 1 M m=1 b mv m /V m p m = 1 M k=1 b k v k /V k b m v m /V m

Smoothed Estimates p m = 1 M k=1 b kv k /V k b m v m /V m If v m = v, m = 1,..., M, then V m = I mv p m = = b m v/i m v M k=1 b kv/i k v b m /I m M k=1 b k/i k

Fixed Sample Size N Sample Z 1,..., Z N Total number of bins M Calculate the quantizer Determine the probability for each bin Smooth the bin probabilities using smoothing k Calculate a decision rule maximizing expected gain Everything depends on M and k

Memorization and Generalization k is too small: memorization, over-fitting k is too large: over-generalization, under-fitting M is too large: memorization, over-fitting M is too small: over-generalization, under-fitting

Optimize The Probability Estimation Parameters Split the ground truthed sample into three parts Use the first part to calculate the quantizer and bin probabilities Calculate the discrete Bayes decision rule Apply the decision rule to the second part so that an unbiased estimate of the expected economic gain given the decision rule can be computed Brute force optimization to find the values of M and k to maximize the estimated expected gain With M and k fixed, use the third part to determine an estimate for the expected gain for the optimization

Optimize The Probability Estimation Parameters Once the parameters M and k have been optimized, the quantizer boundaries can be optimized. Repeat until no change Use training data part 1 Choose a dimension Choose a boundary Change the boundary Determine the adjusted probabilities Determine the discrete Bayes decision rule Use training data part 2 Calculate the Expected Gain

Optimize The Quantizer Boundaries 0 z ( N K +1)j z z ( ) ( kn K +1)j (K 1)N K +1 j 1023 b 1j b 2j b kj b K 1j b Kj Repeat until no change Randomly choose a component j and quantizing interval k Randomly choose a small pertubation δ (δ can be positive or negative) Randomly choose a small integer M (No collision with neighboring boundaries) b new = b kj kj δ(m + 1) For (m = 0; m 2M + 1; m + +) b new kj b new kj + δ Compute New Probabilities Recompute Bayes Rule Save expected gain Replace b kj by the boundary position associated with the highest gain

Optimize The Quantizer Boundaries z ( kn K +1)j z ( ) (K 1)N K +1 j z (1)j z ( N K +1)j b 1j b 2j b kj b K 1j b Kj Greedy Algorithm has a random component Multiple runs will produce different answers Repeat greedy algorithm T times Keep track of best result so far After T times, use the best result z (N)j

Bayesian Perspective MLE: start bin counters from 0 Bayesian: start bin counters from β, β > 0

The Observation There are K bins. Each time an observation is made, the observation falls into exactly one of the K bins. The probability that an observation falls into bin k is p k. To estimate the bin probabilities p 1,..., p K, we take a random sample of I observations. We find that of the I observations, I 1 observations fall into bin 1 I 2 observations fall into bin 2... I K observations fall into bin K

Multinomial Under the protocol of the random sampling, the probability of observing counts I 1,..., I K given the bin probabilities p 1,..., p K is given by the multinomial P(I 1,..., I K p 1,..., p K ) = I! I 1! I K! pi 1 1 pi K K

Bayesian We have observed I 1,..., I K we would like to determine the probability that an observation falls in bin k. Denote by d k the event that an observation falls into bin k We wish to determine P(d k I 1,..., I K )

K 1 Simplex To do this we will need to evaluate two integrals over the K 1-simplex S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q 2 +... + q K = 1 } 0 Simplex: point 1 Simplex: line segment 2 Simplex: triangle 3 Simplex: Tetrahedron 4 Simplex: Pentachoron

K 1 Simplex Definition A K 1 Simplex is a (K 1)-dimensional polytope which is the convex hull of its K vertices. S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q 2 +... + q K = 1 } The K vertices of S are the K tuples (1, 0,..., 0), (0, 1, 0,..., 0), (0, 0, 1, 0,..., 0),..., (0,..., 0, 1)

The K 1 Simplex S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q 2 +... + q K = 1 }

Two Integrals They are: (q 1,...,q K ) S dq 1,..., dq K = 1 (K 1)! K (q 1,...,q K ) S k=1 q I k k dq 1,..., dq K = K k=1 I k! (I + K 1)! where K I k = I k=1

Gamma Function Γ(z) = Γ(n) = (n 1)! 0 t z 1 e t dt

Derivation The derivation goes as follows: Prob(d k I 1,..., I K ) = Prob(d k, I 1,..., I K ) Prob(I 1,..., I K ) = = = Prob(d (p 1,...,p K ) S k, I 1,..., I K, p 1,..., p K )dp 1... dp K Prob(I (q 1,...,q K ) S 1,..., I K, q 1,..., q K )dq 1... dq K Prob(d (p 1,...p K ) S k, I 1,..., I K p 1,... p K )P(p 1,..., p K )dp 1... dp K Prob(I (q 1...,q K ) S 1,... I K q 1,... q K )P(q 1,..., q K )dq 1... dq K K n=1 I n! K m=1 pi m m p k (K 1)!dp 1 dp K (p 1,...,p K ) S (q 1...,q K ) S I! K n=1 I n! I! K m=1 qi m m (K 1)!dq 1... dq K

Derivation Prob(d k I 1,..., I K ) = p 1,...,p K ) S pi 1 1 pi 2 2, pi k 1 k 1 pi k +1 p I k+1 k k+1 pi K K dp 1 dp K (q 1,...,q K ) S qi 1 1 qi 2 2 qi K K dq 1 dq K I 1!I 2! I k 1!(I k +1)!I k+1! I K (I+K )! = I 1!I 2! I K! (I+K 1)! = I k + 1 I + K

Prior Distribution The prior distribution over the K -Simplex does not have to be taken as uniform. The natural prior distribution over the K -Simplex is the Dirichlet distribution. P(p 1,..., p K α 1,..., α K ) = Γ( K k=1 α k) K k=1 Γ(α k) α k > 0 K k=1 0 < p k < 1, k = 1,..., K K 1 p k < 1 k=1 K 1 p K = 1 k=1 p k p α k 1 k

Dirichlet Distribution Properties E[p k ] = α k K j=1 α j V [p k ] = E[p k](1 E[p k ]) 1 + K j=1 α j C[p i, p j ] = E[p i]e[p j ] 1 + K k=1 α k If α k > 1, k = 1,..., K, the maximum density occurs at p k = α k 1 ( K j=1 α j) K

The Uniform The uniform distribution on the K 1-Simplex is a special case of the Dirichlet distribution where α k = 1, k = 1,..., K. P(p 1,..., p K α 1,..., α K ) = = Γ( K k=1 α k) K k=1 Γ(α k) P(p 1,..., p K 1,..., 1) = Γ(K ) Γ(1) K k=1 = (K 1)! p 0 k K k=1 p α k 1 k

The Beta Distribution The Beta distribution is a special case of the Dirichlet distribution for K = 2. P(y) = 1 B(p, q) y p 1 (1 y) q 1 where p > 0 and q > 0 and 0 y 1 B(p, q) = Γ(p)Γ(q) Γ(p + q)

Dirichlet Prior In the case of the Dirichlet prior distribution, the derivation goes in a similar manner. Prob(d k I 1,..., I K ) = Prob(d k, I 1,..., I K ) Prob(I 1,..., I K ) (p = 1,...,p K ) S Prob(d k, I 1,..., I K, p 1,..., p K )dp 1... dp K (q 1,...,q K ) S Prob(I 1,..., I K, q 1,..., q K )dq 1... dq K (p = 1,...p K ) S Prob(I 1,..., I k 1, I k + 1, I k+1,..., I K p 1,... p K )P(p 1,..., p K )dp 1... dp K (q 1...,q K ) S Prob(I 1,... I K q 1,... q K )P(q 1,..., q K )dq 1... dq K

Dirichlet Prior Prob(I 1,..., I K ) = = (q 1,...,q K ) S (q 1,...,q K ) S K n=1 = I n! I! K n=1 = I n! I! Prob(I 1,..., I K, q 1,..., q K )dq 1... dq K K n=1 I n! I! Γ( K n=1 α n) K n=1 Γ(α n) Γ( K n=1 α n) N n=1 Γ(α n) K m=1 q Im m Γ( K n=1 α n) K n=1 Γ(α n) K (q 1,...,q K ) S m=1 K k=1 (I k + α k 1)! (I 1 + K k=1 α k)! K q αm 1 m m=1 q Im+αm 1 m dq 1,..., dq K dq 1,..., dq K

Dirichlet Prior Prob(d k, I 1,..., I K ) = = (q 1,...,q K ) S (q 1,...,q K ) S Prob(d k, I 1,..., I K, q 1,..., q K )dq 1... dq K Prob(d k )Prob(I 1,..., I K q 1,..., q K ) Prob(q 1,..., q K )dq 1,..., q K K n=1 = q I n! K Γ( K k q Im n=1 m (q 1,...,q K ) S I! α n) K K m=1 n=1 Γ(α n) m=1 K n=1 = I n! Γ( K n=1 α n) K I! K n=1 Γ(α q k n) (q 1,...,q K ) S m=1 K n=1 = I n! Γ( K n=1 α n) I! N n=1 Γ(α n) q αm 1 m dq 1,..., dq K q Im+αm 1 m dq 1,..., dq K (I k + α k ) K n=1 (I n + α n 1)! (I + K n=1 α n)(i 1 + K n=1 α n)!

Dirichlet Prior K n=1 Prob(d k, I 1,..., I K ) = I n! Γ( K n=1 α n) (I k + α k ) K n=1 (I n + α n 1)! I! N n=1 Γ(α n) (I + K n=1 α n)(i 1 + K n=1 α n)! K n=1 Prob(I 1,..., I K ) = I n! Γ( K n=1 α K n) k=1 (I k + α k 1)! I! N n=1 Γ(α n) (I 1 + K k=1 α k)! Prob(d k I 1,..., I K ) = (I k +α k ) K n=1 (I n+α n 1)! (I+ K k=1 α k )(I 1+ K k=1 α k )! Kn=1 (I n+α n 1)! (I 1+ K k=1 α k )! = I k + α k I + K n=1 α n