Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York
|
|
- Marjorie Ball
- 5 years ago
- Views:
Transcription
1 Quantization Robert M. Haralick Computer Science, Graduate Center City University of New York
2 Outline Quantizing 1 Quantizing
3 Quantizing Data is real-valued Data is integer valued with large max value To use a discrete Bayes rule the data has to be quantized Quantize each dimension to 10 or fewer quantized intervals
4 The Problem Assume input data in each dimension is discretely valued: And now it must be quantized Determine the Number of Quantizing levels for each dimension Determine the Quantizing interval boundaries Determine the Probability associated with each quantizing bin
5 Quantizer and Bins J dimensions L quantized values per dimension L J bins in discrete measurement space Each bin has a class conditional probability
6 The Quantizer Definition A quantizer q is a monotonically increasing function that takes in a real number and produces a non-negative integer between 0 and K 1 where K is the number of quantizing levels.
7 Quantizing Interval Definition The quantizing interval Q k associated with the integer k is defined by Q k = {x q(x) = k} Q 0 Q 1 Q 2 Q
8 Table Lookup Let Z = (z 1,..., z J ) be a measurement tuple Let q j be the quantizing function for the j th dimension The quantized tuple for Z is (q 1 (z 1 ),..., q J (z J )) The address for quantized Z is a(q 1 (z 1 ),..., q J (z J )) P(a(q 1 (z 1 ),..., q J (z J )) c) = P(q 1 (z 1 ),..., q J (z J ) c)
9 Entropy Definition Definition If p 1,..., p K is a discrete probability function, its Entropy H is K H = p k log 2 p k k=1
10 Entropy Meaning Person A chooses an index from {1,..., K } in accordance with probabilities p 1,..., p K Person B is to determine the index chosen by Person A by guessing Person B can ask any question that can be answered Yes or No Assuming that Person B is clever in formulating the questions, it will take on the average H questions to correctly determine the index Person A chose.
11 Entropy Quantizing H = K p k log 2 p k k=1 If a message is sent composed of index choices sampled from a distribution with probabilities p 1,..., p K, the average information in the message is H bits per symbol.
12 Entropy Estimation Data is discrete Observe x 1,..., x N, where each x n {1,..., K } Count the number of occurrences m k = #{n x n = k} Estimate the probability p k = m k N Number of zero counts n 0 = #{k m k = 0} Unbiased estimate of entropy Ĥ = K k=1 p k log 2 p k + n 0 1 2Nlog e 2 G. Miller, Note On The Bias Of Information Estimates, In H. Quastler (Ed.) Information theory in psychology II-B, Free Press, Glencoe, IL, 1955, pp
13 Number of Quantizing Levels J dimensions Each observation is a tuple Z = (z 1,..., z J ) Each z j is discretely valued Let M be the total number of quantizing bins over J dimensions How to determine the number L j of bins for the j th dimension? M = J j=1 L j
14 The Entropy Solution Let Ĥj be the entropy of the j th component of Z. Let L j be the number of bins for the j th component of Z. Define f j = Ĥ j J i=1 Ĥi L j = M f j J J L j = M f j j=1 j=1 J = M j=1 f j = M
15 The Number of Quantizing Bins Let Ĥj be the entropy of the j th component of Z. Let L j be the number of bins for the j th component of Z. Define f j = Ĥ j J i=1 Ĥi L j = M f j Now we have solved for the number of quantizing bins for each dimension.
16 How to Determine The Probabilities Memory size for each class conditional probability is M Real Valued Data If the number of observations is N Equal Interval Quantize each component to N/10 Levels Digitized Data If each data item is I bits Set the number of quantized levels to 2 I Determine the probability for each quantized level for each component Determine the entropy H j for each component j J Set the number of quantized levels for component j to be L j = M f j f j = Ĥj J i=1 Ĥi
17 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N Each tuple has J components Component j has L j quantized levels The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order. The left quantizing interval boundaries are: z (1)j z ( N L j +1)j z ( kn L j +1)j z ( ) (L j 1)N L +1 j j z (N)j
18 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N Each tuple has J components Component j has L j quantized levels The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order. The left quantizing interval boundaries are: 0 z ( N L j +1)j z ( kn L j +1)j z ( ) (L j 1)N L +1 j j 1023
19 Example Suppose N = 12, Data is 10 bits, and L j = 4. j th component z 1j,..., z 12j ordered values of j th component: z (1)j,... z (12)j N + 1 = 4 L j 2N + 1 = 7 L j 3N + 1 = 10 L j The quantizing intervals are: [0, z (4)j ) [z (4)j, z (7)j ) [z (7)j, z (10)j ) [z (10)j, 1023)
20 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order k indexes quantizing interval: k = 1,..., L j The k th quantizing interval [c kj, d kj ) for the j th component is defined by For k {2,..., L j 1} c kj = z ((k 1)N/Lj +1)j d kj = z (kn/lj +1)j
21 Non-uniform Quantization
22 Maximum Likelihood Probability Estimation The sample is Z 1,..., Z N The n th observed tuple: Z n = (z n1,..., z nj ) The quantized tuple for Z n is (q 1 (z n1 ),..., q J (z nj )) The address for Z n is a(q 1 (z n1 ),..., q J (z nj )) The bins are numbered 1,..., M The number of observations falling into bin m is t m The maximum likelihood estimate of the probability for bin m is p m t m = #{n a(q 1 (z n1 ),..., q J (z nj )) = m} p m = t m N
23 Density Estimation Using Fixed Volumes Total count N Fix a volume v Count the number k of observations in the volume v Density is mass divided by volume Estimate the density of each point in the volume by k/n v
24 Density Estimation Using Fixed Counts Total count N Fix a count k Find the smallest volume v around the point having a count k just greater than k Density is mass divided by volume Estimate the density of each point in the volume by k/n v
25 Smoothed Estimates If the sample size is not large enough, the MLE probability estimates may not be representative. bin smoothing Let bin m have volume v m and count t m Let m 1,..., m I be the indexes of the I closest bins to bin m satisfying I I 1 k < k i=1 t mi i=1 b m = I i=1 t m i Vm = I i=1 v m i Density of each point in bin m: αb m /Vm Set α so that the density integrates to 1 t mi
26 Smoothed Estimates Density of each point in bin m: αb m /V m Volume of bin m: v m Probability of bin m: p m = (αb m /V m )v m Total probability: 1 = M m=1 αb mv m /V m α = 1 M m=1 b mv m /V m p m = 1 M k=1 b k v k /V k b m v m /V m
27 Smoothed Estimates p m = 1 M k=1 b kv k /V k b m v m /V m If v m = v, m = 1,..., M, then V m = I mv p m = = b m v/i m v M k=1 b kv/i k v b m /I m M k=1 b k/i k
28 Fixed Sample Size N Sample Z 1,..., Z N Total number of bins M Calculate the quantizer Determine the probability for each bin Smooth the bin probabilities using smoothing k Calculate a decision rule maximizing expected gain Everything depends on M and k
29 Memorization and Generalization k is too small: memorization, over-fitting k is too large: over-generalization, under-fitting M is too large: memorization, over-fitting M is too small: over-generalization, under-fitting
30 Optimize The Probability Estimation Parameters Split the ground truthed sample into three parts Use the first part to calculate the quantizer and bin probabilities Calculate the discrete Bayes decision rule Apply the decision rule to the second part so that an unbiased estimate of the expected economic gain given the decision rule can be computed Brute force optimization to find the values of M and k to maximize the estimated expected gain With M and k fixed, use the third part to determine an estimate for the expected gain for the optimization
31 Optimize The Probability Estimation Parameters Once the parameters M and k have been optimized, the quantizer boundaries can be optimized. Repeat until no change Use training data part 1 Choose a dimension Choose a boundary Change the boundary Determine the adjusted probabilities Determine the discrete Bayes decision rule Use training data part 2 Calculate the Expected Gain
32 Optimize The Quantizer Boundaries 0 z ( N K +1)j z z ( ) ( kn K +1)j (K 1)N K +1 j 1023 b 1j b 2j b kj b K 1j b Kj Repeat until no change Randomly choose a component j and quantizing interval k Randomly choose a small pertubation δ (δ can be positive or negative) Randomly choose a small integer M (No collision with neighboring boundaries) b new = b kj kj δ(m + 1) For (m = 0; m 2M + 1; m + +) b new kj b new kj + δ Compute New Probabilities Recompute Bayes Rule Save expected gain Replace b kj by the boundary position associated with the highest gain
33 Optimize The Quantizer Boundaries z ( kn K +1)j z ( ) (K 1)N K +1 j z (1)j z ( N K +1)j b 1j b 2j b kj b K 1j b Kj Greedy Algorithm has a random component Multiple runs will produce different answers Repeat greedy algorithm T times Keep track of best result so far After T times, use the best result z (N)j
34 Bayesian Perspective MLE: start bin counters from 0 Bayesian: start bin counters from β, β > 0
35 The Observation There are K bins. Each time an observation is made, the observation falls into exactly one of the K bins. The probability that an observation falls into bin k is p k. To estimate the bin probabilities p 1,..., p K, we take a random sample of I observations. We find that of the I observations, I 1 observations fall into bin 1 I 2 observations fall into bin 2... I K observations fall into bin K
36 Multinomial Under the protocol of the random sampling, the probability of observing counts I 1,..., I K given the bin probabilities p 1,..., p K is given by the multinomial P(I 1,..., I K p 1,..., p K ) = I! I 1! I K! pi 1 1 pi K K
37 Bayesian We have observed I 1,..., I K we would like to determine the probability that an observation falls in bin k. Denote by d k the event that an observation falls into bin k We wish to determine P(d k I 1,..., I K )
38 K 1 Simplex To do this we will need to evaluate two integrals over the K 1-simplex S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 } 0 Simplex: point 1 Simplex: line segment 2 Simplex: triangle 3 Simplex: Tetrahedron 4 Simplex: Pentachoron
39 K 1 Simplex Definition A K 1 Simplex is a (K 1)-dimensional polytope which is the convex hull of its K vertices. S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 } The K vertices of S are the K tuples (1, 0,..., 0), (0, 1, 0,..., 0), (0, 0, 1, 0,..., 0),..., (0,..., 0, 1)
40 The K 1 Simplex S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 }
41 Two Integrals They are: (q 1,...,q K ) S dq 1,..., dq K = 1 (K 1)! K (q 1,...,q K ) S k=1 q I k k dq 1,..., dq K = K k=1 I k! (I + K 1)! where K I k = I k=1
42 Gamma Function Γ(z) = Γ(n) = (n 1)! 0 t z 1 e t dt
43 Derivation The derivation goes as follows: Prob(d k I 1,..., I K ) = Prob(d k, I 1,..., I K ) Prob(I 1,..., I K ) = = = Prob(d (p 1,...,p K ) S k, I 1,..., I K, p 1,..., p K )dp 1... dp K Prob(I (q 1,...,q K ) S 1,..., I K, q 1,..., q K )dq 1... dq K Prob(d (p 1,...p K ) S k, I 1,..., I K p 1,... p K )P(p 1,..., p K )dp 1... dp K Prob(I (q 1...,q K ) S 1,... I K q 1,... q K )P(q 1,..., q K )dq 1... dq K K n=1 I n! K m=1 pi m m p k (K 1)!dp 1 dp K (p 1,...,p K ) S (q 1...,q K ) S I! K n=1 I n! I! K m=1 qi m m (K 1)!dq 1... dq K
44 Derivation Prob(d k I 1,..., I K ) = p 1,...,p K ) S pi 1 1 pi 2 2, pi k 1 k 1 pi k +1 p I k+1 k k+1 pi K K dp 1 dp K (q 1,...,q K ) S qi 1 1 qi 2 2 qi K K dq 1 dq K I 1!I 2! I k 1!(I k +1)!I k+1! I K (I+K )! = I 1!I 2! I K! (I+K 1)! = I k + 1 I + K
45 Prior Distribution The prior distribution over the K -Simplex does not have to be taken as uniform. The natural prior distribution over the K -Simplex is the Dirichlet distribution. P(p 1,..., p K α 1,..., α K ) = Γ( K k=1 α k) K k=1 Γ(α k) α k > 0 K k=1 0 < p k < 1, k = 1,..., K K 1 p k < 1 k=1 K 1 p K = 1 k=1 p k p α k 1 k
46 Dirichlet Distribution Properties E[p k ] = α k K j=1 α j V [p k ] = E[p k](1 E[p k ]) 1 + K j=1 α j C[p i, p j ] = E[p i]e[p j ] 1 + K k=1 α k If α k > 1, k = 1,..., K, the maximum density occurs at p k = α k 1 ( K j=1 α j) K
47 The Uniform The uniform distribution on the K 1-Simplex is a special case of the Dirichlet distribution where α k = 1, k = 1,..., K. P(p 1,..., p K α 1,..., α K ) = = Γ( K k=1 α k) K k=1 Γ(α k) P(p 1,..., p K 1,..., 1) = Γ(K ) Γ(1) K k=1 = (K 1)! p 0 k K k=1 p α k 1 k
48 The Beta Distribution The Beta distribution is a special case of the Dirichlet distribution for K = 2. P(y) = 1 B(p, q) y p 1 (1 y) q 1 where p > 0 and q > 0 and 0 y 1 B(p, q) = Γ(p)Γ(q) Γ(p + q)
49 Dirichlet Prior In the case of the Dirichlet prior distribution, the derivation goes in a similar manner. Prob(d k I 1,..., I K ) = Prob(d k, I 1,..., I K ) Prob(I 1,..., I K ) (p = 1,...,p K ) S Prob(d k, I 1,..., I K, p 1,..., p K )dp 1... dp K (q 1,...,q K ) S Prob(I 1,..., I K, q 1,..., q K )dq 1... dq K (p = 1,...p K ) S Prob(I 1,..., I k 1, I k + 1, I k+1,..., I K p 1,... p K )P(p 1,..., p K )dp 1... dp K (q 1...,q K ) S Prob(I 1,... I K q 1,... q K )P(q 1,..., q K )dq 1... dq K
50 Dirichlet Prior Prob(I 1,..., I K ) = = (q 1,...,q K ) S (q 1,...,q K ) S K n=1 = I n! I! K n=1 = I n! I! Prob(I 1,..., I K, q 1,..., q K )dq 1... dq K K n=1 I n! I! Γ( K n=1 α n) K n=1 Γ(α n) Γ( K n=1 α n) N n=1 Γ(α n) K m=1 q Im m Γ( K n=1 α n) K n=1 Γ(α n) K (q 1,...,q K ) S m=1 K k=1 (I k + α k 1)! (I 1 + K k=1 α k)! K q αm 1 m m=1 q Im+αm 1 m dq 1,..., dq K dq 1,..., dq K
51 Dirichlet Prior Prob(d k, I 1,..., I K ) = = (q 1,...,q K ) S (q 1,...,q K ) S Prob(d k, I 1,..., I K, q 1,..., q K )dq 1... dq K Prob(d k )Prob(I 1,..., I K q 1,..., q K ) Prob(q 1,..., q K )dq 1,..., q K K n=1 = q I n! K Γ( K k q Im n=1 m (q 1,...,q K ) S I! α n) K K m=1 n=1 Γ(α n) m=1 K n=1 = I n! Γ( K n=1 α n) K I! K n=1 Γ(α q k n) (q 1,...,q K ) S m=1 K n=1 = I n! Γ( K n=1 α n) I! N n=1 Γ(α n) q αm 1 m dq 1,..., dq K q Im+αm 1 m dq 1,..., dq K (I k + α k ) K n=1 (I n + α n 1)! (I + K n=1 α n)(i 1 + K n=1 α n)!
52 Dirichlet Prior K n=1 Prob(d k, I 1,..., I K ) = I n! Γ( K n=1 α n) (I k + α k ) K n=1 (I n + α n 1)! I! N n=1 Γ(α n) (I + K n=1 α n)(i 1 + K n=1 α n)! K n=1 Prob(I 1,..., I K ) = I n! Γ( K n=1 α K n) k=1 (I k + α k 1)! I! N n=1 Γ(α n) (I 1 + K k=1 α k)! Prob(d k I 1,..., I K ) = (I k +α k ) K n=1 (I n+α n 1)! (I+ K k=1 α k )(I 1+ K k=1 α k )! Kn=1 (I n+α n 1)! (I 1+ K k=1 α k )! = I k + α k I + K n=1 α n
Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationBayesian Inference and the Symbolic Dynamics of Deterministic Chaos. Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2
How Random Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2 1 Center for Complex Systems Research and Department of Physics University
More informationMotif representation using position weight matrix
Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationModel Averaging (Bayesian Learning)
Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationA Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong
A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong Abstract: In this talk, a higher-order Interactive Hidden Markov Model
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More information( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1
( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2 0 1 6 C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science In this chapter, we learn how data can
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationMachine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Naïve Bayes Classifier Eric Xing Lecture 3, January 23, 2006 Reading: Chap. 4 CB and handouts Classification Representing data: Choosing hypothesis
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationClustering and Gaussian Mixtures
Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationMaximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS
Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationEM Algorithm & High Dimensional Data
EM Algorithm & High Dimensional Data Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Gaussian EM Algorithm For the Gaussian mixture model, we have Expectation Step (E-Step): Maximization Step (M-Step): 2 EM
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationClustering. Léon Bottou COS 424 3/4/2010. NEC Labs America
Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationUsing Probability to do Statistics.
Al Nosedal. University of Toronto. November 5, 2015 Milk and honey and hemoglobin Animal experiments suggested that honey in a diet might raise hemoglobin level. A researcher designed a study involving
More informationUniversity of Illinois at Urbana-Champaign. Midterm Examination
University of Illinois at Urbana-Champaign Midterm Examination CS410 Introduction to Text Information Systems Professor ChengXiang Zhai TA: Azadeh Shakery Time: 2:00 3:15pm, Mar. 14, 2007 Place: Room 1105,
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationHomework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015
10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationMidterm: CS 6375 Spring 2018
Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional
More informationThe Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1
The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationBayesian Inference for Dirichlet-Multinomials
Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationDistributed Source Coding Using LDPC Codes
Distributed Source Coding Using LDPC Codes Telecommunications Laboratory Alex Balatsoukas-Stimming Technical University of Crete May 29, 2010 Telecommunications Laboratory (TUC) Distributed Source Coding
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationIntroduc)on to Bayesian methods (con)nued) - Lecture 16
Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationIs cross-validation valid for small-sample microarray classification?
Is cross-validation valid for small-sample microarray classification? Braga-Neto et al. Bioinformatics 2004 Topics in Bioinformatics Presented by Lei Xu October 26, 2004 1 Review 1) Statistical framework
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More information1 Maximum Likelihood Estimation
heads tails Figure 1: A simple thumbtack tossing experiment. L(θ :D) 0 0.2 0.4 0.6 0.8 1 Figure 2: The likelihood function for the sequence of tosses H,T,T,H,H. 1 Maximum Likelihood Estimation In this
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationVariable Selection in Classification Trees Based on Imprecise Probabilities
4th International Symposium on Imprecise Probabilities and Their Applications, Pittsburgh, Pennsylvania, 2005 Variable Selection in Classification Trees Based on Imprecise Probabilities Carolin Strobl
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationChapter 3 Linear Block Codes
Wireless Information Transmission System Lab. Chapter 3 Linear Block Codes Institute of Communications Engineering National Sun Yat-sen University Outlines Introduction to linear block codes Syndrome and
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationAlgorithms: COMP3121/3821/9101/9801
NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales TOPIC 4: THE GREEDY METHOD COMP3121/3821/9101/9801 1 / 23 The
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMaximum Likelihood Estimation of Dirichlet Distribution Parameters
Maximum Lielihood Estimation of Dirichlet Distribution Parameters Jonathan Huang Abstract. Dirichlet distributions are commonly used as priors over proportional data. In this paper, I will introduce this
More informationCOS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007
COS 424: Interacting with ata Lecturer: ave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 1 Graphical Models Wrap-up We began the lecture with some final words on graphical models. Choosing a
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationA Bayesian Analysis of Some Nonparametric Problems
A Bayesian Analysis of Some Nonparametric Problems Thomas S Ferguson April 8, 2003 Introduction Bayesian approach remained rather unsuccessful in treating nonparametric problems. This is primarily due
More informationDiscrete Bayes. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Discrete Bayes Robert M. Haralick Computer Science, Graduate Center City University of New York Outline 1 The Basic Perspective 2 3 Perspective Unit of Observation Not Equal to Unit of Classification Consider
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationHow Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos
How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos Christopher C. Strelioff Center for Complex Systems Research and Department of Physics University of Illinois
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian
More informationDiscrete Binary Distributions
Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:
More informationInformation Theory, Statistics, and Decision Trees
Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010
More informationClassification Algorithms
Classification Algorithms UCSB 290N, 2015. T. Yang Slides based on R. Mooney UT Austin 1 Table of Content roblem Definition Rocchio K-nearest neighbor case based Bayesian algorithm Decision trees 2 Given:
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationChapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning
Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-34 Decision Trees STEIN/LETTMANN 2005-2017 Splitting Let t be a leaf node
More informationLearning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1
Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationPERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY
PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY BURTON ROSENBERG UNIVERSITY OF MIAMI Contents 1. Perfect Secrecy 1 1.1. A Perfectly Secret Cipher 2 1.2. Odds Ratio and Bias 3 1.3. Conditions for Perfect
More informationInformation Geometric view of Belief Propagation
Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural
More informationMultivariate density estimation and its applications
Multivariate density estimation and its applications Wing Hung Wong Stanford University June 2014, Madison, Wisconsin. Conference in honor of the 80 th birthday of Professor Grace Wahba Lessons I learned
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationBinary Convolutional Codes
Binary Convolutional Codes A convolutional code has memory over a short block length. This memory results in encoded output symbols that depend not only on the present input, but also on past inputs. An
More informationAdministration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function
Administration Chapter 3: Decision Tree Learning (part 2) Book on reserve in the math library. Questions? CS 536: Machine Learning Littman (Wu, TA) Measuring Entropy Entropy Function S is a sample of training
More information