Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Size: px
Start display at page:

Download "Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York"

Transcription

1 Quantization Robert M. Haralick Computer Science, Graduate Center City University of New York

2 Outline Quantizing 1 Quantizing

3 Quantizing Data is real-valued Data is integer valued with large max value To use a discrete Bayes rule the data has to be quantized Quantize each dimension to 10 or fewer quantized intervals

4 The Problem Assume input data in each dimension is discretely valued: And now it must be quantized Determine the Number of Quantizing levels for each dimension Determine the Quantizing interval boundaries Determine the Probability associated with each quantizing bin

5 Quantizer and Bins J dimensions L quantized values per dimension L J bins in discrete measurement space Each bin has a class conditional probability

6 The Quantizer Definition A quantizer q is a monotonically increasing function that takes in a real number and produces a non-negative integer between 0 and K 1 where K is the number of quantizing levels.

7 Quantizing Interval Definition The quantizing interval Q k associated with the integer k is defined by Q k = {x q(x) = k} Q 0 Q 1 Q 2 Q

8 Table Lookup Let Z = (z 1,..., z J ) be a measurement tuple Let q j be the quantizing function for the j th dimension The quantized tuple for Z is (q 1 (z 1 ),..., q J (z J )) The address for quantized Z is a(q 1 (z 1 ),..., q J (z J )) P(a(q 1 (z 1 ),..., q J (z J )) c) = P(q 1 (z 1 ),..., q J (z J ) c)

9 Entropy Definition Definition If p 1,..., p K is a discrete probability function, its Entropy H is K H = p k log 2 p k k=1

10 Entropy Meaning Person A chooses an index from {1,..., K } in accordance with probabilities p 1,..., p K Person B is to determine the index chosen by Person A by guessing Person B can ask any question that can be answered Yes or No Assuming that Person B is clever in formulating the questions, it will take on the average H questions to correctly determine the index Person A chose.

11 Entropy Quantizing H = K p k log 2 p k k=1 If a message is sent composed of index choices sampled from a distribution with probabilities p 1,..., p K, the average information in the message is H bits per symbol.

12 Entropy Estimation Data is discrete Observe x 1,..., x N, where each x n {1,..., K } Count the number of occurrences m k = #{n x n = k} Estimate the probability p k = m k N Number of zero counts n 0 = #{k m k = 0} Unbiased estimate of entropy Ĥ = K k=1 p k log 2 p k + n 0 1 2Nlog e 2 G. Miller, Note On The Bias Of Information Estimates, In H. Quastler (Ed.) Information theory in psychology II-B, Free Press, Glencoe, IL, 1955, pp

13 Number of Quantizing Levels J dimensions Each observation is a tuple Z = (z 1,..., z J ) Each z j is discretely valued Let M be the total number of quantizing bins over J dimensions How to determine the number L j of bins for the j th dimension? M = J j=1 L j

14 The Entropy Solution Let Ĥj be the entropy of the j th component of Z. Let L j be the number of bins for the j th component of Z. Define f j = Ĥ j J i=1 Ĥi L j = M f j J J L j = M f j j=1 j=1 J = M j=1 f j = M

15 The Number of Quantizing Bins Let Ĥj be the entropy of the j th component of Z. Let L j be the number of bins for the j th component of Z. Define f j = Ĥ j J i=1 Ĥi L j = M f j Now we have solved for the number of quantizing bins for each dimension.

16 How to Determine The Probabilities Memory size for each class conditional probability is M Real Valued Data If the number of observations is N Equal Interval Quantize each component to N/10 Levels Digitized Data If each data item is I bits Set the number of quantized levels to 2 I Determine the probability for each quantized level for each component Determine the entropy H j for each component j J Set the number of quantized levels for component j to be L j = M f j f j = Ĥj J i=1 Ĥi

17 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N Each tuple has J components Component j has L j quantized levels The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order. The left quantizing interval boundaries are: z (1)j z ( N L j +1)j z ( kn L j +1)j z ( ) (L j 1)N L +1 j j z (N)j

18 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N Each tuple has J components Component j has L j quantized levels The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order. The left quantizing interval boundaries are: 0 z ( N L j +1)j z ( kn L j +1)j z ( ) (L j 1)N L +1 j j 1023

19 Example Suppose N = 12, Data is 10 bits, and L j = 4. j th component z 1j,..., z 12j ordered values of j th component: z (1)j,... z (12)j N + 1 = 4 L j 2N + 1 = 7 L j 3N + 1 = 10 L j The quantizing intervals are: [0, z (4)j ) [z (4)j, z (7)j ) [z (7)j, z (10)j ) [z (10)j, 1023)

20 Initial Quantizing Interval Boundary The sample is Z 1,..., Z N The n th observed tuple: Z n = (z n1,..., z nj ) Let z (1)j,..., z (N)j be the N values of the j th component of the observed tuples, ordered in ascending order k indexes quantizing interval: k = 1,..., L j The k th quantizing interval [c kj, d kj ) for the j th component is defined by For k {2,..., L j 1} c kj = z ((k 1)N/Lj +1)j d kj = z (kn/lj +1)j

21 Non-uniform Quantization

22 Maximum Likelihood Probability Estimation The sample is Z 1,..., Z N The n th observed tuple: Z n = (z n1,..., z nj ) The quantized tuple for Z n is (q 1 (z n1 ),..., q J (z nj )) The address for Z n is a(q 1 (z n1 ),..., q J (z nj )) The bins are numbered 1,..., M The number of observations falling into bin m is t m The maximum likelihood estimate of the probability for bin m is p m t m = #{n a(q 1 (z n1 ),..., q J (z nj )) = m} p m = t m N

23 Density Estimation Using Fixed Volumes Total count N Fix a volume v Count the number k of observations in the volume v Density is mass divided by volume Estimate the density of each point in the volume by k/n v

24 Density Estimation Using Fixed Counts Total count N Fix a count k Find the smallest volume v around the point having a count k just greater than k Density is mass divided by volume Estimate the density of each point in the volume by k/n v

25 Smoothed Estimates If the sample size is not large enough, the MLE probability estimates may not be representative. bin smoothing Let bin m have volume v m and count t m Let m 1,..., m I be the indexes of the I closest bins to bin m satisfying I I 1 k < k i=1 t mi i=1 b m = I i=1 t m i Vm = I i=1 v m i Density of each point in bin m: αb m /Vm Set α so that the density integrates to 1 t mi

26 Smoothed Estimates Density of each point in bin m: αb m /V m Volume of bin m: v m Probability of bin m: p m = (αb m /V m )v m Total probability: 1 = M m=1 αb mv m /V m α = 1 M m=1 b mv m /V m p m = 1 M k=1 b k v k /V k b m v m /V m

27 Smoothed Estimates p m = 1 M k=1 b kv k /V k b m v m /V m If v m = v, m = 1,..., M, then V m = I mv p m = = b m v/i m v M k=1 b kv/i k v b m /I m M k=1 b k/i k

28 Fixed Sample Size N Sample Z 1,..., Z N Total number of bins M Calculate the quantizer Determine the probability for each bin Smooth the bin probabilities using smoothing k Calculate a decision rule maximizing expected gain Everything depends on M and k

29 Memorization and Generalization k is too small: memorization, over-fitting k is too large: over-generalization, under-fitting M is too large: memorization, over-fitting M is too small: over-generalization, under-fitting

30 Optimize The Probability Estimation Parameters Split the ground truthed sample into three parts Use the first part to calculate the quantizer and bin probabilities Calculate the discrete Bayes decision rule Apply the decision rule to the second part so that an unbiased estimate of the expected economic gain given the decision rule can be computed Brute force optimization to find the values of M and k to maximize the estimated expected gain With M and k fixed, use the third part to determine an estimate for the expected gain for the optimization

31 Optimize The Probability Estimation Parameters Once the parameters M and k have been optimized, the quantizer boundaries can be optimized. Repeat until no change Use training data part 1 Choose a dimension Choose a boundary Change the boundary Determine the adjusted probabilities Determine the discrete Bayes decision rule Use training data part 2 Calculate the Expected Gain

32 Optimize The Quantizer Boundaries 0 z ( N K +1)j z z ( ) ( kn K +1)j (K 1)N K +1 j 1023 b 1j b 2j b kj b K 1j b Kj Repeat until no change Randomly choose a component j and quantizing interval k Randomly choose a small pertubation δ (δ can be positive or negative) Randomly choose a small integer M (No collision with neighboring boundaries) b new = b kj kj δ(m + 1) For (m = 0; m 2M + 1; m + +) b new kj b new kj + δ Compute New Probabilities Recompute Bayes Rule Save expected gain Replace b kj by the boundary position associated with the highest gain

33 Optimize The Quantizer Boundaries z ( kn K +1)j z ( ) (K 1)N K +1 j z (1)j z ( N K +1)j b 1j b 2j b kj b K 1j b Kj Greedy Algorithm has a random component Multiple runs will produce different answers Repeat greedy algorithm T times Keep track of best result so far After T times, use the best result z (N)j

34 Bayesian Perspective MLE: start bin counters from 0 Bayesian: start bin counters from β, β > 0

35 The Observation There are K bins. Each time an observation is made, the observation falls into exactly one of the K bins. The probability that an observation falls into bin k is p k. To estimate the bin probabilities p 1,..., p K, we take a random sample of I observations. We find that of the I observations, I 1 observations fall into bin 1 I 2 observations fall into bin 2... I K observations fall into bin K

36 Multinomial Under the protocol of the random sampling, the probability of observing counts I 1,..., I K given the bin probabilities p 1,..., p K is given by the multinomial P(I 1,..., I K p 1,..., p K ) = I! I 1! I K! pi 1 1 pi K K

37 Bayesian We have observed I 1,..., I K we would like to determine the probability that an observation falls in bin k. Denote by d k the event that an observation falls into bin k We wish to determine P(d k I 1,..., I K )

38 K 1 Simplex To do this we will need to evaluate two integrals over the K 1-simplex S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 } 0 Simplex: point 1 Simplex: line segment 2 Simplex: triangle 3 Simplex: Tetrahedron 4 Simplex: Pentachoron

39 K 1 Simplex Definition A K 1 Simplex is a (K 1)-dimensional polytope which is the convex hull of its K vertices. S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 } The K vertices of S are the K tuples (1, 0,..., 0), (0, 1, 0,..., 0), (0, 0, 1, 0,..., 0),..., (0,..., 0, 1)

40 The K 1 Simplex S = { (q 1,..., q K ) 0 q k 1, k = 1,..., K and q 1 + q q K = 1 }

41 Two Integrals They are: (q 1,...,q K ) S dq 1,..., dq K = 1 (K 1)! K (q 1,...,q K ) S k=1 q I k k dq 1,..., dq K = K k=1 I k! (I + K 1)! where K I k = I k=1

42 Gamma Function Γ(z) = Γ(n) = (n 1)! 0 t z 1 e t dt

43 Derivation The derivation goes as follows: Prob(d k I 1,..., I K ) = Prob(d k, I 1,..., I K ) Prob(I 1,..., I K ) = = = Prob(d (p 1,...,p K ) S k, I 1,..., I K, p 1,..., p K )dp 1... dp K Prob(I (q 1,...,q K ) S 1,..., I K, q 1,..., q K )dq 1... dq K Prob(d (p 1,...p K ) S k, I 1,..., I K p 1,... p K )P(p 1,..., p K )dp 1... dp K Prob(I (q 1...,q K ) S 1,... I K q 1,... q K )P(q 1,..., q K )dq 1... dq K K n=1 I n! K m=1 pi m m p k (K 1)!dp 1 dp K (p 1,...,p K ) S (q 1...,q K ) S I! K n=1 I n! I! K m=1 qi m m (K 1)!dq 1... dq K

44 Derivation Prob(d k I 1,..., I K ) = p 1,...,p K ) S pi 1 1 pi 2 2, pi k 1 k 1 pi k +1 p I k+1 k k+1 pi K K dp 1 dp K (q 1,...,q K ) S qi 1 1 qi 2 2 qi K K dq 1 dq K I 1!I 2! I k 1!(I k +1)!I k+1! I K (I+K )! = I 1!I 2! I K! (I+K 1)! = I k + 1 I + K

45 Prior Distribution The prior distribution over the K -Simplex does not have to be taken as uniform. The natural prior distribution over the K -Simplex is the Dirichlet distribution. P(p 1,..., p K α 1,..., α K ) = Γ( K k=1 α k) K k=1 Γ(α k) α k > 0 K k=1 0 < p k < 1, k = 1,..., K K 1 p k < 1 k=1 K 1 p K = 1 k=1 p k p α k 1 k

46 Dirichlet Distribution Properties E[p k ] = α k K j=1 α j V [p k ] = E[p k](1 E[p k ]) 1 + K j=1 α j C[p i, p j ] = E[p i]e[p j ] 1 + K k=1 α k If α k > 1, k = 1,..., K, the maximum density occurs at p k = α k 1 ( K j=1 α j) K

47 The Uniform The uniform distribution on the K 1-Simplex is a special case of the Dirichlet distribution where α k = 1, k = 1,..., K. P(p 1,..., p K α 1,..., α K ) = = Γ( K k=1 α k) K k=1 Γ(α k) P(p 1,..., p K 1,..., 1) = Γ(K ) Γ(1) K k=1 = (K 1)! p 0 k K k=1 p α k 1 k

48 The Beta Distribution The Beta distribution is a special case of the Dirichlet distribution for K = 2. P(y) = 1 B(p, q) y p 1 (1 y) q 1 where p > 0 and q > 0 and 0 y 1 B(p, q) = Γ(p)Γ(q) Γ(p + q)

49 Dirichlet Prior In the case of the Dirichlet prior distribution, the derivation goes in a similar manner. Prob(d k I 1,..., I K ) = Prob(d k, I 1,..., I K ) Prob(I 1,..., I K ) (p = 1,...,p K ) S Prob(d k, I 1,..., I K, p 1,..., p K )dp 1... dp K (q 1,...,q K ) S Prob(I 1,..., I K, q 1,..., q K )dq 1... dq K (p = 1,...p K ) S Prob(I 1,..., I k 1, I k + 1, I k+1,..., I K p 1,... p K )P(p 1,..., p K )dp 1... dp K (q 1...,q K ) S Prob(I 1,... I K q 1,... q K )P(q 1,..., q K )dq 1... dq K

50 Dirichlet Prior Prob(I 1,..., I K ) = = (q 1,...,q K ) S (q 1,...,q K ) S K n=1 = I n! I! K n=1 = I n! I! Prob(I 1,..., I K, q 1,..., q K )dq 1... dq K K n=1 I n! I! Γ( K n=1 α n) K n=1 Γ(α n) Γ( K n=1 α n) N n=1 Γ(α n) K m=1 q Im m Γ( K n=1 α n) K n=1 Γ(α n) K (q 1,...,q K ) S m=1 K k=1 (I k + α k 1)! (I 1 + K k=1 α k)! K q αm 1 m m=1 q Im+αm 1 m dq 1,..., dq K dq 1,..., dq K

51 Dirichlet Prior Prob(d k, I 1,..., I K ) = = (q 1,...,q K ) S (q 1,...,q K ) S Prob(d k, I 1,..., I K, q 1,..., q K )dq 1... dq K Prob(d k )Prob(I 1,..., I K q 1,..., q K ) Prob(q 1,..., q K )dq 1,..., q K K n=1 = q I n! K Γ( K k q Im n=1 m (q 1,...,q K ) S I! α n) K K m=1 n=1 Γ(α n) m=1 K n=1 = I n! Γ( K n=1 α n) K I! K n=1 Γ(α q k n) (q 1,...,q K ) S m=1 K n=1 = I n! Γ( K n=1 α n) I! N n=1 Γ(α n) q αm 1 m dq 1,..., dq K q Im+αm 1 m dq 1,..., dq K (I k + α k ) K n=1 (I n + α n 1)! (I + K n=1 α n)(i 1 + K n=1 α n)!

52 Dirichlet Prior K n=1 Prob(d k, I 1,..., I K ) = I n! Γ( K n=1 α n) (I k + α k ) K n=1 (I n + α n 1)! I! N n=1 Γ(α n) (I + K n=1 α n)(i 1 + K n=1 α n)! K n=1 Prob(I 1,..., I K ) = I n! Γ( K n=1 α K n) k=1 (I k + α k 1)! I! N n=1 Γ(α n) (I 1 + K k=1 α k)! Prob(d k I 1,..., I K ) = (I k +α k ) K n=1 (I n+α n 1)! (I+ K k=1 α k )(I 1+ K k=1 α k )! Kn=1 (I n+α n 1)! (I 1+ K k=1 α k )! = I k + α k I + K n=1 α n

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos. Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2

Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos. Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2 How Random Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2 1 Center for Complex Systems Research and Department of Physics University

More information

Motif representation using position weight matrix

Motif representation using position weight matrix Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Model Averaging (Bayesian Learning)

Model Averaging (Bayesian Learning) Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong Abstract: In this talk, a higher-order Interactive Hidden Markov Model

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that 1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1

More information

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1 ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2 0 1 6 C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science In this chapter, we learn how data can

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Naïve Bayes Classifier Eric Xing Lecture 3, January 23, 2006 Reading: Chap. 4 CB and handouts Classification Representing data: Choosing hypothesis

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Clustering and Gaussian Mixtures

Clustering and Gaussian Mixtures Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected

More information

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

EM Algorithm & High Dimensional Data

EM Algorithm & High Dimensional Data EM Algorithm & High Dimensional Data Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Gaussian EM Algorithm For the Gaussian mixture model, we have Expectation Step (E-Step): Maximization Step (M-Step): 2 EM

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Using Probability to do Statistics.

Using Probability to do Statistics. Al Nosedal. University of Toronto. November 5, 2015 Milk and honey and hemoglobin Animal experiments suggested that honey in a diet might raise hemoglobin level. A researcher designed a study involving

More information

University of Illinois at Urbana-Champaign. Midterm Examination

University of Illinois at Urbana-Champaign. Midterm Examination University of Illinois at Urbana-Champaign Midterm Examination CS410 Introduction to Text Information Systems Professor ChengXiang Zhai TA: Azadeh Shakery Time: 2:00 3:15pm, Mar. 14, 2007 Place: Room 1105,

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015 10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,

More information

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1 The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Bayesian Inference for Dirichlet-Multinomials

Bayesian Inference for Dirichlet-Multinomials Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Distributed Source Coding Using LDPC Codes

Distributed Source Coding Using LDPC Codes Distributed Source Coding Using LDPC Codes Telecommunications Laboratory Alex Balatsoukas-Stimming Technical University of Crete May 29, 2010 Telecommunications Laboratory (TUC) Distributed Source Coding

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduc)on to Bayesian methods (con)nued) - Lecture 16 Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Is cross-validation valid for small-sample microarray classification?

Is cross-validation valid for small-sample microarray classification? Is cross-validation valid for small-sample microarray classification? Braga-Neto et al. Bioinformatics 2004 Topics in Bioinformatics Presented by Lei Xu October 26, 2004 1 Review 1) Statistical framework

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

1 Maximum Likelihood Estimation

1 Maximum Likelihood Estimation heads tails Figure 1: A simple thumbtack tossing experiment. L(θ :D) 0 0.2 0.4 0.6 0.8 1 Figure 2: The likelihood function for the sequence of tosses H,T,T,H,H. 1 Maximum Likelihood Estimation In this

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The

More information

Variable Selection in Classification Trees Based on Imprecise Probabilities

Variable Selection in Classification Trees Based on Imprecise Probabilities 4th International Symposium on Imprecise Probabilities and Their Applications, Pittsburgh, Pennsylvania, 2005 Variable Selection in Classification Trees Based on Imprecise Probabilities Carolin Strobl

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Chapter 3 Linear Block Codes

Chapter 3 Linear Block Codes Wireless Information Transmission System Lab. Chapter 3 Linear Block Codes Institute of Communications Engineering National Sun Yat-sen University Outlines Introduction to linear block codes Syndrome and

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Algorithms: COMP3121/3821/9101/9801

Algorithms: COMP3121/3821/9101/9801 NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales TOPIC 4: THE GREEDY METHOD COMP3121/3821/9101/9801 1 / 23 The

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Maximum Likelihood Estimation of Dirichlet Distribution Parameters

Maximum Likelihood Estimation of Dirichlet Distribution Parameters Maximum Lielihood Estimation of Dirichlet Distribution Parameters Jonathan Huang Abstract. Dirichlet distributions are commonly used as priors over proportional data. In this paper, I will introduce this

More information

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 COS 424: Interacting with ata Lecturer: ave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 1 Graphical Models Wrap-up We began the lecture with some final words on graphical models. Choosing a

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

A Bayesian Analysis of Some Nonparametric Problems

A Bayesian Analysis of Some Nonparametric Problems A Bayesian Analysis of Some Nonparametric Problems Thomas S Ferguson April 8, 2003 Introduction Bayesian approach remained rather unsuccessful in treating nonparametric problems. This is primarily due

More information

Discrete Bayes. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Discrete Bayes. Robert M. Haralick. Computer Science, Graduate Center City University of New York Discrete Bayes Robert M. Haralick Computer Science, Graduate Center City University of New York Outline 1 The Basic Perspective 2 3 Perspective Unit of Observation Not Equal to Unit of Classification Consider

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos

How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos Christopher C. Strelioff Center for Complex Systems Research and Department of Physics University of Illinois

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian

More information

Discrete Binary Distributions

Discrete Binary Distributions Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Classification Algorithms

Classification Algorithms Classification Algorithms UCSB 290N, 2015. T. Yang Slides based on R. Mooney UT Austin 1 Table of Content roblem Definition Rocchio K-nearest neighbor case based Bayesian algorithm Decision trees 2 Given:

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-34 Decision Trees STEIN/LETTMANN 2005-2017 Splitting Let t be a leaf node

More information

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1 Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY BURTON ROSENBERG UNIVERSITY OF MIAMI Contents 1. Perfect Secrecy 1 1.1. A Perfectly Secret Cipher 2 1.2. Odds Ratio and Bias 3 1.3. Conditions for Perfect

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

Multivariate density estimation and its applications

Multivariate density estimation and its applications Multivariate density estimation and its applications Wing Hung Wong Stanford University June 2014, Madison, Wisconsin. Conference in honor of the 80 th birthday of Professor Grace Wahba Lessons I learned

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Binary Convolutional Codes

Binary Convolutional Codes Binary Convolutional Codes A convolutional code has memory over a short block length. This memory results in encoded output symbols that depend not only on the present input, but also on past inputs. An

More information

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function Administration Chapter 3: Decision Tree Learning (part 2) Book on reserve in the math library. Questions? CS 536: Machine Learning Littman (Wu, TA) Measuring Entropy Entropy Function S is a sample of training

More information