Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
|
|
- Kory Foster
- 6 years ago
- Views:
Transcription
1 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian network Parameter learning: Learn parameters for a given structure. Structure learning: Learn both structures and parameters Learning latent structures: Discover latent variables behind observed variables and determine their relationships. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
2 COMP538: Introduction to Bayesian Networks Lecture 6: Parameter Learning in Bayesian Networks Nevin L. Zhang Department of Computer Science and Engineering Hong Kong University of Science and Technology Fall 2008 Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
3 Objective Objective: Principles for parameter learning in Bayesian networks. Algorithms for the case of complete data. Reading: Zhang and Guo (2007), Chapter 7 Reference: Heckerman (1996) (first half), Cowell et al (1999, Chapter 9) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
4 Outline Problem Statement 1 Problem Statement 2 Principles of Parameter Learning Maximum likelihood estimation Bayesian estimation Variable with Multiple Values 3 Parameter Estimation in General Bayesian Networks The Parameters Maximum likelihood estimation Properties of MLE Bayesian estimation Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
5 Problem Statement Parameter Learning Given: A Bayesian network structure. X 1 X 2 X 3 X 4 A data set X 5 Estimate conditional probabilities: X 1 X 2 X 3 X 4 X : : : : : P(X 1 ), P(X 2 ), P(X 3 X 1, X 2 ), P(X 4 X 1 ), P(X 5 X 1, X 3, X 4 ) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
6 Outline Principles of Parameter Learning 1 Problem Statement 2 Principles of Parameter Learning Maximum likelihood estimation Bayesian estimation Variable with Multiple Values 3 Parameter Estimation in General Bayesian Networks The Parameters Maximum likelihood estimation Properties of MLE Bayesian estimation Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
7 Principles of Parameter Learning Single-Node Bayesian Network Maximum likelihood estimation X X: result of tossing a thumbtack H T Consider a Bayesian network with one node X, where X is the result of tossing a thumbtack and Ω X = {H, T }. Data cases: D 1 = H, D 2 = T, D 3 = H,..., D m = H Data set: D = {D 1, D 2, D 3,..., D m } Estimate parameter: θ = P(X =H). Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
8 Likelihood Principles of Parameter Learning Maximum likelihood estimation Data: D = {H, T, H, T, T, H, T } As possible values of θ, which of the following is the most likely? Why? θ = 0 θ = 0.01 θ = 10.5 θ = 0 contradicts data because P(D θ = 0) = 0.It cannot explain the data at all. θ = 0.01 almost contradicts with the data. It does not explain the data well. However, it is more consistent with the data than θ = 0 because P(D θ = 0.01) > P(D θ = 0). So θ = 0.5 is more consistent with the data than θ = 0.01 because P(D θ = 0.5) > P(D θ = 0.01) It explains the data the best among the three and is hence the most likely. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
9 Principles of Parameter Learning Maximum Likelihood Estimation Maximum likelihood estimation In general, the larger P(D θ = v) is, the more likely θ = v is. Likelihood of parameter θ given data set: L(θ D) = P(D θ) L( θ D) The maximum likelihood estimation (MLE) θ of θ is a possible value of θ such that 0 θ * 1 θ L(θ D) = sup θ L(θ D). MLE best explains data or best fits data. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
10 i.i.d and Likelihood Principles of Parameter Learning Maximum likelihood estimation Assume the data cases D 1,..., D m are independent given θ: m P(D 1,...,D m θ) = P(D i θ) i=1 Assume the data cases are identically distributed: P(D i = H) = θ, P(D i = T) = 1 θ (Note: i.i.d means independent and identically distributed) Then L(θ D) for all i = P(D θ) = P(D 1,..., D m θ) m = P(D i θ) = θ m h (1 θ) mt (1) where m h is the number of heads and m t is the number of tail. Binomial likelihood. i=1 Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
11 Principles of Parameter Learning Maximum likelihood estimation Example of Likelihood Function Example: D = {D 1 = H, D 2 T, D 3 = H, D 4 = H, D 5 = T } L(θ D) = P(D θ) = P(D 1 = H θ)p(d 2 = T θ)p(d 3 = H θ)p(d 4 = H θ)p(d 5 = T θ) = θ(1 θ)θθ(1 θ) = θ 3 (1 θ) 2. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
12 Sufficient Statistic Principles of Parameter Learning Maximum likelihood estimation A sufficient statistic is a function s(d) of data that summarizing the relevant information for computing the likelihood. That is s(d) = s(d ) L(θ D) = L(θ D ) Sufficient statistics tell us all there is to know about data. Since L(θ D) = θ m h (1 θ) mt, the pair (m h, m t ) is a sufficient statistic. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
13 Loglikelihood Principles of Parameter Learning Maximum likelihood estimation Loglikelihood: l(θ D) = logl(θ D) = logθ m h (1 θ) mt = m h logθ + m t log(1 θ) Maximizing likelihood is the same as maximizing loglikelihood. The latter is easier. By Corollary 1.1 of Lecture 1, the following value maximizes l(θ D): MLE is intuitive. It also has nice properties: θ = m h = m h m h + m t m E.g. Consistence: θ approaches the true value of θ with probability 1 as m goes to infinity. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
14 Drawback of MLE Principles of Parameter Learning Bayesian estimation Thumbtack tossing: (m h, m t ) = (3, 7). MLE: θ = 0.3. Reasonable. Data suggest that the thumbtack is biased toward tail. Coin tossing: Case 1: (m h, m t ) = (3, 7). MLE: θ = 0.3. Not reasonable. Our experience (prior) suggests strongly that coins are fair, hence θ=1/2. The size of the data set is too small to convince us this particular coin is biased. The fact that we get (3, 7) instead of (5, 5) is probably due to randomness. Case 2: (m h, m t ) = (30, 000, 70, 000). MLE: θ = 0.3. Reasonable. Data suggest that the coin is after all biased, overshadowing our prior. MLE does not differentiate between those two cases. It doe not take prior information into account. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
15 Principles of Parameter Learning Bayesian estimation Two Views on Parameter Estimation MLE: Assumes that θ is unknown but fixed parameter. Estimates it using θ, the value that maximizes the likelihood function Makes prediction based on the estimation: P(D m+1 = H D) = θ Bayesian Estimation: Treats θ as a random variable. Assumes a prior probability of θ: p(θ) Uses data to get posterior probability of θ: p(θ D) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
16 Principles of Parameter Learning Bayesian estimation Two Views on Parameter Estimation Bayesian Estimation: Predicting D m+1 P(D m+1 = H D) = = = = P(D m+1 = H, θ D)dθ P(D m+1 = H θ,d)p(θ D)dθ P(D m+1 = H θ)p(θ D)dθ θp(θ D)dθ. Full Bayesian: Take expectation over θ. Bayesian MAP: P(D m+1 = H D) = θ = arg maxp(θ D) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
17 Principles of Parameter Learning Bayesian estimation Calculating Bayesian Estimation Posterior distribution: p(θ D) p(θ)l(θ D) where the equation follows from (1) = θ m h (1 θ) mt p(θ) To facilitate analysis, assume prior has Beta distribution B(α h, α t ) Then p(θ) θ αh 1 αt 1 (1 θ) p(θ D) θ m h+α h 1 (1 θ) mt+αt 1 (2) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
18 Beta Distribution Principles of Parameter Learning Bayesian estimation Γ(α) = (α 1)!. It is also defined for non-integers. Density function of prior Beta distribution B(α h, α t ), p(θ) = Γ(α t + α h ) Γ(α t )Γ(α h ) θα h 1 αt 1 (1 θ) The normalization constant for the Beta distribution B(α h, α t ) Γ(α t + α h ) Γ(α t )Γ(α h ) where Γ(.) is the Gamma function. For any integer α, The hyperparameters α h and α t can be thought of as imaginary counts from our prior experiences. Their sum α = α h +α t is called equivalent sample size. The larger the equivalent sample size, the more confident we are in our prior. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
19 Conjugate Families Principles of Parameter Learning Bayesian estimation Binomial Likelihood: θ m h (1 θ) mt Beta Prior: θ α h 1 (1 θ) αt 1 Beta Posterior: θ m h+α h 1 (1 θ) mt+αt 1. Beta distributions are hence called a conjugate family for Binomial likelihood. Conjugate families allow closed-form for posterior distribution of parameters and closed-form solution for prediction. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
20 Principles of Parameter Learning Calculating Prediction Bayesian estimation We have P(D m+1 = H D) = θp(θ D)dθ = c θθ m h+α h 1 (1 θ) mt+αt 1 dθ = m h + α h m + α where c is the normalization constant, m=m h +m t, α = α h +α t. Consequently, P(D m+1 = T D) = m t + α t m + α After taking data D into consideration, now our updated belief on X=T is m t+α t m+α. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
21 Principles of Parameter Learning MLE and Bayesian estimation Bayesian estimation As m goes to infinity, P(D m+1 = H D) approaches the MLE approaches the true value of θ with probability 1. Coin tossing example revisited: Suppose α h = α t = 100. Equivalent sample size: 200 In case 1, P(D m+1 = H D) = Our prior prevails. In case 2, Data prevail. P(D m+1 = H D) = 30, , m h m h +m t,which Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
22 Principles of Parameter Learning Variable with Multiple Values Variable with Multiple Values Bayesian networks with a single multi-valued variable. Ω X = {x 1, x 2,..., x r }. Let θ i = P(X = x i ) and θ = (θ 1, θ 2,..., θ r ). Note that θ i 0 and i θ i = 1. Suppose in a data set D, there are m i data cases where X takes value x i. Then Multinomial likelihood. L(θ D) = P(D θ) = N P(D j θ) = j=1 r i=1 θ mi i Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
23 Principles of Parameter Learning Dirichlet distributions Variable with Multiple Values Conjugate family for multinomial likelihood: Dirichlet distributions. A Dirichlet distribution is parameterized by r parameters α 1, α 2,..., α r. Density function given by Γ(α) r i=1 Γ(α i) where α = α 1 + α α r. Same as Beta distribution when r=2. Fact: For any i: θ i Γ(α) r i=1 Γ(α i) k θ i=1 k i=1 αi 1 i αi 1 θi dθ 1 dθ 2...dθ r = α i α Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
24 Principles of Parameter Learning Calculating Parameter Estimations Variable with Multiple Values If the prior probability is a Dirichlet distribution Dir(α 1, α 2,...,α r ), then the posterior probability p(θ D) is a given by p(θ D) r i=1 mi+αi 1 θi So it is Dirichlet distribution Dir(α 1 + m 1, α 2 + m 2,..., α r + m r ), Bayesian estimation has the following closed-form: P(D m+1 =x i D) = θ i p(θ D)dθ = α i + m i α + m MLE: θi = mi m. (Exercise: Prove this.) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
25 Outline Parameter Estimation in General Bayesian Networks 1 Problem Statement 2 Principles of Parameter Learning Maximum likelihood estimation Bayesian estimation Variable with Multiple Values 3 Parameter Estimation in General Bayesian Networks The Parameters Maximum likelihood estimation Properties of MLE Bayesian estimation Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
26 Parameter Estimation in General Bayesian Networks The Parameters The Parameters n variables: X 1, X 2,..., X n. Number of states of X i : 1, 2,..., r i = Ω Xi. Number of configurations of parents of X i : 1, 2,..., q i = Ω pa(xi). Parameters to be estimated: θ ijk = P(X i = j pa(x i ) = k), i = 1,...,n; j = 1,...,r i ; k = 1,...,q i Parameter vector: θ = {θ ijk i = 1,...,n; j = 1,...,r i ; k = 1,...,q i }. Note that j θ ijk = 1 i, k θ i.. : Vector of parameters for P(X i pa(x i )) θ i.. = {θ ijk j = 1,...,r i ; k = 1,...,q i } θ i.k : Vector of parameters for P(X i pa(x i )=k) θ i.k = {θ ijk j = 1,...,r i } Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
27 Parameter Estimation in General Bayesian Networks The Parameters The Parameters Example: Consider the Bayesian network shown below. Assume all variables are binary, taking values 1 and 2. X 1 X 2 X 3 θ 111 = P(X 1 =1), θ 121 = P(X 1 =2) θ 211 = P(X 2 =1), θ 221 = P(X 2 =2) pa(x 3 ) = 1 : θ 311 = P(X 3 =1 X 1 = 1, X 2 = 1), θ 321 = P(X 3 =2 X 1 = 1, X 2 = 1) pa(x 3 ) = 2 : θ 312 = P(X 3 =1 X 1 = 1, X 2 = 2), θ 322 = P(X 3 =2 X 1 = 1, X 2 = 2) pa(x 3 ) = 3 : θ 313 = P(X 3 =1 X 1 = 2, X 2 = 1), θ 323 = P(X 3 =2 X 1 = 2, X 2 = 1) pa(x 3 ) = 4 : θ 314 = P(X 3 =1 X 1 = 2, X 2 = 2), θ 324 = P(X 3 =2 X 1 = 2, X 2 = 2) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
28 Data Parameter Estimation in General Bayesian Networks The Parameters A complete case D l : a vector of values, one for each variable. Example: D l = (X 1 = 1, X 2 = 2, X 3 = 2) Given: A set of complete cases: D = {D 1, D 2,...,D m }. Example: X 1 X 2 X 3 X 1 X 2 X Find: The ML estimates of the parameters θ. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
29 Parameter Estimation in General Bayesian Networks Maximum likelihood estimation The Loglikelihood Function Loglikelihood: l(θ D) = logl(θ D) = logp(d θ) = log l P(D l θ) = l logp(d l θ). The term logp(d l θ): D 4 = (1, 2, 2), logp(d 4 θ) = logp(x 1 = 1, X 2 = 2, X 3 = 2) = logp(x 1 =1 θ)p(x 2 =2 θ)p(x 3 =2 X 1 =1, X 2 =2, θ) = logθ logθ logθ 322. Recall: θ = {θ 111, θ 121 ; θ 211, θ 221 ; θ 311, θ 312, θ 313, θ 314, θ 321, θ 322, θ 323, θ 324 } Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
30 Parameter Estimation in General Bayesian Networks The Loglikelihood Function Maximum likelihood estimation Define the characteristic function of case D l : { 1 if Xi = j, pa(x χ(i, j, k : D l ) = i ) = k in D l 0 otherwise When l=4, D 4 = (1, 2, 2). χ(1, 1, 1 : D 4 ) = χ(2, 2, 1 : D 4 ) = χ(3, 2, 2 : D 4 ) = 1 χ(i, j, k : D 4 ) = 0 for all other i, j, k So, logp(d 4 θ) = ijk χ(i, j, k; D 4)logθ ijk In general, logp(d l θ) = ijk χ(i, j, k : D l )logθ ijk Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
31 Nevin Sufficient L. Zhang (HKUST) statistics: m ijk Bayesian Networks Fall / 58 Parameter Estimation in General Bayesian Networks The Loglikelihood Function Maximum likelihood estimation Define m ijk = l χ(i, j, k : D l ). It is the number of data cases where X i = j and pa(x i ) = k. Then l(θ D) = logp(d l θ) l = χ(i, j, k : D l )logθ ijk l i,j,k = χ(i, j, k : D l )logθ ijk i,j,k l = ijk = i,k m ijk logθ ijk m ijk logθ ijk. (4) j
32 MLE Parameter Estimation in General Bayesian Networks Maximum likelihood estimation Want: arg max θ l(θ D) = arg max θ ijk m ijk logθ ijk i,k j Note that θ ijk = P(X i =j pa(x i )=k) and θ i j k = P(X i =j pa(x i )=k ) are not related if either i i or k k. Consequently, we can separately maximize each term in the summation i,k [...] arg max m ijk logθ ijk θ ijk j Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
33 MLE Parameter Estimation in General Bayesian Networks Maximum likelihood estimation By Corollary 1.1, we get θ ijk = m ijk j m ijk In words, the MLE estimate for θ ijk = P(X i =j pa(x i )=k) is: θ ijk = number of cases where X i=j and pa(x i )=k number of cases where pa(x i )=k Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
34 Example Parameter Estimation in General Bayesian Networks Maximum likelihood estimation Example: X 1 X 3 X 2 MLE for P(X 1 =1) is: 6/16 MLE for P(X 2 =1) is: 7/16 MLE for P(X 3 =1 X 1 =2, X 2 =2) is: 2/6... X 1 X 2 X 3 X 1 X 2 X Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
35 Parameter Estimation in General Bayesian Networks A Question Properties of MLE Start from a joint distribution P(X) (Generative Distribution) D: collection of data sampled from P(X). Let S be a BN structrue (DAG) over variables X. Learn parameters θ for BN structure S from D. Let P (X) be the joint probability of the BN (S, θ ). Note: θijk = P (X i =j pa S (X i )=k) How is P related to P? Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
36 Parameter Estimation in General Bayesian Networks Properties of MLE MLE in General Bayesian Networks with Complete Data P * *. P Distributions that factorize according to S We will show that, with probability 1, P converges to the distribution that Factorizes according to S, Is closest to P under KL divergence among all distributions that factorize according to S. P factorizes according to S. P does not necessarily factorize according to S. If P factorizes according to S, P converges to P with probability 1. (MLE is consistent.) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
37 Parameter Estimation in General Bayesian Networks The Target Distribution Properties of MLE Define θ S ijk = P(X i=j pa S (X i ) = k)) Let P S (X) be the joint distribution of the BN (S, θ S ) P S factorizes according to S and for any X X, P S (X pa(x)) = P(X pa(x)) If P factorizes according to S, then P and P S are identical. If P does not factorize according to S, then P and P S are different. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
38 Parameter Estimation in General Bayesian Networks First Theorem Properties of MLE Theorem (6.1) Among all distributions Q that factorizes according to S, the KL divergence KL(P, Q) is minimized by Q=P S. P S is the closest to P among all those that factorize according to S. Proof: Since KL(P, Q) = X P(X)log P(X) Q(X) It suffices to show that Proposition: Q=P S maximizes X P(X)logQ(X) We show the claim by induction on the number of nodes. When there is only one node, the proposition follows from property of KL divergence (Corollary 1.1). Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
39 Parameter Estimation in General Bayesian Networks Properties of MLE First Theorem Suppose the proposition is true for the case of n nodes. Consider the case of n+1 nodes. Let X be a leaf node and X =X \ {X }. S be the obtained from S by removing X. Then P(X)logQ(X) = P(X )logq(x ) + P(pa(X)) P(X pa(x))logq(x pa(x)) X X pa(x) X By the induction hypothesis, the first term is maximized by P S. By Corollary 1.1, the second term is maximized if Q(X pa(x)) = P(X pa(x)). Hence the sum is maximized by P S. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
40 Parameter Estimation in General Bayesian Networks Properties of MLE Second Theorem Theorem (6.2) lim N P (X=x) = P S (X=x) with probability 1 where N is the sample size, i.e. number of cases in D. Proof: Let ˆP(X) be the empirical distribution: ˆP(X=x) = fraction of cases in D where X=x It is clear that P (X i =j pa S (X i )=k) = θ ijk = ˆP(X i =j pa S (X i )=k) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
41 Parameter Estimation in General Bayesian Networks Second Theorem Properties of MLE On the other hand, by the law of large numbers, we have Hence lim ˆP(X=x) = P(X=x) with probability 1 N lim N P (X i =j pa S (X i )=k) = lim ˆP(X i =j pa S (X i )=k) N = P(X i =j pa S (X i )=k) with probability 1 = P S (X i =j pa S (X i )=k) Because both P and P S factorizes according to S, the theorem follows. Q.E.D. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
42 Parameter Estimation in General Bayesian Networks A Corollary Properties of MLE Corollary If P factorizes according to S, then lim N P (X=x) = P(X=x) with probability 1 Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
43 Parameter Estimation in General Bayesian Networks Bayesian Estimation Bayesian estimation View θ as a vector of random variables with prior distribution p(θ). Posterior: p(θ D) p(θ)l(θ D) = p(θ) i,k j where the equation follows from (4). θ m ijk ijk Assumptions need to be made about prior distribution. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
44 Parameter Estimation in General Bayesian Networks Assumptions Bayesian estimation Global independence in prior distribution: p(θ) = i p(θ i.. ) Local independence in prior distribution: For each i p(θ i.. ) = k p(θ i.k ) Parameter independence = global independence + local independence: p(θ) = i,k p(θ i.k ) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
45 Parameter Estimation in General Bayesian Networks Assumptions Bayesian estimation Further assume that p(θ i.k ) is Dirichlet distribution Dir(α i0k, α i1k,..., α irik): p(θ i.k ) j θ α ijk 1 ijk Then, product Dirichlet distribution. p(θ) = i,k j θ α ijk 1 ijk Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
46 Parameter Estimation in General Bayesian Networks Bayesian Estimation Bayesian estimation Posterior: p(θ D) p(θ) i,k = [ i,k j = i,k j j θ m ijk ijk θ α ijk 1 ijk ] i,k θ m ijk+α ijk 1 ijk j θ m ijk ijk It is also a product product Dirichlet distribution.(think: What does this mean?) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
47 Parameter Estimation in General Bayesian Networks Bayesian estimation Prediction Predicting D m+1 = {X1 m+1, X2 m+1,...,xn m+1 }. Random variables. For notational simplicity, simply write D m+1 = {X 1, X 2,..., X n }. First, we have: P(D m+1 D) = P(X 1, X 2,...,X n D) = i P(X i pa(x i ),D) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
48 Proof Parameter Estimation in General Bayesian Networks Bayesian estimation P(D m+1 D) = P(D m+1 θ)p(θ D)dθ P(D m+1 θ) = P(X 1, X 2,..., X n θ) = i P(X i pa(x i ), θ) = i P(X i pa(x i ), θ i.. ) p(θ i D) = i p(θ i.. D) Hence P(D m+1 D) = i P(X i pa(x i ), θ i.. )p(θ i.. D)dθ i.. = i P(X i pa(x i ),D) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
49 Parameter Estimation in General Bayesian Networks Prediction Bayesian estimation Further, we have P(X i =j pa(x i )=k,d) = = P(X i =j pa(x i ) = k, θ ijk )p(θ ijk D)dθ ijk θ ijk p(θ ijk D)dθ ijk Because p(θ i.k D) j θ m ijk+α ijk 1 ijk We have θ ijk p(θ ijk D)dθ ijk = m ijk+α ijk j (m ijk+α ijk ) Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
50 Parameter Estimation in General Bayesian Networks Prediction Bayesian estimation Conclusion: P(X 1, X 2,..., X n D) = i P(X i pa(x i ),D) where P(X i =j pa(x i )=k,d) = m ijk+α ijk m i k +α i k where m i k = j m ijk and α i k = j α ijk Notes: Conditional independence or structure preserved after absorbing D. Important property for sequential learning where we process one case at a time. The final result is independent of the order by which cases are processed. Comparison with MLE estimation: θijk = m ijk j m ijk Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
51 Summary Parameter Estimation in General Bayesian Networks Bayesian estimation θ: random variable. Prior p(θ): product Dirichlet distribution p(θ) = i,k p(θ i.k ) i,k Posterior p(θ D): also product Dirichlet distribution Prediction: p(θ D) i,k j j θ m ijk+α ijk 1 ijk P(D m+1 D) = P(X 1, X 2,...,X n D) = i θ α ijk 1 ijk P(X i pa(x i ),D) where P(X i =j pa(x i )=k,d) = m ijk+α ijk m i k +α i k Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Probabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More information1 Maximum Likelihood Estimation
heads tails Figure 1: A simple thumbtack tossing experiment. L(θ :D) 0 0.2 0.4 0.6 0.8 1 Figure 2: The likelihood function for the sequence of tosses H,T,T,H,H. 1 Maximum Likelihood Estimation In this
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationLearning Bayesian networks
1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationCOMP538: Introduction to Bayesian Networks
COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationLecture 18: Bayesian Inference
Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring 2014 1 / 10 Bayesian Statistical Inference Statiscal inference
More informationData Analysis and Uncertainty Part 2: Estimation
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationLecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.
Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationParameter Learning With Binary Variables
With Binary Variables University of Nebraska Lincoln CSCE 970 Pattern Recognition Outline Outline 1 Learning a Single Parameter 2 More on the Beta Density Function 3 Computing a Probability Interval Outline
More informationMotif representation using position weight matrix
Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation
More informationDiscrete Binary Distributions
Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More information4: Parameter Estimation in Fully Observed BNs
10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationThe Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1
The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma
More informationCOMP 328: Machine Learning
COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationCS540 Machine learning L8
CS540 Machine learning L8 Announcements Linear algebra tutorial by Mark Schmidt, 5:30 to 6:30 pm today, in the CS X-wing 8th floor lounge (X836). Move midterm from Tue Oct 14 to Thu Oct 16? Hw3sol handed
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Recitations: Every Tuesday 4-5:30 in 243 Annenberg Homework 1 out. Due in class
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationIntroduc)on to Bayesian methods (con)nued) - Lecture 16
Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationLearning Bayesian Networks
Learning Bayesian Networks Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki V-1 Aspects in learning Learning the parameters of a Bayesian network Marginalizing over all all parameters
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationSome Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)
Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book) Principal Topics The approach to certainty with increasing evidence. The approach to consensus for several agents, with increasing
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationLecture 23 Maximum Likelihood Estimation and Bayesian Inference
Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationOutline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks
Outline 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Likelihood A common and fruitful approach to statistics is to assume
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationPredictive Hypothesis Identification
Marcus Hutter - 1 - Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More information5. Conditional Distributions
1 of 12 7/16/2009 5:36 AM Virtual Laboratories > 3. Distributions > 1 2 3 4 5 6 7 8 5. Conditional Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationAn Empirical-Bayes Score for Discrete Bayesian Networks
An Empirical-Bayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationLecture 11: Probability Distributions and Parameter Estimation
Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, 2.1 2.3.4, Appendix B Duncan Gillies and
More informationInferring Protein-Signaling Networks II
Inferring Protein-Signaling Networks II Lectures 15 Nov 16, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022
More informationLecture 8: December 17, 2003
Computational Genomics Fall Semester, 2003 Lecture 8: December 17, 2003 Lecturer: Irit Gat-Viks Scribe: Tal Peled and David Burstein 8.1 Exploiting Independence Property 8.1.1 Introduction Our goal is
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationLEARNING WITH BAYESIAN NETWORKS
LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationDirected Graphical Models
Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationPOSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS
POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS EDWARD I. GEORGE and ZUOSHUN ZHANG The University of Texas at Austin and Quintiles Inc. June 2 SUMMARY For Bayesian analysis of hierarchical
More information