Estimating the Variance of Query Responses in Hybrid Bayesian Nets
|
|
- Mitchell Logan
- 6 years ago
- Views:
Transcription
1 Estimating the Variance of Query Responses in Hybrid Bayesian Nets Yasin bbasi-yadkori, Russ Greiner, Bret Hoehn Dept of omputing Science University of lberta Peter Hooper Dept of Math and Statistical Science University of lberta bstract Bayesian network is a model of a distribution, encoded using a network structure S augmented with conditional distribution parameters (DP Θ that specify the conditional probability of a variable, given each assignment to its parents. Given a fixed structure S and DP Θ, we can compute the response to a fixed query Q(Θ P S,Θ ( c E e, which is a real number. However, in many situations, the DParameters Θ themselves can be uncertain e.g., when they are estimated from a (random datasample. Here, this response Q(Θ will be a random variable. Earlier results provided a way to estimate the variance of this response when all variables are discrete. This paper extends that analysis to deal with Bayesian networks that can also include normally distributed continuous variables. (We consider essentially arbitrary Bayesian net structures, assuming only that discrete variables have no continuous parents. In particular, we show how to compute posterior distributions of each independent DP, and then how to use the Delta method to approximate the variance of any query. We also derive a compact form for the variance in the case of Naive Bayes structures. Finally, we provide empirical studies that demonstrate that our system works effectively, even when the parameters corresponds to a small sample. Introduction In general, a Bayesian network is a model of distribution, represented as a directed acyclic graph S whose nodes represent variables and whose arcs represent the dependencies between them, as well as conditional distribution parameters (DP Θ that specify the conditional probability of each variable, given each assignment to its parents. Given a fixed structure S and parameters Θ, we can compute the response to a fixed query Q(Θ P S,Θ ( c E e, which is a real number. However, in many situations, the DP parameters Θ themselves can be uncertain e.g., when they are estimated from a (random datasample. Here, this response Q(Θ will be a random variable. For discrete variables, whose parents are discrete, these DPs correspond to Ptables (Pearl, 988. onsider for example the variable D in Figure, which has only the parent B. Its parameter θ D +b θ +d +b, θ d +b corresponds to the distribution of D given that B is true, and θ D b θ +d b, θ d b corresponds to the D distribution when B is false. If these values were known with certainty, we could view them as constants e.g., θ D +b 0.3, 0.7. However, if they were only based on an expert s not-necessarily-perfect assessment, or if they were learned from a datasample, they would not be known with certainty. Here, we would represent the parameter as a random variable; perhaps θ D +b Dir( 3, 7 to mean this parameter is drawn from a Dirichlet distribution, with parameters 3 and 7. Similarly, the normally distributed variable N ( θ, σ depends on parameters that are drawn from some distribution; here from a Normal-Inverseχ distribution (see below. Now consider computing the response to a fixed query from this fixed structure, say P( b + d θ b θ +d b θ b θ +d b + θ +b θ +d +b P( +b, 0 < < P( +b, 0 < <, < < + P( +b P( 0 < <, < < + + b θ +b πσ, +b σ Z + Z 0 e σ (y θ e σ, +b (x θ +b θ, +b y dx dy learly these responses depend on the parameters here θ i, σ i. s they are random variables, clearly the responses to these queries are random variables as well. Van llen, et al. (008 earlier dealt with discrete variables in arbitrary graph structures, proving that this response is
2 N ( θ, σ E, N ( θ E + θ E + θ E, σ E B θ +b, θ b D E θ +d +b, θ d +b θ +d b, θ d b, +b N ( θ +b + θ,+b, σ,+b, b N ( θ b + θ, b, σ, b Figure : Simple Example of Hybrid Network, N asymptotically normal, and providing both the expected value of the response and the asymptotic variance. This paper extends that analysis to deal with Bayesian networks that can also include continuous normally-distributed variables, whose DPs are drawn from a Normal-Inverse-χ distribution. (We consider essentially arbitrary Bayesian net structures, requiring only that discrete variables have no continuous parents. In particular, Section provides the foundations, showing how to compute posterior distributions of each independent DP. Section 3 then shows how to use the Delta method to approximate the variance of any query for general Bayesian network structures. Sections 4 and 5 derive compact forms for the variance in the case of Naive Bayes structures, with discrete vs continuous root nodes, and Section 6 presents empirical results showing that it works effectively. The website ualberta.ca/ greiner/reserh/hybrid provides additional details about this process, including proofs and detailed examples; we indicate this using the notation [Web:χ] below. Foundations Following standard convention, we represent the distribution of each discrete variable by a onditional Probability Table, whose rows each correspond to a specific assignment to that variable s parents. (Recall we require that all parents of discrete variables be discrete. Each of these row-parameters is drawn from a Dirichlet distribution (Heckerman, 998; Van llen et al., 008. We will assume all continuous variables are normally distributed. For hybrid Bayesian nets, which include both discrete and continuous variables, we use onditional Linear Gaussian Models to represent the conditional distribution (Koller & Friedman, 007. Their parameters are themselves random variables, with mean and variance drawn from a Normal-Inverse-χ distribution (Gelman, arlin, Stern, & Rubin, 003; see below. To be more concrete, consider again Figure. To simplify our description, we will assume that all discrete variables are binary (although all of our analysis applies if they range over any finite set of values. Here, B s parameters are θ B θ +b, θ b and D has two parameters, For binary variables X, we will let +x abbreviate X θ D +b θ +d +b, θ d +b associated with B being true and θ D b θ +d b, θ d b for B being false. We assume each parameter is drawn from a Dirichlet distribution; here θ B θ +b, θ b Dir( +,. We initialize all parameters to be uniform ; that is i. The continuous variable has no parents; its DP is simply N ( θ, σ where in general N ( µ, σ is a Gaussian distribution with mean µ and variance σ. These parameters are also random variables θ σ N (, σ χ ( σ where χ (ν refers to a hi-squared distribution with ν degrees of freedom. Here,,, and are all hyperparameters, each initialized to a (positive real number. By convention, we will typically set 0,, and 5. Next, the continuous variable is a child of both the continuous and the discrete (binary B. Its DP is ( a, +b N ( θ +b + θ,+b a, σ,+b a, b N ( θ b + θ, b a, σ, b Note we have a different set of parameters for each instantiation of the discrete parent. To specify the distributions over these parameters: (θ +b, θ,+b σ +b N ( ( +b,,+b, σ +b,+b χ ( +b +b +b σ +b ( (There is a similar set of equations associated with, b. The variable E has two continuous parents, with the DP E a, c N ( θ E + θ E a + θ E c, σ E where (θ E, θ E, θ E σe N ( ( E, E, E, σe E χ ( E E E σ E and x abbreviate X 0. We use normal Roman letters (e.g.,, a for base variable and associated value, Greek letters (e.g., Θ, θ for parameters and their values, and yrillic letters (e.g.,,,,, pronounced Zhe, El, he, De, E for hyper-parameters.
3 In general, consider the continuous variable U with r discrete parents {D,..., D r } and t continuous parents {,..., t }. Let d d,..., d r be the values of the discrete variables, and c c,..., c s be the values of the continuous variables. Then U d, c N ( θ V d + i θ V d,i c i, σ V d. (3 Notice there are r such equations (assuming each D i is binary, and for each, we need to specify t different θ V d,ci parameters, as well as a constant term θ V d, and a variance term σ V d for a total of d (t + parameters. Now to specify the distribution over these parameters. The variables associated with each d assignment are independent. However, the (t + parameters for interdependent: (To simplify our notation, below we omit the V d part of the subscripts. θ σ N t(, σ χ ( σ Notice this requires O(t hyperparameters: { i i..t + }, { i,j i..t +, j..t + }, as well as and. We will sometimes abbreviate Equation 4 as θ, σ Norm/χ (,,,, where Norm/χ ( refers to the Normal-Inverse-χ distribution. (When there is only one continuous parent, t which means both and are scalars. In general, we initialize the t + -ary vector to be all 0 s, and to be the (t + (t + identity matrix I t+.. omputing Posterior Distributions We assume the parameters for one row are independent of the parameters for the others; e.g., θ D +b θ B, θ D +b θ and θ D +b σ. Moreover, each of these distribution is conjugate. That is, if we initialize the parameter for each row of each discrete variable as Dir(,, and the parameters for each (conditional continuous variable as Norm/χ ( 0, I,, 5, then observe the datasample S B D E then the posterior distribution is θ B θ +b, θ b S Dir( 5,, θ D +b θ +d +b, θ d +b S Dir( 3, 3, θ D b θ +d b, θ d b S Dir(,. (Here, we compute the posterior hyperparameters by simply adding to the prior the number of examples that match each condition. So as the hyperparameters for θ B were initially,, (4 after seeing 4 B + and B instances in S, the posterior is Dir( + 4, +. Now to compute the posterior for the continuous variables (Gelman et al., 003: Let n be the effective sample size (here n 5, Ā ( /5 0.36, be the sample mean and (n s ( , be the sum of squares. We use these update rules to produce the posterior distribution: + n + n + n + n + n Ā + (n s + Hence, for variable, n + n (Ā θ, σ S Norm/χ ( 0.3, 6, 0.644, 0. (5 (This is described in more detail in [Web:pp. ]. For the general update rules, see [Web:pp. ]. For variable when +b, (θ +b, θ,+b σ,+b N ( (0.4, 0.009, σ,+b where σ,+b ( χ (9, and (θ E, θ E, θ E σe N 3( (0.4, 0.065, 0.6, σ,+b where σ E 3 Estimating Variance χ ( To define our task, we assume we are given a Bayesian net structure S. the (posterior distribution over the parameters for each variable, Θ (which correspond to Dirichlet parameters for each discrete variable, and the Normal-Inverse-χ parameters for each Gaussian these are denoted using the yrillic letters. In the above example (Figure, Θ θ B, θ D +b, θ D b, θ, θ,+b, σ,+b, θ, b, σ, b, θ E,, σ E,. a specific query over some variables within the network, Q(Θ P S,Θ ( c E e. (This notation emphasizes its dependence on the parameters. We will consider queries whose query variables are each either an assignment to a discrete variable (e.g., +b or a range for a
4 continuous variable ( < <, and whose evidence variables are each a specific assignment, to either a discrete or continuous variable e.g., d or 3. Given this, the response will be a random variable in the interval [0, ]; we want to return a good estimate of both its mean and its variance. We estimate the variance using the Delta method (Oehlert, 99; asella & Berger, 00. Let ˆΘ E[Θ] be the expected value of the parameter values, and Θ Q( ˆΘ Q θ i i be the vector of the derivatives of Q wrt each of the parameters θ i, evaluated at ˆΘ. Using a Taylor expansion, Q(Θ Q( ˆΘ + Θ Q( ˆΘ T (Θ ˆΘ + R (6 where R is the terms of degree and higher. ssuming this R is neglible, Q(Θ Q( ˆΘ Θ Q( ˆΘ T (Θ ˆΘ which means, assuming E[Q(Θ] Q( ˆΘ, 3 E[Q(Θ E[Q(Θ] E[Q(Θ Q( ˆΘ] Θ Q( ˆΘ T ov(θ Θ Q( ˆΘ where ov(θ is the variance-covariance matrix. Note the left hand side is the variance of the response V ( Q(Θ, which suggests we can approximate this variance using V( Q(Θ Θ Q( ˆΘ T ov(θ Θ Q( ˆΘ (7 The two challenges, therefore, are ( computing the covariance matrix ov(θ, and ( computing the derivatives Q θ i. Fortunately, given standard assumptions about independence of the different parameters, the parameters associated with different variables are uncorrelated i.e., for each pair of distinct variables X and Y, we have Θ(X Θ(Y, where Θ(X are the parameters associated with the variable X. This means the covariance matrix will be a block diagonal matrix and so V( Q(Θ V Q(Θ ( X X where (8 V Q(Θ ( X Θ(X Q( ˆΘ(X T (9 ov(θ(x Θ(X Q( ˆΘ(X The rest of this section describes how to compute the covariance terms for each type of node: discrete child of discrete parents, and continuous child of both discrete and continuous parents. The next two sections show simpler versions for Naive Bayes structures. 3 While this claim holds for discrete networks (ooper & Herskovits, 99, it does not apply to continuous networks; see [Web:ounterEx]. onsider the network shown in Figure and the query, Q P( < < E 3, +b (0 In general, we consider queries of the form Q P( R I E e, which allows us to partition the variables into 3 sets: the ones that have some specific instantiation E (in the evidence component of the query, the ones that are in some range R (in the query component of the query, and the remaining variables T that do not appear anywhere in the query. (Here, R {}, I {[, ]}, E {E, B}, e {3, +}, and T {, D}. Let P( u be the probability distribution function of the joint distribution of the variables u. For example, P( a, c, e N ( a; θ, σ N ( c; θ +b + θ,+b a, σ,+b N ( e; θ E + θ E a + θ E c, σe Now let P Ee ( u be the value of P( u when all evidence variables are substituted with their values in the query. For example, using P E3,+b ( a, c θ +b N ( a; θ, σ N ( c; θ +b + θ,+b a, σ,+b N ( E 3; θ E + θ E a + θ E c, σ E we have Q(Θ P R R P d P E3,+b,d( a,c da dc R R Pd P E3,+b,d( a,c da dc Pd f(+b, d d g(+b, d for the obvious f(. and g(. functions. To simplify our notation, we will let U refer to the continuous variables in T R, n refer to the integral in the numerator and d refer to the integral in the denominator, z refer to the sum in the numerator and z refer to the sum in the denominator. So, in general, z s Q P E e ( U du z d P E e ( U du z f(z z g(z ( In order to compute the variance of Q, we need to compute its derivatives wrt parameters. Letting γ be an arbitrary parameter, Q γ z g(z ( γ z f(z + Q g(z γ z ( Now recall the Delta method, Equation 7. To estimate the variance of a query, we only need to compute the derivatives of the query wrt parameters of the network, then use the covariance matrix, which is described in [Web:pp. ]. In [Web:pp. B], we show that all those derivatives are functions of these integrals, including every single Gaussian variable u i and every pair of (not necessarily distinct Gaussian variables u i, u j, over both the numerator
5 bounds and also the denominator bounds: u i P E e ( U du u i u j P E e ( U du n d u i P E e ( U du n d u i u j P E e ( U du where u i, u j ( R T are continuous variables of the network that do not appear in the evidence set. So, in our example, we only need to compute these integrals, Z Z Z Z χ P E3,+b,d ( a, c da dc χ P E3,+b,d ( a, c da dc where χ refers a, c or a, c and d is iterating over different values that D can take. (This corresponds to 0 different integrals. In [Web:pp. B], we present an algorithm to compute these integrals, OMPINTEGRLS(N, Θ, Q, which takes as input: the Bayesian network S, parameters Θ and the query Q. It returns all the integrals with the above forms. Let ˆQ and ˆP be the values of the corresponding functions based on the parameter values Θ ˆΘ. Given those integrals, we can compute the derivatives of the query wrt each parameter χ, which could be θ Ui, θ Ui j, σu i (see [Web:pp. B]: h i Q χ ˆΘ P z where h P( U» i θ Ui ˆΘ P( U θ Ui j» P( U σ U i R Pz d P( U du R h P( U n χ ˆΘ ˆΘ i du ˆQ P z Rd ˆΘ h i P( U du χ ˆΘ ˆP (U (3 Given these derivatives, as well as the covariance matrix (defined above, we can then use Equation 8 to compute V Q(Θ ( X for each variable X, which can then be added together to form our approximation to the variance (via Equation 9. See [Web:pp. B] for the proof, and [Web:Ex. 3] for a specific worked-out example. The following algorithm computes all derivatives of the form γ z f(z in a general network, using the f(z s P E e, z ( U du from Equation. Here, to compute Q γ over all γs, we would first call ompderivativeshybrid(s, Θ, S f then ompderivativeshybrid(s, Θ, S g i.e., with different bounds for the integral. (If the query variable was discrete e.g., Q P( +b + d, E 3 then the integrals would be the same, but the variables z would be different on different calls. ompderivativeshybrid(n: BayesianNetwork, Θ: parameters, S z s P E e, z ( U du : returns S γ over all parameters γ : r : 0; r : 0; associated with : t {discrete variables that do not appear in the query} 3: Let f(z refer to s P E e, z ( U du 4: for each w assignment to t do 5: for θ i : Dirichlet parameters associated with w do 6: r (θ i + f(w 7: end for 8: Θ w : parameters of continuous variables when discrete variables are instantiated to w 9: s OMPDERIVTIVES(N, Θ w, f(w % using Equation 3 0: r (Θ w + s : end for : for each Dirichlet parameters of the network, θ i do 3: r (θ i : r (θ i / ˆθ i 4: end for 5: return [r, r ] Lines 5 7 then 4 of OMPDERIVTIVESHYBRID is a brute force procedure to compute γ z f(z when γ is a Dirichlet parameter associated with a discrete variable. Van llen et al. (008 produced a more efficient algorithm for computing γ z f(z when γ is a Dirichlet parameter. For an example, see [Web:Ex. 4]. 4 Naive Bayes with Discrete lass Variable Naive Bayes structure is a simple tree, with a single variable serving as the only parent to the remaining vari- v i ˆθ Ui P ˆθ r Ui r c r ˆP (U c j u i ˆθ Ui P ˆθ ables; for notation, we let {F i } refer to the continuous child r Ui r c r ˆP (U variables and {G j } to the discrete child variables; see Figure (a. The discrete variable can take n different val- + ( u i ˆθ Ui P m i ˆθ «r Ui r c r ues, according to the Dirichlet distribution θ θ,..., θ n Dir(,..., n. The effective sample size is m j i i, and e j m is the expected value of the j th value, which corresponds to P( j. Each G i is a discrete child, which takes n i possible values, according to a Dirichlet distribution. Given the value of its parent, j, its parameters are θ Gi j θ Gi j,..., θ Gin i j Dir( G i j,..., G in i j. and its effective sample size (for this parental assignment is m Gi j k G G ik j. Here e ijr i r j m Gi is the expected j value, corresponding to P( G i r j. Each F i is a continuous random variable, distributed as F i j N ( θ Fi j, σ F i j where θ Fi j and σf i j are jointly distributed accord-
6 (a θ +g +c, θ g +c θ +g c, θ g c θ +c, θ c G F G F 3 +c N ( θ F F F3 +c, σ F 3 +c 3 F 3 c N ( θ F3 c, σ F 3 c F +c N ( θ F +c, σ F +c F c N ( θ F c, σ F c (b N ( θ, σ F F F 3 F 4 F N ( θ F + θ F, σf F 4 N ( θ F4 + θ F4, σf 4 Figure : Two examples of Naive Bayes systems (Structure+Parameters (a Discrete parent; (b ontinuous Parent ing to a Normal-Inverse-χ distribution: θ Fi j σf i j N ( F i j, σf i j χ ( σ F i j F i j F i j F i j F i j We want to compute the variance of Q(Θ P( q F f,..., F k f k, G g,..., G l g l, D, Θ P( q F F, GG, D, Θ where F F F f,..., F k f k, GG G g,..., G l g l and D is the dataset. We also set p i : ˆP ( i F F, GG, D, Θ. (Recall ˆP is the probability P(..., θ computed at the mean value of the parameter vector ˆΘ. Theorem (Proof in [Web:pp. D] Given the above conditions, For root : V Q ( [ pq ] p j e j m+ p q e q + j For each discrete child G i in the evidence set : V Q ( G i p q ( ( eiqgi ( p q e iqgi (+m iq + j p j( e ijgi e ijgi (+m ij For each continuous ( child in the evidence set : V Q ( p q ( pq h jq + k p k h jk where h ij F i j `fi F i j F i j + F i j F i j F i j F i j 4 0 `fi F i j F i F i j F i j Example onsider the Bayesian network in Figure (a, where is a binary variable that takes two values {+, } according to a Dirichlet distribution θ +c, θ c Dir(,. G is also binary, drawn according to a Dirichlet distribution (conditioned on the value of : θ +g +c, θ g +c Dir(,, θ +g c, θ g c Dir( 3, The distributions of two continuous children is given by: F +c N ( θ F +c, σ F +c θ F +c, σ F +c Norm/χ (, 5,, 6 F c N ( θ F c, σ F c θ F c, σ F c Norm/χ ( 0,,, F + c N ( θ F +c, σ F +c θ F +c, σ F +c Norm/χ (,, 3, 7 F c N ( θ F c, σ F c θ F c, σ F c Norm/χ (,,, 0 We want to compute the variance of Q P( +c F.5, F 3, +g. (Notice this does not involve the other two child nodes. Using results from [Web:pp. ], E ˆθ F +c E ˆθF c 0 E ˆθ F3 +c E ˆθF3 c E ˆσ F +c.5 E ˆσ F c 4.8 E ˆσ F 3 +c.6 E ˆσ F 3 c.5 This yields p 0.89 and p 0.8, and so ( V Q ( m+ p p V Q ( F + V Q ( F e + p e + p e fter substitutions, we can show that h 0.063, h and h and h , so, VQ ( G (( p h + p h + ( p h + p h p Hence, using Theorem, V (Q V( Q(Θ V Q ( + V Q ( F + V Q ( F 3 + V Q ( G
7 5 ll ontinuous Naive Bayes Now consider a Naive Bayes where all nodes correspond to continuous variables, both the root and the children F i. The distribution of the parent is given by For each child F i, N ( θ, σ σ θ σ N (, χ ( σ F i N ( θ Fi + θ Fi, σ F i (θ Fi, θ Fi σ F i N ( ( F i, F i, σ F i σ F i χ ( F F i i F i We like to compute the variance of the query F i Q(Θ P( c < < c F f,..., F n f n, Θ, D P( c < < c F F, Θ, D Let P F F ( c be the probability distribution function of the above distribution and ˆP F F ( c be its value at Θ ˆΘ. Theorem (Proof in [Web:pp. F] Given the above conditions, i «V Q( c 4K π hˆp F F ( c + c ( h` c 4 ( 4 i «! c B 4 ˆP F F ( c V Q( F i R i u i M i u T i where i is iterating over evidence variables and X j B ( + X j E ( X j K exp E B 4 R j ( 4K π 4 ( c ( (f j «s π ( (f j M j ov(θ Fj, θ Fj, σf j v X u t π u i (u i, u i, u i3 h u i ˆP i c F i F F ( c c» u i ( c F i + B F i + fi F ˆP i F F ( c j c c " F u i3 i ` F i (f i F i + F i c F i B F i 4 ˆP F F ( c See [Web:Section 5] for an example. 6 Empirical Studies # c Given that the parameters for different variables are independent (e.g., Θ is independent of Θ E, etc, and the distributions for each individual variable are conjugate, the posterior distribution, given a complete datasample, is unambiguous and straight-forward to compute; see Section.. This is why we are focusing on the challenge of computing variance of the response of a specific query, given this posterior distribution. s noted earlier, our estimation technique, for computing V( Q(Θ (Equation 7 makes several assumptions, including the assumptions that the mean of the response is response of the mean of the variables (E[Q(Θ] Q( ˆΘ and that the first-order approximation will work effectively. Following Van llen et al. (008, we therefore ran a number of studies, to explore whether our approximations are sufficiently close at least within a factor of. In each study, we first identified a particular structure S (for space reasons, we considered only Naive Bayes here; see [Web:Studies] and a specific query, which here is of the form P( c F f,..., F n f n. We then considered various settings of the hyperparameters (i.e., the yrillic variables. For example, suppose perhaps s parameters were θ +c, θ c Dir( 4, 6, and F s parameters were θ F, θ F, σ F Norm/χ ( 0, 0, I,, 5, etc. 4 For each set of hyperparameters, we could then use Equation 7 produce an analytic estimate of the variance of the response, V V( Q(Θ. We can also obtain a (presumably more accurate empirical estimate σ, as follows: We first draw a number of parameter values from the posterior distribution over the parameters (as encoded by the hyperparameters. For example, given the hyperparameters shown above, on one draw, we might then get θ ( +c, θ( c 0.4, 0.58 and θ ( F, θ( F, σ( F 0., 0.04, 0.9 ; the next draw might yield θ +c, ( θ c ( 0.39, 0.6 and θ ( F, θ( F, σ( F 0.09, 0.03,.0.5 For each particular assignment to the parameters, call it Θ (i θ (i j, we can then compute the associated response to the query, 4 To simplify the notation, we will deal with σ rather than σ. 5 Note each is a sampling of the parameters; n.b., not of the domain variables i.e., this is not over values for nor values for F. c
8 mean relative difference mean relative difference mean relative difference training data size number of children number of children Figure 3: (a RelativeError vs Training Set size; (b Relative Error vs #children (all continuous; (c Relative Error vs #children (both continuous and discrete r (i Q(Θ (i. fter m,000 draws, we can obtain m responses, from which we can compute the empirical variance σ. Given the V and σ values computed for each network structure, query, and set of hyperparameters, we can then compute the relative-error, V σ / σ. To investigate the quality of our approximation, we explore two scaling questions: ( How does the relative-error scale with training size? Here, we considered a NaiveBayes network with a continuous parent, and 4 continuous children (like Figure [b]. We then initialized the hyperparameters as shown above, computed posterior parameters by training this structure on data sets of size {0, 50, 00, 50, 000, 5,000, 0,000}, and computed both V and σ (over m 000 draws for each of 00 different queries. Figure 3(a shows that the average (over 00 queries relative-error decreases as we increase the training set size. ( How does the relative-error vary with the number of children? Here, we consider a discrete parent and r {,, 4, 8, 6 } continuous children, trained on,000 instances. Figure 3(b shows that the average, over 00 queries. We see that the difference between relative-error grows with the number of children. We also considered both discrete and continuous children again consider r {,, 4, 8, 6 } children, but now half are discrete and the other half are continuous. (For r, the only child was discrete. Figure 3(c shows that, while the relative-error again grows with the number of children, this growth is slower here, vs the all continous case shown above. In all cases, we see that the error is close; in all cases within the desired factor of of the correct answer. Moreover, this is very efficient to compute (as it is just a straight-line computation much faster than the sampling approach that involved,000 of inferences. See [Web:Studies] for more extensive studies and analyses, wrt naive bayes and also more complicated structures. 7 onclusion Van llen et al. (008 earlier motivated the task of computing the variance of the response to a query wrt a given Bayesian network, as this can help us ( to estimate the bias +variance of each given Bayesian network, which can help us select the best discriminative model (Guo & Greiner, 005, and ( to combine the responses of various independent belief net classifiers by weighting their respective (mean probabilities by /variance (Lee, Greiner, & Wang, 006. That earlier paper, however, considered only discrete values. This current paper extends that earlier one by showing how to deal with continuous (Gaussian variables. We show how to use the Delta method to obtain an approximation, for arbitrary networks (insisting only that discrete variables have only discrete parents. We also provide simpler forms that apply to simple NaiveBayes models one for discrete root and arbitrary children, and another for continuous parent and continuous children. We also provide empirical evidence to demonstrate that this approach works effectively. References asella, G., & Berger, R. L. (00. Statistical Inference. ooper, G., & Herskovits, E. (99. Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, Gelman,., arlin, J. B., Stern, H. S., & Rubin, D. B. (003. Bayesian Data nalysis. hapman and Hall. Guo, Y., & Greiner, R. (005. Discriminative model selection for belief net structures. In I. Heckerman, D. E. (998. tutorial on learning with Bayesian networks. In Learning in Graphical Models. Koller, D., & Friedman, N. (007. Graphical Models. to appear. Lee,., Greiner, R., & Wang, S. (006. Using variance estimates to combine Bayesian classifiers. In IML. Oehlert, G. W. (99. note on the delta method. The merican Statistician, 46(, 7 9. Pearl, J. (988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Van llen, T., Singh,., Greiner, R., & Hooper, P. (008. Quantifying the uncertainty of a belief net response: Bayesian errorbars for belief net inference. rtificial Intelligence.
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationExact distribution theory for belief net responses
Exact distribution theory for belief net responses Peter M. Hooper Department of Mathematical and Statistical Sciences University of Alberta Edmonton, Canada, T6G 2G1 hooper@stat.ualberta.ca May 2, 2008
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationOutline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.
s s Practitioner Course: Portfolio Optimization September 24, 2008 s The Goal of s The goal of estimation is to assign numerical values to the parameters of a probability model. Considerations There are
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationExact model averaging with naive Bayesian classifiers
Exact model averaging with naive Bayesian classifiers Denver Dash ddash@sispittedu Decision Systems Laboratory, Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15213 USA Gregory F
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationT Machine Learning: Basic Principles
Machine Learning: Basic Principles Bayesian Networks Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007
More informationBayesian Error-Bars for Belief Net Inference
$ To appear in to the Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), Seattle, Aug 001. Bayesian Error-Bars for Belief Net Inference Tim Van Allen digimine,
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationDependent Dirichlet Priors and Optimal Linear Estimators for Belief Net Parameters
Dependent Dirichlet Priors and Optimal Linear Estimators for Belief Net Parameters Peter M. Hooper Dept. of Mathematical & Statistical Sciences University of Alberta Edmonton, AB T6G 2G1 Canada Abstract
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form
ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t
More informationA Tutorial on Learning with Bayesian Networks
A utorial on Learning with Bayesian Networks David Heckerman Presented by: Krishna V Chengavalli April 21 2003 Outline Introduction Different Approaches Bayesian Networks Learning Probabilities and Structure
More informationQuantifying the uncertainty of a belief net response: Bayesian error-bars for belief net inference
Artificial Intelligence 72 (2008) 483 53 www.elsevier.com/locate/artint Quantifying the uncertainty of a belief net response: Bayesian error-bars for belief net inference Tim Van Allen a, Ajit Singh b,
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More informationStochastic Processes, Kernel Regression, Infinite Mixture Models
Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationWhen Discriminative Learning of Bayesian Network Parameters Is Easy
Pp. 491 496 in Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), edited by G. Gottlob and T. Walsh. Morgan Kaufmann, 2003. When Discriminative Learning
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationCSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19
SE 150. Assignment 6 Summer 2016 Out: Thu Jul 14 ue: Tue Jul 19 6.1 Maximum likelihood estimation A (a) omplete data onsider a complete data set of i.i.d. examples {a t, b t, c t, d t } T t=1 drawn from
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationBelief Update in CLG Bayesian Networks With Lazy Propagation
Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationDirected Graphical Models
Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal
More informationAn Empirical-Bayes Score for Discrete Bayesian Networks
JMLR: Workshop and Conference Proceedings vol 52, 438-448, 2016 PGM 2016 An Empirical-Bayes Score for Discrete Bayesian Networks Marco Scutari Department of Statistics University of Oxford Oxford, United
More informationCS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February
CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February 1 PageRank You are given in the file adjency.mat a matrix G of size n n where n = 1000 such that { 1 if outbound link from i to j,
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationHierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables
Hierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables Laura Azzimonti IDSIA - SUPSI/USI Manno, Switzerland laura@idsia.ch Giorgio Corani IDSIA - SUPSI/USI Manno,
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationAn Introduction to Bayesian Networks in Systems and Control
1 n Introduction to ayesian Networks in Systems and Control Dr Michael shcroft Computer Science Department Uppsala University Uppsala, Sweden mikeashcroft@inatas.com bstract ayesian networks are a popular
More informationIntroduction to continuous and hybrid. Bayesian networks
Introduction to continuous and hybrid Bayesian networks Joanna Ficek Supervisor: Paul Fink, M.Sc. Department of Statistics LMU January 16, 2016 Outline Introduction Gaussians Hybrid BNs Continuous children
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationOn the errors introduced by the naive Bayes independence assumption
On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of
More informationGraphical Models 359
8 Graphical Models Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that probability theory can be expressed in terms of two simple equations corresponding to
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationMinimum Free Energies with Data Temperature for Parameter Learning of Bayesian Networks
28 2th IEEE International Conference on Tools with Artificial Intelligence Minimum Free Energies with Data Temperature for Parameter Learning of Bayesian Networks Takashi Isozaki 1,2, Noriji Kato 2, Maomi
More informationInference and estimation in probabilistic time series models
1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationIllustration of the K2 Algorithm for Learning Bayes Net Structures
Illustration of the K2 Algorithm for Learning Bayes Net Structures Prof. Carolina Ruiz Department of Computer Science, WPI ruiz@cs.wpi.edu http://www.cs.wpi.edu/ ruiz The purpose of this handout is to
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More information