EDML: A Method for Learning Parameters in Bayesian Networks

Size: px
Start display at page:

Download "EDML: A Method for Learning Parameters in Bayesian Networks"

Transcription

1 : A Metod for Learning Parameters in Bayesian Networks Artur Coi, Kaled S. Refaat and Adnan Darwice Computer Science Department University of California, Los Angeles {aycoi, krefaat, darwice}@cs.ucla.edu Abstract We propose a metod called for learning MAP parameters in binary Bayesian networks under incomplete data. Te metod assumes Beta priors and can be used to learn maximum likeliood parameters wen te priors are uninformative. exibits interesting beaviors, especially wen compared to. We introduce, explain its origin, and study some of its properties bot analytically and empirically. 1 INTRODUCTION We consider in tis paper te problem of learning Bayesian network parameters given incomplete data, wile assuming tat all network variables are binary. We propose a specific metod,, 1 wic as a similar structure and complexity to te algoritm (Dempster, Laird, & Rubin, 1977; Lauritzen, 1995). assumes Beta priors on network parameters, allowing one to compute MAP parameters. Wen using uninformative priors, reduces to computing maximum likeliood (ML) parameters. originated from applying an approximate inference algoritm (Coi & Darwice, 2006) to a meta network in wic parameters are explicated as variables, and on wic data is asserted as evidence. Te update equations of resemble te ones for, yet appears to ave different convergence properties wic stem from its being an inference metod as opposed to a local searc metod. For example, we will identify a class of incomplete datasets on wic is guaranteed to converge immediately to an 1 stands for Edge-Deletion MAP-Learning or Edge-Deletion Maximum-Likeliood as it is based on an edge-deletion approximate inference algoritm tat can compute MAP or maximum likeliood parameters. optimal solution, by simply reasoning about te beavior of its underlying inference metod. Even toug originates in a rater involved approximate inference sceme, its update equations can be intuitively justified independently. We terefore present initially in Section 3 before delving into te details of ow it was originally derived in Section 5. Intuitively, can be tougt of as relying on two key concepts. Te first concept is tat of estimating te parameters of a single random variable given soft observations, i.e., observations tat provide soft evidence on te values of a random variable. Te second key concept beind is tat of interpreting te examples of an incomplete data set as providing soft observations on te random variables of a Bayesian network. As to te first concept, we also sow tat MAP and ML parameter estimates are unique in tis case, terefore, generalizing te fundamental result wic says tat tese estimates are unique for ard observations. Tis result is interesting and fundamental enoug tat we treat it separately in Section 4 before we move on and discuss te origin of in Section 5. We discuss some teoretical properties of in Section 6, were we identify situations in wic it is guaranteed to converge immediately to optimal estimates. We present some preliminary empirical results in Section 7 tat corroborate some of te convergence beaviors predicted. In Section 8, we close wit some concluding remarks on related and future work. We note tat wile we focus on binary variables ere, our approac generalizes to multivalued variables as well. We will comment later on tis and te reason we restricted our focus ere. 2 TECHNICAL PRELIMINARIES We use upper case letters (X) to denote variables and lower case letters (x) to denote teir values. Variable

2 sets are denoted by bold-face upper case letters (X) and teir instantiations by bold-face lower case letters (x). Since our focus is on binary variables, we use x (positive) and x (negative) to denote te two values of binary variable X. Generally, we will use X to denote a variable in a Bayesian network and U to denote its parents. A network parameter will terefore ave te general form θ x u, representing te probability Pr(X =x U=u). Note tat variable X can be tougt of as inducing a number of conditional random variables, denoted X u, were te values of variable X u are drawn based on te conditional distribution Pr(X u). In fact, parameter estimation in Bayesian networks can be tougt of as a process of estimating te distributions of tese conditional random variables. Since we assume binary variables, eac of tese distributions can be caracterized by te single parameter θ x u, since θ x u = 1 θ x u. We will use θ to denote te set of all network parameters. Given a network structure G in wic all variables are binary, our goal is to learn its parameters from an incomplete dataset, suc as: example X Y Z 1 x ȳ? 2? ȳ? 3 x? z We use D to denote a dataset, and d i to denote an example. Te dataset above as tree examples, wit d 3 being te instantiation X = x, Z =z. A commonly used measure for te quality of parameter estimates θ is teir likeliood, defined as: L(θ D) = N i=1 Pr θ(d i ), were Pr θ is te distribution induced by network structure G and parameters θ. In te case of complete data (eac example fixes te value of eac variable), te ML parameters are unique. Learning ML parameters is arder wen te data is incomplete, were is typically employed. starts wit some initial parameters θ 0, called a seed, and successively improves on tem via. uses te update equation: θ k+1 x u = N i=1 P r θ k(xu d i) N i=1 P r θ k(u d i), wic requires inference on a Bayesian network parameterized by θ k, in order to compute P r θ k(xu d i ) and P r θ k(u d i ). In fact, one run of te jointree algoritm on eac distinct example is sufficient to implement an of, wic is guaranteed to never decrease te likeliood of its estimates across s. also converges to every local maxima, given tat it starts wit an appropriate seed. It is common to run wit multiple seeds, keeping te best local maxima it finds. See (Darwice, 2009; Koller & Friedman, 2009) for recent treatments on parameter learning in Bayesian networks via and related metods. can also be used to find MAP parameters, assuming one as some priors on network parameters. Te Beta distribution is commonly used as a prior on te probability of a binary random variable. In particular, te Beta for random variable X u is specified by two exponents, α Xu and β Xu, leading to a density [θ x u ] α Xu 1 [1 θ x u ] β Xu 1. It is common to assume tat exponents are > 1 (te density is ten unimodal). For MAP parameters, uses te update equation (see, e.g., (Darwice, 2009)): x u = α Xu 1 + N i=1 Pr θ k(xu d i) α Xu + β Xu 2 + N i=1 Pr θ k(u d i). θ k+1 Wen α Xu = β Xu = 1 (uninformative prior), te equation reduces to te one for computing ML parameters. Wen computing ML parameters, using α Xu = β Xu = 2 leads to wat is usually known as Laplace smooting. Tis is a common tecnique to deal wit te problem of insufficient counts (i.e., instantiations tat never appear in te dataset, leading to zero probabilities and division by zero). We will indeed use Laplace smooting in our experiments. Our metod for learning MAP and ML parameters makes eavy use of two notions: (1) te odds of an event, wic is te probability of te event over te probability of its negation, and (2) te Bayes factor (Good, 1950), wic is te relative cange in te odds of one event, say, X = x, due to observing some oter event, say, η. In tis case, we ave te odds O(x) and O(x η), were te Bayes factor is κ = O(x η)/o(x), wic is viewed as quantifying te strengt of soft evidence η on X =x. It is known tat κ = Pr(η x)/pr(η x) and κ [0, ]. Wen κ = 0, te soft evidence reduces to ard evidence asserting X = x. Wen κ =, te soft evidence reduces to ard evidence asserting X =x. Wen κ = 1, te soft evidence is neutral and bears no information on X =x. A detailed discussion on te use of Bayes factors for soft evidence is given in (Can & Darwice, 2005). 3 AN OVERVIEW OF Consider Algoritm 1, wic provides pseudocode for. typically starts wit some initial parameters estimates, called a seed, and ten iterates to monotonically improve on tese estimates. Eac consists of two steps. Te first step, Line 3, computes marginals over te families of a Bayesian network tat is parameterized by te current estimates. Te second step, Line 4, uses te computed probabilities to

3 Algoritm 1 input: G: A Bayesian network structure D: An incomplete dataset d 1,..., d N θ: An initial parameterization of structure G α Xu, β Xu : Beta prior for eac random variable X u 1: wile not converged do 2: Pr distribution induced by θ and G 3: Compute probabilities: Pr(xu d i) and Pr(u d i) for eac family instantiation xu and example d i 4: Update parameters: Algoritm 2 input: G: A Bayesian network structure D: An incomplete dataset d 1,..., d N θ: An initial parameterization of structure G α Xu, β Xu : Beta prior for eac random variable X u 1: wile not converged do 2: Pr distribution induced by θ and G 3: Compute Bayes factors: κ i Pr(xu di)/pr(x u) Pr(u di) + 1 x u Pr( xu d i)/pr( x u) Pr(u d i) + 1 for eac family instantiation xu and example d i 4: Update parameters: (1) θ x u 5: return parameterization θ αxu 1 + N i=1 Pr(xu di) α Xu + β Xu 2 + N i=1 Pr(u di) θ x u argmax p [p] α Xu 1 [1 p] β Xu 1 5: return parameterization θ N i=1 [κ i x u p p + 1] (2) update te network parameters. Te process continues until some convergence criterion is met. Te main point ere is tat te computation on Line 3 can be implemented by a single run of te jointree algoritm, wile te update on Line 4 is immediate. Consider now Algoritm 2, wic provides pseudocode for, to be contrasted wit te one for. Te two algoritms clearly ave te same overall structure. Tat is, also starts wit some initial parameters estimates, called a seed, and ten iterates to update tese estimates. Eac consists of two steps. Te first step, Line 3, computes Bayes factors using a Bayesian network tat is parameterized by te current estimates. Te second step, Line 4, uses te computed Bayes factors to update network parameters. Te process continues until some convergence criterion is met. Muc like, te computation on Line 3 can be implemented by a single run of te jointree algoritm. Unlike, owever, te update on Line 4 is not immediate as it involves solving an optimization problem, albeit a simple one. Aside from tis optimization task, and ave te same computational complexity. We next explain te two concepts underlying and ow tey lead to te equations of Algoritm ESTIMATION FROM SOFT OBSERVATIONS Consider a random variable X wit values x and x, and suppose tat we ave N > 0 independent observations of X, wit N x as te number of positive observations. It is well known tat te ML parameter estimates for random variable X are unique in tis case and caracterized by θ x = N x /N. If one furter assumes a Beta prior wit exponents α and β tat are 1, it is also known tat te MAP parameter estimates are unique and caracterized by θ x = Nx+α 1 N+α+β 2. Consider now a more general problem in wic te observations are soft in tat tey only provide soft evidence on te values of random variable X. Tat is, eac soft observation η i is associated wit a Bayes factor κ i x = O(x η i )/O(x) wic quantifies te evidence tat η i provides on aving observed te value x of variable X. We will sow later tat te ML estimates remain unique in tis more general case, if at least one of te soft observations is not trivial (i.e., wit Bayes factor κ i x 1). Moreover, we will sow tat te MAP estimates are also unique assuming a Beta prior wit exponents 1. In particular, we will sow tat te unique MAP estimates are caracterized by Equation 2 of Algoritm 2. Furter, we will sow tat te unique ML estimates are caracterized by te same equation wile using a Beta prior wit exponents = 1. Tis is te first key concept tat underlies our proposed algoritm for estimating ML and MAP parameters in a binary Bayesian network. 3.2 EXAMPLES AS SOFT OBSERVATIONS Te second key concept underlying is to interpret eac example d i in a dataset as providing a soft observation on eac random variable X u. As mentioned earlier, soft observations are specified by Bayes factors and, ence, one needs to specify te Bayes factor κ i x u tat example d i induces on random variable

4 x X 1 X 2 X N Figure 1: Estimation given independent observations. X u. uses Equation 1 for tis purpose, wic will be derived in Section 5. We next consider a few special cases of tis equation to igligt its beavior. Consider first te case in wic example d i implies parent instantiation u (i.e., te parents U of variable X are instantiated to u in example d i ). In tis case, Equation 1 reduces to κ i x u = O(x u,di) O(x u), wic is te relative cange in te odds of x given u due to conditioning on example d i. Note tat for root variables X, wic ave no parents U, Equation 1 furter reduces to κ i x = O(x di) O(x). Te second case we consider is wen example d i is inconsistent wit parent instantiation u. In tis case, Equation 1 reduces to κ i x u = 1, wic amounts to neutral evidence. Hence, example d i is irrelevant to estimating te distribution of variable X u in tis case, and will be ignored by. Te last special case of Equation 1 we sall consider is wen te example d i is complete; tat is, it fixes te value of eac variable. In tis case, one can verify tat κ i x u {0, 1, } and, ence, te example can be viewed as providing eiter neutral or ard evidence on eac random variable X u. Tus, an example will provide soft observations on variables only wen it is incomplete (i.e., missing some values). Oterwise, it is eiter irrelevant to, or provides a ard observation on, eac variable X u. In te next section, we prove Equation 2 of Algoritm 2. In Section 5, we discuss te origin of, were we go on and derive Equation 1 of Algoritm 2. 4 ESTIMATION FROM SOFT OBSERVATIONS Consider a binary variable X. Figure 1 depicts a network were θ x is a parameter representing Pr(X =x) and X 1,..., X N are independent observations of X. Suppose furter tat we ave a Beta prior on parameter θ x wit exponents α 1 and β 1. A standard estimation problem is to assume tat we know te values of tese observations and ten estimate te parameter θ x. We now consider a variant on tis problem, in wic we only ave soft evidence η i about eac observation, wose strengt is quantified by a Bayes factor κ i x = O(x η i )/O(x). Here, κ i x represents te cange in odds tat te i-t observation is positive due to evidence η i. We will refer to η i as a soft observation on variable X, and our goal in tis section is to compute (and optimize) te posterior density on parameter θ x given tese soft observations η 1,..., η N. We first consider te likeliood: Pr(η 1,..., η N θ x ) = N i=1 Pr(η i θ x ) = N i=1 [Pr(η i x, θ x )Pr(x θ x ) + Pr(η i x, θ x )Pr( x θ x )] = N i=1 [Pr(η i x)θ x + Pr(η i x)(1 θ x )] N i=1 [κi x θ x θ x + 1]. Te last step follows because κ i x = O(x η i )/O(x) = Pr(η i x)/pr(η i x). Te posterior density is ten: ρ(θ x η 1,..., η N ) ρ(θ x )Pr(η 1,..., η N θ x ) [θ x ] α 1 [1 θ x ] β 1 N i=1 [κi x θ x θ x + 1]. Tis is exactly Equation 2 of Algoritm 2 assuming we replace te random variable X wit te conditional random variable X u. 2 Te second derivative of te log posterior is α 1 [θ x ] 2 β 1 [1 θ x ] 2 i [ (κ i x 1) (κ i x 1)θ x + 1 wic is strictly negative wen κ i x 1 for at least one i. Tis remains true wen α = β = 1. Hence, bot te likeliood function and te posterior density are strictly log-concave and terefore ave unique modes. Tis means tat bot ML and MAP parameter estimates are unique in te case of soft, independent observations, wic generalizes te uniqueness result for ard, independent observations on a variable X. 5 THE ORIGIN OF Tis section reveals te tecnical origin of, sowing ow Equation 1 of Algoritm 2 is derived, and providing te basis for te overall structure of as spelled out in Algoritm 2. originated from an approximation algoritm for computing MAP parameters in a meta network. Figure 2 depicts an example meta network in wic 2 Te case of κ i x = needs to be andled carefully in Equation 2. First note tat κ i x = iff Pr(η i x) = 0 in te derivation of tis equation. In tis case, te term Pr(η i x)θ x +Pr(η i x)(1 θ x) equals c θ x for some constant c (0, 1]. Since te value of Equation 2 does not depend on constant c, we will assume c = 1. Hence, wen κ i x =, te term [κ i x θ x θ x + 1] evaluates to θ x by convention. ] 2

5 H 3 H 3 H 1 H 2 H 3 : : S 1 E 1 S 2 E 2 S 3 s s (a) Adding generators (b) Deleting copy edges Figure 2: A meta network induced from a base network S H E. Te CPTs ere are based on standard semantics; see, e.g., (Darwice, 2009, C. 18). Figure 3: Introducing generators into a meta network and ten deleting copy edges from te resulting meta network, wic leads to introducing clones. parameters are represented explicitly as nodes (Darwice, 2009). In particular, for eac conditional random variable X u in te original Bayesian network, called te base network, we ave a node θ x u in te meta network wic represents a parameter tat caracterizes te distribution of tis random variable. Moreover, te meta network includes enoug instances of te base network to allow te assertion of eac example d i as evidence on one of tese instances. Assuming tat θ is an instantiation of all parameter variables, and D is a dataset, MAP estimates are ten: θ = argmax ρ(θ D), θ were ρ is te density induced by te meta network. Computing MAP estimates exactly is usually proibitive due to te structure of te meta network. We terefore use te tecnique of edge deletion (Coi & Darwice, 2006), wic formulates approximate inference as exact inference on a simplified network tat is obtained by deleting edges from te original network. Te tecnique compensates for tese deletions by introducing auxiliary parameters wose values must be cosen carefully (and usually iteratively) in order to improve te quality of approximations obtained from te simplified network. is te result of making a few specific coices for deleting edges and for coosing values for te auxiliary parameters introduced, wic we explain next. 5.1 INTRODUCING GENERATORS Let X i denote te instance of variable X in te base network corresponding to example d i. Te first coice of is tat for eac edge θ x u X i in te meta network, we introduce a generator variable Xu, i leading to te pair of edges θ x u X u X i i. Figure 3(a) depicts a fragment of te meta network in Figure 2, in wic we introduced two generator variables for edges θ e and θ e, leading to θ e E 3 E3 and θ e. Variable Xu i is meant to generate values of variable X i according to te distribution specified by parameter θ x u. Hence, te conditional distribution of a generator Xu i is suc tat Pr(x i u θ x u ) = θ x u. Moreover, te CPT of variable X i is set to ensure tat variable X i copies te value of generator Xu i if and only if te parents of X i take on te value u. Tat is, te CPT of variable X i acts as a selector tat cooses a particular generator Xu i to copy from, depending on te values of its parents U. For example, in Figure 3(a), wen parent H 3 takes on its positive value, variable copies te value of generator E 3. Wen parent H3 takes on its negative value, variable copies te value of generator. Adding generator variables does not cange te meta network as it continues to ave te same density over te original variables. Yet, generators are essential to te derivation of as tey will be used for interpreting data examples as soft observations. 5.2 DELETING COPY EDGES Te second coice made by is tat we only delete edges of te form Xu X i i from te augmented meta network, wic we sall call copy edges. Figure 3(b) depicts an example in wic we ave deleted

6 H 2 : H 1 H 2 H 3 S 1 E 1 S 3 S 2 s S 3 S 2 E 2 S 1 S 2 : S 2 E : E : s Figure 4: An edge-deleted network obtained from te meta network in Figure 2 found by: (1) adding generator variables, (2) deleting copy edges, and (3) adding cloned generators. Te figure igligts te island for example d 2, and te island for parameter θ s. S : E 2 te two copy edges from Figure 3(a). Note ere te addition of anoter auxiliary variable Xu:, i called a clone, for eac generator Xu. i Te addition of clones is mandated by te edge deletion framework. Moreover, if te CPT of clone Xu: i is cosen carefully, it can compensate for te parent-to-cild information lost wen deleting edge Xu X i i. We will later see ow sets tese CPTs. Te oter aspect of compensating for a deleted edge is to specify soft evidence on eac generator Xu. i Tis is also mandated by te edge deletion framework, and is meant to compensate for te cild-to-parent information lost wen deleting edge Xu X i i. We will later see ow sets tis soft evidence as well, wic effectively completes te specification of te algoritm. We prelude tis specification, owever, by making some furter observations about te structure of te meta network after edge deletion. 5.3 PARAMETER & EXAMPLE ISLANDS Consider te network in Figure 4, wic is obtained from te meta network in Figure 2 according to te edge-deletion process indicated earlier. Te edge-deleted network contains a set of disconnected structures, called islands. Eac island belongs to one of two classes: a parameter island for eac network parameter θ x u and an example island for eac example d i in te dataset. Figure 4 provides te full details for one example island and one parameter island. Note tat eac parameter island corresponds to a Naive Bayes structure, wit parameter θ x u as E 2 te root and generators Xu i as cildren. Wen soft evidence is asserted on tese generators, we get te estimation problem we treated in Section 4. can now be fully described by specifying (1) te soft evidence on eac generator X i u in a parameter island, and (2) te CPT of eac clone X i u: in an example island. Tese specifications are given next. 5.4 CHILD-TO-PARENT COMPENSATION Te edge deletion approac suggests te following soft evidence on generators X i u, specified as Bayes factors: κ i x u = O(xi u: d i ) O(x i u:) = P ri (d i x i u:) P r i (d i x i u:), (3) were P r i is te distribution induced by te island of example d i. We will now sow tat tis equation simplifies to Equation 1 of Algoritm 2. Suppose tat we marginalize all clones X i u: from te island of example d i, leading to a network tat induces a distribution Pr. Te new network as te following properties. First, it as te same structure as te base network. Second, Pr(x u) = P r i (x i u:), wic means tat te CPTs of clones in example islands correspond to parameters in te base network. Finally, if we use u to denote te disjunction of all parent instantiations excluding u, we get: κ i x u = P ri (d i x i u:) P r i (d i x i u:) = Pr(d i xu)pr(u) + Pr(d i u)pr(u) Pr(d i xu)pr(u) + Pr(d i u)pr(u) = Pr(xu d i)/pr(x u) Pr(u d i ) + 1 Pr( xu d i )/Pr( x u) Pr(u d i ) + 1. Tis is exactly Equation 1 of Algoritm 2. Hence, we can evaluate Equation 3 by evaluating Equation 1 on te base network, as long as we seed te base network wit parameters tat correspond to te CPTs of clones in an example island. 5.5 PARENT-TO-CHILD COMPENSATION We now complete te derivation of by sowing ow it specifies te CPTs of clones in example islands, wic are needed for computing soft evidence as in te previous section. In a nutsell, assumes an initial value of tese CPTs, typically cosen randomly. Given tese CPTs, example islands will be fully specified and will compute soft evidence as given by Equation 3. Te

7 s H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 s Figure 5: A pruning of te meta network in Figure 2 given H 1 =, H 2 = and H 3 =. computed soft evidence is ten injected on te generators of parameter islands, leading to a full specification of tese islands. will ten estimate parameters by solving an exact optimization problem on eac parameter island as sown in Section 4. Te estimated parameters are ten used as te new values of CPTs for clones in example islands. Tis process repeats until convergence. We ave sown in te previous section tat te CPTs of clones are in one-to-one correspondence wit te parameters of te base network. We ave also sown tat soft evidence, as given by Equation 3, can be computed by evaluating Equation 1 of Algoritm 2 (wit parameters θ corresponding to te CPTs of clones in an example island). takes advantage of tis correspondence, leading to te simplified statement spelled out in Algoritm 2. 6 SOME PROPERTIES OF Being an approximate inference metod, one can somes identify good beaviors of by identifying situations under wic te underlying inference algoritm will produce ig quality approximations. We provide a result in tis section tat illustrates tis point in te extreme, were is guaranteed to return optimal estimates and in only one. Our result relies on te following observation about parameter estimation via inference on a meta network. Wen te parents U of a variable X are observed to u in an example d i, all edges θ x u X i in te meta network become superfluous and can be pruned, except for te one edge tat satisfies u = u. Moreover, edges outgoing from observed nodes can also be pruned from a meta network. Suppose now tat te parents of eac variable are observed in a dataset. After pruning edges as indicated earlier, eac parameter variable θ x u will end up being te root of an isolated naive Bayes structure tat as some variables X i as its cildren (tose wose parents are instantiated to u in example d i ). Figure 5 depicts te result of suc pruning in te meta network of Figure 2, given a dataset wit H 1 =, H 2 = and H 3 =. Te above observation implies tat wen te parents of eac variable are observed in a dataset, parameters can be estimated independently. Tis leads to te following well known result. Proposition 1 Wen te dataset is complete, te ML estimate for parameter θ x u is unique and given by D#(xu)/D#(u), were D#(xu) is te number of examples containing xu and D#(u) is te number of examples containing u. It is well known tat returns suc estimates and in only one (i.e., independently of its seed). Te following more general result is also implied by our earlier observation. Proposition 2 Wen only leaf variables ave missing values in a dataset, te ML estimate for eac parameter θ x u is unique and given by D#(xu)/D + #(u). Here, D + #(u) is te number of examples containing u and in wic X is observed. We can now prove te following property of, wic is not satisfied by, as we sow next. Teorem 1 Wen only leaf variables ave missing values in a dataset, returns te unique ML estimates given by Proposition 2 and in only one. Proof Consider an example d i tat fixes te values of parents U for variable X and consider Equation 1. First, κ i x u = 1 iff example d i is inconsistent wit u or does not set te value of X. Next, κ i x u = 0 iff example d i contains xu. Finally, κ i x u = iff example d i contains xu. Moreover, tese values are independent of te seed so te algoritm converges in one. Given tese values of te Bayes factors, Equation 2 leads to te estimate of Proposition 2. We ave a number of observations about tis result. First, since Proposition 1 is implied by Proposition 2, returns te unique ML estimates in only one wen te dataset is complete (just like ). Next, wen only te values of leaf variables are missing in a dataset, Proposition 2 says tat tere is a unique ML estimate for eac network parameter. Moreover,

8 Teorem 1 says tat returns tese unique estimates and in only one. Finally, Teorem 1 does not old for. In particular, one can sow tat under te conditions of tis teorem, an will update its current parameter estimates θ and return te following estimates for θ x u : D#(xu) + D #(u)pr(x u). D#(u) Here, D #(u) is te number of examples tat contain u and in wic te value of X is missing. Tis next estimate clearly depends on te current parameter estimates. As a result, te beavior of will depend on its initial seed, unlike. Wen only te values of leaf variables are missing, tere is a unique optimal solution as sown by Proposition 2. Since is known to converge to a local optimum, it will eventually return te optimal estimates as well, but possibly after some number of s. In tis case, te difference between and is simply in te speed of convergence. Teorem 1 clearly suggests better convergence beavior of over in some situations. We next present initial experiments supporting tis suggestion. 7 MORE ON CONVERGENCE We igligt now a few empirical properties of. In particular, we sow ow can somes find iger quality estimates tan, in fewer s and also in less. We igligt different types of relative convergence beavior in Figure 6, wic depicts example runs on a selection of networks: spect, win95pts, emdec6g, and tcc4e. Network spect is a naive Bayes network induced from a dataset in te UCI ML repository, wit 1 class variable and 22 attributes. Network win95pts (76 variables) is an expert system for printer troublesooting in Windows 95. Networks emdec6g (168 variables) and tcc4e (98 variables) are noisy-or networks for diagnosis (courtesy of HRL Laboratories). We simulated datasets of size 2 k, using te original CPT parameters of te respective networks, and ten used and to learn new parameters for a network wit te same structure. We assumed tat certain variables were idden (latent); in Figure 6, we randomly cose 1 4 of te variables to be idden. Hidden nodes are of particular interest to, because it as been observed tat local extrema and convergence rates can be problematic for ere; see, for example (Elidan & Friedman, 2005; Salakutdinov, Roweis, & Garamani, 2003) spect win95pts tcc4e emdec6g spect win95pts tcc4e emdec6g Figure 6: Quality of parameter estimates over s (left column) and (rigt column). Going rigt on te x-axis, we ave increasing s and. Going up on te y-axis, we ave increasing quality of parameter estimates. is depicted wit a solid red line, and wit a dased black line. In Figure 6, eac plot represents a simulated data set of size 2 10, were and ave been initialized wit te same random parameter seeds. Bot algoritms were run for a fixed number of s, 1024 in tis case, and we observed te quality of te parameter estimates found, wit respect to te log posterior probability (wic as been normalized so tat te maximum log probability observed is ). We assumed a Beta prior wit exponents 2. damped its parameter updates by a factor of 1 2, wic is typical for (loopy) belief propagation algoritms. 3 3 Te simple bisection metod suffices for te optimization sub-problem in for binary Bayesian networks. In our current implementation, we used te conjugate gradient metod, wit a convergence tresold of 10 8.

9 In te left column of Figure 6, we evaluated te quality of estimates over s of and. In tese examples, (represented by a solid red line) tended to ave better quality estimates from to (curves tat are iger are better), and furter managed to find tem in fewer s (curves to te left are faster). 4 Tis is most dramatic in network spect, were appears to ave converged almost immediately, wereas spent a significant number of s to reac estimates of comparable quality. As most nodes idden in network spect were leaf nodes, tis may be expected due to te considerations from te previous section. In te rigt column of Figure 6, we evaluated te quality of estimates, now in terms of. We remark again tat procedurally, and are very similar, and eac algoritm needs only one evaluation of te jointree algoritm per distinct example in te data set (per ). solves an optimization problem per distinct example, wereas as a closed-form update equation in te corresponding step (Line 4 in Algoritms 1 and 2). Altoug tis optimization problem is a simple one, does require more per tan. Te rigt column of Figure 6 suggests tat can still find better estimates faster, especially in te cases were as converged in significantly fewer s. In network emdec6g, we find tat altoug appeared to converge in fewer s, was able to find better estimates in less. We anticipate in larger networks wit iger treewidt, te spent in te simple optimization sub-problem will be dominated by te to perform jointree propagation. We also performed experiments on networks learned from binary aplotype data (Elidan & Gould, 2008), wic are networks wit bounded treewidt. Here, we simulated data sets of size 2 10, were we again randomly selected 1 4 of te variables to be idden. We furter ran and for a fixed number of s (512, ere). For eac of te 74 networks available, we ran and wit 3 random seeds, for a total of 222 cases. In Figure 7, we igligt a selection of te runs we performed, to illustrate examples of relative convergence beaviors. Again, in te first row, we see a case were identifies better estimates in fewer s and less. In te next two rows, we igligt two cases were appears to converge to a superior fixed point tan te one tat appears to converge to. In te last row, we igligt an instance were instead converges to a superior estimate. In Figure 8, we compare te estimates of 4 We omit te results of te first 10 s as initial parameter estimates are relatively poor, wic make te plots difficult to read. greedy e3 bounded.u e3 bounded.u greedy greedy e3 bounded.u e3 bounded.u greedy Figure 7: Quality of parameter estimates over s (left column) and (rigt column). Going rigt on te x-axis, we ave increasing s and. Going up te y-axis, we ave increasing quality of parameter estimates. is depicted wit a solid red line, and wit a dased black line. and at eac, computing te percentage of te 74 3 = 222 cases considered, were ad estimates no worse tan tose found by. In tis set of experiments, te estimates identified by are clearly superior (or at least, no worse in most cases), wen compared to. We remark owever, tat wen bot algoritms are given enoug s to converge, we ave observed tat te quality of te estimates found by bot algoritms are often comparable. Tis is evident in Figure 6, for example. Te analysis from te previous section indicates owever tat tere are (very specialized) situations were would be clearly preferred over. One subject of future study is te identification of situations and applications were

10 % of 222 cases, favored Figure 8: Quality of estimates over 74 networks (3 cases eac) induced from binary aplotype data. Going rigt on te x-axis, we ave increasing s. Going up te y-axis, we ave an increasing percentage of instances were s estimates were no worse tan tose given by. would be preferred in practice as well. 8 FUTURE AND RELATED WORK as played a critical role in learning probabilistic grapical models and Bayesian networks (Dempster et al., 1977; Lauritzen, 1995; Heckerman, 1998). However learning (and Bayesian learning in particular) remains callenging in a variety of situations, particularly wen tere are idden (latent) variables; see, e.g., (Elidan, Ninio, Friedman, & Suurmans, 2002; Elidan & Friedman, 2005). Slow convergence of as also been recognized, particularly in te presence of idden variables. A variety of tecniques, some incorporating more traditional approaces to optimization, ave been proposed in te literature; see, e.g., (Tiesson, Meek, & Heckerman, 2001). Variational approaces are an increasingly popular formalism for learning tasks as well, and for topic models in particular, were variational alternatives to are used to maximize a lower bound on te log likeliood (Blei, Ng, & Jordan, 2003). Expectation Propagation also provides variations of (Minka & Lafferty, 2002) and is closely related to (loopy) belief propagation (Minka, 2001). Our empirical results ave been restricted to a preliminary investigation of te convergence of, in contrast to. A more compreensive evaluation is called for in relation to bot and oter approaces based on Bayesian inference. We ave also focused tis paper on binary variables., owever, generalizes to multivalued variables since edge deletion does not require a restriction to binary variables and te key result of Section 4 also generalizes to multivalued variables. Te resulting formulation is less transparent toug wen compared to te binary case since Bayes factors no longer apply directly and one must appeal to a more complex metod for quantifying soft evidence; see (Can & Darwice, 2005). We expect our future work to focus on a more compreensive empirical evaluation of, in te context of an implementation tat uses multivalued variables. Moreover, we seek to identify additional properties of tat go beyond convergence. References Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Diriclet allocation. JMLR, 3, Latent Can, H., & Darwice, A. (2005). On te revision of probabilistic beliefs using uncertain evidence. Artificial Intelligence, 163, Coi, A., & Darwice, A. (2006). An edge deletion semantics for belief propagation and its practical impact on approximation quality. In AAAI, pp Darwice, A. (2009). Modeling and Reasoning wit Bayesian Networks. Cambridge University Press. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likeliood from incomplete data via te algoritm. Journal of te Royal Statistical Society B, 39, Elidan, G., & Friedman, N. (2005). Learning idden variable networks: Te information bottleneck approac. JMLR, 6, Elidan, G., & Gould, S. (2008). Learning bounded treewidt Bayesian networks. JMLR, 9, Elidan, G., Ninio, M., Friedman, N., & Suurmans, D. (2002). Data perturbation for escaping local maxima in learning. In AAAI/IAAI, pp Good, I. J. (1950). Probability and te Weiging of Evidence. Carles Griffin, London. Heckerman, D. (1998). A tutorial on learning wit Bayesian networks. In Jordan, M. I. (Ed.), Learning in Grapical Models, pp MIT Press. Koller, D., & Friedman, N. (2009). Probabilistic Grapical Models: Principles and Tecniques. MIT Press. Lauritzen, S. (1995). Te algoritm for grapical association models wit missing data. Computational Statistics and Data Analysis, 19, Minka, T. P. (2001). Expectation propagation for approximate Bayesian inference. In UAI, pp Minka, T. P., & Lafferty, J. D. (2002). Expectationpropogation for te generative aspect model. In UAI, pp Salakutdinov, R., Roweis, S. T., & Garamani, Z. (2003). Optimization wit and expectation-conjugategradient. In ICML, pp Tiesson, B., Meek, C., & Heckerman, D. (2001). Accelerating for large databases. Macine Learning, 45 (3),

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

Learning Bayesian Network Parameters under Equivalence Constraints

Learning Bayesian Network Parameters under Equivalence Constraints Learning Bayesian Network Parameters under Equivalence Constraints Tiansheng Yao 1,, Arthur Choi, Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 90095

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk

More information

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems 5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we

More information

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example, NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing

More information

Overdispersed Variational Autoencoders

Overdispersed Variational Autoencoders Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

Flavius Guiaş. X(t + h) = X(t) + F (X(s)) ds.

Flavius Guiaş. X(t + h) = X(t) + F (X(s)) ds. Numerical solvers for large systems of ordinary differential equations based on te stocastic direct simulation metod improved by te and Runge Kutta principles Flavius Guiaş Abstract We present a numerical

More information

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES (Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information

Lecture 10: Carnot theorem

Lecture 10: Carnot theorem ecture 0: Carnot teorem Feb 7, 005 Equivalence of Kelvin and Clausius formulations ast time we learned tat te Second aw can be formulated in two ways. e Kelvin formulation: No process is possible wose

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

Bounds on the Moments for an Ensemble of Random Decision Trees

Bounds on the Moments for an Ensemble of Random Decision Trees Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: Sep. 17, 2013 / Revised: Mar. 04, 2014 / Accepted: Jun. 30, 2014

More information

Notes on Neural Networks

Notes on Neural Networks Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat

More information

Haplotyping. Biostatistics 666

Haplotyping. Biostatistics 666 Haplotyping Biostatistics 666 Previously Introduction to te E-M algoritm Approac for likeliood optimization Examples related to gene counting Allele frequency estimation recessive disorder Allele frequency

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

MVT and Rolle s Theorem

MVT and Rolle s Theorem AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state

More information

Fundamentals of Concept Learning

Fundamentals of Concept Learning Aims 09s: COMP947 Macine Learning and Data Mining Fundamentals of Concept Learning Marc, 009 Acknowledgement: Material derived from slides for te book Macine Learning, Tom Mitcell, McGraw-Hill, 997 ttp://www-.cs.cmu.edu/~tom/mlbook.tml

More information

AVL trees. AVL trees

AVL trees. AVL trees Dnamic set DT dnamic set DT is a structure tat stores a set of elements. Eac element as a (unique) ke and satellite data. Te structure supports te following operations. Searc(S, k) Return te element wose

More information

Derivatives. By: OpenStaxCollege

Derivatives. By: OpenStaxCollege By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

2.11 That s So Derivative

2.11 That s So Derivative 2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

The Laplace equation, cylindrically or spherically symmetric case

The Laplace equation, cylindrically or spherically symmetric case Numerisce Metoden II, 7 4, und Übungen, 7 5 Course Notes, Summer Term 7 Some material and exercises Te Laplace equation, cylindrically or sperically symmetric case Electric and gravitational potential,

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Handling Missing Data on Asymmetric Distribution

Handling Missing Data on Asymmetric Distribution International Matematical Forum, Vol. 8, 03, no. 4, 53-65 Handling Missing Data on Asymmetric Distribution Amad M. H. Al-Kazale Department of Matematics, Faculty of Science Al-albayt University, Al-Mafraq-Jordan

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

A h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation

A h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation Capter Grid Transfer Remark. Contents of tis capter. Consider a grid wit grid size and te corresponding linear system of equations A u = f. Te summary given in Section 3. leads to te idea tat tere migt

More information

The Verlet Algorithm for Molecular Dynamics Simulations

The Verlet Algorithm for Molecular Dynamics Simulations Cemistry 380.37 Fall 2015 Dr. Jean M. Standard November 9, 2015 Te Verlet Algoritm for Molecular Dynamics Simulations Equations of motion For a many-body system consisting of N particles, Newton's classical

More information

FloatBoost Learning for Classification

FloatBoost Learning for Classification loatboost Learning for Classification Stan Z. Li Microsoft Researc Asia Beijing, Cina Heung-Yeung Sum Microsoft Researc Asia Beijing, Cina ZenQiu Zang Institute of Automation CAS, Beijing, Cina HongJiang

More information

Click here to see an animation of the derivative

Click here to see an animation of the derivative Differentiation Massoud Malek Derivative Te concept of derivative is at te core of Calculus; It is a very powerful tool for understanding te beavior of matematical functions. It allows us to optimize functions,

More information

New families of estimators and test statistics in log-linear models

New families of estimators and test statistics in log-linear models Journal of Multivariate Analysis 99 008 1590 1609 www.elsevier.com/locate/jmva ew families of estimators and test statistics in log-linear models irian Martín a,, Leandro Pardo b a Department of Statistics

More information

Analytic Functions. Differentiable Functions of a Complex Variable

Analytic Functions. Differentiable Functions of a Complex Variable Analytic Functions Differentiable Functions of a Complex Variable In tis capter, we sall generalize te ideas for polynomials power series of a complex variable we developed in te previous capter to general

More information

Journal of Computational and Applied Mathematics

Journal of Computational and Applied Mathematics Journal of Computational and Applied Matematics 94 (6) 75 96 Contents lists available at ScienceDirect Journal of Computational and Applied Matematics journal omepage: www.elsevier.com/locate/cam Smootness-Increasing

More information

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA Te Krewe of Caesar Problem David Gurney Souteastern Louisiana University SLU 10541, 500 Western Avenue Hammond, LA 7040 June 19, 00 Krewe of Caesar 1 ABSTRACT Tis paper provides an alternative to te usual

More information

Optimization of Distribution Parameters for Estimating Probability of Crack Detection

Optimization of Distribution Parameters for Estimating Probability of Crack Detection Optimization of Distribution Parameters for Estimating Probability of Crack Detection Alexandra Coppe 1, Rapael T. Haftka 2 and Nam-Ho Kim 3 University of Florida, Gainesville, FL, 32611 and Palani Ramu

More information

Definition of the Derivative

Definition of the Derivative Te Limit Definition of te Derivative Tis Handout will: Define te limit grapically and algebraically Discuss, in detail, specific features of te definition of te derivative Provide a general strategy of

More information

HOMEWORK HELP 2 FOR MATH 151

HOMEWORK HELP 2 FOR MATH 151 HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,

More information

Polynomials 3: Powers of x 0 + h

Polynomials 3: Powers of x 0 + h near small binomial Capter 17 Polynomials 3: Powers of + Wile it is easy to compute wit powers of a counting-numerator, it is a lot more difficult to compute wit powers of a decimal-numerator. EXAMPLE

More information

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f = Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.

More information

POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY

POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY APPLICATIONES MATHEMATICAE 36, (29), pp. 2 Zbigniew Ciesielski (Sopot) Ryszard Zieliński (Warszawa) POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY Abstract. Dvoretzky

More information

Minimizing D(Q,P) def = Q(h)

Minimizing D(Q,P) def = Q(h) Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4. December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need

More information

Finite Difference Methods Assignments

Finite Difference Methods Assignments Finite Difference Metods Assignments Anders Söberg and Aay Saxena, Micael Tuné, and Maria Westermarck Revised: Jarmo Rantakokko June 6, 1999 Teknisk databeandling Assignment 1: A one-dimensional eat equation

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

ch (for some fixed positive number c) reaching c

ch (for some fixed positive number c) reaching c GSTF Journal of Matematics Statistics and Operations Researc (JMSOR) Vol. No. September 05 DOI 0.60/s4086-05-000-z Nonlinear Piecewise-defined Difference Equations wit Reciprocal and Cubic Terms Ramadan

More information

Fast Exact Univariate Kernel Density Estimation

Fast Exact Univariate Kernel Density Estimation Fast Exact Univariate Kernel Density Estimation David P. Hofmeyr Department of Statistics and Actuarial Science, Stellenbosc University arxiv:1806.00690v2 [stat.co] 12 Jul 2018 July 13, 2018 Abstract Tis

More information

Near-Optimal conversion of Hardness into Pseudo-Randomness

Near-Optimal conversion of Hardness into Pseudo-Randomness Near-Optimal conversion of Hardness into Pseudo-Randomness Russell Impagliazzo Computer Science and Engineering UC, San Diego 9500 Gilman Drive La Jolla, CA 92093-0114 russell@cs.ucsd.edu Ronen Saltiel

More information

Dedicated to the 70th birthday of Professor Lin Qun

Dedicated to the 70th birthday of Professor Lin Qun Journal of Computational Matematics, Vol.4, No.3, 6, 4 44. ACCELERATION METHODS OF NONLINEAR ITERATION FOR NONLINEAR PARABOLIC EQUATIONS Guang-wei Yuan Xu-deng Hang Laboratory of Computational Pysics,

More information

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER* EO BOUNDS FO THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BADLEY J. LUCIE* Abstract. Te expected error in L ) attimet for Glimm s sceme wen applied to a scalar conservation law is bounded by + 2 ) ) /2 T

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

Continuity and Differentiability of the Trigonometric Functions

Continuity and Differentiability of the Trigonometric Functions [Te basis for te following work will be te definition of te trigonometric functions as ratios of te sides of a triangle inscribed in a circle; in particular, te sine of an angle will be defined to be te

More information

arxiv: v1 [physics.flu-dyn] 3 Jun 2015

arxiv: v1 [physics.flu-dyn] 3 Jun 2015 A Convective-like Energy-Stable Open Boundary Condition for Simulations of Incompressible Flows arxiv:156.132v1 [pysics.flu-dyn] 3 Jun 215 S. Dong Center for Computational & Applied Matematics Department

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019 ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS MATH00030 SEMESTER 208/209 DR. ANTHONY BROWN 6. Differential Calculus 6.. Differentiation from First Principles. In tis capter, we will introduce

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.

More information

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk Bob Brown Mat 251 Calculus 1 Capter 3, Section 1 Completed 1 Te Tangent Line Problem Te idea of a tangent line first arises in geometry in te context of a circle. But before we jump into a discussion of

More information

Quantum Mechanics Chapter 1.5: An illustration using measurements of particle spin.

Quantum Mechanics Chapter 1.5: An illustration using measurements of particle spin. I Introduction. Quantum Mecanics Capter.5: An illustration using measurements of particle spin. Quantum mecanics is a teory of pysics tat as been very successful in explaining and predicting many pysical

More information

Pre-Calculus Review Preemptive Strike

Pre-Calculus Review Preemptive Strike Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly

More information

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here! Precalculus Test 2 Practice Questions Page Note: You can expect oter types of questions on te test tan te ones presented ere! Questions Example. Find te vertex of te quadratic f(x) = 4x 2 x. Example 2.

More information

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles

More information

Quantum Numbers and Rules

Quantum Numbers and Rules OpenStax-CNX module: m42614 1 Quantum Numbers and Rules OpenStax College Tis work is produced by OpenStax-CNX and licensed under te Creative Commons Attribution License 3.0 Abstract Dene quantum number.

More information

3.1 Extreme Values of a Function

3.1 Extreme Values of a Function .1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find

More information

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative Matematics 5 Workseet 11 Geometry, Tangency, and te Derivative Problem 1. Find te equation of a line wit slope m tat intersects te point (3, 9). Solution. Te equation for a line passing troug a point (x

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

Impact of Lightning Strikes on National Airspace System (NAS) Outages

Impact of Lightning Strikes on National Airspace System (NAS) Outages Impact of Ligtning Strikes on National Airspace System (NAS) Outages A Statistical Approac Aurélien Vidal University of California at Berkeley NEXTOR Berkeley, CA, USA aurelien.vidal@berkeley.edu Jasenka

More information

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer

More information

IEOR 165 Lecture 10 Distribution Estimation

IEOR 165 Lecture 10 Distribution Estimation IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat

More information

Poisson Equation in Sobolev Spaces

Poisson Equation in Sobolev Spaces Poisson Equation in Sobolev Spaces OcMountain Dayligt Time. 6, 011 Today we discuss te Poisson equation in Sobolev spaces. It s existence, uniqueness, and regularity. Weak Solution. u = f in, u = g on

More information

The Dynamic Range of Bursting in a Model Respiratory Pacemaker Network

The Dynamic Range of Bursting in a Model Respiratory Pacemaker Network SIAM J. APPLIED DYNAMICAL SYSTEMS Vol. 4, No. 4, pp. 117 1139 c 25 Society for Industrial and Applied Matematics Te Dynamic Range of Bursting in a Model Respiratory Pacemaker Network Janet Best, Alla Borisyuk,

More information

Functions of the Complex Variable z

Functions of the Complex Variable z Capter 2 Functions of te Complex Variable z Introduction We wis to examine te notion of a function of z were z is a complex variable. To be sure, a complex variable can be viewed as noting but a pair of

More information

Equilibrium and Pareto Efficiency in an exchange economy

Equilibrium and Pareto Efficiency in an exchange economy Microeconomic Teory -1- Equilibrium and efficiency Equilibrium and Pareto Efficiency in an excange economy 1. Efficient economies 2 2. Gains from excange 6 3. Edgewort-ox analysis 15 4. Properties of a

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

Discriminate Modelling of Peak and Off-Peak Motorway Capacity

Discriminate Modelling of Peak and Off-Peak Motorway Capacity International Journal of Integrated Engineering - Special Issue on ICONCEES Vol. 4 No. 3 (2012) p. 53-58 Discriminate Modelling of Peak and Off-Peak Motorway Capacity Hasim Moammed Alassan 1,*, Sundara

More information

CHAPTER 3: Derivatives

CHAPTER 3: Derivatives CHAPTER 3: Derivatives 3.1: Derivatives, Tangent Lines, and Rates of Cange 3.2: Derivative Functions and Differentiability 3.3: Tecniques of Differentiation 3.4: Derivatives of Trigonometric Functions

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

Boosting Kernel Density Estimates: a Bias Reduction. Technique?

Boosting Kernel Density Estimates: a Bias Reduction. Technique? Boosting Kernel Density Estimates: a Bias Reduction Tecnique? Marco Di Marzio Dipartimento di Metodi Quantitativi e Teoria Economica, Università di Cieti-Pescara, Viale Pindaro 42, 65127 Pescara, Italy

More information

These errors are made from replacing an infinite process by finite one.

These errors are made from replacing an infinite process by finite one. Introduction :- Tis course examines problems tat can be solved by metods of approximation, tecniques we call numerical metods. We begin by considering some of te matematical and computational topics tat

More information

Symmetry Labeling of Molecular Energies

Symmetry Labeling of Molecular Energies Capter 7. Symmetry Labeling of Molecular Energies Notes: Most of te material presented in tis capter is taken from Bunker and Jensen 1998, Cap. 6, and Bunker and Jensen 2005, Cap. 7. 7.1 Hamiltonian Symmetry

More information

Chapter 2 Limits and Continuity

Chapter 2 Limits and Continuity 4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(

More information

Time (hours) Morphine sulfate (mg)

Time (hours) Morphine sulfate (mg) Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15

More information

The Complexity of Computing the MCD-Estimator

The Complexity of Computing the MCD-Estimator Te Complexity of Computing te MCD-Estimator Torsten Bernolt Lerstul Informatik 2 Universität Dortmund, Germany torstenbernolt@uni-dortmundde Paul Fiscer IMM, Danisc Tecnical University Kongens Lyngby,

More information