Learning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks

Size: px
Start display at page:

Download "Learning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks"

Transcription

1 Learning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks Jeremy C. Weiss University of Wisconsin-Madison Madison, WI, US Sriraam Natarajan Indiana University Bloomington, IN, US C. David Page University of Wisconsin-Madison Madison, WI, US Abstract Applications of grapical models often require te use of approximate inference, suc as sequential importance sampling (SIS), for estimation of te model distribution given partial evidence, i.e., te target distribution. However, wen SIS proposal and target distributions are dissimilar, suc procedures lead to biased estimates or require a proibitive number of samples. We introduce ReBaSIS, a metod tat better approximates te target distribution by sampling variable by variable from existing importance samplers and accepting or rejecting eac proposed assignment in te sequence: a coice made based on anticipating upcoming evidence. We relate te per-variable proposal and model distributions by expected weigt ratios of sequence completions and sow tat we can learn accurate models of optimal acceptance probabilities from local samples. In a continuous-time domain, our metod improves upon previous importance samplers by transforming an SIS problem into a macine learning one. Introduction Sequential importance sampling (SIS) is a metod for approximating an intractable target distribution by sampling from a proposal distribution and weigting te sample by te ratio of target to proposal distribution probabilities at eac step of te sequence. It provides te basis for many distribution approximations wit applications including robotic environment mapping and speec recognition (Montemerlo et al. 2003; Wolfel and Faubel 2007). Te caracteristic sortcoming of importance sampling stems from te potentially ig weigt variance tat results from large differences in te target and proposal densities. SIS compounds tis problem by iteratively sampling over te steps of te sequence, resulting in sequence weigts equal to te product of step weigts. Te sequence weigt distribution is exponential, so only te ig-weigt tail of te sample distribution contributes substantially to te approximation. Two approaces to mitigate tis problem are filtering, e.g., (Doucet, Godsill, and Andrieu 2000; Fan, Xu, and Selton 200) and adaptive importance sampling, e.g., (Cornebise, Moulines, and Olsson 2008; Yuan and Druzdzel 2003; 2007a). Copyrigt c 205, Association for te Advancement of Artificial Intelligence ( All rigts reserved. One drawback of filtering is tat it does not efficiently account for future evidence, and in cases of severe proposalevidence mismatc, many resampling steps are required, leading to sample impoverisment. Akin to adaptive importance sampling, our metod addresses proposal-evidence mismatc by developing foresigt, i.e. adaptation to approacing evidence, to guide its proposals. It develops foresigt by learning a binary classifier dependent on approacing evidence to accept or reject eac proposal. Our procedure can be viewed as te construction of a second proposal distribution, learned to account for evidence and to better approximate te target distribution, and is a new form of adaptive importance sampling. In greater detail, our task is to recover a target distribution f, wic can be factored variable by variable into component conditional distributions fi for i... k. Te SIS framework provides a suboptimal surrogate distribution g, wic likewise can be factored into a set of conditional distributions g i. We propose a second surrogate distribution closer to f based on learning conditional acceptance probabilities a i of rejection samplers relating fi and g i. Tat is, to sample from, we iteratively (re-)sample from proposals g i and accept wit probability a i. Our key idea is to relate te proposal g i and target fi distributions by te ratio of expected weigts of sequence completions, i.e., a setting for eac variable from i to k, given acceptance and rejection of te sample from g i. Given te expected weigt ratio, we can recover te optimal acceptance probability a i and tus f i. Unfortunately, te calculation of te expected weigts, and tus te ratio, is typically intractable because of te exponential number of sequence completions. Instead, we can approximate it using macine learning. First, we sow tat te expected weigt ratio equals te odds of accepting a proposal from g i under te fi distribution. Ten, transforming te odds to a probability, we can learn a binary classifier for te probability of acceptance under fi given te sample proposal from g i. Finally, we sow ow to generate examples to train a classifier to make te optimal accept/reject decision. We specifically examine te application of our rejection-based SIS (ReBaSIS) algoritm to continuous-time Bayesian networks (CTBNs) (Nodelman, Selton, and Koller 2002), were inference (even approximate) is NP-ard (Sturlaugson and Seppard 204).

2 We proceed as follows. First we present an example and related work, and we outline te problem of SIS and te identification of good proposal distributions. Ten, we define rejection sampling witin SIS and sow ow to approximate te target distribution via binary classification. Finally, we extend our analysis to CTBNs and describe experiments tat sow te empirical advantages of our metod over previous CTBN importance samplers.. An Illustrative Example Figure describes our metod in te simplest relevant example: a binary-state Markov cain. For our example, let k = 3: ten we ave evidence tat z 3 =. One possible sample procedure could be: S, accept z, 2 reject z2, 2 accept z2, reject z3, 2 accept z3, T, giving us te pat: S, z, 2 z2, z3, T. Note tat if te proposal g 3 : z2 z3 were very improbable under g but not f (i.e., proposal-evidence mismatc), all samples running troug z2 would ave very large weigt. By introducing te possibility of rejection at eac step, our procedure can learn to reject samples to z3, 2 reducing te importance sampling weigt, and learn to enter states z2 and z2 2 proportionally to f( e), i.e., develop foresigt..2 Related Work As mentioned above, batc resampling tecniques based on rejection control (Liu, Cen, and Wong 998; Yuan and Druzdzel 2007b) or sequential Monte Carlo (SMC) (Doucet, Godsill, and Andrieu 2000; Fan, Xu, and Selton 200), i.e. particle filtering, can mitigate te SIS weigt variance problem, but tey can lead to reduced particle diversity, especially wen many resampling iterations are required. Particle smooting (Fan, Xu, and Selton 200) combats particle impoverisment, but te exponentially-large state spaces used in CTBNs limit its ability to find alternative, probable sample istories. Previous adaptive importance sampling metods rely on structural knowledge and oter inference metods, e.g., (Ceng and Druzdzel 2000; Yuan and Druzdzel 2003), to develop improved proposals, wereas our metod learns a classifier to elp guide samples troug regions of proposal-evidence mismatc. One interesting idea combining work in filtering and adaptive importance sampling is te SMC 2 algoritm (Copin et al. 20), wic maintains a sample distribution over bot particles and parameters determining te proposal distribution, resampling along eiter dimension as necessary. Te metod does not anticipate future evidence, so it may complement our work, wic can similarly be used in te SMC framework. Oter MCMC (Rao and Te 20) or particle MCMC (Andrieu, Doucet, and Holenstein 200) metods may ave trouble in large state spaces (e.g., CTBNs) wit multiple modes and low density regions in between, especially if tere is proposal-evidence mismatc. 2 Background We begin wit a review of importance sampling and ten introduce our surrogate distribution. Let f be a p.d.f. defined on an ordered set of random variables Z = {Z,..., Z k } f: g: f f 2 f 3 g g 2 g 3 S z z 2 z 2 z 2 2 (f( e), target) (g, proposal) Figure : A source-to-sink representation of a binary-state Markov cain wit evidence at z k (red). Distributions f and g are defined over pats from source S to sink T and are composed of element-wise distributions f i and g i. For a sample at state z 2 (dark blue), an assignment to z2 2 is proposed (ligt blue) according to g 2. To mimic sampling from f2 = f 2 ( e), te proposed assignment is accepted wit probability proportional to te ratio of expected weigts of pat completions from z2 2 and z 2 to T. over an event space Ω. We are interested in te conditional distribution f(z e), were evidence e is a set of observations about a subset κ of values {Z i = z i } i κ. For fixed e, we define our target p.d.f. f (z) = f(z e). Let g(z) be a surrogate distribution from wic we can sample suc tat if f (z) > 0 ten g(z) > 0. Ten for any subset Z Ω, we can approximate f wit n weigted samples from g: z Z f (z)dz = n n i= Z f (z) g(z) g(z)dz [z i Z] f (z i ) g(z i ) = n z k z k 2 T n [z i Z]w i i= were [z i Z] is te indicator function wit value if z i Z and 0 oterwise, and w i is te importance sample weigt. We design a second surrogate (z) wit te density corresponding to accepting a sample from g: ( (z) = g(z)a(z) g(ζ)a(ζ)dζ) () Ω were a(z) is te sample acceptance probability from g. Te last term in Equation is a normalizing functional (of g and a) to ensure tat (z) is a density. Procedurally, we sample from by (re-)sampling from g and accepting wit probability a. Te approximation of f wit is given by: f f (z) g(z) (z)dz = Z Z g(z) (z) (z)dz n [z i Z]wf i n gw i g i= wit weigts wf i = wi f g wi g. To ensure tat as te support of f, we require tat bot a and g are non-zero everywere f is non-zero. Now we can define our optimal resampling density (z) using te optimal coice of acceptance probability a (z) =

3 min(, f (z)/αg(z)), were α is a constant determining te familiar rejection sampler envelope : αg(z). Te density (z) is optimal in te sense tat, for appropriate coice of α suc tat f (z) < αg(z) for all z, (z) = f (z). Wen (z) = f (z), te importance weigts are exactly, and te effective sample size is n. In many applications te direct calculation of f (z) is intractable or impossible and tus we cannot directly recover a (z) or (z). However, we can still use tese ideas to find (z) troug sequential importance sampling (Liu, Cen, and Wong 998), wic we describe next. 2. Sequential Importance Sampling Sequential importance sampling (SIS) is used wen estimating te distribution f over te factorization of Z. In timeseries models, Z i is a random variable over te joint state corresponding to a time step; in continuous-time models, Z i is te random variable corresponding to an interval. Defining z j i = {z j, z j,..., z i } for i, j {,..., k} and j i, we ave te decomposition: f (z) = p(z,..., z k e) = g (z e)w (z e) k g i (z i z (i ), e)w i (z i z (i ), e) (2) i=2 were p( ) is te probability distribution under f. Equation 2 substitutes p wit p.d.f. g i by defining functions g i and w i ( ) = p( )/g i ( ) and requiring g i to ave te same support as p. Ten we define g by te composition of g i : g(z e) = g (z e) k i=2 g i(z i z (i ), e), and likewise for w. To generate a sample z j from proposal distribution g(z e), SIS samples eac z i in order from to k. 3 Metods In tis section we relate te target and proposal decompositions. Recall tat we are interested in sampling directly from te conditional distribution f (z) = p(z e) for fixed e. We define interval distributions f (z ) and fi (z i z (i ) ) suc tat f (z) can be factored into te interval distributions: f (z) = f (z ) k i=2 f i (z i z (i ) ). We define te interval distributions: fi (z i z (i ) ) = p(e z i ) p(e z (i ) ) p(z i z (i ) ) for i > and p(e z (i ) ) > 0, and f (z ) = p(e z )p(z )/p(e) for i = and p(e) > 0. Ten, by te law of probability, we ave: fi (z i z (i ) ) = E f [[e, z] z i ] E f [[e, z] z (i ) ] p(z i z (i ) ). (3) Te indicator function [e, z] is sortand for [ l κ {z l= e l } z] and takes value if te evidence matces z and 0 oterwise. Note tat in te sampling framework, Equation 3 corresponds to sampling from te unconditioned proposal distribution p(z i z (i ) ) and calculating te expected weigt of sample completions z k (i+) and z k i, given by te indicator functions. Tis procedure describes te expected outcome obtained by forward sampling wit rejection. However, wen f and f are igly dissimilar, te vast majority of samples from f will be rejected, i.e., [e, z] = 0 for most z. It may be better to sample from a proposal g wit weigt function w = f/g so tat sampling leads to fewer rejections. Substituting g in Equation 3, we get: f i (z i z (i ) ) = E g[w a k i ] E g [w r k i ]g i(z i z (i ) ). (4) Te terms E g [wk i a ] and E g[wk i r ] are te expected forward importance sampling weigts of z k i given acceptance (a) or rejection (r) of proposed assignment z i. Te derivation of Equation 4 is provided in te Appendix. Equation 4 provides te relationsip: fi versus g i, given by te ratio of expected weigts of completion of sample z under acceptance or rejection of z i. Tis allows us to furter improve g and gives us te sampling distribution. 3. Rejection Sampling to Recover fi Because Equation 4 relates te two distributions, we can generate samples from fi by conducting rejection sampling from g i. Selecting constant α suc tat fi αg i, i.e., αg i is te rejection envelope, we define te optimal interval acceptance probability a i by: a i (z i z (i ) ) = f i (z i z (i ) ) αg i (z i z (i ) ) = E g[wk i a ] αe g [wk i r (5) ]. By defining a i for all i, we can generate an unweigted sample from f (z) in O(k max i (fi ( )/g i( ))) steps given te weigt ratio expectations and appropriate coice of α. Tus, if we can recover a i for all i, we get = f as desired and our procedure generates unweigted samples. 3.2 Estimating te Weigt Ratio Te procedure of sampling intervals to completion depends on te expected weigt ratio in Equation 4. Unfortunately, exact calculation of te ratio is impractical because te expectations involved require summing over an exponential number of terms. We could resort to estimating it from weigted importance samples: completions of z given z i and z given z (i ). Wile possible, tis is inefficient because () it would require weigt estimations for every z i given z i, and (2) te estimation of te expected weigts itself relies on importance sampling. However, we can cast te estimation of te weigt ratio as a macine learning problem of binary classification. We recognize tat similar situations, in terms of state z i, evidence e, model (providing f) and proposal g, result in similar values of a i. Tus, we can learn a binary classifier Φ i (z i, e, f, g) to represent te probability of {acceptance, rejection} = {φ i ( ), φ i ( )} as a function of te situation. In particular, te expected weigt ratio in Equation 4 is proportional to te odds under f of accepting te z i sampled from g i. Te binary classifier provides an estimate of te probability of acceptance φ i (z i, e, f, g), from wic

4 we can derive te odds of acceptance. Substituting into Equation 5, we ave: a i (z i z (i ) ) ( ) φi (z i, e, f, g) = a i (z i z (i ) ), α φ i (z i, e, f, g) denoting te approximations as a i for all i. Ten our empirical proposal density is: k (z) = (z ) i (z i ) i=2 = g (z )a (z )c [g, a ] k g i (z i z (i ) )a i (z i z (i ) )c i [g i, a i ], i=2 were te c i are te normalizing functionals as in Equation. We provide pseudocode for te rejection-based SIS (Re- BaSIS) procedure in Algoritm. 3.3 Training te Classifier Φ i Conceptually, generating examples for te classifier Φ i is straigtforward. Given some z (i ), we sample z i from g i, accept wit probability ρ = /2, and sample to completion using g to get te importance weigt. Ten, an example is (y, x, w): y = {accept, reject}, x is a set of features encoding te situation, and w is te importance weigt. Te training procedure works because te mean weigt of te positive examples estimates E g [wk i a ], and likewise te mean weigt of te negative examples estimates E g [wk i r ]. By sampling wit training acceptance probability ρ, a calibrated classifier Φ ρ i (i.e., one tat minimizes te L 2 loss) estimates te probability: ρe g [wk i a ]/(ρe g[wk i a ] + ( ρ)e g [wk i r ]). Te estimated probability can ten be used to recover te expected weigt ratio: E g [wk i a ] ( ) ( ρ φ ρ E g [wk i r ] i (z ) i, e, f, g) ρ φ ρ i (z, i, e, f, g) and tus, te optimal classifier can be used to recover te rejection-based acceptance probability a i. We get our particular estimator φ i /( φ i ) by setting ρ = /2, toug in principle we could use oter values of ρ. In practice, we sample trajectories alternating acceptance and rejection of samples z i to get a proportion ρ = /2. Ten, we continue sampling te same trajectory to produce 2k training examples for a sequence of lengt k. We adopt tis procedure for efficiency at te cost of generating related training examples. Pseudocode for te generation of examples to train te classifier is provided in te Supplement. Inevitably, tere is some cost to pay to construct, including time for feature construction, classifier training, and classifier use. However, learning only needs to be done once, wile inference may be performed many times. Also, in many callenging problems g will not produce an ESS of any appreciable size, in our case because of a mismatc wit f, and in MCMC because of mode opping difficulties. Our metod adopts an approac complementary to particle metods to elp tackle suc problems, and we sow its utility in te CTBN application. Algoritm Rejection-based SIS (ReBaSIS) Input: conditional distributions {f j} and {g j} j, evidence e; constants α, k; i =, z = {}, w = ; classifiers {φ j} Output: sample z wit weigt w : wile i k do 2: accept = false 3: wile not accept do 4: Sample z i g i, r U[0, ] ( φi (z i,z,e,f,g) φ i (z i,z,e,f,g) 5: a = α 6: if r < a ten 7: accept = true 8: end if 9: end wile 0: z = {z i, z}, w = wf i(z i)c[g i(z i), a(z i)]/(g i(z i)a) : i = i + 2: end wile 3: return (z, w) 4 Continuous-Time Bayesian Networks We extend our analysis to a continuous-time model: te continuous-time Bayesian network (CTBN) (Nodelman, Selton, and Koller 2002), wic ave applications for example in anomaly detection (Xu and Selton 200) and medicine (Weiss, Natarajan, and Page 202). We review te CTBN model and sampling metods and extend our analysis. CTBNs are a model of discrete random variables X, X 2,..., X d = X over time. Te model specifies a directed grap over X tat determines te parents of eac variable. Te parents setting u X is te joint state of te parents of X. Ten, CTBNs assume tat te probability of transition for variable X out of state x at time t is given by te exponential distribution q x u e qx ut wit rate parameter (intensity) q x u for parents setting u. Te variable X transitions from state x to x wit probability Θ xx u, were Θ xx u is an entry in a state transition matrix Θ u. A complete CTBN model is described by two components: a distribution B over te initial joint state X, typically represented by a Bayesian network, and a directed grap over nodes representing X wit corresponding conditional intensity matrices (CIMs). Te CIMs old te intensities q x u and state transition probability matrices Θ u. Te likeliood of a CTBN model given data is computed as follows. A trajectory is a sequence of intervals of fixed state. For eac interval [t 0, t ), te duration t = t t 0 passes, and a variable X transitions at t from state x to x. During te interval all oter variables X i X remain in teir current states x i. Te interval likeliood is given by: q x u e q x ut e q x i ut. (6) }{{} X transitions ) Θ xx u x i:x i X }{{} to state x } {{ } wile X i s rest Ten, te sequence likeliood is given by te product of interval likelioods: q Mx u x u e q x ut x u X X x X u U X x x Θ M xx u xx u

5 were te M x u (and M xx u) are te numbers of transitions out of state x (to state x ), and were te T x u are te amounts of time spent in x given parents settings u. 4. Sampling in CTBNs Evidence provided in data is typically incomplete, i.e., te joint state is partially or fully unobserved over time. Tus, inference is performed to probabilistically complete te unobserved regions. CTBNs are generative models and provide a sampling framework to complete suc regions. Let a trajectory z be a sequence of (state,time) pairs (z i = {x i, x 2i,..., x di }, t i ) for i = {0,..., k}, were x ji is te jt CTBN variable at te it time, suc tat te sequence of t i are in [t start, t end ). Given an initial state z 0 = {x 0, x 20,..., x d0 }, transition times are sampled for eac variable x according to q x u e q x ut were x is te active state of X. Te variable X i tat transitions in te interval is selected based on te sortest sampled transition time. Te state to wic X i transitions is sampled from Θ xix i u. Ten te transition times are resampled as necessary according to intensities q x u, noting tat tese intensities may be different because of potential canges in te parents setting u. Te trajectory terminates wen all sampled transition times exceed a specified ending time. Previous work by Fan et al. describes a framework for importance sampling, particle filtering, and particle smooting in CTBNs (Fan, Xu, and Selton 200). By sampling from truncated exponentials, tey incur importance sample downweigts on teir samples. Recall tat Equation 6 is broken into tree components. Te weigts f i /g i are decomposed likewise: () a downweigt for te variable x tat transitions, according to te ratio of te exponential to te truncated exponential: ( e q x uτ ), (2) a downweigt corresponding to a lookaead point-estimate of Θ u assuming no oter variables cange until te evidence (we leave tis unmodified in our implementation), and (3) a downweigt for eac resting variable x i given by te ratio of not sampling a transition in time t from an exponential and a truncated exponential: ( e q x i uτ i )/( e q x i u(τ i t ) ). Finally, te product of all interval downweigts provides te trajectory importance sample weigt. 4.2 Rejection-Based Importance Sampling in CTBNs Tere is an equivalence between a fixed continuous-time trajectory and te discrete sequences described above. In particular, te rejection-based importance sampling metod requires tat te number of intervals k must be fixed, wile CTBNs produce trajectories wit varying numbers of intervals. Neverteless, for any set of trajectories, we can define ɛ-widt intervals small enoug tat at most one transition occurs per interval and tat suc transitions occur at te end of te interval. Ten for any set of trajectories over te duration [t start, t end ), we set k = (t end t start )/ɛ. Using te memoryless property of exponential distributions, te density of a single, one-transition interval is equal to te density of te product of a sequence of ɛ-widt, zero-transition intervals and one ɛ- widt, one-transition interval. In practice, it is simpler to use te CTBN sampling framework so tat eac interval is of appreciable size. We denote te evidence e as a sequence of tuples of type (state, start time, duration): (e i, t i,0, τ i ) allowing for point evidence wit zero duration, τ i = 0, and [e, z] cecks to see if z agrees wit eac e i over te duration. Unlike discrete time were we can enumerate te states z i, in continuous time te calculation of w g can be timeconsuming because of te normalizing integrals c i [g i, a i ]. From Equation, we ave, omitting te conditioning: g i (z i ) i (z i ) = Ω i a i (ζ)g i (ζ)dζ. a i (z i ) wic can eiter be approximated by ( φ i ( ))/φ i ( ), or calculated piecewise. Te approximation metod assumes te learned acceptance probability produces a proper conditional probability distribution for eac situation. In our experiments, we adopt te approximation metod and compare te results to sow it does not introduce significant bias. We also compare using a restricted feature set were an exact, piecewise computation is possible. Several properties of CTBNs make our approac appealing. First, CTBNs possess te Markov property; namely, te next state is independent of previous states given te current one. Second, CTBNs are omogeneous processes, so te model rate parameters are sared across intervals. We leverage tese facts wen learning eac acceptance probability a i. Te Markov property simplifies te learned probability of acceptance φ i (z i z (i ), e, f, g) to φ i (z i z (i ), e, f, g). Homogeneity simplifies te learning process because, if z (i ) = z (j ) and t i,0 = t j,0 for j i, ten φ i (z i z (i ), e, f, g) = φ i (z j z (j ), e, f, g). Te degeneracy of tese two cases indicates tat te probability of acceptance is situation-dependent and interval indexindependent, so a single classifier can be learned in place of k classifiers. 5 Experiments We compare our learning-based rejection metod (setting α = 2) wit te Fan et al. sampler (200). We learn a logistic regression (LR) model using online, stocastic gradient descent for eac CTBN variable state. An LR data example is (y, x, w), were y is one of {accept, reject}, x is a set of features, and w is te sequence completion weigt. For our experiments we use per-variable features encoding te state (as indicator variables), te time from te current time to next evidence, te time from te proposed sample to next evidence, and te time from te proposed sample to next matcing evidence. Except in te restricted feature set, te times are mapped to intervals [0, ] by using a e t/λ transformation, wit λ = {0 2, 0, 0 0, 0, 0 2 } to capture effects at different time scales. Te restricted feature set uses linear times truncated at a maximum time of 0. Tis allows fort piecewise calculation of te normalizing functional Because of te ig variance in weigts for samples of full sequence completions, we instead coose a local weigt: te weigt of te sequence troug te next m = 0 evidence times. Tis biases te learner to ave low variance weigts witin te local window, but it does not bias te proposal.

6 Probability density Probability density g f f* Time Time f* g f* f g Figure 2: Approximate transition densities (top) of f (target), f (target witout evidence), g (surrogate), and (learned rejection surrogate) in a one-variable, binary-state CTBN wit uniform transition rates of 0. and matcing evidence at t=5. Te learned distribution closely mimics f, te target distribution wit evidence, wile g was constructed to mimic f (exactly, in tis situation). All metods recover te weigted transition densities (bottom); for 20 evidence points wit one at t=5 and 9 after t=20, recovers te target distribution more precisely tan g per 0 6 samples. We analyze te performance of te rejection-based sampler by inspection of learned transition probability densities and te effective sample size (ESS) (Kong, Liu, and Wong 994). ESS is an indicator of te quality of te samples, and a larger value is better: ESS = /( n i= (W i ) 2 ), were W i = w i / n j= wj. We test our metod in several models: one-, two- and tree- variable, strong-cycle binary-state CTBNs, and te part-binary, 8-variable drug model presented in te original CTBN paper (Nodelman, Selton, and Koller 2002). Te strong-cycle models encode an intensity pat for particular joint states by sifting bit registers and adding and filling in an empty register wit a 0 or as in te following example. For te 3- variable strong cycle, intensities involved in te pat are, and intensities are 0. for all oter transitions. We generate sequences from ground trut models and censor eac to retain only 00 point evidences wit times t i drawn randomly, uniformly over te duration [0,20). Te code is provided at ttp://cs.wisc.edu/ jcweiss/aaai5. Figure 2 (top) illustrates te ability of to mimic f, te target distribution, in a one-node binary-state CTBN wit matcing evidence at t = 5. Te Fan et al. proposal g matces te target density in te absence of evidence, Table : Geometric mean of effective sample size (ESS) over 00 sequences, eac wit 00 observations; ESS is per 0 5 samples. Te proposal was learned wit 000 sequences. Model Fan et al. (g) Rejection SIS () Strong cycle, n= Strong cycle, n= Strong cycle, n= Drug f. However, wen approacing evidence (at t = 5), te transition probability given evidence goes to 0 as te next transition must also occur before t = 5 to be a viable sequence. Only f and exibit tis beavior. Figure 2 (bottom) sows te density approximations after weigting te samples, given a trajectory wit evidence at t = 5 and 9 evidence points after t = 20. Eac metod recovers te target distribution, but does so more precisely tan g, given a fixed number of samples (one million). Te proposal acceptance rate for was measured to be 45 percent. Te proposal based on te restricted feature set for unbiased inference is provided in te Supplement. Table sows tat te learned, rejection-based proposal outperforms te oter CTBN importance sampler g across all 4 models, resulting in an ESS 2 to 0 times larger. Generally as te number of variables increases, te ESS decreases because of te increasing mismatc between f and g. Wit an average ESS of only 29 in an 8-variable model, as we increase model sizes, we expect tat g would fail to produce a meaningful sample distribution more quickly tan would. Figure 3 sows tat te weigt distribution from is narrower tan tat from g on te log scale, using te drug model and 0 evidence points. Te interpretation is tat a larger fraction of examples from contribute substantially to te total weigt, resulting in a lower variance sample distribution. For example, any sample wit weigt below e 0 as negligible relative weigt and does not substantially affect te sample distribution. Tere are many fewer suc samples generated from tan from g. Density g Log weigt Figure 3: Distribution of log weigts. For sample completions of a trajectory wit 0 evidence points in te drug model, te distribution of log weigts using is muc narrower tan te distribution of log weigts using g (bottom).

7 6 Conclusion Our work as demonstrated tat macine learning can be used to improve sequential importance sampling via a rejection sampling framework. We sowed tat te proposal and target distributions are related by an expected weigt ratio, and tat te weigt ratio can be estimated by te probabilistic output of a binary classifier learned from weigted, local importance samples. We extended te algoritm to CTBNs, were we found experimentally tat using our learning algoritm produces a sampling distribution closer to te target and generates more effective samples. Continued investigations are warranted, including te use of non-parametric learning algoritms, a procedure for expanding local approximations, and extensions to oter grapical models, were conjugacy of te acceptance probability wit te proposal distribution could lead to improved performance. 7 Acknowledgments We gratefully acknowledge te CIBM Training Program grant 5T5LM007359, te NIGMS grant R0GM09768, te NLM grant R0LM0028, and te NSF grant IIS A Appendix To derive Equation 4, we relate te target densities f i (z i z (i ) ) wit te proposal densities g i (z i z (i ) ) via te (standard) derivation of Equation 3 using Bayes teorem, te law of total probability, and substitution: f i (z i z (i ) ) = p(e z i ) p(e z (i ) ) p(z i z (i ) ) p(e z k i+, z i )p(z k i+ z i ) z k i+ = p(e z k i, z i )p(z k i z i ) p(z i z i ) z k i [ l κ {z l=e l } z]p(z k i+ z i ) z k i+ = [ l κ {z l=e l } z]p(z k i z i ) p(z i z i ) z k i = E g[[e, z] k j=i+ w j(z j z j ) z i ] E g [[e, z] k j=i w j(z j z j ) z i ] p(z i z i ) = E g[[e, z] k j=i w j(z j z j ) z i ] E g [[e, z] k j=i w j(z j z j ) z i ] g i(z i z i ) = E g[w a k i ] E g [w r k i ]g i(z i z (i ) ). References Andrieu, C.; Doucet, A.; and Holenstein, R Particle Markov cain Monte Carlo metods. Journal of te Royal Statistical Society: Series B (Statistical Metodology) 72(3): Ceng, J., and Druzdzel, M. J AIS-BN: An adaptive importance sampling algoritm for evidential reasoning in large Bayesian networks. Journal of Artificial Intelligence Researc 3(): Copin, N.; Jacob, P.; Papaspiliopoulos, O.; et al. 20. Smc2: A sequential Monte Carlo algoritm wit particle Markov cain Monte Carlo updates. JR Stat. Soc. B (202, to appear). Cornebise, J.; Moulines, E.; and Olsson, J Adaptive metods for sequential importance sampling wit application to state space models. Statistics and Computing 8(4): Doucet, A.; Godsill, S.; and Andrieu, C On sequential Monte Carlo sampling metods for Bayesian filtering. Statistics and Computing 0(3): Fan, Y.; Xu, J.; and Selton, C. R Importance sampling for continuous time Bayesian networks. Te Journal of Macine Learning Researc : Kong, A.; Liu, J. S.; and Wong, W. H Sequential imputations and Bayesian missing data problems. Journal of te American Statistical Association 89(425): Liu, J. S.; Cen, R.; and Wong, W. H Rejection control and sequential importance sampling. Journal of te American Statistical Association 93(443): Montemerlo, M.; Trun, S.; Koller, D.; and Wegbreit, B Fastslam 2.0: An improved particle filtering algoritm for simultaneous localization and mapping tat provably converges. In International Joint Conference on Artificial Intelligence, volume 8, Lawrence Erlbaum Associates LTD. Nodelman, U.; Selton, C.; and Koller, D Continuous time Bayesian networks. In Uncertainty in artificial intelligence, Morgan Kaufmann Publisers Inc. Rao, V., and Te, Y. W. 20. Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks. In Uncertainty in Artificial Intelligence. Sturlaugson, L., and Seppard, J. W Inference complexity in continuous time bayesian networks. In Uncertainty in Artificial Intelligence. Weiss, J.; Natarajan, S.; and Page, D Multiplicative forests for continuous-time processes. In Advances in Neural Information Processing Systems 25, Wolfel, M., and Faubel, F Considering uncertainty by particle filter enanced speec features in large vocabulary continuous speec recognition. In Acoustics, Speec and Signal Processing, ICASSP IEEE International Conference on, volume 4, IV 049. IEEE. Xu, J., and Selton, C. R Intrusion detection using continuous time Bayesian networks. Journal of Artificial Intelligence Researc 39(): Yuan, C., and Druzdzel, M. J An importance sampling algoritm based on evidence pre-propagation. In Uncertainty in Artificial Intelligence, Morgan Kaufmann Publisers Inc. Yuan, C., and Druzdzel, M. J. 2007a. Importance sampling for general ybrid Bayesian networks. In Artificial intelligence and statistics. Yuan, C., and Druzdzel, M. J. 2007b. Improving importance sampling by adaptive split-rejection control in Bayesian networks. In Advances in Artificial Intelligence. Springer

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

Minimizing D(Q,P) def = Q(h)

Minimizing D(Q,P) def = Q(h) Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact

More information

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk

More information

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles

More information

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain

More information

Notes on Neural Networks

Notes on Neural Networks Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225 THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:

More information

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f = Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

Homework 1 Due: Wednesday, September 28, 2016

Homework 1 Due: Wednesday, September 28, 2016 0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

THE hidden Markov model (HMM)-based parametric

THE hidden Markov model (HMM)-based parametric JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Modeling Spectral Envelopes Using Restricted Boltzmann Macines and Deep Belief Networks for Statistical Parametric Speec Syntesis Zen-Hua Ling,

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

Continuity and Differentiability of the Trigonometric Functions

Continuity and Differentiability of the Trigonometric Functions [Te basis for te following work will be te definition of te trigonometric functions as ratios of te sides of a triangle inscribed in a circle; in particular, te sine of an angle will be defined to be te

More information

Chapter 1 Functions and Graphs. Section 1.5 = = = 4. Check Point Exercises The slope of the line y = 3x+ 1 is 3.

Chapter 1 Functions and Graphs. Section 1.5 = = = 4. Check Point Exercises The slope of the line y = 3x+ 1 is 3. Capter Functions and Graps Section. Ceck Point Exercises. Te slope of te line y x+ is. y y m( x x y ( x ( y ( x+ point-slope y x+ 6 y x+ slope-intercept. a. Write te equation in slope-intercept form: x+

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT the Math. What is the half-life of radon?

Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT the Math. What is the half-life of radon? 8.5 Solving Exponential Equations GOAL Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT te Mat All radioactive substances decrease in mass over time. Jamie works in

More information

EDML: A Method for Learning Parameters in Bayesian Networks

EDML: A Method for Learning Parameters in Bayesian Networks : A Metod for Learning Parameters in Bayesian Networks Artur Coi, Kaled S. Refaat and Adnan Darwice Computer Science Department University of California, Los Angeles {aycoi, krefaat, darwice}@cs.ucla.edu

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

Spike train entropy-rate estimation using hierarchical Dirichlet process priors

Spike train entropy-rate estimation using hierarchical Dirichlet process priors publised in: Advances in Neural Information Processing Systems 26 (23), 276 284. Spike train entropy-rate estimation using ierarcical Diriclet process priors Karin Knudson Department of Matematics kknudson@mat.utexas.edu

More information

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

Bayesian ML Sequence Detection for ISI Channels

Bayesian ML Sequence Detection for ISI Channels Bayesian ML Sequence Detection for ISI Cannels Jill K. Nelson Department of Electrical and Computer Engineering George Mason University Fairfax, VA 030 Email: jnelson@gmu.edu Andrew C. Singer Department

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

Haplotyping. Biostatistics 666

Haplotyping. Biostatistics 666 Haplotyping Biostatistics 666 Previously Introduction to te E-M algoritm Approac for likeliood optimization Examples related to gene counting Allele frequency estimation recessive disorder Allele frequency

More information

2.11 That s So Derivative

2.11 That s So Derivative 2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point

More information

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

Derivatives. By: OpenStaxCollege

Derivatives. By: OpenStaxCollege By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator

More information

Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution

Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution International Journal of Clinical Medicine Researc 2016; 3(5): 76-80 ttp://www.aascit.org/journal/ijcmr ISSN: 2375-3838 Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

HOMEWORK HELP 2 FOR MATH 151

HOMEWORK HELP 2 FOR MATH 151 HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,

More information

Analytic Functions. Differentiable Functions of a Complex Variable

Analytic Functions. Differentiable Functions of a Complex Variable Analytic Functions Differentiable Functions of a Complex Variable In tis capter, we sall generalize te ideas for polynomials power series of a complex variable we developed in te previous capter to general

More information

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c)

= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c) Paper 1: Pure Matematics 1 Mark Sceme 1(a) (i) (ii) d d y 3 1x 4x x M1 A1 d y dx 1.1b 1.1b 36x 48x A1ft 1.1b Substitutes x = into teir dx (3) 3 1 4 Sows d y 0 and states ''ence tere is a stationary point''

More information

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4. December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

Sin, Cos and All That

Sin, Cos and All That Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

Gradient Descent etc.

Gradient Descent etc. 1 Gradient Descent etc EE 13: Networked estimation and control Prof Kan) I DERIVATIVE Consider f : R R x fx) Te derivative is defined as d fx) = lim dx fx + ) fx) Te cain rule states tat if d d f gx) )

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

Overdispersed Variational Autoencoders

Overdispersed Variational Autoencoders Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,

More information

A h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation

A h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation Capter Grid Transfer Remark. Contents of tis capter. Consider a grid wit grid size and te corresponding linear system of equations A u = f. Te summary given in Section 3. leads to te idea tat tere migt

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example, NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3

More information

Math Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim

Math Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim Mat 311 - Spring 013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, 013 Question 1. [p 56, #10 (a)] 4z Use te teorem of Sec. 17 to sow tat z (z 1) = 4. We ave z 4z (z 1) = z 0 4 (1/z) (1/z

More information

Time (hours) Morphine sulfate (mg)

Time (hours) Morphine sulfate (mg) Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

IEOR 165 Lecture 10 Distribution Estimation

IEOR 165 Lecture 10 Distribution Estimation IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

Poisson Equation in Sobolev Spaces

Poisson Equation in Sobolev Spaces Poisson Equation in Sobolev Spaces OcMountain Dayligt Time. 6, 011 Today we discuss te Poisson equation in Sobolev spaces. It s existence, uniqueness, and regularity. Weak Solution. u = f in, u = g on

More information

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations International Journal of Applied Science and Engineering 2013. 11, 4: 361-373 Parameter Fitted Sceme for Singularly Perturbed Delay Differential Equations Awoke Andargiea* and Y. N. Reddyb a b Department

More information

New Distribution Theory for the Estimation of Structural Break Point in Mean

New Distribution Theory for the Estimation of Structural Break Point in Mean New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University

More information

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as

More information

Printed Name: Section #: Instructor:

Printed Name: Section #: Instructor: Printed Name: Section #: Instructor: Please do not ask questions during tis exam. If you consider a question to be ambiguous, state your assumptions in te margin and do te best you can to provide te correct

More information

Bounds on the Moments for an Ensemble of Random Decision Trees

Bounds on the Moments for an Ensemble of Random Decision Trees Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: Sep. 17, 2013 / Revised: Mar. 04, 2014 / Accepted: Jun. 30, 2014

More information

Handling Missing Data on Asymmetric Distribution

Handling Missing Data on Asymmetric Distribution International Matematical Forum, Vol. 8, 03, no. 4, 53-65 Handling Missing Data on Asymmetric Distribution Amad M. H. Al-Kazale Department of Matematics, Faculty of Science Al-albayt University, Al-Mafraq-Jordan

More information

MATH 1020 Answer Key TEST 2 VERSION B Fall Printed Name: Section #: Instructor:

MATH 1020 Answer Key TEST 2 VERSION B Fall Printed Name: Section #: Instructor: Printed Name: Section #: Instructor: Please do not ask questions during tis exam. If you consider a question to be ambiguous, state your assumptions in te margin and do te best you can to provide te correct

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk Bob Brown Mat 251 Calculus 1 Capter 3, Section 1 Completed 1 Te Tangent Line Problem Te idea of a tangent line first arises in geometry in te context of a circle. But before we jump into a discussion of

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

Termination Problems in Chemical Kinetics

Termination Problems in Chemical Kinetics Termination Problems in Cemical Kinetics Gianluigi Zavattaro and Luca Cardelli 2 Dip. Scienze dell Informazione, Università di Bologna, Italy 2 Microsoft Researc, Cambridge, UK Abstract. We consider nondeterministic

More information

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer

More information

Investigating Euler s Method and Differential Equations to Approximate π. Lindsay Crowl August 2, 2001

Investigating Euler s Method and Differential Equations to Approximate π. Lindsay Crowl August 2, 2001 Investigating Euler s Metod and Differential Equations to Approximate π Lindsa Crowl August 2, 2001 Tis researc paper focuses on finding a more efficient and accurate wa to approximate π. Suppose tat x

More information

1watt=1W=1kg m 2 /s 3

1watt=1W=1kg m 2 /s 3 Appendix A Matematics Appendix A.1 Units To measure a pysical quantity, you need a standard. Eac pysical quantity as certain units. A unit is just a standard we use to compare, e.g. a ruler. In tis laboratory

More information

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t). . Consider te trigonometric function f(t) wose grap is sown below. Write down a possible formula for f(t). Tis function appears to be an odd, periodic function tat as been sifted upwards, so we will use

More information

1 1. Rationalize the denominator and fully simplify the radical expression 3 3. Solution: = 1 = 3 3 = 2

1 1. Rationalize the denominator and fully simplify the radical expression 3 3. Solution: = 1 = 3 3 = 2 MTH - Spring 04 Exam Review (Solutions) Exam : February 5t 6:00-7:0 Tis exam review contains questions similar to tose you sould expect to see on Exam. Te questions included in tis review, owever, are

More information

Bounds on the Moments for an Ensemble of Random Decision Trees

Bounds on the Moments for an Ensemble of Random Decision Trees Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: / Accepted: Abstract An ensemble of random decision trees is

More information

Pre-Calculus Review Preemptive Strike

Pre-Calculus Review Preemptive Strike Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly

More information

An Empirical Bayesian interpretation and generalization of NL-means

An Empirical Bayesian interpretation and generalization of NL-means Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation

More information

LATTICE EXIT MODELS S. GILL WILLIAMSON

LATTICE EXIT MODELS S. GILL WILLIAMSON LATTICE EXIT MODELS S. GILL WILLIAMSON ABSTRACT. We discuss a class of problems wic we call lattice exit models. At one level, tese problems provide undergraduate level exercises in labeling te vertices

More information

3.4 Algebraic Limits. Ex 1) lim. Ex 2)

3.4 Algebraic Limits. Ex 1) lim. Ex 2) Calculus Maimus.4 Algebraic Limits At tis point, you sould be very comfortable finding its bot grapically and numerically wit te elp of your graping calculator. Now it s time to practice finding its witout

More information

Excursions in Computing Science: Week v Milli-micro-nano-..math Part II

Excursions in Computing Science: Week v Milli-micro-nano-..math Part II Excursions in Computing Science: Week v Milli-micro-nano-..mat Part II T. H. Merrett McGill University, Montreal, Canada June, 5 I. Prefatory Notes. Cube root of 8. Almost every calculator as a square-root

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

Experimental Validation of Cooperative Formation Control with Collision Avoidance for a Multi-UAV System

Experimental Validation of Cooperative Formation Control with Collision Avoidance for a Multi-UAV System Proceedings of te 6t International Conference on Automation, Robotics and Applications, Feb 17-19, 2015, Queenstown, New Zealand Experimental Validation of Cooperative Formation Control wit Collision Avoidance

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

Numerical Analysis MTH603. dy dt = = (0) , y n+1. We obtain yn. Therefore. and. Copyright Virtual University of Pakistan 1

Numerical Analysis MTH603. dy dt = = (0) , y n+1. We obtain yn. Therefore. and. Copyright Virtual University of Pakistan 1 Numerical Analysis MTH60 PREDICTOR CORRECTOR METHOD Te metods presented so far are called single-step metods, were we ave seen tat te computation of y at t n+ tat is y n+ requires te knowledge of y n only.

More information

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist Mat 1120 Calculus Test 2. October 18, 2001 Your name Te multiple coice problems count 4 points eac. In te multiple coice section, circle te correct coice (or coices). You must sow your work on te oter

More information

Artificial Neural Network Model Based Estimation of Finite Population Total

Artificial Neural Network Model Based Estimation of Finite Population Total International Journal of Science and Researc (IJSR), India Online ISSN: 2319-7064 Artificial Neural Network Model Based Estimation of Finite Population Total Robert Kasisi 1, Romanus O. Odiambo 2, Antony

More information

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties

More information

Blueprint End-of-Course Algebra II Test

Blueprint End-of-Course Algebra II Test Blueprint End-of-Course Algebra II Test for te 2001 Matematics Standards of Learning Revised July 2005 Tis revised blueprint will be effective wit te fall 2005 administration of te Standards of Learning

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

Chapter 2 Limits and Continuity

Chapter 2 Limits and Continuity 4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(

More information

Material for Difference Quotient

Material for Difference Quotient Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient

More information

Derivatives of Exponentials

Derivatives of Exponentials mat 0 more on derivatives: day 0 Derivatives of Eponentials Recall tat DEFINITION... An eponential function as te form f () =a, were te base is a real number a > 0. Te domain of an eponential function

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5 THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION NEW MODULAR SCHEME introduced from te examinations in 009 MODULE 5 SOLUTIONS FOR SPECIMEN PAPER B THE QUESTIONS ARE CONTAINED IN A SEPARATE FILE

More information

Continuous Stochastic Processes

Continuous Stochastic Processes Continuous Stocastic Processes Te term stocastic is often applied to penomena tat vary in time, wile te word random is reserved for penomena tat vary in space. Apart from tis distinction, te modelling

More information

Near-Optimal conversion of Hardness into Pseudo-Randomness

Near-Optimal conversion of Hardness into Pseudo-Randomness Near-Optimal conversion of Hardness into Pseudo-Randomness Russell Impagliazzo Computer Science and Engineering UC, San Diego 9500 Gilman Drive La Jolla, CA 92093-0114 russell@cs.ucsd.edu Ronen Saltiel

More information