Learning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks
|
|
- Gary Reginald Holt
- 6 years ago
- Views:
Transcription
1 Learning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks Jeremy C. Weiss University of Wisconsin-Madison Madison, WI, US Sriraam Natarajan Indiana University Bloomington, IN, US C. David Page University of Wisconsin-Madison Madison, WI, US Abstract Applications of grapical models often require te use of approximate inference, suc as sequential importance sampling (SIS), for estimation of te model distribution given partial evidence, i.e., te target distribution. However, wen SIS proposal and target distributions are dissimilar, suc procedures lead to biased estimates or require a proibitive number of samples. We introduce ReBaSIS, a metod tat better approximates te target distribution by sampling variable by variable from existing importance samplers and accepting or rejecting eac proposed assignment in te sequence: a coice made based on anticipating upcoming evidence. We relate te per-variable proposal and model distributions by expected weigt ratios of sequence completions and sow tat we can learn accurate models of optimal acceptance probabilities from local samples. In a continuous-time domain, our metod improves upon previous importance samplers by transforming an SIS problem into a macine learning one. Introduction Sequential importance sampling (SIS) is a metod for approximating an intractable target distribution by sampling from a proposal distribution and weigting te sample by te ratio of target to proposal distribution probabilities at eac step of te sequence. It provides te basis for many distribution approximations wit applications including robotic environment mapping and speec recognition (Montemerlo et al. 2003; Wolfel and Faubel 2007). Te caracteristic sortcoming of importance sampling stems from te potentially ig weigt variance tat results from large differences in te target and proposal densities. SIS compounds tis problem by iteratively sampling over te steps of te sequence, resulting in sequence weigts equal to te product of step weigts. Te sequence weigt distribution is exponential, so only te ig-weigt tail of te sample distribution contributes substantially to te approximation. Two approaces to mitigate tis problem are filtering, e.g., (Doucet, Godsill, and Andrieu 2000; Fan, Xu, and Selton 200) and adaptive importance sampling, e.g., (Cornebise, Moulines, and Olsson 2008; Yuan and Druzdzel 2003; 2007a). Copyrigt c 205, Association for te Advancement of Artificial Intelligence ( All rigts reserved. One drawback of filtering is tat it does not efficiently account for future evidence, and in cases of severe proposalevidence mismatc, many resampling steps are required, leading to sample impoverisment. Akin to adaptive importance sampling, our metod addresses proposal-evidence mismatc by developing foresigt, i.e. adaptation to approacing evidence, to guide its proposals. It develops foresigt by learning a binary classifier dependent on approacing evidence to accept or reject eac proposal. Our procedure can be viewed as te construction of a second proposal distribution, learned to account for evidence and to better approximate te target distribution, and is a new form of adaptive importance sampling. In greater detail, our task is to recover a target distribution f, wic can be factored variable by variable into component conditional distributions fi for i... k. Te SIS framework provides a suboptimal surrogate distribution g, wic likewise can be factored into a set of conditional distributions g i. We propose a second surrogate distribution closer to f based on learning conditional acceptance probabilities a i of rejection samplers relating fi and g i. Tat is, to sample from, we iteratively (re-)sample from proposals g i and accept wit probability a i. Our key idea is to relate te proposal g i and target fi distributions by te ratio of expected weigts of sequence completions, i.e., a setting for eac variable from i to k, given acceptance and rejection of te sample from g i. Given te expected weigt ratio, we can recover te optimal acceptance probability a i and tus f i. Unfortunately, te calculation of te expected weigts, and tus te ratio, is typically intractable because of te exponential number of sequence completions. Instead, we can approximate it using macine learning. First, we sow tat te expected weigt ratio equals te odds of accepting a proposal from g i under te fi distribution. Ten, transforming te odds to a probability, we can learn a binary classifier for te probability of acceptance under fi given te sample proposal from g i. Finally, we sow ow to generate examples to train a classifier to make te optimal accept/reject decision. We specifically examine te application of our rejection-based SIS (ReBaSIS) algoritm to continuous-time Bayesian networks (CTBNs) (Nodelman, Selton, and Koller 2002), were inference (even approximate) is NP-ard (Sturlaugson and Seppard 204).
2 We proceed as follows. First we present an example and related work, and we outline te problem of SIS and te identification of good proposal distributions. Ten, we define rejection sampling witin SIS and sow ow to approximate te target distribution via binary classification. Finally, we extend our analysis to CTBNs and describe experiments tat sow te empirical advantages of our metod over previous CTBN importance samplers.. An Illustrative Example Figure describes our metod in te simplest relevant example: a binary-state Markov cain. For our example, let k = 3: ten we ave evidence tat z 3 =. One possible sample procedure could be: S, accept z, 2 reject z2, 2 accept z2, reject z3, 2 accept z3, T, giving us te pat: S, z, 2 z2, z3, T. Note tat if te proposal g 3 : z2 z3 were very improbable under g but not f (i.e., proposal-evidence mismatc), all samples running troug z2 would ave very large weigt. By introducing te possibility of rejection at eac step, our procedure can learn to reject samples to z3, 2 reducing te importance sampling weigt, and learn to enter states z2 and z2 2 proportionally to f( e), i.e., develop foresigt..2 Related Work As mentioned above, batc resampling tecniques based on rejection control (Liu, Cen, and Wong 998; Yuan and Druzdzel 2007b) or sequential Monte Carlo (SMC) (Doucet, Godsill, and Andrieu 2000; Fan, Xu, and Selton 200), i.e. particle filtering, can mitigate te SIS weigt variance problem, but tey can lead to reduced particle diversity, especially wen many resampling iterations are required. Particle smooting (Fan, Xu, and Selton 200) combats particle impoverisment, but te exponentially-large state spaces used in CTBNs limit its ability to find alternative, probable sample istories. Previous adaptive importance sampling metods rely on structural knowledge and oter inference metods, e.g., (Ceng and Druzdzel 2000; Yuan and Druzdzel 2003), to develop improved proposals, wereas our metod learns a classifier to elp guide samples troug regions of proposal-evidence mismatc. One interesting idea combining work in filtering and adaptive importance sampling is te SMC 2 algoritm (Copin et al. 20), wic maintains a sample distribution over bot particles and parameters determining te proposal distribution, resampling along eiter dimension as necessary. Te metod does not anticipate future evidence, so it may complement our work, wic can similarly be used in te SMC framework. Oter MCMC (Rao and Te 20) or particle MCMC (Andrieu, Doucet, and Holenstein 200) metods may ave trouble in large state spaces (e.g., CTBNs) wit multiple modes and low density regions in between, especially if tere is proposal-evidence mismatc. 2 Background We begin wit a review of importance sampling and ten introduce our surrogate distribution. Let f be a p.d.f. defined on an ordered set of random variables Z = {Z,..., Z k } f: g: f f 2 f 3 g g 2 g 3 S z z 2 z 2 z 2 2 (f( e), target) (g, proposal) Figure : A source-to-sink representation of a binary-state Markov cain wit evidence at z k (red). Distributions f and g are defined over pats from source S to sink T and are composed of element-wise distributions f i and g i. For a sample at state z 2 (dark blue), an assignment to z2 2 is proposed (ligt blue) according to g 2. To mimic sampling from f2 = f 2 ( e), te proposed assignment is accepted wit probability proportional to te ratio of expected weigts of pat completions from z2 2 and z 2 to T. over an event space Ω. We are interested in te conditional distribution f(z e), were evidence e is a set of observations about a subset κ of values {Z i = z i } i κ. For fixed e, we define our target p.d.f. f (z) = f(z e). Let g(z) be a surrogate distribution from wic we can sample suc tat if f (z) > 0 ten g(z) > 0. Ten for any subset Z Ω, we can approximate f wit n weigted samples from g: z Z f (z)dz = n n i= Z f (z) g(z) g(z)dz [z i Z] f (z i ) g(z i ) = n z k z k 2 T n [z i Z]w i i= were [z i Z] is te indicator function wit value if z i Z and 0 oterwise, and w i is te importance sample weigt. We design a second surrogate (z) wit te density corresponding to accepting a sample from g: ( (z) = g(z)a(z) g(ζ)a(ζ)dζ) () Ω were a(z) is te sample acceptance probability from g. Te last term in Equation is a normalizing functional (of g and a) to ensure tat (z) is a density. Procedurally, we sample from by (re-)sampling from g and accepting wit probability a. Te approximation of f wit is given by: f f (z) g(z) (z)dz = Z Z g(z) (z) (z)dz n [z i Z]wf i n gw i g i= wit weigts wf i = wi f g wi g. To ensure tat as te support of f, we require tat bot a and g are non-zero everywere f is non-zero. Now we can define our optimal resampling density (z) using te optimal coice of acceptance probability a (z) =
3 min(, f (z)/αg(z)), were α is a constant determining te familiar rejection sampler envelope : αg(z). Te density (z) is optimal in te sense tat, for appropriate coice of α suc tat f (z) < αg(z) for all z, (z) = f (z). Wen (z) = f (z), te importance weigts are exactly, and te effective sample size is n. In many applications te direct calculation of f (z) is intractable or impossible and tus we cannot directly recover a (z) or (z). However, we can still use tese ideas to find (z) troug sequential importance sampling (Liu, Cen, and Wong 998), wic we describe next. 2. Sequential Importance Sampling Sequential importance sampling (SIS) is used wen estimating te distribution f over te factorization of Z. In timeseries models, Z i is a random variable over te joint state corresponding to a time step; in continuous-time models, Z i is te random variable corresponding to an interval. Defining z j i = {z j, z j,..., z i } for i, j {,..., k} and j i, we ave te decomposition: f (z) = p(z,..., z k e) = g (z e)w (z e) k g i (z i z (i ), e)w i (z i z (i ), e) (2) i=2 were p( ) is te probability distribution under f. Equation 2 substitutes p wit p.d.f. g i by defining functions g i and w i ( ) = p( )/g i ( ) and requiring g i to ave te same support as p. Ten we define g by te composition of g i : g(z e) = g (z e) k i=2 g i(z i z (i ), e), and likewise for w. To generate a sample z j from proposal distribution g(z e), SIS samples eac z i in order from to k. 3 Metods In tis section we relate te target and proposal decompositions. Recall tat we are interested in sampling directly from te conditional distribution f (z) = p(z e) for fixed e. We define interval distributions f (z ) and fi (z i z (i ) ) suc tat f (z) can be factored into te interval distributions: f (z) = f (z ) k i=2 f i (z i z (i ) ). We define te interval distributions: fi (z i z (i ) ) = p(e z i ) p(e z (i ) ) p(z i z (i ) ) for i > and p(e z (i ) ) > 0, and f (z ) = p(e z )p(z )/p(e) for i = and p(e) > 0. Ten, by te law of probability, we ave: fi (z i z (i ) ) = E f [[e, z] z i ] E f [[e, z] z (i ) ] p(z i z (i ) ). (3) Te indicator function [e, z] is sortand for [ l κ {z l= e l } z] and takes value if te evidence matces z and 0 oterwise. Note tat in te sampling framework, Equation 3 corresponds to sampling from te unconditioned proposal distribution p(z i z (i ) ) and calculating te expected weigt of sample completions z k (i+) and z k i, given by te indicator functions. Tis procedure describes te expected outcome obtained by forward sampling wit rejection. However, wen f and f are igly dissimilar, te vast majority of samples from f will be rejected, i.e., [e, z] = 0 for most z. It may be better to sample from a proposal g wit weigt function w = f/g so tat sampling leads to fewer rejections. Substituting g in Equation 3, we get: f i (z i z (i ) ) = E g[w a k i ] E g [w r k i ]g i(z i z (i ) ). (4) Te terms E g [wk i a ] and E g[wk i r ] are te expected forward importance sampling weigts of z k i given acceptance (a) or rejection (r) of proposed assignment z i. Te derivation of Equation 4 is provided in te Appendix. Equation 4 provides te relationsip: fi versus g i, given by te ratio of expected weigts of completion of sample z under acceptance or rejection of z i. Tis allows us to furter improve g and gives us te sampling distribution. 3. Rejection Sampling to Recover fi Because Equation 4 relates te two distributions, we can generate samples from fi by conducting rejection sampling from g i. Selecting constant α suc tat fi αg i, i.e., αg i is te rejection envelope, we define te optimal interval acceptance probability a i by: a i (z i z (i ) ) = f i (z i z (i ) ) αg i (z i z (i ) ) = E g[wk i a ] αe g [wk i r (5) ]. By defining a i for all i, we can generate an unweigted sample from f (z) in O(k max i (fi ( )/g i( ))) steps given te weigt ratio expectations and appropriate coice of α. Tus, if we can recover a i for all i, we get = f as desired and our procedure generates unweigted samples. 3.2 Estimating te Weigt Ratio Te procedure of sampling intervals to completion depends on te expected weigt ratio in Equation 4. Unfortunately, exact calculation of te ratio is impractical because te expectations involved require summing over an exponential number of terms. We could resort to estimating it from weigted importance samples: completions of z given z i and z given z (i ). Wile possible, tis is inefficient because () it would require weigt estimations for every z i given z i, and (2) te estimation of te expected weigts itself relies on importance sampling. However, we can cast te estimation of te weigt ratio as a macine learning problem of binary classification. We recognize tat similar situations, in terms of state z i, evidence e, model (providing f) and proposal g, result in similar values of a i. Tus, we can learn a binary classifier Φ i (z i, e, f, g) to represent te probability of {acceptance, rejection} = {φ i ( ), φ i ( )} as a function of te situation. In particular, te expected weigt ratio in Equation 4 is proportional to te odds under f of accepting te z i sampled from g i. Te binary classifier provides an estimate of te probability of acceptance φ i (z i, e, f, g), from wic
4 we can derive te odds of acceptance. Substituting into Equation 5, we ave: a i (z i z (i ) ) ( ) φi (z i, e, f, g) = a i (z i z (i ) ), α φ i (z i, e, f, g) denoting te approximations as a i for all i. Ten our empirical proposal density is: k (z) = (z ) i (z i ) i=2 = g (z )a (z )c [g, a ] k g i (z i z (i ) )a i (z i z (i ) )c i [g i, a i ], i=2 were te c i are te normalizing functionals as in Equation. We provide pseudocode for te rejection-based SIS (Re- BaSIS) procedure in Algoritm. 3.3 Training te Classifier Φ i Conceptually, generating examples for te classifier Φ i is straigtforward. Given some z (i ), we sample z i from g i, accept wit probability ρ = /2, and sample to completion using g to get te importance weigt. Ten, an example is (y, x, w): y = {accept, reject}, x is a set of features encoding te situation, and w is te importance weigt. Te training procedure works because te mean weigt of te positive examples estimates E g [wk i a ], and likewise te mean weigt of te negative examples estimates E g [wk i r ]. By sampling wit training acceptance probability ρ, a calibrated classifier Φ ρ i (i.e., one tat minimizes te L 2 loss) estimates te probability: ρe g [wk i a ]/(ρe g[wk i a ] + ( ρ)e g [wk i r ]). Te estimated probability can ten be used to recover te expected weigt ratio: E g [wk i a ] ( ) ( ρ φ ρ E g [wk i r ] i (z ) i, e, f, g) ρ φ ρ i (z, i, e, f, g) and tus, te optimal classifier can be used to recover te rejection-based acceptance probability a i. We get our particular estimator φ i /( φ i ) by setting ρ = /2, toug in principle we could use oter values of ρ. In practice, we sample trajectories alternating acceptance and rejection of samples z i to get a proportion ρ = /2. Ten, we continue sampling te same trajectory to produce 2k training examples for a sequence of lengt k. We adopt tis procedure for efficiency at te cost of generating related training examples. Pseudocode for te generation of examples to train te classifier is provided in te Supplement. Inevitably, tere is some cost to pay to construct, including time for feature construction, classifier training, and classifier use. However, learning only needs to be done once, wile inference may be performed many times. Also, in many callenging problems g will not produce an ESS of any appreciable size, in our case because of a mismatc wit f, and in MCMC because of mode opping difficulties. Our metod adopts an approac complementary to particle metods to elp tackle suc problems, and we sow its utility in te CTBN application. Algoritm Rejection-based SIS (ReBaSIS) Input: conditional distributions {f j} and {g j} j, evidence e; constants α, k; i =, z = {}, w = ; classifiers {φ j} Output: sample z wit weigt w : wile i k do 2: accept = false 3: wile not accept do 4: Sample z i g i, r U[0, ] ( φi (z i,z,e,f,g) φ i (z i,z,e,f,g) 5: a = α 6: if r < a ten 7: accept = true 8: end if 9: end wile 0: z = {z i, z}, w = wf i(z i)c[g i(z i), a(z i)]/(g i(z i)a) : i = i + 2: end wile 3: return (z, w) 4 Continuous-Time Bayesian Networks We extend our analysis to a continuous-time model: te continuous-time Bayesian network (CTBN) (Nodelman, Selton, and Koller 2002), wic ave applications for example in anomaly detection (Xu and Selton 200) and medicine (Weiss, Natarajan, and Page 202). We review te CTBN model and sampling metods and extend our analysis. CTBNs are a model of discrete random variables X, X 2,..., X d = X over time. Te model specifies a directed grap over X tat determines te parents of eac variable. Te parents setting u X is te joint state of te parents of X. Ten, CTBNs assume tat te probability of transition for variable X out of state x at time t is given by te exponential distribution q x u e qx ut wit rate parameter (intensity) q x u for parents setting u. Te variable X transitions from state x to x wit probability Θ xx u, were Θ xx u is an entry in a state transition matrix Θ u. A complete CTBN model is described by two components: a distribution B over te initial joint state X, typically represented by a Bayesian network, and a directed grap over nodes representing X wit corresponding conditional intensity matrices (CIMs). Te CIMs old te intensities q x u and state transition probability matrices Θ u. Te likeliood of a CTBN model given data is computed as follows. A trajectory is a sequence of intervals of fixed state. For eac interval [t 0, t ), te duration t = t t 0 passes, and a variable X transitions at t from state x to x. During te interval all oter variables X i X remain in teir current states x i. Te interval likeliood is given by: q x u e q x ut e q x i ut. (6) }{{} X transitions ) Θ xx u x i:x i X }{{} to state x } {{ } wile X i s rest Ten, te sequence likeliood is given by te product of interval likelioods: q Mx u x u e q x ut x u X X x X u U X x x Θ M xx u xx u
5 were te M x u (and M xx u) are te numbers of transitions out of state x (to state x ), and were te T x u are te amounts of time spent in x given parents settings u. 4. Sampling in CTBNs Evidence provided in data is typically incomplete, i.e., te joint state is partially or fully unobserved over time. Tus, inference is performed to probabilistically complete te unobserved regions. CTBNs are generative models and provide a sampling framework to complete suc regions. Let a trajectory z be a sequence of (state,time) pairs (z i = {x i, x 2i,..., x di }, t i ) for i = {0,..., k}, were x ji is te jt CTBN variable at te it time, suc tat te sequence of t i are in [t start, t end ). Given an initial state z 0 = {x 0, x 20,..., x d0 }, transition times are sampled for eac variable x according to q x u e q x ut were x is te active state of X. Te variable X i tat transitions in te interval is selected based on te sortest sampled transition time. Te state to wic X i transitions is sampled from Θ xix i u. Ten te transition times are resampled as necessary according to intensities q x u, noting tat tese intensities may be different because of potential canges in te parents setting u. Te trajectory terminates wen all sampled transition times exceed a specified ending time. Previous work by Fan et al. describes a framework for importance sampling, particle filtering, and particle smooting in CTBNs (Fan, Xu, and Selton 200). By sampling from truncated exponentials, tey incur importance sample downweigts on teir samples. Recall tat Equation 6 is broken into tree components. Te weigts f i /g i are decomposed likewise: () a downweigt for te variable x tat transitions, according to te ratio of te exponential to te truncated exponential: ( e q x uτ ), (2) a downweigt corresponding to a lookaead point-estimate of Θ u assuming no oter variables cange until te evidence (we leave tis unmodified in our implementation), and (3) a downweigt for eac resting variable x i given by te ratio of not sampling a transition in time t from an exponential and a truncated exponential: ( e q x i uτ i )/( e q x i u(τ i t ) ). Finally, te product of all interval downweigts provides te trajectory importance sample weigt. 4.2 Rejection-Based Importance Sampling in CTBNs Tere is an equivalence between a fixed continuous-time trajectory and te discrete sequences described above. In particular, te rejection-based importance sampling metod requires tat te number of intervals k must be fixed, wile CTBNs produce trajectories wit varying numbers of intervals. Neverteless, for any set of trajectories, we can define ɛ-widt intervals small enoug tat at most one transition occurs per interval and tat suc transitions occur at te end of te interval. Ten for any set of trajectories over te duration [t start, t end ), we set k = (t end t start )/ɛ. Using te memoryless property of exponential distributions, te density of a single, one-transition interval is equal to te density of te product of a sequence of ɛ-widt, zero-transition intervals and one ɛ- widt, one-transition interval. In practice, it is simpler to use te CTBN sampling framework so tat eac interval is of appreciable size. We denote te evidence e as a sequence of tuples of type (state, start time, duration): (e i, t i,0, τ i ) allowing for point evidence wit zero duration, τ i = 0, and [e, z] cecks to see if z agrees wit eac e i over te duration. Unlike discrete time were we can enumerate te states z i, in continuous time te calculation of w g can be timeconsuming because of te normalizing integrals c i [g i, a i ]. From Equation, we ave, omitting te conditioning: g i (z i ) i (z i ) = Ω i a i (ζ)g i (ζ)dζ. a i (z i ) wic can eiter be approximated by ( φ i ( ))/φ i ( ), or calculated piecewise. Te approximation metod assumes te learned acceptance probability produces a proper conditional probability distribution for eac situation. In our experiments, we adopt te approximation metod and compare te results to sow it does not introduce significant bias. We also compare using a restricted feature set were an exact, piecewise computation is possible. Several properties of CTBNs make our approac appealing. First, CTBNs possess te Markov property; namely, te next state is independent of previous states given te current one. Second, CTBNs are omogeneous processes, so te model rate parameters are sared across intervals. We leverage tese facts wen learning eac acceptance probability a i. Te Markov property simplifies te learned probability of acceptance φ i (z i z (i ), e, f, g) to φ i (z i z (i ), e, f, g). Homogeneity simplifies te learning process because, if z (i ) = z (j ) and t i,0 = t j,0 for j i, ten φ i (z i z (i ), e, f, g) = φ i (z j z (j ), e, f, g). Te degeneracy of tese two cases indicates tat te probability of acceptance is situation-dependent and interval indexindependent, so a single classifier can be learned in place of k classifiers. 5 Experiments We compare our learning-based rejection metod (setting α = 2) wit te Fan et al. sampler (200). We learn a logistic regression (LR) model using online, stocastic gradient descent for eac CTBN variable state. An LR data example is (y, x, w), were y is one of {accept, reject}, x is a set of features, and w is te sequence completion weigt. For our experiments we use per-variable features encoding te state (as indicator variables), te time from te current time to next evidence, te time from te proposed sample to next evidence, and te time from te proposed sample to next matcing evidence. Except in te restricted feature set, te times are mapped to intervals [0, ] by using a e t/λ transformation, wit λ = {0 2, 0, 0 0, 0, 0 2 } to capture effects at different time scales. Te restricted feature set uses linear times truncated at a maximum time of 0. Tis allows fort piecewise calculation of te normalizing functional Because of te ig variance in weigts for samples of full sequence completions, we instead coose a local weigt: te weigt of te sequence troug te next m = 0 evidence times. Tis biases te learner to ave low variance weigts witin te local window, but it does not bias te proposal.
6 Probability density Probability density g f f* Time Time f* g f* f g Figure 2: Approximate transition densities (top) of f (target), f (target witout evidence), g (surrogate), and (learned rejection surrogate) in a one-variable, binary-state CTBN wit uniform transition rates of 0. and matcing evidence at t=5. Te learned distribution closely mimics f, te target distribution wit evidence, wile g was constructed to mimic f (exactly, in tis situation). All metods recover te weigted transition densities (bottom); for 20 evidence points wit one at t=5 and 9 after t=20, recovers te target distribution more precisely tan g per 0 6 samples. We analyze te performance of te rejection-based sampler by inspection of learned transition probability densities and te effective sample size (ESS) (Kong, Liu, and Wong 994). ESS is an indicator of te quality of te samples, and a larger value is better: ESS = /( n i= (W i ) 2 ), were W i = w i / n j= wj. We test our metod in several models: one-, two- and tree- variable, strong-cycle binary-state CTBNs, and te part-binary, 8-variable drug model presented in te original CTBN paper (Nodelman, Selton, and Koller 2002). Te strong-cycle models encode an intensity pat for particular joint states by sifting bit registers and adding and filling in an empty register wit a 0 or as in te following example. For te 3- variable strong cycle, intensities involved in te pat are, and intensities are 0. for all oter transitions. We generate sequences from ground trut models and censor eac to retain only 00 point evidences wit times t i drawn randomly, uniformly over te duration [0,20). Te code is provided at ttp://cs.wisc.edu/ jcweiss/aaai5. Figure 2 (top) illustrates te ability of to mimic f, te target distribution, in a one-node binary-state CTBN wit matcing evidence at t = 5. Te Fan et al. proposal g matces te target density in te absence of evidence, Table : Geometric mean of effective sample size (ESS) over 00 sequences, eac wit 00 observations; ESS is per 0 5 samples. Te proposal was learned wit 000 sequences. Model Fan et al. (g) Rejection SIS () Strong cycle, n= Strong cycle, n= Strong cycle, n= Drug f. However, wen approacing evidence (at t = 5), te transition probability given evidence goes to 0 as te next transition must also occur before t = 5 to be a viable sequence. Only f and exibit tis beavior. Figure 2 (bottom) sows te density approximations after weigting te samples, given a trajectory wit evidence at t = 5 and 9 evidence points after t = 20. Eac metod recovers te target distribution, but does so more precisely tan g, given a fixed number of samples (one million). Te proposal acceptance rate for was measured to be 45 percent. Te proposal based on te restricted feature set for unbiased inference is provided in te Supplement. Table sows tat te learned, rejection-based proposal outperforms te oter CTBN importance sampler g across all 4 models, resulting in an ESS 2 to 0 times larger. Generally as te number of variables increases, te ESS decreases because of te increasing mismatc between f and g. Wit an average ESS of only 29 in an 8-variable model, as we increase model sizes, we expect tat g would fail to produce a meaningful sample distribution more quickly tan would. Figure 3 sows tat te weigt distribution from is narrower tan tat from g on te log scale, using te drug model and 0 evidence points. Te interpretation is tat a larger fraction of examples from contribute substantially to te total weigt, resulting in a lower variance sample distribution. For example, any sample wit weigt below e 0 as negligible relative weigt and does not substantially affect te sample distribution. Tere are many fewer suc samples generated from tan from g. Density g Log weigt Figure 3: Distribution of log weigts. For sample completions of a trajectory wit 0 evidence points in te drug model, te distribution of log weigts using is muc narrower tan te distribution of log weigts using g (bottom).
7 6 Conclusion Our work as demonstrated tat macine learning can be used to improve sequential importance sampling via a rejection sampling framework. We sowed tat te proposal and target distributions are related by an expected weigt ratio, and tat te weigt ratio can be estimated by te probabilistic output of a binary classifier learned from weigted, local importance samples. We extended te algoritm to CTBNs, were we found experimentally tat using our learning algoritm produces a sampling distribution closer to te target and generates more effective samples. Continued investigations are warranted, including te use of non-parametric learning algoritms, a procedure for expanding local approximations, and extensions to oter grapical models, were conjugacy of te acceptance probability wit te proposal distribution could lead to improved performance. 7 Acknowledgments We gratefully acknowledge te CIBM Training Program grant 5T5LM007359, te NIGMS grant R0GM09768, te NLM grant R0LM0028, and te NSF grant IIS A Appendix To derive Equation 4, we relate te target densities f i (z i z (i ) ) wit te proposal densities g i (z i z (i ) ) via te (standard) derivation of Equation 3 using Bayes teorem, te law of total probability, and substitution: f i (z i z (i ) ) = p(e z i ) p(e z (i ) ) p(z i z (i ) ) p(e z k i+, z i )p(z k i+ z i ) z k i+ = p(e z k i, z i )p(z k i z i ) p(z i z i ) z k i [ l κ {z l=e l } z]p(z k i+ z i ) z k i+ = [ l κ {z l=e l } z]p(z k i z i ) p(z i z i ) z k i = E g[[e, z] k j=i+ w j(z j z j ) z i ] E g [[e, z] k j=i w j(z j z j ) z i ] p(z i z i ) = E g[[e, z] k j=i w j(z j z j ) z i ] E g [[e, z] k j=i w j(z j z j ) z i ] g i(z i z i ) = E g[w a k i ] E g [w r k i ]g i(z i z (i ) ). References Andrieu, C.; Doucet, A.; and Holenstein, R Particle Markov cain Monte Carlo metods. Journal of te Royal Statistical Society: Series B (Statistical Metodology) 72(3): Ceng, J., and Druzdzel, M. J AIS-BN: An adaptive importance sampling algoritm for evidential reasoning in large Bayesian networks. Journal of Artificial Intelligence Researc 3(): Copin, N.; Jacob, P.; Papaspiliopoulos, O.; et al. 20. Smc2: A sequential Monte Carlo algoritm wit particle Markov cain Monte Carlo updates. JR Stat. Soc. B (202, to appear). Cornebise, J.; Moulines, E.; and Olsson, J Adaptive metods for sequential importance sampling wit application to state space models. Statistics and Computing 8(4): Doucet, A.; Godsill, S.; and Andrieu, C On sequential Monte Carlo sampling metods for Bayesian filtering. Statistics and Computing 0(3): Fan, Y.; Xu, J.; and Selton, C. R Importance sampling for continuous time Bayesian networks. Te Journal of Macine Learning Researc : Kong, A.; Liu, J. S.; and Wong, W. H Sequential imputations and Bayesian missing data problems. Journal of te American Statistical Association 89(425): Liu, J. S.; Cen, R.; and Wong, W. H Rejection control and sequential importance sampling. Journal of te American Statistical Association 93(443): Montemerlo, M.; Trun, S.; Koller, D.; and Wegbreit, B Fastslam 2.0: An improved particle filtering algoritm for simultaneous localization and mapping tat provably converges. In International Joint Conference on Artificial Intelligence, volume 8, Lawrence Erlbaum Associates LTD. Nodelman, U.; Selton, C.; and Koller, D Continuous time Bayesian networks. In Uncertainty in artificial intelligence, Morgan Kaufmann Publisers Inc. Rao, V., and Te, Y. W. 20. Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks. In Uncertainty in Artificial Intelligence. Sturlaugson, L., and Seppard, J. W Inference complexity in continuous time bayesian networks. In Uncertainty in Artificial Intelligence. Weiss, J.; Natarajan, S.; and Page, D Multiplicative forests for continuous-time processes. In Advances in Neural Information Processing Systems 25, Wolfel, M., and Faubel, F Considering uncertainty by particle filter enanced speec features in large vocabulary continuous speec recognition. In Acoustics, Speec and Signal Processing, ICASSP IEEE International Conference on, volume 4, IV 049. IEEE. Xu, J., and Selton, C. R Intrusion detection using continuous time Bayesian networks. Journal of Artificial Intelligence Researc 39(): Yuan, C., and Druzdzel, M. J An importance sampling algoritm based on evidence pre-propagation. In Uncertainty in Artificial Intelligence, Morgan Kaufmann Publisers Inc. Yuan, C., and Druzdzel, M. J. 2007a. Importance sampling for general ybrid Bayesian networks. In Artificial intelligence and statistics. Yuan, C., and Druzdzel, M. J. 2007b. Improving importance sampling by adaptive split-rejection control in Bayesian networks. In Advances in Artificial Intelligence. Springer
Regularized Regression
Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize
More informationMinimizing D(Q,P) def = Q(h)
Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact
More informationTe comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab
To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk
More informationLong Term Time Series Prediction with Multi-Input Multi-Output Local Learning
Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles
More informationProbabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm
Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain
More informationNotes on Neural Networks
Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat
More informationContinuity and Differentiability Worksheet
Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;
More informationTHE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225
THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:
More informationIntroduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =
Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.
More informationA = h w (1) Error Analysis Physics 141
Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.
More informationChapter 1. Density Estimation
Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f
More informationHomework 1 Due: Wednesday, September 28, 2016
0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q
More informationOrder of Accuracy. ũ h u Ch p, (1)
Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical
More informationTHE hidden Markov model (HMM)-based parametric
JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Modeling Spectral Envelopes Using Restricted Boltzmann Macines and Deep Belief Networks for Statistical Parametric Speec Syntesis Zen-Hua Ling,
More informationIntroduction to Derivatives
Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))
More informationContinuity and Differentiability of the Trigonometric Functions
[Te basis for te following work will be te definition of te trigonometric functions as ratios of te sides of a triangle inscribed in a circle; in particular, te sine of an angle will be defined to be te
More informationChapter 1 Functions and Graphs. Section 1.5 = = = 4. Check Point Exercises The slope of the line y = 3x+ 1 is 3.
Capter Functions and Graps Section. Ceck Point Exercises. Te slope of te line y x+ is. y y m( x x y ( x ( y ( x+ point-slope y x+ 6 y x+ slope-intercept. a. Write te equation in slope-intercept form: x+
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc
More informationSolve exponential equations in one variable using a variety of strategies. LEARN ABOUT the Math. What is the half-life of radon?
8.5 Solving Exponential Equations GOAL Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT te Mat All radioactive substances decrease in mass over time. Jamie works in
More informationEDML: A Method for Learning Parameters in Bayesian Networks
: A Metod for Learning Parameters in Bayesian Networks Artur Coi, Kaled S. Refaat and Adnan Darwice Computer Science Department University of California, Los Angeles {aycoi, krefaat, darwice}@cs.ucla.edu
More informationSECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY
(Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative
More informationSpike train entropy-rate estimation using hierarchical Dirichlet process priors
publised in: Advances in Neural Information Processing Systems 26 (23), 276 284. Spike train entropy-rate estimation using ierarcical Diriclet process priors Karin Knudson Department of Matematics kknudson@mat.utexas.edu
More informationNatural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs
Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata
More information232 Calculus and Structures
3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE
More informationBayesian ML Sequence Detection for ISI Channels
Bayesian ML Sequence Detection for ISI Cannels Jill K. Nelson Department of Electrical and Computer Engineering George Mason University Fairfax, VA 030 Email: jnelson@gmu.edu Andrew C. Singer Department
More informationlecture 26: Richardson extrapolation
43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)
More informationHaplotyping. Biostatistics 666
Haplotyping Biostatistics 666 Previously Introduction to te E-M algoritm Approac for likeliood optimization Examples related to gene counting Allele frequency estimation recessive disorder Allele frequency
More information2.11 That s So Derivative
2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point
More informationMath 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006
Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2
More informationExam 1 Review Solutions
Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),
More informationDerivatives. By: OpenStaxCollege
By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator
More informationEstimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution
International Journal of Clinical Medicine Researc 2016; 3(5): 76-80 ttp://www.aascit.org/journal/ijcmr ISSN: 2375-3838 Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution
More informationEfficient algorithms for for clone items detection
Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire
More informationHow to Find the Derivative of a Function: Calculus 1
Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te
More informationHOMEWORK HELP 2 FOR MATH 151
HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,
More informationAnalytic Functions. Differentiable Functions of a Complex Variable
Analytic Functions Differentiable Functions of a Complex Variable In tis capter, we sall generalize te ideas for polynomials power series of a complex variable we developed in te previous capter to general
More informationEFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING
Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x
More information= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c)
Paper 1: Pure Matematics 1 Mark Sceme 1(a) (i) (ii) d d y 3 1x 4x x M1 A1 d y dx 1.1b 1.1b 36x 48x A1ft 1.1b Substitutes x = into teir dx (3) 3 1 4 Sows d y 0 and states ''ence tere is a stationary point''
More informationSolution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.
December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need
More informationPreface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser
More informationThe Priestley-Chao Estimator
Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are
More informationSin, Cos and All That
Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives
More informationConsider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.
Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions
More informationGradient Descent etc.
1 Gradient Descent etc EE 13: Networked estimation and control Prof Kan) I DERIVATIVE Consider f : R R x fx) Te derivative is defined as d fx) = lim dx fx + ) fx) Te cain rule states tat if d d f gx) )
More informationFinancial Econometrics Prof. Massimo Guidolin
CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis
More informationOverdispersed Variational Autoencoders
Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,
More informationA h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation
Capter Grid Transfer Remark. Contents of tis capter. Consider a grid wit grid size and te corresponding linear system of equations A u = f. Te summary given in Section 3. leads to te idea tat tere migt
More informationLIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION
LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y
More informationChapter 5 FINITE DIFFERENCE METHOD (FDM)
MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential
More informationNUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,
NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing
More information2.8 The Derivative as a Function
.8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open
More informationHOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS
HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3
More informationMath Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim
Mat 311 - Spring 013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, 013 Question 1. [p 56, #10 (a)] 4z Use te teorem of Sec. 17 to sow tat z (z 1) = 4. We ave z 4z (z 1) = z 0 4 (1/z) (1/z
More informationTime (hours) Morphine sulfate (mg)
Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15
More information1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)
Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of
More informationIEOR 165 Lecture 10 Distribution Estimation
IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat
More informationLecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines
Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to
More informationPoisson Equation in Sobolev Spaces
Poisson Equation in Sobolev Spaces OcMountain Dayligt Time. 6, 011 Today we discuss te Poisson equation in Sobolev spaces. It s existence, uniqueness, and regularity. Weak Solution. u = f in, u = g on
More informationParameter Fitted Scheme for Singularly Perturbed Delay Differential Equations
International Journal of Applied Science and Engineering 2013. 11, 4: 361-373 Parameter Fitted Sceme for Singularly Perturbed Delay Differential Equations Awoke Andargiea* and Y. N. Reddyb a b Department
More informationNew Distribution Theory for the Estimation of Structural Break Point in Mean
New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University
More informationLIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT
LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as
More informationPrinted Name: Section #: Instructor:
Printed Name: Section #: Instructor: Please do not ask questions during tis exam. If you consider a question to be ambiguous, state your assumptions in te margin and do te best you can to provide te correct
More informationBounds on the Moments for an Ensemble of Random Decision Trees
Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: Sep. 17, 2013 / Revised: Mar. 04, 2014 / Accepted: Jun. 30, 2014
More informationHandling Missing Data on Asymmetric Distribution
International Matematical Forum, Vol. 8, 03, no. 4, 53-65 Handling Missing Data on Asymmetric Distribution Amad M. H. Al-Kazale Department of Matematics, Faculty of Science Al-albayt University, Al-Mafraq-Jordan
More informationMATH 1020 Answer Key TEST 2 VERSION B Fall Printed Name: Section #: Instructor:
Printed Name: Section #: Instructor: Please do not ask questions during tis exam. If you consider a question to be ambiguous, state your assumptions in te margin and do te best you can to provide te correct
More informationNumerical Differentiation
Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function
More informationBob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk
Bob Brown Mat 251 Calculus 1 Capter 3, Section 1 Completed 1 Te Tangent Line Problem Te idea of a tangent line first arises in geometry in te context of a circle. But before we jump into a discussion of
More informationLearning based super-resolution land cover mapping
earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc
More informationTermination Problems in Chemical Kinetics
Termination Problems in Cemical Kinetics Gianluigi Zavattaro and Luca Cardelli 2 Dip. Scienze dell Informazione, Università di Bologna, Italy 2 Microsoft Researc, Cambridge, UK Abstract. We consider nondeterministic
More informationDeep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy
Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer
More informationInvestigating Euler s Method and Differential Equations to Approximate π. Lindsay Crowl August 2, 2001
Investigating Euler s Metod and Differential Equations to Approximate π Lindsa Crowl August 2, 2001 Tis researc paper focuses on finding a more efficient and accurate wa to approximate π. Suppose tat x
More information1watt=1W=1kg m 2 /s 3
Appendix A Matematics Appendix A.1 Units To measure a pysical quantity, you need a standard. Eac pysical quantity as certain units. A unit is just a standard we use to compare, e.g. a ruler. In tis laboratory
More informationKernel Density Based Linear Regression Estimate
Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency
More informationDifferential Calculus (The basics) Prepared by Mr. C. Hull
Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit
More information1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).
. Consider te trigonometric function f(t) wose grap is sown below. Write down a possible formula for f(t). Tis function appears to be an odd, periodic function tat as been sifted upwards, so we will use
More information1 1. Rationalize the denominator and fully simplify the radical expression 3 3. Solution: = 1 = 3 3 = 2
MTH - Spring 04 Exam Review (Solutions) Exam : February 5t 6:00-7:0 Tis exam review contains questions similar to tose you sould expect to see on Exam. Te questions included in tis review, owever, are
More informationBounds on the Moments for an Ensemble of Random Decision Trees
Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: / Accepted: Abstract An ensemble of random decision trees is
More informationPre-Calculus Review Preemptive Strike
Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly
More informationAn Empirical Bayesian interpretation and generalization of NL-means
Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation
More informationLATTICE EXIT MODELS S. GILL WILLIAMSON
LATTICE EXIT MODELS S. GILL WILLIAMSON ABSTRACT. We discuss a class of problems wic we call lattice exit models. At one level, tese problems provide undergraduate level exercises in labeling te vertices
More information3.4 Algebraic Limits. Ex 1) lim. Ex 2)
Calculus Maimus.4 Algebraic Limits At tis point, you sould be very comfortable finding its bot grapically and numerically wit te elp of your graping calculator. Now it s time to practice finding its witout
More informationExcursions in Computing Science: Week v Milli-micro-nano-..math Part II
Excursions in Computing Science: Week v Milli-micro-nano-..mat Part II T. H. Merrett McGill University, Montreal, Canada June, 5 I. Prefatory Notes. Cube root of 8. Almost every calculator as a square-root
More informationDifferentiation in higher dimensions
Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends
More informationExperimental Validation of Cooperative Formation Control with Collision Avoidance for a Multi-UAV System
Proceedings of te 6t International Conference on Automation, Robotics and Applications, Feb 17-19, 2015, Queenstown, New Zealand Experimental Validation of Cooperative Formation Control wit Collision Avoidance
More informationREVIEW LAB ANSWER KEY
REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g
More informationNumerical Analysis MTH603. dy dt = = (0) , y n+1. We obtain yn. Therefore. and. Copyright Virtual University of Pakistan 1
Numerical Analysis MTH60 PREDICTOR CORRECTOR METHOD Te metods presented so far are called single-step metods, were we ave seen tat te computation of y at t n+ tat is y n+ requires te knowledge of y n only.
More information1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist
Mat 1120 Calculus Test 2. October 18, 2001 Your name Te multiple coice problems count 4 points eac. In te multiple coice section, circle te correct coice (or coices). You must sow your work on te oter
More informationArtificial Neural Network Model Based Estimation of Finite Population Total
International Journal of Science and Researc (IJSR), India Online ISSN: 2319-7064 Artificial Neural Network Model Based Estimation of Finite Population Total Robert Kasisi 1, Romanus O. Odiambo 2, Antony
More informationA MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES
A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties
More informationBlueprint End-of-Course Algebra II Test
Blueprint End-of-Course Algebra II Test for te 2001 Matematics Standards of Learning Revised July 2005 Tis revised blueprint will be effective wit te fall 2005 administration of te Standards of Learning
More information4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.
Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra
More informationCombining functions: algebraic methods
Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)
More informationEFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS
Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,
More informationChapter 2 Limits and Continuity
4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(
More informationMaterial for Difference Quotient
Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient
More informationDerivatives of Exponentials
mat 0 more on derivatives: day 0 Derivatives of Eponentials Recall tat DEFINITION... An eponential function as te form f () =a, were te base is a real number a > 0. Te domain of an eponential function
More informationTHE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5
THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION NEW MODULAR SCHEME introduced from te examinations in 009 MODULE 5 SOLUTIONS FOR SPECIMEN PAPER B THE QUESTIONS ARE CONTAINED IN A SEPARATE FILE
More informationContinuous Stochastic Processes
Continuous Stocastic Processes Te term stocastic is often applied to penomena tat vary in time, wile te word random is reserved for penomena tat vary in space. Apart from tis distinction, te modelling
More informationNear-Optimal conversion of Hardness into Pseudo-Randomness
Near-Optimal conversion of Hardness into Pseudo-Randomness Russell Impagliazzo Computer Science and Engineering UC, San Diego 9500 Gilman Drive La Jolla, CA 92093-0114 russell@cs.ucsd.edu Ronen Saltiel
More information