Estimating Causal Direction and Confounding Of Two Discrete Variables

Size: px
Start display at page:

Download "Estimating Causal Direction and Confounding Of Two Discrete Variables"

Transcription

1 Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv: v1 [stat.ml] 4 Nov 2016 Krzysztof Chalupka Computation an Neural Systems Caltech Abstract We propose a metho to classify the causal relationship between two iscrete variables given only the joint istribution of the variables, acknowleging that the metho is subject to an inherent baseline error. We assume that the causal system is acyclicity, but we o allow for hien common causes. Our algorithm presupposes that the probability istributions P (C) of a cause C is inepenent from the probability istribution P (E C) of the cause-effect mechanism. While our classifier is traine with a Bayesian assumption of flat hyperpriors, we o not make this assumption about our test ata. This work connects to recent evelopments on the ientifiability of causal moels over continuous variables uner the assumption of inepenent mechanisms. Carefully-commente Python notebooks that reprouce all our experiments are available online at vision.caltech.eu/ kchalupk/coe.html. 1 Introuction Take two iscrete variables X an Y that are probabilistically epenent. Assume there is no feeback between the variables: it is not the case that both X causes Y an Y causes X. Further, assume that any probabilistic epenence between variables always arises ue to a causal connection between the variables (Reichenbach, 1991). The funamental causal question is then to answer two questions: 1) Does X cause Y or oes Y cause X? An 2) Do X an Y have a common cause H? Since we assume no feeback in the system, the options in 1) are mutually exclusive. Each of them, however, can occur together with a confouner. Fig. 1 enumerates the set of six possible hypotheses. Freerick Eberhart Humanities an Social Sciences Caltech Pietro Perona Electrical Engineering Caltech In this article we present a metho to istinguish between these six hypotheses on the basis only of ata from the observational joint probability P (X, Y ) that is, without resorting to experimental intervention. Within the causal graphical moels framework (Pearl, 2000; Spirtes et al., 2000), ifferentiating between any two of the causally interesting possibilities (shown in Fig. 1B- F) is in general only possible if one has the ability to intervene on the system. For example, to ifferentiate between the pure-confouning an the irect-causal case (Fig. 1B an C), one can intervene on X an observe whether that has an effect on the istribution of Y. Given only observations of X an Y an no ability to intervene on the system however, the problem is in general not ientifiable. Roughly speaking, the reason is simply that without any further assumptions about the form of the istribution, any joint P (X, Y ) can be factorize as P (X)P (Y X) an P (Y )P (X Y ), an the hien confouner H can easily be enowe with a istribution that can give the marginal H P (X, Y, H) any esire form. 2 Relate Work There are two common remeies to the funamental unientifiability of the two-variable causal system: 1) Resort to interventions or 2) Introuce aitional assumptions about the system an erive a solution that works uner these assumptions. Whereas the first solution is straightforwar, research in the secon irection is a more recent an exciting enterprise. 2.1 Aitive Noise Moels A recent boy of work attacks the problem of establishing whether x y or y x when specific assumptions with respect to the functional form of the causal relationship hol. Shimizu et al. (2006) showe for continuous variables that when the effect is a linear function of the cause, with non-gaussian noise, then the causal irection can be ientifie in the limit of infinite sample size.

2 bility results to the case when the effect is any (except for a measure-theoretically small set of) nonlinear function of the cause, an the noise is aitive even Gaussian. Zhang an Hyvärinen (2009) showe that a postnonlinear moel that is, y = f(g(x) + ɛ) where f is an invertible function an ɛ a noise term is ientifiable. The aitive noise moels framework was applie to iscrete variables by Peters et al. (2011). Most of this work has focuse on istinguishing the causal orientation in the absence of confouning (i.e. istinguishing hypotheses A, C an D in Fig. 1, although Hoyer et al. (2008) have extene the linear non-gaussian methos to the general hypothesis space of Fig, 1 an Janzing et al. (2009) showe that the aitive noise assumption can be use to etect pure confouning with some success, i.e. to istinguish hypothesis B from hypotheses C an D. The assumption of aitive noise supplies remarkable ientifiability results, an has a natural application in many cases where the variation in the ata is thought to erive from the measurement process on an otherwise eterministic functional relation. With respect to the more general space of possible probabilistic causal relations it constitutes a very substantive assumption. In particular, in the iscrete case its application is no longer so natural. 2.2 Bayesian Causal Moel Selection From a Bayesian perspective, the question of causal moel ientification is softene to the question of moel selection base on the posterior probability of a causal structure give the ata. In a classic work on Bayesian Network learning (then calle Belief Net learning), Heckerman an Chickering (Heckerman et al., 1995; Chickering, 2002a,b) evelope a Bayesian scoring criterion that allowe them to efine the posterior probability of each possible Bayesian network given a ataset. Motivate by results of Geiger an Pearl (1988) an Meek (1995) that showe that for linear Gaussian parameterizations an for multinomial parameterizations, causal moels with ientical (conitional) inepenence structure cannot in principle be istinguishe given only the observational joint istribution over the observe variables, their score ha the property that it assigne the same score to graphs with the same inepenence structure. But the results of Geiger an Pearl (1988) an Meek (1995) only make an existential claim: That for any joint istribution a parameterization of the appropriate form exists for any causal structure in the Markov equivalence class. But one nee not conclue from such an existential claim, like Heckerman an Chickering i, that there aren t reasons in the ata that coul suggest that one causal structure in a Markov equivalence class is more probable than another. Thus, the crucial ifference between the work of Heckermann an ours is that their goal was to fin the Markov Figure 1: Possible causal structures linking X an Y. Assume X, Y an H are all iscrete but H is unobserve. In principle, it is impossible to ientify the correct causal structure given only X an Y samples. In this report, we will tackle this problem using a minimalistic set of assumptions. Our final result is a classifier that ifferentiates between these six cases the confusion matrices are shown in Fig. 9. Equivalence Class of the true causal moel. This, however, reners our task impossible: all the hypotheses enumerate in Fig. 1B-F are Markov-equivalent. Our contribution is to use assumptions formally ientical to theirs (when restricte to the causally sufficient case) to assess the likelihoo of causal structures over two observe variables, bearing in min that even in the limit of infinitely many samples, the true moel cannot be etermine, the structures can only be eeme more or less likely. Our approach is most similar in spirit to the work of Sun et al. (2006). Sun puts an explicit Bayesian prior on what a likely causal system is: if X causes Y, then the conitionals p(y X = x) are less complex than the reverse conitionals P (X Y = y), where complexity is measure by the Hilbert space norm of the conitional ensity functions. This formulation is plausible an easily applicable to iscrete systems (by efining the complexity of iscrete probability tables by their entropy). 3 Assumptions Our contribution is to create an algorithm with the following properties: 1. Applicable to iscrete variables X an Y with finitely many states. 2. Decies between all the six possible graphs shown in Fig Does not make assumptions about the functional form of the iscrete parameterization (in contrast to e.g. an

3 aitive noise assumption). In a recent review, Mooij et al. (2014) compares a range of methos that ecie the causal irection between two variables, incluing the methos iscusse above. To our knowlege, none of these methos attempt to istinguish between the pure-causal (A, C, D), the confoune (B), an the causal+confoune case (E,F). We take an approach inspire by the Bayesian methos iscusse in Sec. 2. Consier the Bayesian moel in which P (X, Y ) is sample from an uninformative hyperprior with the property that the istribution of the cause is inepenent of the istribution of the effect conitione on the cause: 1. Assume that P (eff ect cause) P (cause). 2. Assume that P (effect cause = c) is sample from the uninformative hyperprior for each c. 3. Assume that P (cause) is sample from the uninformative hyperprior. Since all the istributions uner consierations are multinomial, the uninformative hyperprior is the Dirichlet istribution with parameters all equal to 1 (which we will enote as Dir(1), remembering that 1 is actually a vector whose imensionality will be clear from context). Which variable is a cause, an which the effect, or whether there is confouning, epens on which of the causal systems in Fig. 1 are sample. For example, if X Y an there is also confouning X h Y (Fig. 1D), then our assumptions set P (X) Dir(1) x P (Y X = x) Dir(1) P (H) Dir(1) h P (X H = h) Dir(1) h P (Y H = h) Dir(1) 4 An Analytical Solution: Causal Direction Consier first the problem of ientifying the causal irection. That is, assume that either X Y or Y X, an there is no confouning. The assumptions of Sec. 3 then allow us to compute, for any given joint P (X, Y ) (which we will from now on enote P XY to simplify notation), the likelihoo p(x Y P XY ) an the likelihoo p(y X P XY ). The likelihoo ratio allows us to ecie which causal irection P XY more likely represents. We first erive an visualize the likelihoo for the case of X an Y both binary variables. Next, we generalize the result to general X an Y. Finally, we analyze experimentally how sensitive such causal irection classifier is to breaking the assumption of uninformative Dirichlet hyperpriors (but keeping the inepenent mechanisms assumption). 4.1 Optimal Classifier for Binary X an Y [ ] a Consier first the binary case. Let P X = an 1 a [ ] b 1 b P Y X =. Assume P c 1 c X is sample inepenently from P Y X, an that the ensities (parameterize by a an b, c) are F a, F b, F c : (0, 1) R. This efines a ensity over (a, b, c), the three-imensional parameterization of an x y system, as F(a, b, c) = F a (a)f b (b)f c (c): (0, 1) 3 R. [ ] e Now, consier P XY = a threeimensional parameterization of the joint. If we assume f 1 ( + e + f) that P XY is sample accoring to the X Y sampling proceure, we can compute its ensity H XY : (0, 1) 3 R as a function of F using the multivariate change of variables formula. We have e = ab a(1 b) f (1 a)c an the inverse transformation is a b = c + e +e f 1 e (1) The Jacobian of the inverse transformation is (a, b, c) (, e, f) = e (+e) 2 (+e) 0 2 f (1 e) 2 ( ) its eterminant et (a,b,c) (,e,f) change of variables formula then gives us f (1 e) e, 1 = (+e) (+e). 2 F( + e, +e H XY (, e, f) =, ( + e) ( + e) 2, where a, b, c are obtaine from Eq. (1). f 1 e ) The We can repeat the same reasoning for the inverse causal irection, Y X. In this case, we obtain F( + f, +f H Y X (, e, f) =, ( + f) ( + f) 2. e 1 f ) Given P XY an the hyperpriors F, we can now test which causal irection P XY most likely correspons to. Assuming equal priors on both causal irections, we have p(x Y (, e, f)) p(y X (, e, f)) = H xy(, e, f) H yx (, e, f) ( ) F + e, +e, f 1 e = ( F + f, +f, e 1 f ) ( + f) ( + f)2 ( + e) ( + e) 2

4 p(x Y P XY ) p(y X P XY = (2) = F ( ) P X, P Y X F ( ) et J XY 1, (3) P Y, P X Y et J Y X 1 ( ) Figure 2: Log likelihoo-ratio log P (X Y (,e,f)) P (Y X (,e,f)) as a function of e, f for nine ifferent values of. Re correspons to values larger than 0 that is, X Y is more likely than the opposite causal irection in the re regions. Blue signifies the opposite. The ecision bounary is shown in black. It is a union of two orthogonal planes that cut the (, e, f) simplex into four connecte components along a skewe axis. Only the first factor in the likelihoo ratio epens on the hyperprior F. If we fix F a, F b, F c to all be Dir(1), the factor reuces to 1 an the likelihoo ratio becomes p(x Y (, e, f)) ( + f) ( + f)2 = p(y X (, e, f)) ( + e) ( + e) 2. Denote the uninformative-hyperprior likelihoo ratio function LR: P XY (, e, f) ( + f) ( + f)2 ( + e) ( + e) 2. The classifier that assigns the X Y class to P XY with LR(P XY ) > 1, an the Y X class otherwise is the optimal classifier uner our assumptions. Fig.2 shows LR across the three-imensional P XY simplex. The figure shows nine slices of this simplex for ifferent values of the coorinate. 4.2 Optimal Classifier for Arbitrary X an Y Deriving the optimal classifier for the case where X an Y are not binary is analogous to the binary erivation. The resulting likelihoo ratio is where J XY is the Jacobian of the linear transformation (P X, P Y X ) P XY an J Y X is the Jacobian of the transformation (P Y, P X Y ) P XY. The transformation, its eterminant an Jacobian are reaily computable on paper or using computer algebra systems. In our implementation, we use Theano (Theano Development Team, 2016) to perform the computation for us. Note that if X has carinality k X an Y has carinality k Y, the Jacobians have (k X k Y 1) 2 entries. Computing their eterminants has complexity O((k X k Y 1) 6 ) or, if we assume k X = k Y = k, O(k 12 ) it grows rather quickly with growing carinality. If F is flat, that is all the priors are Dir(1), we will call the causal irection classifier that follows Eq. (3) the LR classifier. That is, the LR classifier outputs X Y if the uninformative-hyperprior likelihoo ratio is larger than 1, an outputs Y X otherwise. Note that the optimal classifier is not perfect there is a baseline error that the optimal classifier has uner the assumptions it is built on. This error is E LR = p(y X P XY )I [LR(PXY )>1]+ p(x Y P XY )I [LR(PXY )<1]P XY, where the integral varies over all the possible joints P XY with uniform measure, an I [LR(PXY )<>1] is the inicator function that evaluates to 1 if its subscript conition hols, an to 0 otherwise. That is, assuming that each P XY is sample from the uninformative Dirichlet prior given that either X Y or Y X with given probability, in the limit of infinite classification trials the error rate of the LR classifier is E LR. Whereas this integral is not analytically computable (at least neither by the authors nor by available computer algebra systems), we can estimate it using Monte Carlo methos in the following sections. In Fig. 6, the leftmost entry on each curve correspons to E LR for various carinalities of X an Y. For example, for X = Y = 2, E LR.4 but alreay for X = Y = 10, E LR < Robustness: Changing the Hyperprior F What if we use the LR classifier, but our assumptions o not match reality? Namely, what if F is not Dir(1)? For example, what if F is a mixture of ten Dirichlet istribu-

5 Figure 3: Samples from Dirichlet mixtures. Each plot shows three ranom samples from a ten-component mixture of Dirichlet istributions over the 1D simplex. Each mixture component has a ifferent, ranom parameter α. For each plot we fixe a ifferent log 2 (α max ), a parameter which limits both the smallest an largest value of any of the two α coorinates that efine each mixture component. tions 1? We will raw F from mixtures with fixe log 2 (α max ). Let the k-th component of the mixture have parameter α k = (α1, k, αn k ) where N is the carinality of X or Y. Then fixe α max means that we rew each αi k uniformly at ranom from the interval 2 αmax, 2 αmax. Fig. 4 shows samples from such mixtures with growing α max. The figure shows that increasing the parameter allows the istributions to grow in complexity. Note that if α max = 0, we recover the noninformative prior case. How oes the likelihoo ratio an the causal irection ecision bounary change as we allow α max to epart from 0? For binary X an Y, Fig. 4 illustrates the change. Comparing with Fig. 2, we see that as α max grows, the likelihoo ratios become more extreme, an the ecision bounaries become more complex. Fig. 5 makes it clear that a fixe α max allows for the ecision bounary to vary significantly. That the inepenent mechanisms assumption as we frame it is not sufficient to provie ientifiability of the causal irection was clear from the outset (since each joint 1 A mixture of Dirichlet istributions with arbitrary many components can approximate any istribution over the simplex. Figure 4: Log-likelihoo ratios for the causal irection when F is a mixture of ten Dirichlet istributions with growing α max (see Fig. 3). can be factorize as P (X)P (Y X) an P (Y )P (X Y )). However, the above consierations suggest that the assumption of noninformative hyperpriors is rather strong: In fact, it is possible to show that the ecision surface can be precisely flippe with appropriate ajustment of F, making the LR classifier s error precisely 100%. Our experiments, however, suggest that using the LR classifier is a reasonable choice in a wie range of circumstances, especially as the carinality of X an Y grows. In our experiments, we checke how the error changes as we allow the α max parameter of all the hyperpriors to grow. Our experimental proceure is as follows: 1. Fix the imensionality of X an Y, an fix α max. 2. Sample 100 hyperpriors for each imensionality an α max. Sample α parameters for F within given α max bouns, where F consists of Dirichlet mixtures (with 10 components), as escribe above. 3. Sample 100 priors for each hyperprior. Sample P (cause) an P (effect cause) 100 times for each hyperpriors (that is, for each α setting). 4. Sample the causal label uniformly. If chose X Y then let P XY = P (cause)p (effect cause). If chose Y X, let P XY = transpose[p (cause)p (eff ect cause)]. 5. Classify. Use the LR classifier to classify P XY s causal irection an recor error if the causal label isagrees with the classifier. Figure 6 shows the results. As the carinality of the sys-

6 Figure 6: Results of the irection-classification experiment. We varie carinality of X, Y as well as α max of the mixture of Dirichlets F. For each setting, we sample 100 P XY istributions accoring to our causal moel an recore the classification error of the simple LR classifier. The results show that, as carinality of X an Y grows, the LR classifier s accuracy increases. Figure 5: Log-likelihoo ratios for the causal irection when F is a mixture of ten Dirichlet istributions with α max = 2 8 (see Fig. 3) each plot correspons to ifferent, ranomly sample α. tem grows, the LR classifier s ecision bounary approximates the ecision bounary for most Dirichlet mixtures. Another tren is that as α max grows, the variance of the error grows, but there is only a small growing tren in the error itself. In aition, Fig. 7 shows that the error oes not increase as we allow more mixture components, up to 128 components, while holing α max at the large value of 7. Thus, the LR classifier performs well even for extremely complex hyperpriors, at least on average. 5 A Black-box Solution: Detecting Confouning Consier now the question of whether X Y or X H Y, where H is a latent variable (a confouner). In this section we present a solution to this problem, uner assumptions from Sec. 3. Unfortunately, eriving the optimal classifier for this case is ifficult without aitional assumptions on the latent H. Instea, we propose a black-box classifier. We create a ataset of istributions from both the irect-causal an confoune case, using the uninformative Dirichlet prior on either P (X) an P (Y X) (the irect-causal case) or P (H), P (X H) an P (Y H) in the confoune case. For each confoune istribution, we chose the carinality of H, the hien confouner, uniformly at ranom between 2 an 100. Next, we traine a neural network to classify the causal structure (Python coe that reprouces Figure 7: Results of the irection-classification experiment when the number of Dirichlet mixture moel hyperprior components varies. We fixe α to vary between 2 7 an 2 7. The results show that the max-likelihoo classifier that assumes the noninformative priors is not sensitive to the number of Dirichlet mixture components that the test ata is sample from. the experiment is available at vision.caltech.eu/ kchalupk/coe.html). We then checke how well this classifier performs as we vary the carinality of the variables, an as we allow the true hyperprior to be a mixture of 10 Dirichlets, analogously to the experiment from Sec. 4. Fig. 8 shows the results. Note that the classification errors are much lower than for the eciing causal irection case. Both problems (eciing causal irection an etecting confouning) are in principle unientifiable, but it appears the latter is inherently easier. The neural net classifier seems to be little bothere by growing α max. The largest source of error, for carinality of X an Y larger than 3, seems to be neural network training rather than anything

7 Figure 8: Results of the black-box confouning etector. We varie carinality of X, Y as well as α max of the mixture of Dirichlets F. For each setting, we sample 1000 P XY istributions accoring to our causal moel an recore the classification error of a neural net classifier traine on noninformative Dirichlet hyperprior ata. The results show that, as carinality of X an Y grows, the LR classifier s accuracy increases. else. 6 A Black-Box Solution to the General Problem Finally, we present a solution to the general causal iscovery problem over the two variables X, Y : eciing between the six alternatives shown in Fig. 1. The iea is a natural extension of the black-box classifier from Sec. 5. We create a ataset containing all the six cases, sample uner the assumptions of Sec. 3. We then traine a neural network on this ataset (the neural network architecture, as well as the etails of the training proceure, are available in the accompanying Python coe). Figure 9 shows the results of applying the classifier to istributions sample from flat hyperpriors (that is, from a test set with statistics ientical to the training set), for carinalities X = Y = 2 an X = Y = 10. As expecte, the number of errors is much lower for the higher carinality. For the carinality of 2, the confusion matrix shows that the neural networks: 1. easily learn to classify inepenent vs epenent variables, 2. confuse the X Y an Y X cases, an 3. confuse the two irecte-causal plus confouning cases (Fig. 1E,F). However, all these are insignificant issues when X = 10, where the total error is 85 out of testpoints. For X = 2, the error is 25.7%. We remark again that the problem is not ientifiable that is, there is no true causal class for any point in our training or test ataset. Each istribution coul arise from any of the possible five causal Figure 9: Confusion matrices for the all-causal-classes classification task. The test set consists of istributions sample from uniform hyperpriors that is, sample from the same statistics as the training ata (equivalent to α max = 0 in previous sections). A) Results for X = Y = 2. Total number of errors=2477. B) Results for X = Y = 10, total errors=85. C) Average results for X = Y = 10, same classifier as in B) but test set sample with non-uniform hyperpriors with α max = 7 (see text). 201 errors on average. In each case, the test set contains istributions, with all the classes sample with an equal chance.

8 systems in which X an Y are not inepenent. The fact that the error nears 0 in the high-carinality case inicates that the likelihoos uner our assumptions grow very peake as the carinality grows. Thus, the optimal ecision can quite safely be calle the true ecision. In aition, Fig. 9C shows the average confusion table for a hunre trials in which our classifier was applie to istributions over X an Y with carinality 10, corresponing to all the possible six causal structures, but sample from non-uniform hyperpriors with α max = 7. The performance rop is not rastic compare to Fig. 9B. 7 Discussion We evelope a neural network that etermines the causal structure that links two iscrete variables. We allow for confouning between the two variables, but assume acyclicity. The classifier takes as input a joint probability table P XY between the two variables an outputs the most likely causal graph that correspons to this joint. The possible causal graphs span the range shown in Fig. 1 - from inepenence to confouning co-occurring with irect causation. We emphasize two limitations of the classifier: 1. Since the classifier makes a force choice between the six acyclic alternatives, it will necessarily prouce 100% error on P XY s generate from cyclic systems. 2. Our goal was not, an can not be, to achieve 100% accuracy. For example, error in Fig. 9A is about 25%. However, this is not necessarily a ba result. Our consierations in Sec. 3 an 4 show that even when all our assumptions hol, the optimal classifier has a non-zero error. The latter is a consequence of the non-ientifiability of the problem: it is not possible, in general, to ientify the causal structure between two variables by looking at the joint istribution an without intervention. Our goal was to introuce a minimal set of assumptions that, while acknowleging the nonientifiability, enable us to make useful inferences. We note that as the carinality of the variables raises, the task becomes more an more ientifiable in the sense that, for each given P XY, one out of the possible six causal graphs strongly ominates the others with respect to its likelihoo. In this situation, the most likely causal structure becomes essentially the only possible one, barring a small error, an the problem becomes practically ientifiable. All of the above applies assuming that our generative moel correspons to reality. The assumptions, iscusse in Sec. 3, boil own to two ieas: 1) The worl creates causes inepenently of causal mechanisms an 2) Causes are ranom variables whose istributions are sample from flat Dirichlet hyperpriors. Causal mechanisms are conitional istributions of effects given causes, an are also sample from flat Dirichlet hyperpriors. Whether these assumptions are realistic or not is an uneciable question. Nevertheless, through a series of simple experiments (Fig. 6, Fig. 7, Fig. 8, Fig. 9) we showe that the assumption of flat hyperpriors is not essential our classifiers average performance oes not ecrease significantly as we allow the hyperpriors to vary, although the variance of the performance grows. In future work, we will carefully analyze uner what conitions the flat-hyperprior classifier performs well even if the hyperpriors are not flat. The current working hypothesis is that as long as the hyperprior on P (cause) is the same as the hyperprior on P (effect cause), the classification performance oesn t change significantly on average, but as seen in our experiments it will have increase variance. Shohei Shimizu explaine our task (for the case of continuous variables) as: Uner what circumstances an in what way can one etermine causal structure base on ata which is not obtaine by controlle experiments but by passive observation only? (Shimizu et al., 2006). Our answer is, For high-carinality iscrete variables, it seems enough to assume inepenence of P (cause) from P (effect cause), an train a neural network that learns the black-box mapping between observations an their causal generative mechanism. References Davi Maxwell Chickering. Learning equivalence classes of bayesian-network structures. Journal of machine learning research, 2(Feb): , 2002a. Davi Maxwell Chickering. Optimal structure ientification with greey search. Journal of machine learning research, 3(Nov): , 2002b. D. Geiger an J. Pearl. On the logic of causal moels. In Proceeings of UAI, Davi Heckerman, Dan Geiger, an Davi M Chickering. Learning bayesian networks: The combination of knowlege an statistical ata. Machine learning, 20(3): , Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, an Bernhar Schölkopf. Nonlinear causal iscovery with aitive noise moels. In Avances in neural information processing systems, pages , P.O. Hoyer, S. Shimizu, A.J. Kerminen, an M. Palviainen. Estimation of causal effects using linear non-gaussian causal moels with hien variables. International Journal of Approximate Reasoning, 49: , Dominik Janzing, Jonas Peters, Joris Mooij, an Bernhar Schölkopf. Ientifying confouners using aitive noise moels. In Proceeings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages AUAI Press, 2009.

9 C. Meek. Strong completeness an faithfulness in Bayesian networks. In Eleventh Conference on Uncertainty in Artificial Intelligence, pages , Joris M Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, an Bernhar Schölkopf. Distinguishing cause from effect using observational ata: methos an benchmarks. arxiv preprint arxiv: , J. Pearl. Causality: Moels, Reasoning an Inference. Cambrige University Press, Jonas Peters, Dominik Janzing, an Bernhar Scholkopf. Causal inference on iscrete ata using aitive noise moels. IEEE Transactions on Pattern Analysis an Machine Intelligence, 33(12): , Hans Reichenbach. The irection of time, volume 65. Univ of California Press, Shohei Shimizu, Patrik O Hoyer, Aapo Hyvärinen, an Antti Kerminen. A linear non-gaussian acyclic moel for causal iscovery. Journal of Machine Learning Research, 7(Oct): , P. Spirtes, C. N. Glymour, an R. Scheines. Causation, preiction, an search. Massachusetts Institute of Technology, 2n e. eition, Xiaohai Sun, Dominik Janzing, an Bernhar Schölkopf. Causal inference by choosing graphs with most plausible markov kernels. In ISAIM, Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arxiv e-prints, abs/ , May URL arxiv.org/abs/ Kun Zhang an Aapo Hyvärinen. On the ientifiability of the post-nonlinear causal moel. In Proceeings of the twenty-fifth conference on uncertainty in artificial intelligence, pages AUAI Press, 2009.

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

arxiv: v1 [cs.lg] 3 Jan 2017

arxiv: v1 [cs.lg] 3 Jan 2017 Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, New-Delhi, India January 4, 2017 arxiv:1701.00597v1

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Chapter 9 Method of Weighted Residuals

Chapter 9 Method of Weighted Residuals Chapter 9 Metho of Weighte Resiuals 9- Introuction Metho of Weighte Resiuals (MWR) is an approimate technique for solving bounary value problems. It utilizes a trial functions satisfying the prescribe

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Similarity Measures for Categorical Data A Comparative Study. Technical Report

Similarity Measures for Categorical Data A Comparative Study. Technical Report Similarity Measures for Categorical Data A Comparative Stuy Technical Report Department of Computer Science an Engineering University of Minnesota 4-92 EECS Builing 200 Union Street SE Minneapolis, MN

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Causal Inference on Discrete Data via Estimating Distance Correlations

Causal Inference on Discrete Data via Estimating Distance Correlations ARTICLE CommunicatedbyAapoHyvärinen Causal Inference on Discrete Data via Estimating Distance Correlations Furui Liu frliu@cse.cuhk.edu.hk Laiwan Chan lwchan@cse.euhk.edu.hk Department of Computer Science

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Probabilistic latent variable models for distinguishing between cause and effect

Probabilistic latent variable models for distinguishing between cause and effect Probabilistic latent variable models for distinguishing between cause and effect Joris M. Mooij joris.mooij@tuebingen.mpg.de Oliver Stegle oliver.stegle@tuebingen.mpg.de Dominik Janzing dominik.janzing@tuebingen.mpg.de

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea Introuction A Dirichlet Form approach to MCMC Optimal Scaling Markov chain Monte Carlo (MCMC quotes: Metropolis et al. (1953, running coe on the Los Alamos MANIAC: a feasible approach to statistical mechanics

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Local Linear ICA for Mutual Information Estimation in Feature Selection

Local Linear ICA for Mutual Information Estimation in Feature Selection Local Linear ICA for Mutual Information Estimation in Feature Selection Tian Lan, Deniz Erogmus Department of Biomeical Engineering, OGI, Oregon Health & Science University, Portlan, Oregon, USA E-mail:

More information

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Course Project for CDS 05 - Geometric Mechanics John M. Carson III California Institute of Technology June

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize

More information

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation Error Floors in LDPC Coes: Fast Simulation, Bouns an Harware Emulation Pamela Lee, Lara Dolecek, Zhengya Zhang, Venkat Anantharam, Borivoje Nikolic, an Martin J. Wainwright EECS Department University of

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes Fin these erivatives of these functions: y.7 Implicit Differentiation -- A Brief Introuction -- Stuent Notes tan y sin tan = sin y e = e = Write the inverses of these functions: y tan y sin How woul we

More information

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

Optimal Signal Detection for False Track Discrimination

Optimal Signal Detection for False Track Discrimination Optimal Signal Detection for False Track Discrimination Thomas Hanselmann Darko Mušicki Dept. of Electrical an Electronic Eng. Dept. of Electrical an Electronic Eng. The University of Melbourne The University

More information

Nonlinear Dielectric Response of Periodic Composite Materials

Nonlinear Dielectric Response of Periodic Composite Materials onlinear Dielectric Response of Perioic Composite aterials A.G. KOLPAKOV 3, Bl.95, 9 th ovember str., ovosibirsk, 639 Russia the corresponing author e-mail: agk@neic.nsk.su, algk@ngs.ru A. K.TAGATSEV Ceramics

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Integration Review. May 11, 2013

Integration Review. May 11, 2013 Integration Review May 11, 2013 Goals: Review the funamental theorem of calculus. Review u-substitution. Review integration by parts. Do lots of integration eamples. 1 Funamental Theorem of Calculus In

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,

More information

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation A Novel ecouple Iterative Metho for eep-submicron MOSFET RF Circuit Simulation CHUAN-SHENG WANG an YIMING LI epartment of Mathematics, National Tsing Hua University, National Nano evice Laboratories, an

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

The new concepts of measurement error s regularities and effect characteristics

The new concepts of measurement error s regularities and effect characteristics The new concepts of measurement error s regularities an effect characteristics Ye Xiaoming[1,] Liu Haibo [3,,] Ling Mo[3] Xiao Xuebin [5] [1] School of Geoesy an Geomatics, Wuhan University, Wuhan, Hubei,

More information

An Analytical Expression of the Probability of Error for Relaying with Decode-and-forward

An Analytical Expression of the Probability of Error for Relaying with Decode-and-forward An Analytical Expression of the Probability of Error for Relaying with Decoe-an-forwar Alexanre Graell i Amat an Ingmar Lan Department of Electronics, Institut TELECOM-TELECOM Bretagne, Brest, France Email:

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

arxiv: v3 [cs.lg] 3 Dec 2017

arxiv: v3 [cs.lg] 3 Dec 2017 Context-Aware Generative Aversarial Privacy Chong Huang, Peter Kairouz, Xiao Chen, Lalitha Sankar, an Ram Rajagopal arxiv:1710.09549v3 [cs.lg] 3 Dec 2017 Abstract Preserving the utility of publishe atasets

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Arrowhead completeness from minimal conditional independencies

Arrowhead completeness from minimal conditional independencies Arrowhead completeness from minimal conditional independencies Tom Claassen, Tom Heskes Radboud University Nijmegen The Netherlands {tomc,tomh}@cs.ru.nl Abstract We present two inference rules, based on

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

A New Minimum Description Length

A New Minimum Description Length A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum

More information

ELECTRON DIFFRACTION

ELECTRON DIFFRACTION ELECTRON DIFFRACTION Electrons : wave or quanta? Measurement of wavelength an momentum of electrons. Introuction Electrons isplay both wave an particle properties. What is the relationship between the

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information