Revisiting Uncertainty in Graph Cut Solutions

Size: px
Start display at page:

Download "Revisiting Uncertainty in Graph Cut Solutions"

Transcription

1 Revisiting Uncertainty in Graph Cut Solutions Daniel Tarlow Dept. of Computer Science University of Toronto Ryan P. Aams School of Engineering an Applie Sciences Harvar University Abstract Graph cuts is a popular algorithm for fining the MAP assignment of many large-scale graphical moels that are common in computer vision. While graph cuts is powerful, it oes not provie information about the marginal probabilities associate with the solution it fins. To assess uncertainty, we are force to fall back on less efficient an inexact inference algorithms such as loopy belief propagation, or use less principle surrogate representations of uncertainty such as the min-marginal approach of Kohli & Torr [8]. In this work, we give new justification for using minmarginals to compute the uncertainty in conitional ranom fiels, framing the min-marginal outputs as exact marginals uner a specially-chosen generative probabilistic moel. We leverage this view to learn properly calibrate marginal probabilities as the result of straightforwar maximization of the training likelihoo, showing that the necessary subgraients can be compute efficiently using ynamic graph cut operations. We also show how this approach can be extene to compute multi-label marginal istributions, where again ynamic graph cuts enable efficient marginal inference an maximum likelihoo learning. We emonstrate empirically that after proper training uncertainties base on min-marginals provie bettercalibrate probabilities than baselines an that these istributions can be exploite in a ecision-theoretic way for improve segmentation in low-level vision. 1. Introuction Queries on ranom fiels can be broaly classifie into two types: queries for an optimum (fining a moe), an queries for a sum or integral (marginalization). In the first case, one might ask for the most likely joint configuration of the entire fiel. In the secon class, one might ask for the marginal probability of a single variable taking some assignment. At first glance, these two types of queries may appear computationally similar; inee, on a tree-structure graphical moel they take the same amount of time. However, for some moel classes there is a large iscrepancy between the computational complexities of these queries. For example, when a graphical moel is constraine to have binary variables an submoular interactions, the moe can be foun in polynomial time using the graph cuts algorithm, while marginalization is #P-complete [7]. In computer vision, this iscrepancy has contribute to a proliferation of optimization proceures centere aroun the graph cuts algorithm. Graph cuts are use both as a stan-alone proceure an as subroutine for algorithms such as alpha expansion [2], the min-marginal uncertainty of [8], the message passing of [5], an the Quaratic Pseuo Boolean Optimization algorithm [9]. Particularly given the efficient, freely available implementation of [1], graph cuts coul be consiere one of the most practical an powerful algorithms for inference in graphical moels that is available to the computer vision practitioner. Despite the successes of graph cuts, the algorithm is of limite applicability to queries of the secon broa type. When viewe as a metho for approximating the marginal probability of a variable in a graphical moel, we show in the supplementary material that the min-marginal uncertainty of [8] can be off by a factor that is exponentially large in the number of variables in the moel, an we show empirically that learning using this metho as approximate inference can lea to poorly calibrate estimates of marginal probabilities. Marginal probabilities are important in many applications, incluing interactive segmentation, active learning, an multilabel image segmentation. They are perhaps even more important in low-level vision tasks, as ranom fiel moels are often only the first component of a larger computer vision system. In this respect, it is esirable to be able to provie higher-level moules with properly calibrate probabilities, so that informe ecisions can be mae in a well-foune ecision-theoretic framework. To our knowlege, the only metho for using graph cuts to prouce probabilistic marginals is base on the work of Kohli & Torr (KT) [8]. In this paper, we hope to provie aitional insight into the practice of using graph cuts to construct probabilistic moels, by framing the metho of KT as exact marginal inference in a moel that we will elaborate on in later sections. Practically, our goal in this work is to revisit the question of how graph cuts can be use to 1

2 prouce proper uncertainty in ranom fiel moels. Perhaps surprisingly, we will leave the test-time inference proceure of KT unchange. evelop a new training proceure that irectly consiers the question of how to set parameters so that the metho of KT prouces well-calibrate testtime marginal probabilities. We will show that with this new training proceure, graph cuts can be mae to prouce very goo measures of uncertainty. We then show how this same concept enables us to generalize the binary graph cuts moel to multi-label ata. We make several contributions: We evelop theoretical unerpinnings for the inference proceure of KT, showing that there is a generative probabilistic moel for which their inference proceure prouces exact probabilistic marginals. We show how to efficiently train this new generative moel uner the maximum likelihoo objective, an evelop an algorithm for efficiently computing subgraients using ynamic graph cuts. We evelop a new moel of multilabel ata, where exact marginals an subgraients can be compute efficiently using ynamic graph cuts. We show empirically that our approach prouces better measures of uncertainty than the metho of KT an loopy belief propagation-base learning. We show our properly calibrate marginal probabilities can be use in a ecision theoretic framework to approximately optimize test performance on the intersection-over-union ( ) loss function, an we show empirically that this improves test performance. 2. Backgroun Our task is to prouce a istribution over a D- imensional space Y ={1,..., K} D in which each component takes one of K iscrete values. In particular, this istribution shoul be conitione upon a feature vector x, which takes values in X. This is known as a conitional ranom fiel (CRF) moel. Our training ata are N feature/label pairs, D ={x (n), y (n) } N n=1, y (n) Y. We will procee by constructing a moel p(y x, w) parameterize by weights w. As is typical, we will assume that the y (n) are inepenent of each other, given the x (n) an w. The classical formulation of the CRF likelihoo function in this setting is to construct an energy function E(y ; x, w) an use the Gibbs istribution: p(y w, x) = Z(x, w) = y Y 1 exp { E(y ; x, w)} (1) Z(w, x) exp { E(y ; x, w)}. (2) One natural way to formulate the problem of learning an appropriate w from the ata is to maximize the the log likelihoo of the training ata, L(w; D) = 1 N N log p(y (n) x (n), w). (3) n=1 Optimization of Eq. 3 is often ifficult ue to the fact that the graients require computing expectations which are sums over an exponentially large set of states. Various approximation schemes (e.g., [12, 3]) have been evelope to attempt to grapple with this ifficulty. Given parameters, we then nee to perform inference. Even restricte to the case of graph-structure submoular interactions over binary variables, computing exact probabilistic marginals is intractable ue to the ifficulty of computing the partition function [7]; however, MAP inference can be performe exactly in low-orer polynomial time using the graph cuts algorithm which reuces the problem to the computation of maximum flow in a network [6]. In aition to the solution to the MAP inference problem, we will also make use of quantities known as minmarginals. Whereas the value of the MAP solution is minŷ Y E(ŷ ; x, w), min-marginals are efine as the value of a constraine minimization problem where a single variable y is clampe to take on label k, then all other variables are minimize out: Φ (k) = min E(ŷ ; x, w). (4) ŷ Y,ŷ =k This constraine minimization problem can also be solve efficiently using graph cuts, an the set of all min-marginals {Φ (k)} =1:D,k=1:K can be compute in only slightly more time than is require to solve a single graph cuts problem, by using the ynamic graph cuts approach of [8]. Kohli an Torr [8] further suggest that min-marginals can be use to prouce a measure of uncertainty q by taking a softmax over the negative min-marginals: q (k) = exp{ Φ (k)} k exp{ Φ (k )}. Given these marginals, they further suggest that a CRF moel can be traine by replacing exact marginals neee for the graient with these approximate marginals. We will evaluate this learning metho in the experiments section. Finally, we will make use of assignments that we will term argmin-marginals, η (k) = arg min E(ŷ ; x, w), (5) ŷ Y,ŷ =k which simply replaces the min in min-marginals with arg min. These also can be compute efficiently using ynamic graph cuts. Limitations of Kohli-Torr. In the supplementary material, we iscuss the worst-case behavior of KT, showing 2

3 that even for pairwise graphical moels, the KT-estimate marginal can iffer from the Gibbs istribution marginal by a factor that has exponential epenence on the number of variables in the moel. 3. Our Moel In this paper, we avoi the problem of approximating CRF marginals, an in fact avoi the problem of a complicate partition function altogether. We o this by efining the following generative moel: Φ (k; x, w) = p(y =k {Φ (k ; x, w)} K k =1) = min E(ŷ ; x, w) ŷ Y,ŷ =k e Φ (k;x,w) K k =1 e Φ (k ;x,w). We are here enoting the th component of y an y by y an y, respectively. We then interpret the min-marginals as proviing a fully-factorize istribution on y given x. In contrast to Gibbs-base energy moels, this proceure is truly generative: we compute the min-marginals an this gives rise to local istributions over labels. The likelihoo for w is then given by N D K p({y (n) } N n=1 w,{x (n) } N n=1) = q δ(y(n),k) nk, (6) where q nk = n=1 =1 k=1 { } exp Φ (n) (k;x,w) k { Φ exp (n) (k ;x,w) }, an δ(, ) is the Kronecker elta function. This likelihoo makes the nature of the moel clear: we are parameterizing a large set of multinomial istributions with x an w. It simply happens that the parameters of these multinomials are the result of a set of constraine energy minima. Importantly, we can compute q s an thus compute these marginals efficiently when E(y ; x, w) is a binary submoular energy function, using the approach of Kohli & Torr. For the binary moel we use in much of this paper, we will assume that the weights w parameterize the energy via a sum of weighte unary an pairwise potentials: E(y; x, w) = f U w f ψ f (y; x) + f Pw f ψ f (y; x), (7) where U an P are the sets of unary an pairwise features, respectively. The potentials are sums over all local configurations: ψ f (y; x) = ψ f,(y; x) for f U an ψ f (y; x) = (, ) ψ f, (y; x) for f P; the local configurations have the form: { αf, (x) if y ψ f, (y; x) = = 1 (8) 0 otherwise { βf, (x) if y ψ f, (y; x) = y. (9) 0 otherwise Here, α f, (x) (or β f, (x)) are the result at location (or ege ) of running a preefine filter f on input x. 4. Maximum Likelihoo Learning As our goal is to prouce well-calibrate conitional probabilities for test ata, the natural training objective is to maximize the (possibly penalize) likelihoo. That is, given a set of observations D ={x (n), y (n) } N n=1, we wish to fin the MLE (or MAP) of the parameters w. In this section, we show that subgraients of this objective can be compute efficiently for any moel where we have efficient proceures for computing min-marginals. In reality, images may be of ifferent sizes. To remove the bias that larger images have a larger effect on the learning than smaller images, we rescale likelihoos an instea sum the average log likelihoo of each instance. Note that if all images are of the same size, optimizing this objective is equivalent to optimizing the earlier objective Eq. 6. The objective for the nth ata instance can then be written as L (n) (w) = 1 D (n) + log k=1 D (n) =1 [ Φ (y (n) ; w, x(n) ) K { exp Φ (k; w, x )} ] (n). (10) We are intereste in the partial erivative with respect to one parameter, say w f. Dropping superscripts n to reuce notational clutter, L(w) = 1 D D K =1 k=1 L(w) Φ (k; w, x). (11) Φ (k; w, x) The first term is a stanar softmax erivative: L(w) Φ (k) = δ(y, k) + exp{ Φ (k; w, x)} k exp{ Φ (k ; w, x)}. (12) To compute the secon term, first expan the efinition of Φ (k; w, x), then compute a subgraient: Φ (k; w, x) = min w f ψ f (ŷ; x) ŷ Y,ŷ =k = ψ f (η k ; x), (13) where recall η k = arg minŷ Y,ŷ =k E(ŷ ; x, θ) is the argmin-marginal for y = k. The total subgraient for one instance is then L(w) = 1 D D =1 k=1 f K ψ f (η k ; x) [q nk δ(y, k)]. (14) Using these graients we can train the moel to optimize the likelihoo of the training ata. 3

4 5. Faster Computation of Subgraients The subgraients in Eq. 14 naively take O(D 2 ) time to compute, which can be expensive for large images. In this section, we show how to significantly reuce this time by (a) leveraging the locality of changes within the ynamic graph cuts proceure use to compute min-marginals; (b) reorering the computation of min-marginals; an (c) istributing computation across many CPUs. The result of (a) is that computation of graients is only a constant factor slower than computing min-marginals; (b) spees up the computation of min-marginals an thus subgraients; an (c) allows us to easily scale to large ata sets, assuming we have access to a large cluster of machines. (a) Locality of Changes in Argmin Marginals. The maxflow algorithm of [1] caches search trees from iteration to iteration. The only noes that can change are ones that are orphane (that is, their connection to the root of the search tree is severe) after an ege capacity moification or subsequent path augmentations. This list of potentially change noes can be store uring the graph cuts proceure (this option is available in the coe of Kolmogorov [1]), an it is typically much smaller than D. So in the inner loop, we look only at potentially change noes. This moification makes the subgraient computations equivalent in computational cost to computing minmarginals, up to a constant factor, because the subgraient computation only consiers noes that are processe in the min-marginal computation. In Section 7, we compare the time taken using our metho to the time taken using only min-marginals an confirm that this hols empirically. (b) Orering Min-marginal Computations. Computing min-marginals requires solving D + 1 graph cuts problems. The cost is greatly reuce by using ynamic graph cuts, but we have foun experimentally that the orer of problems can make a large ifference in the time it takes to compute min-marginals. The strategy we use is as follows: first, compute the MAP; next, compute min-marginals for variables that take on value 0 in the MAP assignment, iterating over the variables in scanline orering; finally, compute min-marginals for variables that take on value 1 in the MAP assignment, iterating over the variables in scanline orering. The intuition for this orer is ynamic graph cuts is more efficient when the initial solution is closer to the final solution. If after clamping a variable y = 0, the neighboring variable y is also clampe to 0, solutions will ten to be more similar than if y = 1. This effect tens to increase as pairwise potentials become stronger. (c) Distribute Computation. Graients can be compute for each image in parallel, enabling istribution of the learning algorithm over multiple cores. In our implementation, we use C++ to buil a istribute learning system in which one master process communicates with the workers via RPC or MPI. The master sens the workers a current setting of weights, an each worker returns a vector of graients. The master accumulates the graients, upates weights, then sens out a new request. This process repeats until termination. This parallelization resulte in an almost linear speeup with the number of cores. 6. Tractable Multilabel Moel In the multilabel setting, MAP inference becomes NPhar in most cases [2], so we cannot compute exact minmarginals Φ (k; x, θ); thus, it appears that the moel presente above cannot be applie. Notice, however, that there is no requirement in our generative moel that the Φ (k; x, θ) values correspon to exact min-marginals. We require only that they be a eterministic function of parameters, that they be efficiently computable, an that we can compute subgraients of them with respect to moel parameters. In this section, we replace the intractable multilabel min-marginal calculations with a tractable surrogate. For multilabel moels, as is typical, we let there be a separate set of weights for each feature f an class k, efining e.g., the unary potential for pixel taking on label k as θ (k) = f wk f ψ f,(k; x). In this section, to represent a multilabel assignment for pixel, we will use K binary variables, y 1,..., y K. We then efine separate energy functions for each k {1,..., K}: E k (y; x, θ) = θ(y k k ) + θ k (y k, y k), (15) V E where θ k(0) = 0, θk (1) = θ (k), an θ k (y k, y k) are pairwise potentials with ifferent parameters per k. We can then efine separate min-marginals Φ k (y k) = minŷ =y k E k (ŷ; x, θ). These can be compute exactly using a graph cuts min-marginals computation for each k. Finally, we efine multilabel surrogate min-marginals to be Φ (k) = Φ k (1) Φk (0), then let q k = exp{ Φ (k)} ˆk be exp{ Φ (ˆk)} the multilabel probability of pixel taking label k. These surrogate min-marginals then have the properties that we esire: they are eterministic, efficiently computable, an we can (sub)ifferentiate through them. They o not correspon to min-marginals for a CRF moel, but we can think of them as coming from a some other generative process where exact maximum likelihoo learning an marginal inference are tractable via graph cuts. We have seen in the previous section how to erive Φk. To erive subgraients for the multilabel moel, we simply nee to observe that Φ (k) = Φk (1) Φk (0). Focusing on a single instance, L(w) w k f = 1 D,k L Φ (k ) ( Φ k (1) w k f ) Φk (0). (16) w k f The first term is a stanar softmax erivative, just as be- 4

5 fore. For both unary an pairwise features f, Φ k (y k ) wf k = 1 {k=k }ψ f (η k (y k ); x), (17) where η k (y k) = arg minŷ =y k E k (ŷ; x, θ). These subgraients can also be compute efficiently using the methos escribe in Section Experiments Experimentally, we apply our moels to image segmentation tasks an investigate three main questions. The first is how well our metho can optimize the maximum likelihoo objective. We compare against the learning metho suggeste by Kohli & Torr (KT), an against a logistic regression baseline. Secon, we look at the generalization capabilities of our moels both the binary an multilabel variants. Our main evaluation measure is the probability assigne to hel-out test examples, but we also look at har preictive performance, measure in terms of test accuracy an area uner the ROC curve. Thir, we investigate the suitability of the marginals for riving ecision theoretic preictions in terms of expecte loss. We use 84 unary an 4 pairwise features. The unary features are simple color-base an texture-base filters, run on patches surrouning the pixel. One pairwise feature is uniformly set to 1, while the others are base on threshole responses of the pb bounary etector [10]. We emphasize that these features inclue only low level cues. For our experiments, we use a subset of the PASCAL VOC Image Segmentation ata. We buil binary atasets by consiering only images containing a given object class (e.g., airplane), then the task is to label the given object pixels as figure an all other pixels as groun. We buil multilabel atasets by taking a subset of classes an only consiering images that have at least one of the selecte classes present. Images are scale so the minimum imension is 100 pixels. We focuse on Aeroplane, Car, Cow, an Dog classes but expect results to be representative of the case where unary information is fairly weak, ue to the simplicity of our input features. We believe to be a common an important case to consier for low-level vision systems Evaluation of Binary Moel Optimization Recall that the test-time proceures for our metho an KT are ientical. Consequently, we can compare the effectiveness of training the moel escribe in Section 3 using softmaxe negative min-marginals as approximate graients (as in [8]) versus exact subgraients (our metho). We also consier a baseline with no pairwise potentials. The likelihoo evaluations of these moels are focuse on the case where the goal at test time is to prouce a pixel-wise measure of uncertainty, as woul be appropriate in e.g., in- (a) (b) Figure 1. Comparison of training negative log likelihoos achieve by our metho (y-axis) versus (a) logistic regression, an (b) the KT metho (x-axis). There is one marker for each of 30 images, which were optimize inepenently. In all cases, we achieve better training likelihoos than the alternative methos. teractive image segmentation, multiscale segmentation, an in the ecision-theoretic preiction setting of Section 7.2. Single Image Datasets. For the first experiment, we consiere 30 ata sets, each with a single aeroplane instance. We optimize the logistic regression moel to convergence using graient ascent, then we initialize the other two methos with the result, initially setting all pairwise weights to zero. We then ran graient-base optimization using the (sub)graients compute by the two methos an recore the best objective achieve. For KT, we followe [8] an use a fixe step size that was tune by han but left fixe across experiments. For our metho, we use a ynamic step size ecay scheule, which we foun in practice to outperform various static ecay scheules: we maintain a quantity f (t) best = min t {1,...,t} f(θ t ), where f is the negative average log likelihoo objective function. We then use λ f (t) best as an estimate of the optimal value f at iteration t an perform a Polyak-like upate, setting step size ϕ t = (f(θ t ) λf (t) best )/ g 2, where g is the subgraient. We chose λ = 0.95 an left it fixe across experiments. (We also experimente with ynamic step size ecay scheules for the KT graients, but we coul not get them to outperform the fixe upate scheule.) Results are shown in Fig. 1. While KT always prouces better likelihoos than a unary-only (logistic regression) moel, its graients are not irectly optimizing this quantity (an inee, it is unclear that there is any quantity being exactly optimize with the KT approach). When the correct graients are use (our metho), we achieve much better training likelihoos. Full Datasets. Next, we focuse on the comparison to KT an experimente with larger ata sets. For each class, we constructe a training set with 48 images, an parallelize the optimizations over 17 CPUs (1 master, 16 workers). In Fig. 2, we show the best training objective achieve as a function of wall-clock time. An iteration of KT is faster than an iteration of our metho (ue to the fact that we nee to compute argmin-marginals in aition to minmarginals), but within 1000 secons, our metho overtakes KT, an then always leas to better training likelihoos. 5

6 Aero Car Cow Dog Log Lik -.35 (-.29) -.32 (-.28) -.26 (-.24) -.48 (-.65) -.39 (-.51) -.35 (-.51) -.54 (-.64) -.47 (-.52) -.38 (-.41) -.52 (-.45) -.43 (-.38) -.38 (-.34) Accuracy 87.3 (89.7) 88.1 (89.9) 88.9 (90.4) 84.1 (78.7) 86.2 (80.0) 86.1 (81.4) 79.7 (75.9) 80.9 (76.7) 82.5 (79.9) 81.8 (84.7) 84.1 (86.7) 84.0 (86.8) AUC.85 (.87).85 (.87).90 (.90).69 (.65).66 (.62).76 (.65).76 (.77).66 (.65).84 (.82).64 (.66).62 (.66).76 (.79) Image Figure 3. Results for binary moels. Format is Train (Test). True Label Figure 2. fbest versus time for learning on full training sets Evaluation of Binary Moel We then compare the results of our optimization to that of KT in terms of performance as a moel of segmentation ata. We constructe a test set with the remaining images (roughly 50 per class) not use for training. We observe that KT tene towars a set of weights that were ifferent from the weights that achieve the best performance uner the maximum likelihoo objective. To give a better representation of the behavior, we report results for the moel at two points: first, takes the weights that achieve the best training likelihoo objective. Secon, we let the moel run for longer an took a set of weights from the point that it seeme to converge to. We call this. Train an Test Performance. In the first set of evaluations, we report training an test performance accoring to three measures: average pixel likelihoo, 0-1 pixel accuracy, an area uner the ROC curve (AUC). Our approach is consistently best on the likelihoo an AUC measures, which are the ones where a goo measure of uncertainty is require, an it is competitive on pixel accuracy in all cases. In the Car ata, all methos experience some overfitting, but otherwise training performance was inicative of test performance, showing that better optimization of the maximum likelihoo objective le to moels with better test performance. Quantitative results are shown in Fig. 3, an illustrative qualitative results are shown in Fig. 4. Decision Theoretic Preictions for Score. Given properly calibrate probabilities, we can make preictions that seek to maximize expecte score on the test set. Here, we take this approach an seek to optimize the intersectionover-union ( ) score that is commonly use to evaluate image segmentations. Given true labeling y, the score P P 1{y =k y =k} K 1 P is efine as (y, y ) = K. In k=1 1 Figure 4. Estimate marginal probabilities for test examples. On many examples (e.g. left), the three methos behave similarly. When they behave ifferently (mile an right), often becomes overconfient; is often uner-confient; an our metho is more able to prouce well-calibrate probabilities. the case of a binary moel, K = 2 an classes are foregroun an backgroun. Given our preictive istribuqd QK δ(y,k) tion Q(y) = =1 k=1 qk P, the expecte score for making preiction y is e(y) = y0 Q(y 0 ) (y, y 0 ). Since oes not ecompose, even evaluating e(y) requires a sum over exponentially many joint configurations. Instea, we efine a smoothe surrogate expecte score that is tractablepto evaluate given preiction y: e (y) = PK EQ(y0 ) [ 1{y0 =k y =k} ] 1 P k=1 E K 1 0 ]. Our strategy will be to 0 [ Q(y ) {y =k y =k} initialize preiction y at the moe of Q, then to greeily {y =k y =k} 6

7 / After Change Actual Score / Before Actual Score Aero Car Cow Dog Expecte Score (a) Expecte Score 0.7 (b) Figure 6. Trajectories as the local optimizer moves from moe preiction (left-most point of line segment) to the preiction that locally maximizes the surrogate expecte score (right-most point of line segment). The surrogate expecte score is on the x-axis, an the true score (which uses the groun truth to compute) is on the y-axis. (a) on og test ata. (b) on og test ata. Figure 5. Test results for maximizing surrogate expecte score. Before correspons to preicting the moe of Q; After is the preiction from our expecte score maximization routine. After Before True Label Image hill climb in terms of e (y) until we reach a local maximum of expecte score. At each step, we iterate through classes k, proposing to flip pixel to label k, where is the pixel that has largest probability qk amongst pixels not currently labele k. When we cycle through all k but o not make a flip, we terminate. This yiels our preiction, y, which we will evaluate uner (y, y ). Quantitative results for this approach are shown in Fig. 5. Interestingly, even though gives higher scores for the initial moe preiction, our metho surpasses it in all cases after running the expecte score optimization. Because oes not prouce well-calibrate probabilities, the expecte loss optimization either provies little win or hurts preictions. In Fig. 6, we illustrate the trajectories that the expecte score optimizer takes as it performs the local ascent. Each line is for a ifferent image, an the left-most enpoint of the line correspons to the initialization of the optimizer. As the line moves right, the expecte score increases, an ieally the true score will also increase, which woul correspon to the line moving upwars. In Fig. 7, we show the change in preictions from before an after running the optimizer for three images, uner our preictive istributions. Statistical Significance. For all of the experiments in this section, we ran a bootstrap experiment, where we resample instances with replacement, an compute the mean of each evaluation measure on each resample set of instances. We repeate the resampling proceure 1000 times an compute the stanar eviations across the resample atasets. Tables with these error bars appear in the Supplementary Material. Figure 7. Expecte loss optimization results on test images. constructe a ataset of 5 classes (the four from previously plus backgroun), an chose 80 images evenly from the 4 foregroun classes. We similarly constructe a test set of a separate 80 images. To optimize, we parallelize computation across 41 CPUs (1 master, 40 slaves), an let each algorithm run for 12 hours (nearly 500 hours of CPU time). The moels prouce similar test performance LBP gave an score of 17.2, while ours prouce a score of average test After expecte score optimization, LBP performance increase to 17.3, while ours increase to However, the most striking ifference between the approaches was the spee an reliability of the inference routines. While LBP 7.3. Evaluation of Multilabel Moel Finally, we ran experiments on the multilabel moel, an compare it to learning a CRF with loopy belief propagation (LBP) for approximate inference. We use the publicly available libdai implementation of LPB [11], setting amping to.3, an using a maximum of 100 iterations. We 7

8 Time (sec) % Not Conv. Loopy BP (first iter.) 11.3 ± 1.7 0% (first iter.) 0.14 ±.02 Loopy BP (last iter.) 59.1 ± % (last iter.) 0.16 ±.02 Figure 8. Time taken for a single inference call on multilabel moels, reporte early an late in learning. As pairwise potentials get stronger, loopy BP gets slower an less reliable; the graph cuts inference is uniformly reliable an two orers of magnitue faster. was consistently more than two orers of magnitue slower, performance got even worse as learning progresse, an we later ha problems with non-convergence. Conversely, the graph cuts base inference was uniformly fast an reliable. Quantitative results are shown in Fig Discussion an Relate Work Our approach is a eviation from the stanar strategy of efining an intractable moel, then evising efficient but approximate inference routines. Instea, we are taking an efficient inference routine an treating it as a moel. Specifically, we aske, for what moel is the metho of [8] an exact marginal inference routine? The answer is the moel that we presente in Section 3. The power of this approach is that we can efficiently compute exact graients of this moel uner the maximum likelihoo objective (Section 4), so we are irectly training the graph cuts inference to prouce well calibrate marginal probabilities at test time. In Section 6, we show how to exten the ieas to multilabel problems where MAP inference is NP-har. The key iea is to buil a moel compose of tractable subcomponent moules, which are as expressive as possible while still amitting efficient exact inference. We showe experimentally that this approach gives strong empirical performance. If we look at a high level an consier works that efine moels aroun efficient computational proceures, there is some relate work. [13] efines a generative probabilistic moels that inclues a iscrete optimization proceure as the final step. [4] efines probability moels aroun a fixe number of iterations of belief propagation. None of these is a min-marginal computation, an thus the specifics are quite ifferent, but the general spirits are similar. At a broaer level, we are aressing a low level vision problem in this work. While low level vision has receive consierable attention in computer vision, there has not been a strong emphasis on proucing properly calibrate probabilistic outputs. Our approach maintains the computational efficiency of previous surrogate measures of uncertainty, but it oes so within a proper probabilistic framework. We believe this irection to be of importance going forwar when builing large probabilistic vision systems. There are also irect applications to multiscale image labeling, interactive image segmentation, an active learning. Finally, our formulation is quite general, an applies to any moel that can be assemble from components where min-marginals can be compute efficiently. Our multilabel moel is one example of how to assemble graph cuts components. A similar approach also may be attractive in other structure output omains, such as those with bipartite matching an shortest path structures, where min-marginals can be compute efficiently [5] but where exact marginal inference in the stanar CRF formulation is NP-har. References [1] Y. Boykov an V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE TPAMI, 26: , , 4 [2] Y. Boykov, O. Veksler, an R. Zabih. Fast approximate energy minimization via graph cuts. In ICCV, pages , , 4 [3] M. A. Carreira-Perpinan an G. E. Hinton. On contrastive ivergence learning. In International Conference on Artificial Intelligence an Statistics, [4] J. Domke. Parameter learning with truncate messagepassing. In IEEE Conference on Computer Vision an Pattern Recognition (CVPR), [5] J. Duchi, D. Tarlow, G. Elian, an D. Koller. Using combinatorial optimization within max-prouct belief propagation. In NIPS, , 8 [6] D. Greig, B. Porteous, an A. Seheult. Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society, 51:271279, [7] M. Jerrum an A. Sinclair. Polynomial-time approximation algorithms for the Ising moel. SIAM Journal on Computing, 22: , , 2 [8] P. Kohli an P. H. S. Torr. Measuring uncertainty in graph cut solutions. Computer Vision an Image Unerstaning, 112(1):30 38, , 2, 5, 8 [9] V. Kolmogorov an C. Rother. Minimizing nonsubmoular functions with graph cuts a review. PAMI, 29(7): , [10] D. Martin, C. Fowlkes, an J. Malik. Learning to etect natural image bounaries using brightness an texture. In NIPS, [11] J. M. Mooij. libdai: A free an open source C++ library for iscrete approximate inference in graphical moels. JMLR, 11: , Aug [12] I. Murray an Z. Ghahramani. Bayesian learning in unirecte graphical moels: Approximate MCMC algorithms. In Unc. in Art. Intel., pages , [13] G. Papanreou an A. Yuille. Perturb-an-MAP ranom fiels: Using iscrete optimization to learn an sample from energy moels. In Proceeings of the IEEE International Conference on Computer Vision,

Revisiting Uncertainty in Graph Cut Solutions

Revisiting Uncertainty in Graph Cut Solutions Revisiting Uncertainty in Graph Cut Solutions Daniel Tarlow Dept. of Computer Science University of Toronto dtarlow@cs.toronto.edu Ryan P. Adams School of Engineering and Applied Sciences Harvard University

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

Monte Carlo Methods with Reduced Error

Monte Carlo Methods with Reduced Error Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,

More information

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

arxiv: v1 [hep-lat] 19 Nov 2013

arxiv: v1 [hep-lat] 19 Nov 2013 HU-EP-13/69 SFB/CPP-13-98 DESY 13-225 Applicability of Quasi-Monte Carlo for lattice systems arxiv:1311.4726v1 [hep-lat] 19 ov 2013, a,b Tobias Hartung, c Karl Jansen, b Hernan Leovey, Anreas Griewank

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

Differentiation ( , 9.5)

Differentiation ( , 9.5) Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the

More information

Algorithms and matching lower bounds for approximately-convex optimization

Algorithms and matching lower bounds for approximately-convex optimization Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department

More information

Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Baird, faculty.uml.edu/cbaird University of Massachusetts Lowell

Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Baird, faculty.uml.edu/cbaird University of Massachusetts Lowell Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Bair, faculty.uml.eu/cbair University of Massachusetts Lowell 1. Pre-Einstein Relativity - Einstein i not invent the concept of relativity,

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

Transform Regression and the Kolmogorov Superposition Theorem

Transform Regression and the Kolmogorov Superposition Theorem Transform Regression an the Kolmogorov Superposition Theorem Ewin Penault IBM T. J. Watson Research Center Kitchawan Roa, P.O. Box 2 Yorktown Heights, NY 59 USA penault@us.ibm.com Abstract This paper presents

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Course Project for CDS 05 - Geometric Mechanics John M. Carson III California Institute of Technology June

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation Error Floors in LDPC Coes: Fast Simulation, Bouns an Harware Emulation Pamela Lee, Lara Dolecek, Zhengya Zhang, Venkat Anantharam, Borivoje Nikolic, an Martin J. Wainwright EECS Department University of

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Sparse Reconstruction of Systems of Ordinary Differential Equations

Sparse Reconstruction of Systems of Ordinary Differential Equations Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Calculus Class Notes for the Combined Calculus and Physics Course Semester I

Calculus Class Notes for the Combined Calculus and Physics Course Semester I Calculus Class Notes for the Combine Calculus an Physics Course Semester I Kelly Black December 14, 2001 Support provie by the National Science Founation - NSF-DUE-9752485 1 Section 0 2 Contents 1 Average

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

arxiv: v2 [cs.ds] 11 May 2016

arxiv: v2 [cs.ds] 11 May 2016 Optimizing Star-Convex Functions Jasper C.H. Lee Paul Valiant arxiv:5.04466v2 [cs.ds] May 206 Department of Computer Science Brown University {jasperchlee,paul_valiant}@brown.eu May 3, 206 Abstract We

More information

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

TIME-DELAY ESTIMATION USING FARROW-BASED FRACTIONAL-DELAY FIR FILTERS: FILTER APPROXIMATION VS. ESTIMATION ERRORS

TIME-DELAY ESTIMATION USING FARROW-BASED FRACTIONAL-DELAY FIR FILTERS: FILTER APPROXIMATION VS. ESTIMATION ERRORS TIME-DEAY ESTIMATION USING FARROW-BASED FRACTIONA-DEAY FIR FITERS: FITER APPROXIMATION VS. ESTIMATION ERRORS Mattias Olsson, Håkan Johansson, an Per öwenborg Div. of Electronic Systems, Dept. of Electrical

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

arxiv: v5 [cs.lg] 28 Mar 2017

arxiv: v5 [cs.lg] 28 Mar 2017 Equilibrium Propagation: Briging the Gap Between Energy-Base Moels an Backpropagation Benjamin Scellier an Yoshua Bengio * Université e Montréal, Montreal Institute for Learning Algorithms March 3, 217

More information

State observers and recursive filters in classical feedback control theory

State observers and recursive filters in classical feedback control theory State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Optimal Signal Detection for False Track Discrimination

Optimal Signal Detection for False Track Discrimination Optimal Signal Detection for False Track Discrimination Thomas Hanselmann Darko Mušicki Dept. of Electrical an Electronic Eng. Dept. of Electrical an Electronic Eng. The University of Melbourne The University

More information

TAYLOR S POLYNOMIAL APPROXIMATION FOR FUNCTIONS

TAYLOR S POLYNOMIAL APPROXIMATION FOR FUNCTIONS MISN-0-4 TAYLOR S POLYNOMIAL APPROXIMATION FOR FUNCTIONS f(x ± ) = f(x) ± f ' (x) + f '' (x) 2 ±... 1! 2! = 1.000 ± 0.100 + 0.005 ±... TAYLOR S POLYNOMIAL APPROXIMATION FOR FUNCTIONS by Peter Signell 1.

More information

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM ON THE OPTIMALITY SYSTEM FOR A D EULER FLOW PROBLEM Eugene M. Cliff Matthias Heinkenschloss y Ajit R. Shenoy z Interisciplinary Center for Applie Mathematics Virginia Tech Blacksburg, Virginia 46 Abstract

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Role of parameters in the stochastic dynamics of a stick-slip oscillator

Role of parameters in the stochastic dynamics of a stick-slip oscillator Proceeing Series of the Brazilian Society of Applie an Computational Mathematics, v. 6, n. 1, 218. Trabalho apresentao no XXXVII CNMAC, S.J. os Campos - SP, 217. Proceeing Series of the Brazilian Society

More information

Chapter 4. Electrostatics of Macroscopic Media

Chapter 4. Electrostatics of Macroscopic Media Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1

More information

A Unified Approach for Learning the Parameters of Sum-Product Networks

A Unified Approach for Learning the Parameters of Sum-Product Networks A Unifie Approach for Learning the Parameters of Sum-Prouct Networks Han Zhao Machine Learning Dept. Carnegie Mellon University han.zhao@cs.cmu.eu Pascal Poupart School of Computer Science University of

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

Qubit channels that achieve capacity with two states

Qubit channels that achieve capacity with two states Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March

More information

Optimization of a point-mass walking model using direct collocation and sequential quadratic programming

Optimization of a point-mass walking model using direct collocation and sequential quadratic programming Optimization of a point-mass walking moel using irect collocation an sequential quaratic programming Chris Dembia June 5, 5 Telescoping actuator y Stance leg Point-mass boy m (x,y) Swing leg x Leg uring

More information

Chapter 9 Method of Weighted Residuals

Chapter 9 Method of Weighted Residuals Chapter 9 Metho of Weighte Resiuals 9- Introuction Metho of Weighte Resiuals (MWR) is an approimate technique for solving bounary value problems. It utilizes a trial functions satisfying the prescribe

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Calculus in the AP Physics C Course The Derivative

Calculus in the AP Physics C Course The Derivative Limits an Derivatives Calculus in the AP Physics C Course The Derivative In physics, the ieas of the rate change of a quantity (along with the slope of a tangent line) an the area uner a curve are essential.

More information

Optimal Variable-Structure Control Tracking of Spacecraft Maneuvers

Optimal Variable-Structure Control Tracking of Spacecraft Maneuvers Optimal Variable-Structure Control racking of Spacecraft Maneuvers John L. Crassiis 1 Srinivas R. Vaali F. Lanis Markley 3 Introuction In recent years, much effort has been evote to the close-loop esign

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

State estimation for predictive maintenance using Kalman filter

State estimation for predictive maintenance using Kalman filter Reliability Engineering an System Safety 66 (1999) 29 39 www.elsevier.com/locate/ress State estimation for preictive maintenance using Kalman filter S.K. Yang, T.S. Liu* Department of Mechanical Engineering,

More information

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms On Characterizing the Delay-Performance of Wireless Scheuling Algorithms Xiaojun Lin Center for Wireless Systems an Applications School of Electrical an Computer Engineering, Purue University West Lafayette,

More information

Lyapunov Functions. V. J. Venkataramanan and Xiaojun Lin. Center for Wireless Systems and Applications. School of Electrical and Computer Engineering,

Lyapunov Functions. V. J. Venkataramanan and Xiaojun Lin. Center for Wireless Systems and Applications. School of Electrical and Computer Engineering, On the Queue-Overflow Probability of Wireless Systems : A New Approach Combining Large Deviations with Lyapunov Functions V. J. Venkataramanan an Xiaojun Lin Center for Wireless Systems an Applications

More information