arxiv: v2 [cs.lg] 30 Mar 2017

Size: px
Start display at page:

Download "arxiv: v2 [cs.lg] 30 Mar 2017"

Transcription

1 Batch Renoralization: Towards Reducing Minibatch Dependence in Batch-Noralized Models Sergey Ioffe Google Inc., arxiv: v2 [cs.lg] 30 Mar 2017 Abstract Batch Noralization is quite effective at accelerating and iproving the training of deep odels. However, its effectiveness diinishes when the training inibatches are sall, or do not consist of independent saples. We hypothesize that this is due to the dependence of odel layer inputs on all the exaples in the inibatch, and different activations being produced between training and inference. We propose Batch Renoralization, a siple and effective extension to ensure that the training and inference odels generate the sae outputs that depend on individual exaples rather than the entire inibatch. Models trained with Batch Renoralization perfor substantially better than batchnor when training with sall or non-i.i.d. inibatches. At the sae tie, Batch Renoralization retains the benefits of batchnor such as insensitivity to initialization and training efficiency. 1 Introduction Batch Noralization ( batchnor [6]) has recently becoe a part of the standard toolkit for training deep networks. By noralizing activations, batch noralization helps stabilize the distributions of internal activations as the odel trains. Batch noralization also akes it possible to use significantly higher learning rates, and reduces the sensitivity to initialization. These effects help accelerate the training, soeties draatically so. Batchnor has been successfully used to enable state-of-the-art architectures such as residual networks [5]. Batchnor works on inibatches in stochastic gradient training, and uses the ean and variance of the inibatch to noralize the activations. Specifically, consider a particular node in the deep network, producing a scalar value for each input exaple. Given a inibatch B of exaples, consider the values of this node,x 1...x. Then batchnor takes the for: x i x i µ B B where µ B is the saple ean of x 1...x, and B 2 is the saple variance (in practice, a sall ǫ is added to it for nuerical stability). It is clear that the noralized activations corresponding to an input exaple will depend on the other exaples in the inibatch. This is undesirable during inference, and therefore the ean and variance coputed over all training data can be used instead. In practice, the odel usually aintains oving averages of inibatch eans and variances, and during inference uses those in place of the inibatch statistics. While it appears to ake sense to replace the inibatch statistics with whole-data ones during inference, this changes the activations in the network. In particular, this eans that the upper layers (whose inputs are noralized using the inibatch) are trained on representations different fro those coputed in inference (when the inputs are noralized using the population statistics). When the inibatch size is large and its eleents are i.i.d. saples fro the training distribution, this difference is sall, and can in fact aid generalization. However, inibatchwise noralization ay have significant drawbacks: For sall inibatches, the estiates of the ean and variance becoe less accurate. These inaccuracies are copounded with depth, and reduce the quality of resulting odels. Moreover, as each exaple is used to copute the variance used in its own noralization, the noralization operation is less well approxiated by an affine transfor, which is what is used in inference. Non-i.i.d. inibatches can have a detriental effect on odels with batchnor. For exaple, in a etric learning scenario (e.g. [4]), it is coon to bias the inibatch sapling to include sets of exaples that are known to be related. For instance, for a inibatch of size 32, we ay randoly select 16 labels, then choose 2 exaples for each of those labels. Without batchnor, the loss coputed for the inibatch decouples over the exaples, and the intra-batch dependence introduced by our sapling echanis ay, at worst, increase the variance of the inibatch gradient. With batchnor, however, the exaples interact at every layer, which ay cause the odel to overfit to the specific distribution of inibatches and suffer when used on individual exaples. The dependence of the batch-noralized activations on the entire inibatch akes batchnor powerful, but it is also the source of its drawbacks. Several approaches have been proposed to alleviate this. However, unlike batchnor which can be easily applied to an existing odel, these ethods ay require careful analysis of nonlinearities [1] and ay change the class of functions representable by the odel [2]. Weight noralization [10] 1

2 presents an alternative, but does not offer guarantees about the activations and gradients when the odel contains arbitrary nonlinearities, or contains layers without such noralization. Furtherore, weight noralization has been shown to benefit fro ean-only batch noralization, which, like batchnor, results in different outputs during training and inference. Another alternative [9] is to use a separate and fixed inibatch to copute the noralization paraeters, but this akes the training ore expensive, and does not guarantee that the activations outside the fixed inibatch are noralized. In this paper we propose Batch Renoralization, a new extension to batchnor. Our ethod ensures that the activations coputed in the forward pass of the training step depend only on a single exaple and are identical to the activations coputed in inference. This significantly iproves the training on non-i.i.d. or sall inibatches, copared to batchnor, without incurring extra cost. 2 Prior Work: Batch Noralization We are interested in stochastic gradient optiization of deep networks. The task is to iniize the loss, which decoposes over training exaples: Θ = argin Θ 1 N N l i (Θ) where l i is the loss incurred on the ith training exaple, and Θ is the vector of odel weights. At each training step, a inibatch of exaples is used to copute the gradient 1 i (Θ) Θ which the optiizer uses to adjustθ. Consider a particular node x in a deep network. We observe that x depends on all the odel paraeters that are used for its coputation, and when those change, the distribution of x also changes. Since x itself affects the loss through all the layers above it, this change in distribution coplicates the training of the layers above. This has been referred to as internal covariate shift. Batch Noralization [6] addresses it by considering the values of x in a inibatch B = {x 1... }. It then noralizes the as follows: µ B 1 x i B 1 (x i µ B ) 2 +ǫ x i x i µ B B y i γ x i +β BN(x i ) Here γ and β are trainable paraeters (learned using the sae procedure, such as stochastic gradient descent, as all the other odel weights), and ǫ is a sall constant. Crucially, the coputation of the saple eanµ B and saple standard deviation B are part of the odel architecture, are theselves functions of the odel paraeters, and as such participate in backpropagation. The backpropagation forulas for batchnor are easy to derive by chain rule and are given in [6]. When applying batchnor to a layer of activations x, the noralization takes place independently for each diension (or, in the convolutional case, for each channel or feature ap). Whenxis itself a result of applying a linear transfor W to the previous layer, batchnor akes the odel invariant to the scale of W (ignoring the sall ǫ). This invariance akes it possible to not be picky about weight initialization, and to use larger learning rates. Besides the reduction of internal covariate shift, an intuition for another effect of batchnor can be obtained by considering the gradients with respect to different layers. Consider the noralized layer x, whose eleents all have zero ean and unit variance. For a thought experient, let us assue that the diensions of x are independent. Further, let us approxiate the loss l( x) as its first-order Taylor expansion: l l 0 + g T x, where g = x. It then follows that Var[l] g 2 in which the left-hand side does not depend on the layer we picked. This eans that the nor of the gradient w.r.t. a noralized layer x is approxiately the sae for different noralized layers. Therefore the gradients, as they flow through the network, do not explode nor vanish, thus facilitating the training. While the assuptions of independence and linearity do not hold in practice, the gradient flow is in fact significantly iproved in batch-noralized odels. During inference, the standard practice is to noralize the activations using the oving averagesµ, 2 instead of inibatch eanµ B and variance 2 B : y inference = x µ γ +β which depends only on a single input exaple rather than requiring a whole inibatch. It is natural to ask whether we could siply use the oving averages µ, to perfor the noralization during training, since this would reove the dependence of the noralized activations on the other exaple in the inibatch. This, however, has been observed to lead to the odel blowing up. As argued in [6], such use of oving averages would cause the gradient optiization and the noralization to counteract each other. For exaple, the gradient step ay increase a bias or scale the convolutional weights, in spite of the fact that the noralization would cancel the effect of these changes on the loss. This would result in unbounded growth of odel paraeters without actually iproving the loss. It is thus crucial to use the inibatch oents, and to backpropagate through the. 2

3 3 Batch Renoralization With batchnor, the activities in the network differ between training and inference, since the noralization is done differently between the two odels. Here, we ai to rectify this, while retaining the benefits of batchnor. Let us observe that if we have a inibatch and noralize a particular node x using either the inibatch statistics or their oving averages, then the results of these two noralizations are related by an affine transfor. Specifically, let µ be an estiate of the ean of x, and be an estiate of its standard deviation, coputed perhaps as a oving average over the last several inibatches. Then, we have: x i µ = x i µ B r+d, wherer = B B, d = µ B µ If = E[ B ] andµ = E[µ B ], thene[r] = 1 ande[d] = 0 (the expectations are w.r.t. a inibatch B). Batch Noralization, in fact, siply sets r = 1,d = 0. We propose to retain r and d, but treat the as constants for the purposes of gradient coputation. In other words, we augent a network, which contains batch noralization layers, with a per-diension affine transforation applied to the noralized activations. We treat the paraeters r and d of this affine transfor as fixed, even though they were coputed fro the inibatch itself. It is iportant to note that this transfor is identity in expectation, as long as = E[ B ] andµ = E[µ B ]. We refer to batch noralization augented with this affine transfor as Batch Renoralization: the fixed (for the given inibatch) r and d correct for the fact that the inibatch statistics differ fro the population ones. This allows the above layers to observe the correct activations naely, the ones that would be generated by the inference odel. In practice, it is beneficial to train the odel for a certain nuber of iterations with batchnor alone, without the correction, then rap up the aount of allowed correction. We do this by iposing bounds onr andd, which initially constrain the to 1 and 0, respectively, and then are gradually relaxed. Algorith 1 presents Batch Renoralization. Unlike batchnor, where the oving averages are coputed during training but used only for inference, Batch Renor does use µ and during training to perfor the correction. We use a fairly high rate of update α for these averages, to ensure that they benefit fro averaging ultiple batches but do not becoe stale relative to the odel paraeters. We explicitly update the exponentially-decayed oving averages µ and, and optiize the rest of the odel using gradient optiization, with the gradients cal- Input: Values of x over a training ini-batch B = {x 1... }; paraetersγ, β; current oving ean µ and standard deviation ; oving average update rate α; axiu allowed correctionr ax,d ax. Output: {y i = BatchRenor(x i )}; updatedµ,. µ B 1 x i B 1 ǫ+ (x i µ B ) 2 ( ( B )) r stop gradient clip [1/rax,r ax] ( ( )) µb µ d stop gradient clip [ dax,d ax] x i x i µ B B r +d y i γ x i +β µ := µ+α(µ B µ) // Update oving averages := +α( B ) Inference: y γ x µ +β Algorith 1: Training (top) and inference (botto) with Batch Renoralization, applied to activation x over a ini-batch. During backpropagation, standard chain rule is used. The values arked with stop gradient are treated as constant for a given training step, and the gradient is not propagated through the. culated via backpropagation: = γ x i y i B = µ B = x i (x i µ B ) r 2 B x i r B = r + xi µ B + 1 x i B B B µ B γ = x i y i β = y i These gradient equations reveal another interpretation of Batch Renoralization. Because the loss l is unaffected when all x i are shifted or scaled by the sae aount, 3

4 the functions l({x i + t}) and l({x i (1 + t)}) are constant in t, and coputing their derivatives at t = 0 gives = 0 and x i = 0. Therefore, if we consider the -diensional vector { } (with one eleent per exaple in the inibatch), and further consider { two vectorsp 0 = (1,...,1) andp 1 = (x 1,...,x ), then } lies in the null-space ofp0 andp 1. In fact, it is easy to see fro the Batch Renor backprop forulas that to copute the gradient { } { fro } x i, we need to first scale the latter byr/ B, then project it onto the null-space of p 0 and p 1. For r = B, this is equivalent to the backprop for the transforation x µ, but cobined with the null-space projection. In other words, Batch Renoralization allows us to noralize using oving averages µ, in training, and akes it work using the extra projection step in backprop. Batch Renoralization shares any of the beneficial properties of batchnor, such as insensitivity to initialization and ability to train efficiently with large learning rates. Unlike batchnor, our ethod ensures that that all layers are trained on internal representations that will be actually used during inference. 4 Results To evaluate Batch Renoralization, we applied it to the proble of iage classification. Our baseline odel is Inception v3 [12], trained on 1000 classes fro IageNet training set [8], and evaluated on the IageNet validation data. In the baseline odel, batchnor was used after convolution and before the ReLU [7]. To apply Batch Renor, we siply swapped it into the odel in place of batchnor. Both ethods noralize each feature ap over exaples as well as over spatial locations. We fix the scale γ = 1, since it could be propagated through the ReLU and absorbed into the next layer. The training used 50 synchronized workers [3]. Each worker processed a inibatch of 32 exaples per training step. The gradients coputed for all 50 inibatches were aggregated and then used by the RMSProp optiizer [13]. As is coon practice, the inference odel used exponentially-decayed oving averages of all odel paraeters, including the µ and coputed by both batchnor and Batch Renor. For Batch Renor, we used r ax = 1, d ax = 0 (i.e. siply batchnor) for the first 5000 training steps, after which these were gradually relaxed to reach r ax = 3 at 40k steps, and d ax = 5 at 25k steps. These final values resulted in clipping a sall fraction of rs, and none of ds. However, at the beginning of training, when the learning rate was larger, it proved iportant to increase r ax slowly: otherwise, occasional large gradients were observed to suddenly and severely increase the loss. To account for the fact that the eans and variances change as the odel trains, we used relatively fast updates to the oving statistics µ and, with α = Because of Accuracy (%) Baseline (78.3%) Batch Renoralization (78.5%) 30 0k 20k 40k 60k 80k 100k 120k 140k Training steps Figure 1: Validation top-1 accuracy of Inception-v3 odel with batchnor and its Batch Renor version, trained on 50 synchronized workers, each processing inibatches of size 32. The Batch Renor odel achieves a arginally higher validation accuracy. this and keepingr ax = 1 for a relatively large nuber of steps, we did not need to apply initialization bias correction [? ]. All the hyperparaeters other than those related to noralization were fixed between the odels and across experients. 4.1 Baseline As a baseline, we trained the batchnor odel using the inibatch size of 32. More specifically, batchnor was applied to each of the 50 inibatches; each exaple was noralized using 32 exaples, but the resulting gradients were aggregated over 50 inibatches. This odel achieved the top-1 validation accuracy of 78.3% after 130k training steps. To verify that Batch Renor does not diinish perforance on such inibatches, we also trained the odel with Batch Renor, see Figure 1. The test accuracy of this odel closely tracked the baseline, achieving a slightly higher test accuracy (78.5%) after the sae nuber of steps. 4.2 Sall inibatches To investigate the effectiveness of Batch Renor when training on sall inibatches, we reduced the nuber of exaples used for noralization to 4. Each inibatch of size 32 was thus broken into icrobatches each having 4 exaples; each icrobatch was noralized independently, but the loss for each inibatch was coputed as before. In other words, the gradient was still aggregated over 1600 exaples per step, but the noralization 4

5 Accuracy (%) Accuracy (%) Batchnor (67.0%) Batchnor, train accuracy (72.8%) Non-i.i.d. in test, using µ B, B (76.5%) Batchnor (74.2%) Batchnor on half-inibatches (77.4%) Batch Renoralization (76.5%) 30 0k 20k 40k 60k 80k 100k 120k 140k 160k Training steps Batch Renoralization (78.6%) 30 0k 20k 40k 60k 80k 100k 120k 140k Training steps Figure 2: Validation accuracy for odels trained with either batchnor or Batch Renor, where noralization is perfored for sets of 4 exaples (but with the gradients aggregated over all50 32 exaples processed by the 50 workers). Batch Renor allows the odel to train faster and achieve a higher accuracy, although noralizing sets of 32 exaples perfors better. involved groups of 4 exaples rather than 32 as in the baseline. Figure 2 shows the results. The validation accuracy of the batchnor odel is significantly lower than the baseline that noralized over inibatches of size 32, and training is slow, achieving 74.2% at 210k steps. We obtain a substantial iproveent uch faster (76.5% at 130k steps) by replacing batchnor with Batch Renor, However, the resulting test accuracy is still below what we get when applying either batchnor or Batch Renor to size 32 inibatches. Although Batch Renor iproves the training with sall inibatches, it does not eliinate the benefit of having larger ones. 4.3 Non-i.i.d. inibatches When exaples in a inibatch are not sapled independently, batchnor can perfor rather poorly. However, sapling with dependencies ay be necessary for tasks such as for etric learning [4, 11]. We ay want to ensure that iages with the sae label have ore siilar representations than otherwise, and to learn this we require that a reasonable nuber of sae-label iage pairs can be found within the sae inibatch. In this experient (Figure 3), we selected each inibatch of size 32 by randoly sapling 16 labels (out of the total 1000) with replaceent, then randoly selecting 2 iages for each of those labels. When training with batchnor, the test accuracy is uch lower than for i.i.d. inibatches, achieving only 67%. Surprisingly, even the training accuracy is uch lower (72.8%) than the test accuracy in the i.i.d. case, and in fact exhibits a drop that is Figure 3: Validation accuracy when training on non-i.i.d. inibatches, obtained by sapling 2 iages for each of 16 (out of total 1000) rando labels. This distribution bias results not only in a low test accuracy, but also low accuracy on the training set, with an eventual drop. This indicates overfitting to the particular inibatch distribution, which is confired by the iproveent when the test inibatches also contain 2 iages per label, and batchnor uses inibatch statisticsµ B, B during inference. It iproves further if batchnor is applied separately to 2 halves of a training inibatch, aking each of the ore i.i.d. Finally, by using Batch Renor, we are able to just train and evaluate norally, and achieve the sae validation accuracy as we get for i.i.d. inibatches in Fig. 1. consistent with overfitting. We suspect that this is in fact what happens: the odel learns to predict labels for iages that coe in a set, where each iage has a counterpart with the sae label. This does not directly translate to classifying iages individually, thus producing a drop in the accuracy coputed on the training data. To verify this, we also evaluated the odel in the training ode, i.e. using inibatch statistics µ B, B instead of oving averagesµ,, where each test inibatch had size 50 and was obtained using the sae procedure as the training inibatches 25 labels, with 2 iages per label. As expected, this does uch better, achieving 76.5%, though still below the baseline accuracy. Of course, this evaluation scenario is usually infeasible, as we want the iage representation to be a deterinistic function of that iage alone. We can iprove the accuracy for this proble by splitting each inibatch into two halves of size 16 each, so that for every pair of iages belonging to the sae class, one iage is assigned to the first half-inibatch, and the other to the second. Each half is then ore i.i.d., and this achieves a uch better test accuracy (77.4% at 140k steps), but still below the baseline. This ethod is only applicable when the nuber of exaples per label is sall (since this deterines the nuber of icrobatches that a 5

6 inibatch needs to be split into). With Batch Renor, we siply trained the odel with inibatch size of 32. The odel achieved the sae test accuracy (78.5% at 120k steps) as the equivalent odel on i.i.d. inibatches, vs. 67% obtained with batchnor. By replacing batchnor with Batch Renor, we ensured that the inference odel can effectively classify individual iages. This has copletely eliinated the effect of overfitting the odel to iage sets with a biased label distribution. 5 Conclusions We have deonstrated that Batch Noralization, while effective, is not well suited to sall or non-i.i.d. training inibatches. We hypothesized that these drawbacks are due to the fact that the activations in the odel, which are in turn used by other layers as inputs, are coputed differently during training than during inference. We address this with Batch Renoralization, which replaces batchnor and ensures that the outputs coputed by the odel are dependent only on the individual exaples and not the entire inibatch, during both training and inference. Batch Renoralization extends batchnor with a perdiension correction to ensure that the activations atch between the training and inference networks. This correction is identity in expectation; its paraeters are coputed fro the inibatch but are treated as constant by the optiizer. Unlike batchnor, where the eans and variances used during inference do not need to be coputed until the training has copleted, Batch Renoralization benefits fro having these statistics directly participate in the training. Batch Renoralization is as easy to ipleent as batchnor itself, runs at the sae speed during both training and inference, and significantly iproves training on sall or non-i.i.d. inibatches. Our ethod does have extra hyperparaeters: the update rate α for the oving averages, and the schedules for correction liits d ax, r ax. A ore extensive investigation of the effect of these is a part of future work. Batch Renoralization offers a proise of iproving the perforance of any odel that would norally use batchnor. This includes Residual Networks [5]. Another application is Generative Adversarial Networks [9], where the non-deterinis introduced by batchnor has been found to be an issue, and Batch Renor ay provide a solution. Finally, Batch Renoralization ay benefit applications where applying batch noralization has been difficult such as recurrent networks. There, batchnor would require each tiestep to be noralized independently, but Batch Renoralization ay ake it possible to use the sae running averages to noralize all tiesteps, and then update those averages using all tiesteps. This reains one of the areas that warrants further exploration. References [1] D. Arpit, Y. Zhou, B. U. Kota, and V. Govindaraju. Noralization propagation: A paraetric technique for reoving internal covariate shift in deep networks. arxiv preprint arxiv: , [2] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer noralization. arxiv preprint arxiv: , [3] J. Chen, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting distributed synchronous sgd. arxiv preprint arxiv: , [4] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood coponents analysis. In Advances in Neural Inforation Processing Systes 17, [5] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for iage recognition. In Proceedings of the IEEE Conference on Coputer Vision and Pattern Recognition, pages , [6] S. Ioffe and C. Szegedy. Batch noralization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages , [7] V. Nair and G. E. Hinton. Rectified linear units iprove restricted boltzann achines. In ICML, pages Onipress, [8] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. IageNet Large Scale Visual Recognition Challenge, [9] T. Salians, I. Goodfellow, W. Zareba, V. Cheung, A. Radford, and X. Chen. Iproved techniques for training gans. In Advances in Neural Inforation Processing Systes, pages , [10] T. Salians and D. P. Kinga. Weight noralization: A siple reparaeterization to accelerate training of deep neural networks. In Advances in Neural Inforation Processing Systes, pages , [11] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified ebedding for face recognition and clustering. CoRR, abs/ , [12] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for coputer vision. In Proceedings of the IEEE Conference on Coputer Vision and Pattern Recognition, pages , [13] T. Tielean and G. Hinton. Lecture rsprop. COURSERA: Neural Networks for Machine Learning,

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Training an RBM: Contrastive Divergence. Sargur N. Srihari Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Variational Adaptive-Newton Method

Variational Adaptive-Newton Method Variational Adaptive-Newton Method Mohaad Etiyaz Khan Wu Lin Voot Tangkaratt Zuozhu Liu Didrik Nielsen AIP, RIKEN, Tokyo Abstract We present a black-box learning ethod called the Variational Adaptive-Newton

More information

Synthetic Generation of Local Minima and Saddle Points for Neural Networks

Synthetic Generation of Local Minima and Saddle Points for Neural Networks uigi Malagò 1 Diitri Marinelli 1 Abstract In this work-in-progress paper, we study the landscape of the epirical loss function of a feed-forward neural network, fro the perspective of the existence and

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

arxiv: v3 [cs.cv] 10 Apr 2017

arxiv: v3 [cs.cv] 10 Apr 2017 All You Need is Beyond a Good Init: Exploring Better Solution for Training Extreely Deep Convolutional Neural Networks with Orthonorality and Modulation arxiv:73.87v3 [cs.cv] Apr 7 Di Xie xiedi@hikvision.co

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact Physically Based Modeling CS 15-863 Notes Spring 1997 Particle Collision and Contact 1 Collisions with Springs Suppose we wanted to ipleent a particle siulator with a floor : a solid horizontal plane which

More information

ma x = -bv x + F rod.

ma x = -bv x + F rod. Notes on Dynaical Systes Dynaics is the study of change. The priary ingredients of a dynaical syste are its state and its rule of change (also soeties called the dynaic). Dynaical systes can be continuous

More information

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China 6th International Conference on Machinery, Materials, Environent, Biotechnology and Coputer (MMEBC 06) Solving Multi-Sensor Multi-Target Assignent Proble Based on Copositive Cobat Efficiency and QPSO Algorith

More information

lecture 37: Linear Multistep Methods: Absolute Stability, Part I lecture 38: Linear Multistep Methods: Absolute Stability, Part II

lecture 37: Linear Multistep Methods: Absolute Stability, Part I lecture 38: Linear Multistep Methods: Absolute Stability, Part II lecture 37: Linear Multistep Methods: Absolute Stability, Part I lecture 3: Linear Multistep Methods: Absolute Stability, Part II 5.7 Linear ultistep ethods: absolute stability At this point, it ay well

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Data-Driven Imaging in Anisotropic Media

Data-Driven Imaging in Anisotropic Media 18 th World Conference on Non destructive Testing, 16- April 1, Durban, South Africa Data-Driven Iaging in Anisotropic Media Arno VOLKER 1 and Alan HUNTER 1 TNO Stieltjesweg 1, 6 AD, Delft, The Netherlands

More information

Principal Components Analysis

Principal Components Analysis Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Some Perspective. Forces and Newton s Laws

Some Perspective. Forces and Newton s Laws Soe Perspective The language of Kineatics provides us with an efficient ethod for describing the otion of aterial objects, and we ll continue to ake refineents to it as we introduce additional types of

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

PHY 171. Lecture 14. (February 16, 2012)

PHY 171. Lecture 14. (February 16, 2012) PHY 171 Lecture 14 (February 16, 212) In the last lecture, we looked at a quantitative connection between acroscopic and icroscopic quantities by deriving an expression for pressure based on the assuptions

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Normalization Techniques

Normalization Techniques Normalization Techniques Devansh Arpit Normalization Techniques 1 / 39 Table of Contents 1 Introduction 2 Motivation 3 Batch Normalization 4 Normalization Propagation 5 Weight Normalization 6 Layer Normalization

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction TWO-STGE SMPLE DESIGN WITH SMLL CLUSTERS Robert G. Clark and David G. Steel School of Matheatics and pplied Statistics, University of Wollongong, NSW 5 ustralia. (robert.clark@abs.gov.au) Key Words: saple

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004 Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 533 Winter 2004 Christian Jacob Dept.of Coputer Science,University of Calgary 2 05-2-Backprop-print.nb Adaptive "Prograing"

More information

Using a De-Convolution Window for Operating Modal Analysis

Using a De-Convolution Window for Operating Modal Analysis Using a De-Convolution Window for Operating Modal Analysis Brian Schwarz Vibrant Technology, Inc. Scotts Valley, CA Mark Richardson Vibrant Technology, Inc. Scotts Valley, CA Abstract Operating Modal Analysis

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES ICONIC 2007 St. Louis, O, USA June 27-29, 2007 HIGH RESOLUTION NEAR-FIELD ULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR ACHINES A. Randazzo,. A. Abou-Khousa 2,.Pastorino, and R. Zoughi

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Fast Structural Similarity Search of Noncoding RNAs Based on Matched Filtering of Stem Patterns

Fast Structural Similarity Search of Noncoding RNAs Based on Matched Filtering of Stem Patterns Fast Structural Siilarity Search of Noncoding RNs Based on Matched Filtering of Ste Patterns Byung-Jun Yoon Dept. of Electrical Engineering alifornia Institute of Technology Pasadena, 91125, S Eail: bjyoon@caltech.edu

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER

ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER IEPC 003-0034 ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER A. Bober, M. Guelan Asher Space Research Institute, Technion-Israel Institute of Technology, 3000 Haifa, Israel

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Statistical Logic Cell Delay Analysis Using a Current-based Model

Statistical Logic Cell Delay Analysis Using a Current-based Model Statistical Logic Cell Delay Analysis Using a Current-based Model Hanif Fatei Shahin Nazarian Massoud Pedra Dept. of EE-Systes, University of Southern California, Los Angeles, CA 90089 {fatei, shahin,

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Bayesian Approach for Fatigue Life Prediction from Field Inspection

Bayesian Approach for Fatigue Life Prediction from Field Inspection Bayesian Approach for Fatigue Life Prediction fro Field Inspection Dawn An and Jooho Choi School of Aerospace & Mechanical Engineering, Korea Aerospace University, Goyang, Seoul, Korea Srira Pattabhiraan

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

Multiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre

Multiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre Multiscale Entropy Analysis: A New Method to Detect Deterinis in a Tie Series A. Sarkar and P. Barat Variable Energy Cyclotron Centre /AF Bidhan Nagar, Kolkata 700064, India PACS nubers: 05.45.Tp, 89.75.-k,

More information

2.003 Engineering Dynamics Problem Set 2 Solutions

2.003 Engineering Dynamics Problem Set 2 Solutions .003 Engineering Dynaics Proble Set Solutions This proble set is priarily eant to give the student practice in describing otion. This is the subject of kineatics. It is strongly recoended that you study

More information

General Properties of Radiation Detectors Supplements

General Properties of Radiation Detectors Supplements Phys. 649: Nuclear Techniques Physics Departent Yarouk University Chapter 4: General Properties of Radiation Detectors Suppleents Dr. Nidal M. Ershaidat Overview Phys. 649: Nuclear Techniques Physics Departent

More information