Domain-Adversarial Neural Networks

Size: px
Start display at page:

Download "Domain-Adversarial Neural Networks"

Transcription

1 Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada 3 Départeent d inforatique, Université de Sherbrooke, Québec, Canada hana.ajakan.@ulaval.ca, 2 firstnae.lastnae@ift.ulaval.ca, 3 hugo.larochelle@usherbrooke.ca Abstract We introduce a new neural network learning algorith suited to the context of doain adaptation, in which data at training and test tie coe fro siilar but different distributions. Our algorith is inspired by theory on doain adaptation suggesting that, for effective doain transfer to be achieved, predictions ust be ade based on a data representation that cannot discriinate between the training (source) and test (target) doains. We propose a training objective that ipleents this idea in the context of a neural network, whose hidden layer is trained to be predictive of the classification target, but uninforative as to the doain of the input. Our experients on a sentient analysis classification benchark, where the target data available at the training tie is unlabeled, show that our neural network for doain adaption algorith has better perforance than either a standard neural networks and a SVM, trained on input features extracted with the state-ofthe-art arginalized stacked denoising autoencoders of Chen et al. (202). Introduction The cost of generating labeled data for a new achine learning task is often an obstacle for applying achine learning ethods. There is thus great incentive to developing ways of exploiting data fro one proble to generalize to another. Doain adaptation focuses on the situation where we have data generated fro two different, but soehow siilar distributions. One exaple is in the context of sentient analysis in written reviews, where we ight want to distinguish between positive fro negative ones. While we ight have labeled data for reviews of one type of product (e.g., ovies), we ight want to be able to generalize to reviews of other products (e.g., books). Doain adaptation achieves such transfer by exploiting an extra set of unlabeled training data, for the new proble to which we wish to generalize (e.g., unlabeled reviews of books). One of the ain approach to achieve such transfer is to learn not just a classifier, but also the representation of the data, in such a way as to favour transfer. A large body of work exists on jointly training a classifier and a representation that are both linear, 2, 3]. However, recent research has shown that non-linear neural networks can also be successful 4]. Specifically, a variant of the denoising autoencoder 5], known as arginalized stacked denoising autoencoders (SDA) 6], have deonstrated state-of-the-art perforance on this proble. By learning a representation that is robust to input corruption noise, they have been shown to learn a representation that is also ore stable across changes of doain and can thus allow cross-doain transfer. In this paper, we propose to encourage stability of representation between doains explicitly into the learning algorith of a neural network. This approach is otivated by theory on doain adaptation 7, 8], that suggests that a good representation for cross-doain transfer is one that is indiscriinate of the doain of origin of the input observation. We show that this principle can be ipleented into a neural network learning objective that includes a ter where the network s hidden layer is working adversarially towards output connections predicting doain ebership. The

2 neural network is then siply trained by gradient-descent learning on this objective. The success of this doain-adversarial neural network (DANN) is confired by extensive experients on a sentient analysis classification benchark, showing that better perforances are achieved copared to a regular neural network and an SVM. Moreover, by training these odels on top of the representation of SDA, our experients also confir that iniizing doain discriinability explicitly is better than only relying on representations that are robust to noise. 2 Doain Adaptation We consider binary classification tasks where X R n is the input space and Y={0, } is the label set. We have two different distributions over X Y called the source doain D S and the target doain D T. A doain adaptation learning algorith is then provided with a labeled source saple S = {(x s i, ys i )} drawn i.i.d. fro D S, and an unlabeled target saple T = {x t i } drawn i.i.d. fro D T. The goal of the learning algorith is to build a classifier η : X Y with a low target risk R DT (η) = def Pr (xt,y t ) D T η(x t ) y t ], while having no inforation about the labels on D T. To tackle this challenging task, any doain adaptation approaches bound the target error by the su of the source error and a notion of distance between the source and the target. These ethods are intuitively justified by a siple assuption: the source risk is expected to be a good indicator of the target risk when both distributions are siilar. Several notions of distance have been proposed for doain adaptation 2, 7, 8, 9, 0]. In this paper, we focus on the H-divergence used by Ben-David et al. 7, 8] (and based on the earlier work of Kifer et al. ]), defined below. Definition (7, 8, ]). Given two doain distributions D S and D T over X, and a hypothesis class H, the H-divergence between D S and D T is d H (D S, D T ) def = 2 sup Pr η(x s ) = ] Pr η(x t ) = ] η H x s D S x t D T. That is, the H-divergence relies on the capacity of the hypothesis class H to distinguish between exaples generated by D S fro exaples generated by D T. Ben David et al. 7, 8] proved that, for a syetric hypothesis class H, one can copute the epirical H-divergence between two saples S (D S ) and T (D T ) by coputing ˆd H (S, T ) def = 2 ( in η H Iη(x s i ) = ] + ]) Iη(x t i) = 0], () where Ia] is the indicator function which is if predicate a is true, and 0 otherwise. Ben David et al. 7, 8] suggested that, even if it is generally hard to copute ˆd H (S, T ) exactly (e.g., when H is the space of linear classifiers on X ), we can easily approxiate it by running a learning algorith on the proble of discriinating between source and target exaples. To do so, we construct a new dataset U = {(x s i, )} {(xt i, 0)} where the exaples of the source saple are labeled and the exaples of the target saple are labeled 0. Then, the risk of the classifier trained on new dataset U approxiates the in part of Eq. (). Ben David et al. 7, 8] also showed that d H (D S, D T ) is upper bounded by its epirical estiate ˆd H (S, T ) plus a constant coplexity ter that depends of the VC-diension of H and the size of saples S and T. By cobining this result with a siilar bound on the source risk, the following theore is obtained. Theore 2 (Ben David et al., ]). Let H be a hypothesis class of VC-diension d. With probability δ over the choice of saples S (D S ) and T (D T ), for every η H: R DT (η) R S (η) + 4 d log 2e d + log 4 δ + ˆd H (S, T ) + 4 d 2 log 2 d + log 4 δ + β, with β inf R η D S (η )+R DT (η )], and R S (η)= H Iη(x s i ) yi s ] is the epirical source risk. For siplicity, we consider through this paper that the source saple S and the target saple T are of equal size. It is easy to generalize the results for the case S T. 2

3 The previous result tells us that R DT (η) can be low only when the β ter is low (i.e., only when there exists a classifier that can achieve a low risk on both distributions). It also tells us that, to find a classifier with a sall R DT (η) in a given class of fixed VC diension, the learning algorith should iniize (in that class) a trade-off between the source risk R S (η) and the H-divergence ˆd H (S, T ). As pointed-out by Ben David et al. 7], a strategy to control the H-divergence is to find a representation of the exaples where both the source and the target doain are as indistinguishable as possible. Under such a representation, a hypothesis with a low source risk will, according to Theore 2, perfor well on the target data. We now present a learning algorith based on this idea. 3 A Doain-Adversarial Neural Network (DANN) The originality of our approach is to explicitly ipleent the idea exhibited by Theore 2 into a neural network classifier (note that the HMM representation learning ethod for doain adaptation of Huang and Yates (202) 2] is also inspired by the H-divergence of Ben-David et al.). That is, to learn a odel that can generalize well fro one doain to another, we ensure that the internal representation of the neural network contains no discriinative inforation about the origin of the input (source or target), while preserving a low risk on the source (labeled) exaples. Let us consider the following standard neural network architecture with one hidden layer: h(x) = sig(b + Wx), and f(x) = softax(c + Vh(x)), (2) ] ] with sig(a) def a a =, and softax(a) def = +exp( a i) in W,V,b,c exp(a i) a j= exp(aj) Given a training source saple S = {(x s i, ys i )} (D S), the natural classification loss to use is the negative log-probability of the correct label. This leads to the following learning proble: ] log(f yi s(xs i )), (3). where f y (x) denotes the conditional probability that the neural network assigns x to class y. Given a W and b obtained by solving Eq. (3), we view the output of the hidden layer h( ) (Eq. (2)) as the internal representation of the neural network. We denote the source saple representations h(s) = {h(x s i )}. Now, consider an unlabeled saple T = {xt i } (D T) fro the target doain, and the corresponding representations h(t ) = {h(x t i )}. Based on Eq. (), the epirical H-divergence of a syetric hypothesis class H between saples h(s) and h(t ) is given by ˆd H (h(s), h(t )) = 2 ( in η H I η(h(x s i )) = ] + I η(h(x t i)) = 0 ]]). (4) Let us consider H as the class of hyperplanes in the representation space. We suggest estiating the in part of Eq. (4) by a logistic regressor that odel the probability that a given input (either x s or x t ) is fro the source doain D S (denoted z =) or the target doain D T (denoted z =0): p(z = φ) = o(φ) def = sig(d + w φ), where φ is either an output fro h(x s ) or fro h(x t ). Then, this enables us to add a doain adaptation ter to the objective of Eq. (3), giving the following proble to solve: in W,V,b,c log ( f y s i (x s i ) ) +λ ax w,d ( log ( o(h(x s i )) ) + log ( o(h(x t i)) ))], (5) where the paraeter λ > 0 weights the doain adaptation regularization ter. This optiization proble is otivated by Theore 2, as it ipleents a trade-off between the iniization of the source risk R S ( ) and the divergence ˆd H (, ). We introduced the paraeter λ to tune the trade-off between these two quantities during the learning process. We see that Eq. (5) involves a axiization operation. Hence, the neural network (paraetrized by {W, V, b, c}) and the doain classifier (paraetrized by {w, d}) are copeting against each other, in an adversarial way, for that ter. In other words, the hidden layer (given by h( )) aps an exaple (either source or target) into a representation in which the output layer (given by f( )) accurately classifies the source saple while the adaptation coponent (given by o( )) is unable to detect if an exaple belongs to the source saple or the target saple. To optiize Eq. (5), we perfor stochastic gradient descent, as detailed in Appendix A. 3

4 Table : Error rates on the Aazon reviews dataset (left), and Pairwise Poisson binoial test (right). Original data SDA representations Nae DANN NN SVM DANN NN SVM books dvd books electronics books kitchen dvd books dvd electronics dvd kitchen electronics books electronics dvd electronics kitchen kitchen books kitchen dvd kitchen electronics Original data DANN NN SVM DANN NN SVM SDA representations DANN NN SVM DANN NN SVM Experients In this section, we copare the perforance of our proposed DANN algorith to a standard neural network with one hidden layer (NN) described by Eq. (3), and a Support Vector Machine (SVM) with a linear kernel. To select the hyper-paraeters of each these algoriths, we train the using a paraeter grid, and we use a very sall validation set which consists in 00 labeled exaples fro the target doain. Finally, we select the classifiers having the lowest target validation risk. We detailed the training procedure for each algorith in Appendix B. Sentient analysis dataset. We copare our algoriths on the Aazon reviews dataset, as preprocessed by Chen et al. (202) 6]. This dataset includes four doains, each one coposed by reviews of a specific kind of product (books, dvd disks, electronics, and kitchen appliances). Reviews are encoded in 5, 000 diensional feature vectors of unigras and bigras, and labels are binary: 0 if the product is ranked up to 3 stars, and if the product is ranked 4 or 5 stars. We perfor twelve doain adaptation tasks. For exaple, books dvd corresponds to the task for which books is the source doain and dvd disks the target one. All learning algoriths are given 2, 000 labeled source exaples and 2, 000 unlabeled target exaples. Then, we evaluate the on separate target test sets (between 3, 000 and 6, 000 exaples). The Original data part of Table (left) shows the test risk of all algoriths, and Table (right) reports the probability than one algorith is significantly better than another according to the Poisson binoial test 3]. We note that DANN has a significantly better perforance than NN and SVM, with respective probabilities 0.90 and As the only difference between DANN and NN is the DA regularizer, we conclude that our approach successfully helps to find a representation suitable for the target doain. Cobining DANN with autoencoders. We now wonder whether our DANN algorith can iprove on the representation learned by state-of-the-art Marginalized Stacked Denoising Autoencoders (SDA) 6]. In brief, SDA is an unsupervised algorith that learn a new robust features representations of the training saples. It takes the unlabeled parts of union of the source set S and the target set T to learn a feature ap fro input space X to a new representation space X. As a denoising autoencoders algorith, it finds a feature representation fro which one can (approxiately) reconstruct the original features of an exaple fro its noisy counterpart. Chen et al. (202) 6] shows that using SDA with linear SVM classifier perfors well on the Aazon reviews datasets on the new representation of the source saple. As an alternative to SVM, we propose to apply DANN algorith on the sae representations generated by SDA (using representations of both source and target saples). Note that, even if SDA and DANN are two representation learning approaches, they pursued a different strategy that can be copleentary. In this experientation, we generate the SDA representations using a corruption probability of 50% and a nuber of layers of 5 for each pair of doains source-target, using the sae aazon reviews described earlier. We then execute the three learning algoriths (DANN, NN, and SVM) on the top of these representations. The SDA part of Table (left and right) confirs that cobining SDA and DANN is a sound approach. Indeed, the Poisson binoial test shows that DANN has a better perforance than NN and SVM with probabilities 0.82 and 0.88 respectively. 4

5 References ] L. Bruzzone and M. Marconcini. Doain adaptation probles: A DASVM classification technique and a circular validation strategy. Transaction Pattern Analysis and Machine Intelligence, 32(5): , ] P. Gerain, A. Habrard, F. Laviolette, and E. Morvant. A PAC-Bayesian approach for doain adaptation with specialization to linear classifiers. In ICML, pages , ] C. Cortes and M. Mohri. Doain adaptation and saple bias correction theory and algorith for regression. Theor. Coput. Sci., 59:03 26, ] X. Glorot, A. Bordes, and Y. Bengio. Doain adaptation for large-scale sentient classification: A deep learning approach. In ICML, volue 27, pages 97 0, 20. 5] P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol. Extracting and coposing robust features with denoising autoencoders. In ICML, pages , ] M. Chen, Z. E. Xu, K. Q. Weinberger, and F. Sha. Marginalized denoising autoencoders for doain adaptation. In ICML, ] S. Ben-David, J. Blitzer, K. Craer, and F. Pereira. Analysis of representations for doain adaptation. In NIPS, pages 37 44, ] S. Ben-David, J. Blitzer, K. Craer, A. Kulesza, F. Pereira, and J.W. Vaughan. A theory of learning fro different doains. Machine Learning, 79(-2):5 75, ] Y. Mansour, M. Mohri, and A. Rostaizadeh. Doain adaptation: Learning bounds and algoriths. In COLT, pages 9 30, ] Y. Mansour, M. Mohri, and A. Rostaizadeh. Multiple source adaptation and the Rényi divergence. In UAI, pages , ] D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streas. In Very Large Data Bases, ] F. Huang and A. Yates. Biased representation learning for doain adaptation. In Joint Conference on Epirical Methods in Natural Language Processing and Coputational Natural Language Learning, pages , ] A. Lacoste, F. Laviolette, and M. Marchand. Bayesian coparison of achine learning algoriths on single and ultiple datasets. In AISTATS, pages ,

6 A Learning Algorith Algorith DANN Stochastic training update Input: source saple S = {(x s i, y s i )}, target saple T = {x t i}, hidden layer size l, adaptation paraeter λ, learning rate α. Output: neural network {W, V, b, c} W, V rando init( l ) b, c, w, d 0 while stopping criteria is not et do for i fro to do # Forward propagation h(x s i ) sig(b + Wx s i ) f(x s i ) softax(c + Vh(x s i )) # Backpropagation c (e(y s i ) f(x s i )) # e(y) is a one-hot vector, that is all 0s but with a at position y V c h(x s i ) b ( V c ) h(x s i ) ( h(x s i )) # is the eleent-wise product W b (x s i ) # Add doain adaptation regularizer... #... fro current doain o(x s i ) sig(d + w h(x s i )) d λ( o(x s i )) w λ( o(x s i ))h(x s i ) tp λ( o(x s i ))w h(x s i ) ( h(x s i )) b b + tp W W + tp (x s i ) #... fro other doain j unfor integer(,..., ) h(x t j) sig(b + Wx t j) o(x t j) sig(d + w h(x t j)) d d λo(x t j) w w λo(x t j)h(x t j) tp λo(x t j)w h(x t j) ( h(x t j)) b b + tp W W + tp (x t j) # Update paraeters neural network paraeters W W α W # α is a hyper-paraeter learning rate V V α V b b α b c c α c # Update doain classifier paraeters w w + α w # Notice the + instead of the - d d + α d end for end while return {W, V, b, c} B Epirical Experients Details Here is soe details about the procedure we used to execute the learning paraeters. DANN. The adaptation paraeter λ is chosen aong 9 values between 0 2 and on a logarithic scale. The hidden layer size l is either, 5, 2, 25, 50, 75, 00, 50, or 200. Finally, the learning rate α is fixed at 0 3 when learning on original data and 0 4 when learning on SDA representations. For each learning task, we split the source training set to use 90% as the training set S and the reaining 0% as a validation set S V. We stop the learning process when the risk on S V is inial. 6

7 NN. We use exactly the sae hyper-paraeters and training procedure as DANN above, except that we do not need an adaptation paraeter. Note that one can train NN by using DANN ipleentation (Algorith ) with λ = 0. SVM. The hyper-paraeter C of the SVM is chosen aong 0 values between 0 5 and on a logarithic scale. Note that this range of values is the sae uded by Chen et al.(202) 6] in their experientations. 7

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département

More information

arxiv: v3 [stat.ml] 9 Aug 2016

arxiv: v3 [stat.ml] 9 Aug 2016 with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette 3 ilie Morvant INRIA, SIRRA Project-Tea, 75589 Paris, France, et DI, École Norale Supérieure, 7530 Paris, France

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

A PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers

A PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers A PAC-Bayesian Approach for Doain Adaptation with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette ilie Morvant o cite this version: Pascal Gerain Aaury Habrard François

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

A theory of learning from different domains

A theory of learning from different domains DOI 10.1007/s10994-009-5152-4 A theory of learning fro different doains Shai Ben-David John Blitzer Koby Craer Alex Kulesza Fernando Pereira Jennifer Wortan Vaughan Received: 28 February 2009 / Revised:

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

PAC-Bayesian Learning and Domain Adaptation

PAC-Bayesian Learning and Domain Adaptation PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

PAC-Bayesian Learning of Linear Classifiers

PAC-Bayesian Learning of Linear Classifiers Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Sparse Domain Adaptation in a Good Similarity-Based Projection Space

Sparse Domain Adaptation in a Good Similarity-Based Projection Space Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain

More information

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Eilie Morvant eilieorvant@lifuniv-rsfr okol Koço sokolkoco@lifuniv-rsfr Liva Ralaivola livaralaivola@lifuniv-rsfr Aix-Marseille

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Variational Adaptive-Newton Method

Variational Adaptive-Newton Method Variational Adaptive-Newton Method Mohaad Etiyaz Khan Wu Lin Voot Tangkaratt Zuozhu Liu Didrik Nielsen AIP, RIKEN, Tokyo Abstract We present a black-box learning ethod called the Variational Adaptive-Newton

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Feedforward Networks

Feedforward Networks Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 433 Christian Jacob Dept.of Coputer Science,University of Calgary CPSC 433 - Feedforward Networks 2 Adaptive "Prograing"

More information

Denoising Autoencoders

Denoising Autoencoders Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

An improved self-adaptive harmony search algorithm for joint replenishment problems

An improved self-adaptive harmony search algorithm for joint replenishment problems An iproved self-adaptive harony search algorith for joint replenishent probles Lin Wang School of Manageent, Huazhong University of Science & Technology zhoulearner@gail.co Xiaojian Zhou School of Manageent,

More information

Generalization of the PAC-Bayesian Theory

Generalization of the PAC-Bayesian Theory Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de

More information

Rotational Prior Knowledge for SVMs

Rotational Prior Knowledge for SVMs Rotational Prior Knowledge for SVMs Arkady Epshteyn and Gerald DeJong University of Illinois at Urbana-Chapaign, Urbana, IL 68, USA {aepshtey,dejong}@uiuc.edu Abstract. Incorporation of prior knowledge

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Multiple Instance Learning with Query Bags

Multiple Instance Learning with Query Bags Multiple Instance Learning with Query Bags Boris Babenko UC San Diego bbabenko@cs.ucsd.edu Piotr Dollár California Institute of Technology pdollar@caltech.edu Serge Belongie UC San Diego sjb@cs.ucsd.edu

More information

Pattern Classification using Simplified Neural Networks with Pruning Algorithm

Pattern Classification using Simplified Neural Networks with Pruning Algorithm Pattern Classification using Siplified Neural Networks with Pruning Algorith S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract: In recent years, any neural network odels have been proposed for pattern classification,

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004 Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 533 Winter 2004 Christian Jacob Dept.of Coputer Science,University of Calgary 2 05-2-Backprop-print.nb Adaptive "Prograing"

More information

arxiv: v3 [cs.lg] 7 Jan 2016

arxiv: v3 [cs.lg] 7 Jan 2016 Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co

More information

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers Predictive Vaccinology: Optiisation of Predictions Using Support Vector Machine Classifiers Ivana Bozic,2, Guang Lan Zhang 2,3, and Vladiir Brusic 2,4 Faculty of Matheatics, University of Belgrade, Belgrade,

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Prediction by random-walk perturbation

Prediction by random-walk perturbation Prediction by rando-walk perturbation Luc Devroye School of Coputer Science McGill University Gábor Lugosi ICREA and Departent of Econoics Universitat Popeu Fabra lucdevroye@gail.co gabor.lugosi@gail.co

More information

IAENG International Journal of Computer Science, 42:2, IJCS_42_2_06. Approximation Capabilities of Interpretable Fuzzy Inference Systems

IAENG International Journal of Computer Science, 42:2, IJCS_42_2_06. Approximation Capabilities of Interpretable Fuzzy Inference Systems IAENG International Journal of Coputer Science, 4:, IJCS_4 6 Approxiation Capabilities of Interpretable Fuzzy Inference Systes Hirofui Miyajia, Noritaka Shigei, and Hiroi Miyajia 3 Abstract Many studies

More information

Learning with Deep Cascades

Learning with Deep Cascades Learning with Deep Cascades Giulia DeSalvo 1, Mehryar Mohri 1,2, and Uar Syed 2 1 Courant Institute of Matheatical Sciences, 251 Mercer Street, New Yor, NY 10012 2 Google Research, 111 8th Avenue, New

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Two-Diensional Multi-Label Active Learning with An Efficient Online Adaptation Model for Iage Classification Guo-Jun Qi, Xian-Sheng Hua, Meber,

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University NBN Algorith Bogdan M. Wilaoswki Auburn University Hao Yu Auburn University Nicholas Cotton Auburn University. Introduction. -. Coputational Fundaentals - Definition of Basic Concepts in Neural Network

More information

arxiv: v2 [cs.lg] 30 Mar 2017

arxiv: v2 [cs.lg] 30 Mar 2017 Batch Renoralization: Towards Reducing Minibatch Dependence in Batch-Noralized Models Sergey Ioffe Google Inc., sioffe@google.co arxiv:1702.03275v2 [cs.lg] 30 Mar 2017 Abstract Batch Noralization is quite

More information