Robustness in GANs and in Black-box Optimization
|
|
- Brendan Haynes
- 5 years ago
- Views:
Transcription
1 Robustness in GANs and in Black-box Optimization Stefanie Jegelka MIT CSAIL joint work with Zhi Xu, Chengtao Li, Ilija Bogunovic, Jonathan Scarlett and Volkan Cevher
2 Robustness in ML noise Generator Critic One unit is enough! Robustness in GANs Representational Power in Deep Learning Robust Optimization, Generalization, Discrete & Nonconvex Optimization Robust Black-Box Optimization
3 Generative Adversarial Networks CGAN, trained with BN (b) DCGAN, trained without BN G(z) random z noise attack: with probability p<.5, discriator s output is manipulated generator doesn t know which feedback is honest Generator Discriator D(x), D(G(z)) N-S, trained without BN (d) DAN-S, trained without BN x AN, DAN-S and DAN-S trained on CelebA dataset. Both DAN-S and DAN-S ch more diversity in generated samples compared to DCGAN. real data max V (G, D) G D discriator: generator: max V (G, D) D V (G, A(D)) G
4 Generative Adversarial Networks DCGAN, trained with BN (b) DCGAN, trained without BN G(z) random noise z attack: with probability p<.5, discriator s output is manipulated generator doesn t know which feedback is honest Generator Discriator D(x), D(G(z)) DAN-S, trained without BN (d) DAN-S, trained without BN x CGAN, DAN-S and DAN-S trained on CelebA dataset. Both DAN-S and DAN-S much more diversity in generated samples compared to DCGAN. real data A(D(x)) = ( D(x) D(x) with probability p otherwise. Theorem: If adversary does a simple sign flip, then standard GAN no longer learns the right distribution.
5 <latexit sha_base64="b4hlebhr7tetaehbb4ygfyyuiv8=">aaab6hicbvbns8naejurq/qh69lbbbuleqmeif48ta9oq9lsj+azsbsboqs+gu8efdeqz/jm//gbzudtj4yelww8y8ibfcg9f9dgobmvbo8xdt7+wefr+fikrenumwyxwmsqgcngktsgw4edhofnaoedoljdzvpkhspjypzpqgh9gr5cfnfipgq7kfbfqlkdwizetcurodmpf/whmgilyyjqfpcxpgzvyyzgbnsp9wyudahi+xzkmmes8wh87ihvwgjiyvlwniqv9kdfi6kum6imrfe9ebif4vnegnngzpaylwy4kufmtozfkyfxyiyywkkz4vzwwszuuwzsniubgrf68jppxu9t+oryvzyoipzbovycbzwowzoaumej7hfd6cr+ffexc+lqfj585ht9wpn8ayvgm6g==</latexit> <latexit sha_base64="b4hlebhr7tetaehbb4ygfyyuiv8=">aaab6hicbvbns8naejurq/qh69lbbbuleqmeif48ta9oq9lsj+azsbsboqs+gu8efdeqz/jm//gbzudtj4yelww8y8ibfcg9f9dgobmvbo8xdt7+wefr+fikrenumwyxwmsqgcngktsgw4edhofnaoedoljdzvpkhspjypzpqgh9gr5cfnfipgq7kfbfqlkdwizetcurodmpf/whmgilyyjqfpcxpgzvyyzgbnsp9wyudahi+xzkmmes8wh87ihvwgjiyvlwniqv9kdfi6kum6imrfe9ebif4vnegnngzpaylwy4kufmtozfkyfxyiyywkkz4vzwwszuuwzsniubgrf68jppxu9t+oryvzyoipzbovycbzwowzoaumej7hfd6cr+ffexc+lqfj585ht9wpn8ayvgm6g==</latexit> <latexit sha_base64="b4hlebhr7tetaehbb4ygfyyuiv8=">aaab6hicbvbns8naejurq/qh69lbbbuleqmeif48ta9oq9lsj+azsbsboqs+gu8efdeqz/jm//gbzudtj4yelww8y8ibfcg9f9dgobmvbo8xdt7+wefr+fikrenumwyxwmsqgcngktsgw4edhofnaoedoljdzvpkhspjypzpqgh9gr5cfnfipgq7kfbfqlkdwizetcurodmpf/whmgilyyjqfpcxpgzvyyzgbnsp9wyudahi+xzkmmes8wh87ihvwgjiyvlwniqv9kdfi6kum6imrfe9ebif4vnegnngzpaylwy4kufmtozfkyfxyiyywkkz4vzwwszuuwzsniubgrf68jppxu9t+oryvzyoipzbovycbzwowzoaumej7hfd6cr+ffexc+lqfj585ht9wpn8ayvgm6g==</latexit> What makes GANs more robust? random noise z Generator x Discriator G max D V (G, D). Model properties: transformation function G max D E x P data [f(d(x))] + E z Pz [f( D(G(z)))] score for real data inverse score for fake data If: is strictly increasing and differentiable symmetry: f(a) = f( a) for a [, ] then model is robust f<latexit sha_base64="b4hlebhr7tetaehbb4ygfyyuiv8=">aaab6hicbvbns8naejurq/qh69lbbbuleqmeif48ta9oq9lsj+azsbsboqs+gu8efdeqz/jm//gbzudtj4yelww8y8ibfcg9f9dgobmvbo8xdt7+wefr+fikrenumwyxwmsqgcngktsgw4edhofnaoedoljdzvpkhspjypzpqgh9gr5cfnfipgq7kfbfqlkdwizetcurodmpf/whmgilyyjqfpcxpgzvyyzgbnsp9wyudahi+xzkmmes8wh87ihvwgjiyvlwniqv9kdfi6kum6imrfe9ebif4vnegnngzpaylwy4kufmtozfkyfxyiyywkkz4vzwwszuuwzsniubgrf68jppxu9t+oryvzyoipzbovycbzwowzoaumej7hfd6cr+ffexc+lqfj585ht9wpn8ayvgm6g==</latexit>
6 What makes GANs more robust? random noise z Generator x Discriator G max D V (G, D). Model properties: symmetric transformation function. Training Algorithm: weight clipping (regularization) helps robustness Generalization includes Wasserstein GANs!
7 tha similar the MNIST analysis data, with using thea MNIST CNN fordata, bothusing G anda CNN D. Without for both G an ygan help is robustness, still morethe brittle GAN also istowards still more thebrittle choice also of towards algorithmthe choic the rization). theory Extending to include Empirical both the theory aspects, to include model Results and bothalgorithm, aspects, model is an and al e of future research. MNIST GAN stable GAN bility = =. Error Probability Error Probability =. = Error Probability = =. Error Pro Piece_GAN Piece_GAN ple results Figure on5: MNIST Example (noresults clipping). on MNIST (no clipping).
8 Empirical Results. Error probability Error Probability =.=. GAN Linear Tanh Erf Piece Piece % success Success Rate stable GANs. GAN % failure no clipping None... strict clipping Weight Clipping (Regularization) helps robustness Stable GANs are more robust to clipping threshold
9 Black-Box Optimization Goal: optimize a complex function that is only accessible via expensive queries f (x) Automatic Gait Optimization The search for appropriate parameters of a controller can be formulated as an optimization problem, such as the imization imize f ( ) () RD of an objective function f ( ) with respect to the parameters. In the context of gait optimization, this optimization problem is characterized as a global optimization of a zeroorder stochastic objective function. Therefore, the use of Bayesian optimization well suits this challenging optimization task. Bayesian Optimization Bayesian optimization is an iterative model-based global optimization method [7, 5,, ]. In Bayesian optimization, for each iteration (i.e., evaluation of the objective function f ), a GP model 7! f ( ) is learned from the data set T = {, f ( )} composed Figure : The bio-inspired dynamical by the past parameters and the corresponding measure- bipedal walker Fox used for the experments f ( ) of the objective function. This model is used
10 Black-Box Optimization Sequential Optimization: build internal model of f(x) In each time step: Select a query point x Observe (noisy) f(x) update model often: Gaussian Process select queries: uncertainty and expected function value, e.g. maximizing ucb(x) = ˆf(x)+ / (x)
11 Robust Optimization with GPs Observed f(x) may be (adversarially) perturbed: Robotics: simulations vs. real data Parameter tuning: estimation with limited data Time-varying functions Problem: max x D δ Δ ϵ (x) f(x + δ) (x) f(x + ) Δ ϵ (x) = {x x : x D and d(x, x ) ϵ} standard methods can fail! suboptimal decision!
12 <latexit sha_base64="ps+pz/uv6oop4gi+75b54u/pec=">aaab6icbzbnswmxeizn6etxwpxojf8frrdbjyvhcrywqvkwbmmsxzfyos/+cfw+kepupefpfmlz7nyxag/vzjcznqlsoj75pbxjc6u8xdnzds/qb4etwsgczbljgj6utucikb6faytup4vrfkj9g49tz/fgjgyss/yctliekdrwibam4socab9a8+v+xgqvggjqukjzr7bgnlfnfijlwg/gphjkkjjkovszylbeyhvotqu8vtmm9nziz5wxinbjnjk5+sip8raiypcp6i4ssumflfrzthfbmqqczcswh8wzjjiqefkiaxnkccokdpc7uryibrkmvtcseeyyevqvuihji+v6wboo4ynacpaoavxba+6gcsgmijneiutkvrvswgtecxmmfyr9/kdcosooa==</latexit> <latexit sha_base64="ps+pz/uv6oop4gi+75b54u/pec=">aaab6icbzbnswmxeizn6etxwpxojf8frrdbjyvhcrywqvkwbmmsxzfyos/+cfw+kepupefpfmlz7nyxag/vzjcznqlsoj75pbxjc6u8xdnzds/qb4etwsgczbljgj6utucikb6faytup4vrfkj9g49tz/fgjgyss/yctliekdrwibam4socab9a8+v+xgqvggjqukjzr7bgnlfnfijlwg/gphjkkjjkovszylbeyhvotqu8vtmm9nziz5wxinbjnjk5+sip8raiypcp6i4ssumflfrzthfbmqqczcswh8wzjjiqefkiaxnkccokdpc7uryibrkmvtcseeyyevqvuihji+v6wboo4ynacpaoavxba+6gcsgmijneiutkvrvswgtecxmmfyr9/kdcosooa==</latexit> <latexit sha_base64="ps+pz/uv6oop4gi+75b54u/pec=">aaab6icbzbnswmxeizn6etxwpxojf8frrdbjyvhcrywqvkwbmmsxzfyos/+cfw+kepupefpfmlz7nyxag/vzjcznqlsoj75pbxjc6u8xdnzds/qb4etwsgczbljgj6utucikb6faytup4vrfkj9g49tz/fgjgyss/yctliekdrwibam4socab9a8+v+xgqvggjqukjzr7bgnlfnfijlwg/gphjkkjjkovszylbeyhvotqu8vtmm9nziz5wxinbjnjk5+sip8raiypcp6i4ssumflfrzthfbmqqczcswh8wzjjiqefkiaxnkccokdpc7uryibrkmvtcseeyyevqvuihji+v6wboo4ynacpaoavxba+6gcsgmijneiutkvrvswgtecxmmfyr9/kdcosooa==</latexit> <latexit sha_base64="ps+pz/uv6oop4gi+75b54u/pec=">aaab6icbzbnswmxeizn6etxwpxojf8frrdbjyvhcrywqvkwbmmsxzfyos/+cfw+kepupefpfmlz7nyxag/vzjcznqlsoj75pbxjc6u8xdnzds/qb4etwsgczbljgj6utucikb6faytup4vrfkj9g49tz/fgjgyss/yctliekdrwibam4socab9a8+v+xgqvggjqukjzr7bgnlfnfijlwg/gphjkkjjkovszylbeyhvotqu8vtmm9nziz5wxinbjnjk5+sip8raiypcp6i4ssumflfrzthfbmqqczcswh8wzjjiqefkiaxnkccokdpc7uryibrkmvtcseeyyevqvuihji+v6wboo4ynacpaoavxba+6gcsgmijneiutkvrvswgtecxmmfyr9/kdcosooa==</latexit> StableOpt Algorithm In every round: select select observe x t = argmax x D δ t = arg δ Δ ϵ ( x t ) δ Δ ϵ (x) lcb t ( x t + δ) ucb t (x + δ) and update with upper confidence bound lower confidence bound y t = f( x t +δ t ) + z t {( x t +δ t, y t )} Theorem (RBF kernel): sample complexity: O* ( η (log η )p ) steps for -suboptimality whp closely matching lower bound algorithm generalizes to many other settings!
13 <latexit sha_base64="jbremvom7kmupdv9uznwdlxn=">aaab8icbzbns8naeiyn9avwr6phl8eiecqjchqsepfywdzce8pkuxbjzhdykul/hxymixvzvwbtsctpwfhydzpjznqlmor55pbxjc6u8xdnzds/qb4etusaczbljgj7kroubsktiq5juc4wjyr+j8es/vjetrgjeqbjysmyhombeoyvhbeqpoarpxwqvwvlol7skfgekntsvb+cfskymctieop+l5kyy6abjn8wgkywnkyxzyrkwfmtdhpr956p5zp+8oemfinfu/p7imtzmekemyamexazpyvsocbmqquzccuwiwazdclxzwg4fae5izmxgewle6vlrqirkypykpwl7+8culum/5/rlwucnikmmjnmi5+hafdbidjrsaqqrp8apvtua8oo/ox6k5bqzx/bhzucpbdr7a==</latexit> <latexit sha_base64="jbremvom7kmupdv9uznwdlxn=">aaab8icbzbns8naeiyn9avwr6phl8eiecqjchqsepfywdzce8pkuxbjzhdykul/hxymixvzvwbtsctpwfhydzpjznqlmor55pbxjc6u8xdnzds/qb4etusaczbljgj7kroubsktiq5juc4wjyr+j8es/vjetrgjeqbjysmyhombeoyvhbeqpoarpxwqvwvlol7skfgekntsvb+cfskymctieop+l5kyy6abjn8wgkywnkyxzyrkwfmtdhpr956p5zp+8oemfinfu/p7imtzmekemyamexazpyvsocbmqquzccuwiwazdclxzwg4fae5izmxgewle6vlrqirkypykpwl7+8culum/5/rlwucnikmmjnmi5+hafdbidjrsaqqrp8apvtua8oo/ox6k5bqzx/bhzucpbdr7a==</latexit> <latexit sha_base64="jbremvom7kmupdv9uznwdlxn=">aaab8icbzbns8naeiyn9avwr6phl8eiecqjchqsepfywdzce8pkuxbjzhdykul/hxymixvzvwbtsctpwfhydzpjznqlmor55pbxjc6u8xdnzds/qb4etusaczbljgj7kroubsktiq5juc4wjyr+j8es/vjetrgjeqbjysmyhombeoyvhbeqpoarpxwqvwvlol7skfgekntsvb+cfskymctieop+l5kyy6abjn8wgkywnkyxzyrkwfmtdhpr956p5zp+8oemfinfu/p7imtzmekemyamexazpyvsocbmqquzccuwiwazdclxzwg4fae5izmxgewle6vlrqirkypykpwl7+8culum/5/rlwucnikmmjnmi5+hafdbidjrsaqqrp8apvtua8oo/ox6k5bqzx/bhzucpbdr7a==</latexit> Variations Robustness to unknown parameters f : D Θ R is smooth wrt input x and parameters max x D θ Θ f(x, θ) Robust estimation estimate of true parameters <latexit sha_base64="jbremvom7kmupdv9uznwdlxn=">aaab8icbzbns8naeiyn9avwr6phl8eiecqjchqsepfywdzce8pkuxbjzhdykul/hxymixvzvwbtsctpwfhydzpjznqlmor55pbxjc6u8xdnzds/qb4etusaczbljgj7kroubsktiq5juc4wjyr+j8es/vjetrgjeqbjysmyhombeoyvhbeqpoarpxwqvwvlol7skfgekntsvb+cfskymctieop+l5kyy6abjn8wgkywnkyxzyrkwfmtdhpr956p5zp+8oemfinfu/p7imtzmekemyamexazpyvsocbmqquzccuwiwazdclxzwg4fae5izmxgewle6vlrqirkypykpwl7+8culum/5/rlwucnikmmjnmi5+hafdbidjrsaqqrp8apvtua8oo/ox6k5bqzx/bhzucpbdr7a==</latexit> max x D δ θ Δ ϵ ( θ) f(x, θ + δ θ ) Robust group identification input space partitioned into groups group with highest worst-case value G = {G,, G k } max G G x G f(x)
14 Empirical Results [Bertsimas et al. ] x x 5 4 t 6 8 StableOpt Robot Pushing 4 Avg. Min. Obj. Val. -regret b e t t e r StableOpt GP-UCB MaxiMin-GP-UCB Stable-GP-UCB Stable-GP-Random 4 y 5 y Benchmark GP-UCB MaxiMin-GP-UCB Stable-GP-UCB Stable-GP-Random StableOpt b e t t e r
15 Robustness in ML noise Generator Critic One unit is enough! Robustness in GANs (Z. Xu, C. Li, S. Jegelka, arxiv) Representational Power in Deep Learning Robust Optimization, Generalization, Discrete & Nonconvex Optimization Robust Black-Box - - Optimization (I. Bogunovic, J. Scarlett, S. Jegelka, V. Cevher NIPS 8)
MMD GAN 1 Fisher GAN 2
MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber
More informationGenerative adversarial networks
14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based
More informationEnergy-Based Generative Adversarial Network
Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More informationGaussian Process Optimization with Mutual Information
Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationEnforcing constraints for interpolation and extrapolation in Generative Adversarial Networks
Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks Panos Stinis (joint work with T. Hagge, A.M. Tartakovsky and E. Yeung) Pacific Northwest National Laboratory
More informationPredictive Variance Reduction Search
Predictive Variance Reduction Search Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh Centre of Pattern Recognition and Data Analytics (PRaDA), Deakin University Email: v.nguyen@deakin.edu.au
More informationOptimisation séquentielle et application au design
Optimisation séquentielle et application au design d expériences Nicolas Vayatis Séminaire Aristote, Ecole Polytechnique - 23 octobre 2014 Joint work with Emile Contal (computer scientist, PhD student)
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationKnowledge-Gradient Methods for Bayesian Optimization
Knowledge-Gradient Methods for Bayesian Optimization Peter I. Frazier Cornell University Uber Wu, Poloczek, Wilson & F., NIPS 17 Bayesian Optimization with Gradients Poloczek, Wang & F., NIPS 17 Multi
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationGANs. Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG
GANs Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG Machine Learning: Jordan Boyd-Graber UMD GANs 1 / 7 Problems with Generation Generative Models Ain t Perfect
More informationParallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration Emile Contal David Buffoni Alexandre Robicquet Nicolas Vayatis CMLA, ENS Cachan, France September 25, 2013 Motivating
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationCAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING
CAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING (Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis & Sriram Vishwanath, 2017) Summer Term 2018 Created for the Seminar
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationEstimation Considerations in Contextual Bandits
Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens arxiv:1711.07077v4 [stat.ml] 16 Dec 2018 Abstract Contextual bandit algorithms are sensitive to
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationarxiv: v1 [stat.ml] 19 Jan 2018
Composite Functional Gradient Learning of Generative Adversarial Models arxiv:80.06309v [stat.ml] 9 Jan 208 Rie Johnson RJ Research Consulting Tarrytown, NY, USA riejohnson@gmail.com Abstract Tong Zhang
More informationAdversarial Examples
Adversarial Examples presentation by Ian Goodfellow Deep Learning Summer School Montreal August 9, 2015 In this presentation. - Intriguing Properties of Neural Networks. Szegedy et al., ICLR 2014. - Explaining
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationGenerative Adversarial Networks
Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationGenerative Adversarial Networks. Presented by Yi Zhang
Generative Adversarial Networks Presented by Yi Zhang Deep Generative Models N(O, I) Variational Auto-Encoders GANs Unreasonable Effectiveness of GANs GANs Discriminator tries to distinguish genuine data
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationGENERATIVE ADVERSARIAL LEARNING
GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationParallelised Bayesian Optimisation via Thompson Sampling
Parallelised Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy Carnegie Mellon University Google Research, Mountain View, CA Sep 27, 2017 Slides: www.cs.cmu.edu/~kkandasa/talks/google-ts-slides.pdf
More informationarxiv: v3 [cs.lg] 2 Nov 2018
PacGAN: The power of two samples in generative adversarial networks Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh Carnegie Mellon University, University of Illinois at Urbana-Champaign arxiv:72.486v3
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationTruncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation
Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, Volkan Cevher Laboratory for Information and Inference
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationGenerative Adversarial Networks, and Applications
Generative Adversarial Networks, and Applications Ali Mirzaei Nimish Srivastava Kwonjoon Lee Songting Xu CSE 252C 4/12/17 2/44 Outline: Generative Models vs Discriminative Models (Background) Generative
More informationarxiv: v1 [stat.ml] 24 Oct 2016
Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation arxiv:6.7379v [stat.ml] 4 Oct 6 Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, Volkan Cevher Laboratory
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationarxiv: v3 [stat.ml] 20 Feb 2018
MANY PATHS TO EQUILIBRIUM: GANS DO NOT NEED TO DECREASE A DIVERGENCE AT EVERY STEP William Fedus 1, Mihaela Rosca 2, Balaji Lakshminarayanan 2, Andrew M. Dai 1, Shakir Mohamed 2 and Ian Goodfellow 1 1
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationGradient descent GAN optimization is locally stable
Gradient descent GAN optimization is locally stable Advances in Neural Information Processing Systems, 2017 Vaishnavh Nagarajan J. Zico Kolter Carnegie Mellon University 05 January 2018 Presented by: Kevin
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationOptimization of Gaussian Process Hyperparameters using Rprop
Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are
More informationLightweight Probabilistic Deep Networks Supplemental Material
Lightweight Probabilistic Deep Networks Supplemental Material Jochen Gast Stefan Roth Department of Computer Science, TU Darmstadt In this supplemental material we derive the recipe to create uncertainty
More informationDeep Learning & Artificial Intelligence WS 2018/2019
Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target
More informationNegative Momentum for Improved Game Dynamics
Negative Momentum for Improved Game Dynamics Gauthier Gidel Reyhane Askari Hemmat Mohammad Pezeshki Gabriel Huang Rémi Lepriol Simon Lacoste-Julien Ioannis Mitliagkas Mila & DIRO, Université de Montréal
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationABC-LogitBoost for Multi-Class Classification
Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMaximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS
Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationNeed for Sampling in Machine Learning. Sargur Srihari
Need for Sampling in Machine Learning Sargur srihari@cedar.buffalo.edu 1 Rationale for Sampling 1. ML methods model data with probability distributions E.g., p(x,y; θ) 2. Models are used to answer queries,
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationTraining Generative Adversarial Networks Via Turing Test
raining enerative Adversarial Networks Via uring est Jianlin Su School of Mathematics Sun Yat-sen University uangdong, China bojone@spaces.ac.cn Abstract In this article, we introduce a new mode for training
More informationSinging Voice Separation using Generative Adversarial Networks
Singing Voice Separation using Generative Adversarial Networks Hyeong-seok Choi, Kyogu Lee Music and Audio Research Group Graduate School of Convergence Science and Technology Seoul National University
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationWasserstein GAN. Juho Lee. Jan 23, 2017
Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationTutorial on Machine Learning for Advanced Electronics
Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling
More informationMulti-Attribute Bayesian Optimization under Utility Uncertainty
Multi-Attribute Bayesian Optimization under Utility Uncertainty Raul Astudillo Cornell University Ithaca, NY 14853 ra598@cornell.edu Peter I. Frazier Cornell University Ithaca, NY 14853 pf98@cornell.edu
More informationBlind Attacks on Machine Learners
Blind Attacks on Machine Learners Alex Beatson 1 Zhaoran Wang 1 Han Liu 1 1 Princeton University NIPS, 2016/ Presenter: Anant Kharkar NIPS, 2016/ Presenter: Anant Kharkar 1 Outline 1 Introduction Motivation
More informationA QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS
A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationPreference Elicitation for Sequential Decision Problems
Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These
More informationThe role of dimensionality reduction in classification
The role of dimensionality reduction in classification Weiran Wang and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationCS 340 Lec. 16: Logistic Regression
CS 34 Lec. 6: Logistic Regression AD March AD ) March / 6 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given an input test data
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationThe Numerics of GANs
The Numerics of GANs Lars Mescheder 1, Sebastian Nowozin 2, Andreas Geiger 1 1 Autonomous Vision Group, MPI Tübingen 2 Machine Intelligence and Perception Group, Microsoft Research Presented by: Dinghan
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationThe power of two samples in generative adversarial networks
The power of two samples in generative adversarial networks Sewoong Oh Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign / 2 Generative models learn
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationComposite Functional Gradient Learning of Generative Adversarial Models. Appendix
A. Main theorem and its proof Appendix Theorem A.1 below, our main theorem, analyzes the extended KL-divergence for some β (0.5, 1] defined as follows: L β (p) := (βp (x) + (1 β)p(x)) ln βp (x) + (1 β)p(x)
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More information