Learning features to compare distributions
|
|
- Madeleine Baker
- 5 years ago
- Views:
Transcription
1 Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28
2 Goal of this talk Have: Two collections of samples X Y from unknown distributions P and Q. Goal: Learn distinguishing features that indicate how P and Q differ. 2/28
3 Goal of this talk Have: Two collections of samples X Y from unknown distributions P and Q. Goal: Learn distinguishing features that indicate how P and Q differ. 2/28
4 Divergences 3/28
5 Divergences 4/28
6 Divergences 5/28
7 Divergences 6/28
8 Divergences Sriperumbudur, Fukumizu, G, Schoelkopf, Lanckriet (2012) 7/28
9 Overview The Maximum mean discrepancy: How to compute and interpret the MMD How to train the MMD Application to troubleshooting GANs The ME test statistic: Informative, linear time features for comparing distributions How to learn these features TL;DR: Variance matters. 8/28
10 The maximum mean discrepancy Are P and Q different? P(x) Q(y) /28
11 Maximum mean discrepancy (on sample) 10/28
12 Maximum mean discrepancy (on sample) Observe X x 1 x n P Observe Y y 1 y n Q 10/28
13 Maximum mean discrepancy (on sample) Gaussian kernel on x i Gaussian kernel on y i 10/28
14 Maximum mean discrepancy (on sample) P v : mean embedding of P Q v : mean embedding of Q v P v 1 m m i 1 k x i v 10/28
15 Maximum mean discrepancy (on sample) P v : mean embedding of P Q v : mean embedding of Q witness v P v Q v v 10/28
16 Maximum mean discrepancy (on sample) MMD 2 witness v k x i x j k y i y j n n 1 n n 1 i j i j 2 n 2 i j k x i y j 11/28
17 Overview Dogs P and fish Q example revisited Each entry is one of k dog i dog j, k dog i fish j, or k fish i fish j 12/28
18 Overview The maximum mean discrepancy: MMD k dog n n 1 i dog j n n 1 i j i k fish i j fish j 2 n 2 i j k dog i fish j 13/28
19 Asymptotics of MMD The MMD: MMD 2 1 n n 1 i j k x i x j 1 n n 1 i j k y i y j 2 n 2 i j k x i but how to choose the kernel? y j 14/28
20 Asymptotics of MMD The MMD: MMD 2 1 n n 1 i j k x i x j 1 n n 1 i j k y i y j 2 n 2 i j k x i but how to choose the kernel? y j Perspective from statistical hypothesis testing: When P Q then MMD 2 close to zero. When P Q then MMD 2 far from zero Threshold c for MMD 2 gives false positive rate 14/28
21 Astatisticaltest MMD density P=Q P Q Prob. of n MMD d False negatives c α = 1 α quantile when P=Q n MMD d 15/28
22 Astatisticaltest MMD density P=Q P Q Prob. of n MMD d False negatives c α = 1 α quantile when P=Q n MMD d Best kernel gives lowest false negative rate (=highest power) 15/28
23 Astatisticaltest MMD density P=Q P Q Prob. of n MMD d False negatives c α = 1 α quantile when P=Q n MMD d Best kernel gives lowest false negative rate (=highest power)... but can you train for this? 15/28
24 Asymptotics of MMD When P Q, statistic is asymptotically normal, MMD 2 MMD P Q V n P Q where MMD P Q is population MMD, andv n P Q O n 1. D MMD distribution and Gaussian fit under H1 Empirical PDF Gaussian fit Prob. density MMD 16/28
25 Asymptotics of MMD Where P Q, statistic has asymptotic distribution nmmd 2 l 1 l z 2 l MMD density under H0 χ 2 sum Empirical PDF where i i x k x x centred i x dp x Prob. density z l 0 2 i i d n MMD 2 17/28
26 Optimizing test power The power of our test (Pr 1 denotes probability under P Q): Pr 1 nmmd 2 c 18/28
27 Optimizing test power The power of our test (Pr 1 denotes probability under P Q): Pr 1 nmmd 2 c 1 n c V n P Q MMD 2 P Q V n P Q where is the CDF of the standard normal distribution. c is an estimate of c test threshold. 18/28
28 Optimizing test power The power of our test (Pr 1 denotes probability under P Q): Pr 1 nmmd 2 c 1 n c V n P Q MMD 2 P Q V n P Q O n 3 2 O n 1 2 First term asymptotically negligible! 18/28
29 Optimizing test power The power of our test (Pr 1 denotes probability under P Q): Pr 1 nmmd 2 c 1 n c V n P Q MMD 2 P Q V n P Q To maximize test power, maximize MMD 2 P Q V n P Q (Sutherland, Tung, Strathmann, De, Ramdas, Smola, G., in review for ICLR 2017) Code: github.com/dougalsutherland/opt-mmd 18/28
30 Troubleshooting for generative adversarial networks MNIST samples Samples from a GAN 19/28
31 Troubleshooting for generative adversarial networks MNIST samples ARD map Samples from a GAN Power for optimzed ARD kernel: 1.00 at 0 01 Power for optimized RBF kernel: 0.57 at /28
32 Benchmarking generative adversarial networks 20/28
33 The ME statistic and test 21/28
34 Distinguishing Feature(s) P v : mean embedding of P Q v : mean embedding of Q witness v P v Q v v 22/28
35 Distinguishing Feature(s) witness 2 v Take square of witness (only worry about amplitude) 23/28
36 Distinguishing Feature(s) New test statistic: witness 2 at a single v ; Linear time in number n of samples...but how to choose best feature v? 23/28
37 Distinguishing Feature(s) v Best feature = v that maximizes witness 2 v?? 23/28
38 Distinguishing Feature(s) witness 2 v Sample size n 3 24/28
39 Distinguishing Feature(s) Sample size n 50 24/28
40 Distinguishing Feature(s) Sample size n /28
41 Distinguishing Feature(s) Pwx) Qwy) wittess 2 wv) Population witness 2 function 24/28
42 Distinguishing Feature(s) Pwx) Qwy) wittess 2 wv) v? v? 24/28
43 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: n v n witness2 v variance of v. 25/28
44 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: n v n witness2 v variance of v. 25/28
45 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: n v n witness2 v variance of v. Pwx) Qwy) wittess 2 wv) 25/28
46 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: n v n witness2 v variance of v. wittess 2 wv) vsristce X wv) 25/28
47 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: n v n witness2 v variance of v. wittess 2 wv) vsristce Y wv) 25/28
48 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: n v n witness2 v variance of v. wittess 2 wv) vsristce of v 25/28
49 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: n v n witness2 v variance of v. ˆλ n (v) Best location is v that maximizes n. v Improve performance using multiple locations v j J j 1 25/28
50 Distinguishing Positive/Negative Emotions happy neutral surprised 35 females and 35 males (Lundqvist et al., 1998) dimensions. Pixel features. Sample size: 402. afraid angry disgusted The proposed test achieves maximum test power in time O n. Informative features: differences at the nose, and smile lines. 26/28
51 Distinguishing Positive/Negative Emotions 5andRP feature happy neutral surprised afraid angry disgusted PRwer vs. - The proposed test achieves maximum test power in time O n. Informative features: differences at the nose, and smile lines. 26/28
52 Distinguishing Positive/Negative Emotions 5andRP feature PrRpRsed happy neutral surprised afraid angry disgusted PRwer vs. - The proposed test achieves maximum test power in time O n. Informative features: differences at the nose, and smile lines. 26/28
53 Distinguishing Positive/Negative Emotions happy neutral surprised afraid angry disgusted PRwer DndRP fedture PrRpRsed 00D (quddrdtic tipe) + vs. - The proposed test achieves maximum test power in time O n. Informative features: differences at the nose, and smile lines. 26/28
54 Distinguishing Positive/Negative Emotions happy neutral surprised afraid angry disgusted Learned feature The proposed test achieves maximum test power in time O n. Informative features: differences at the nose, and smile lines. 26/28
55 Distinguishing Positive/Negative Emotions happy neutral surprised afraid angry disgusted Learned feature The proposed test achieves maximum test power in time O n. Informative features: differences at the nose, and smile lines. Code: 26/28
56 Final thoughts Witness function approaches: Diversity of samples: MMD test uses pairwise similarities between all samples ME test uses similarities to J reference features Disjoint support of generator/data distributions Witness function is smooth Other discriminator heuristics: Diversity of samples by minibatch heuristic (add as feature distances to neighbour samples) Salimans et al. (2016) Disjoint support treated by adding noise to blur images Arjovsky and Bottou (2016), Huszar (2016) 27/28
57 Co-authors Students and postdocs: Kacper Chwialkowski (at Voleon) Wittawat Jitkrittum Heiko Strathmann Dougal Sutherland Collaborators Kenji Fukumizu Krikamol Muandet Bernhard Schoelkopf Bharath Sriperumbudur Zoltan Szabo Questions? 28/28
58 Testing against a probabilistic model 29/28
59 Statistical model criticism MMD P Q f 2 sup f 1 E Q f E p f p(x) q(x) f * (x) f x is the witness function Can we compute MMD with samples from Q and a model P? Problem: usualy can t compute E p f in closed form. 30/28
60 Stein idea To get rid of E p f in we define the Stein operator sup E q f f 1 E p f T p f x f f x log p Then E P T P f 0 subject to appropriate boundary conditions. (Oates, Girolami, Chopin, 2016) 31/28
61 Maximum Stein Discrepancy Stein operator T p f x f f x log p Maximum Stein Discrepancy (MSD) MSD p q sup E q T p g g 1 E p T p g 32/28
62 Maximum Stein Discrepancy Stein operator T p f x f f x log p Maximum Stein Discrepancy (MSD) MSD p q sup E q T p g g 1 E p T p g 32/28
63 Maximum Stein Discrepancy Stein operator T p f x f f x log p Maximum Stein Discrepancy (MSD) MSD p q sup E q T p g g 1 E p T p g sup E q T p g g 1 32/28
64 Maximum Stein Discrepancy Stein operator T p f x f f x log p Maximum Stein Discrepancy (MSD) MSD p q sup E q T p g g 1 E p T p g sup E q T p g g p(x) q(x) g * (x) /28
65 Maximum Stein Discrepancy Stein operator T p f x f f x log p Maximum Stein Discrepancy (MSD) MSD p q sup E q T p g g 1 E p T p g sup E q T p g g p(x) q(x) 0.1 g * (x) /28
66 Maximum stein discrepancy Closed-form expression for MSD: given Z Z Strathmann, G., 2016) (Liu, Lee, Jordan 2016) q, then(chwialkowski, where MSD p q E q h p Z Z h p x y x log p x x log p y k x y y log p y x k x y x log p x yk x y x yk x y and k is RKHS kernel for Only depends on kernel and x log p x. Do not need to normalize p, or sample from it. 33/28
67 Statistical model criticism 3 Solar activity (normalised) Year Test the hypothesis that a Gaussian process model, learned from data, is a good fit for the test data (example from Lloyd and Ghahramani, 2015) Code: 34/28
68 Statistical model criticism V n test Bootstrapped B n Frequency V n Test the hypothesis that a Gaussian process model, learned from data, is a good fit for the test data 35/28
Learning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu
More informationHilbert Space Embedding of Probability Measures
Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationarxiv: v1 [stat.ml] 27 Oct 2018
Informative Features for Model Comparison Wittawat Jitkrittum Max Planck Institute for Intelligent Systems wittawat@tuebingen.mpg.de Heishiro Kanagawa Gatsby Unit, UCL heishirok@gatsby.ucl.ac.uk arxiv:1810.11630v1
More informationHilbert Space Representations of Probability Distributions
Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck
More informationFast Two Sample Tests using Smooth Random Features
Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University
More informationKernel methods for Bayesian inference
Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20
More informationTensor Product Kernels: Characteristic Property, Universality
Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationBig Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationA Kernelized Stein Discrepancy for Goodness-of-fit Tests
Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer
More informationImportance Reweighting Using Adversarial-Collaborative Training
Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a
More informationMMD GAN: Towards Deeper Understanding of Moment Matching Network
MMD GAN: Towards Deeper Understanding of Moment Matching Network Chun-Liang Li 1, Wei-Cheng Chang 1, Yu Cheng 2 Yiming Yang 1 Barnabás Póczos 1 1 Carnegie Mellon University, 2 AI Foundations, IBM Research
More informationImitation Learning via Kernel Mean Embedding
Imitation Learning via Kernel Mean Embedding Kee-Eung Kim School of Computer Science KAIST kekim@cs.kaist.ac.kr Hyun Soo Park Department of Computer Science and Engineering University of Minnesota hspark@umn.edu
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationMMD GAN: Towards Deeper Understanding of Moment Matching Network
MMD GAN: Towards Deeper Understanding of Moment Matching Network Chun-Liang Li 1, Wei-Cheng Chang 1, Yu Cheng 2 Yiming Yang 1 Barnabás Póczos 1 1 Carnegie Mellon University, 2 AI Foundations, IBM Research
More informationMMD GAN 1 Fisher GAN 2
MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber
More informationExamples are not Enough, Learn to Criticize! Criticism for Interpretability
Examples are not Enough, Learn to Criticize! Criticism for Interpretability Been Kim, Rajiv Khanna, Oluwasanmi Koyejo Wittawat Jitkrittum Gatsby Machine Learning Journal Club 16 Jan 2017 1/20 Summary Examples
More informationTopics in kernel hypothesis testing
Topics in kernel hypothesis testing Kacper Chwialkowski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department
More informationHypothesis Testing with Kernel Embeddings on Interdependent Data
Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationA Wild Bootstrap for Degenerate Kernel Tests
A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby
More informationForefront of the Two Sample Problem
Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationNonparametric Indepedence Tests: Space Partitioning and Kernel Approaches
Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology
More informationUnsupervised Nonparametric Anomaly Detection: A Kernel Method
Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationarxiv: v3 [cs.lg] 15 Nov 2018
Distance Measure Machines arxiv:1803.00250v3 [cs.lg] 15 Nov 2018 Alain Rakotomamonjy Abraham Traoré Maxime Bérar Rémi Flamary Nicolas Courty LITIS EA4108 Université de Rouen and Criteo AI Labs Criteo Paris
More informationTraining generative neural networks via Maximum Mean Discrepancy optimization
Training generative neural networks via Maximum Mean Discrepancy optimization Gintare Karolina Dziugaite University of Cambridge Daniel M. Roy University of Toronto Zoubin Ghahramani University of Cambridge
More informationarxiv: v2 [stat.ml] 3 Sep 2015
Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationMeasuring Sample Quality with Stein s Method
Measuring Sample Quality with Stein s Method Lester Mackey, Joint work with Jackson Gorham Microsoft Research Stanford University July 4, 2017 Mackey (MSR) Kernel Stein Discrepancy July 4, 2017 1 / 25
More informationB-tests: Low Variance Kernel Two-Sample Tests
B-tests: Low Variance Kernel Two-Sample Tests Wojciech Zaremba, Arthur Gretton, Matthew Blaschko To cite this version: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko. B-tests: Low Variance Kernel Two-
More informationGeneralized clustering via kernel embeddings
Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,
More informationDistribution Regression
Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481
More informationLearning to Sample Using Stein Discrepancy
Learning to Sample Using Stein Discrepancy Dilin Wang Yihao Feng Qiang Liu Department of Computer Science Dartmouth College Hanover, NH 03755 {dilin.wang.gr, yihao.feng.gr, qiang.liu}@dartmouth.edu Abstract
More informationWasserstein GAN. Juho Lee. Jan 23, 2017
Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)
More informationManny Parzen, Cherished Teacher And a Tale of Two Kernels. Grace Wahba
Manny Parzen, Cherished Teacher And a Tale of Two Kernels Grace Wahba From Emanuel Parzen and a Tale of Two Kernels Memorial Session for Manny Parzen Joint Statistical Meetings Baltimore, MD August 2,
More informationWasserstein Distance Measure Machines
Alain Rakotomamonjy, Abraham Traore, Maxime Berar, Rémi Flamary, Nicolas Courty To cite this version: Alain Rakotomamonjy, Abraham Traore, Maxime Berar, Rémi Flamary, Nicolas Courty. Wasserstein Distance
More informationEnergy-Based Generative Adversarial Network
Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial
More informationCausal Inference in Time Series via Supervised Learning
Causal Inference in Time Series via Supervised Learning Yoichi Chikahara and Akinori Fujino NTT Communication Science Laboratories, Kyoto 619-0237, Japan chikahara.yoichi@lab.ntt.co.jp, fujino.akinori@lab.ntt.co.jp
More informationRecovering Distributions from Gaussian RKHS Embeddings
Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded
More informationf-gan: Training Generative Neural Samplers using Variational Divergence Minimization
f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin,
More informationCharacterizing Independence with Tensor Product Kernels
Characterizing Independence with Tensor Product Kernels Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Department of Statistics, PSU December 13, 2017 Zolta n Szabo
More informationA QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS
A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationStatistical inference for MEG
Statistical inference for MEG Vladimir Litvak Wellcome Trust Centre for Neuroimaging University College London, UK MEG-UK 2014 educational day Talk aims Show main ideas of common methods Explain some of
More informationLecture 2: Mappings of Probabilities to RKHS and Applications
Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationarxiv: v1 [cs.lg] 20 Apr 2017
Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).
More informationNon-parametric Estimation of Integral Probability Metrics
Non-parametric Estimation of Integral Probability Metrics Bharath K. Sriperumbudur, Kenji Fukumizu 2, Arthur Gretton 3,4, Bernhard Schölkopf 4 and Gert R. G. Lanckriet Department of ECE, UC San Diego,
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationarxiv: v2 [cs.lg] 8 Jun 2017
Youssef Mroueh * 1 2 Tom Sercu * 1 2 Vaibhava Goel 2 arxiv:1702.08398v2 [cs.lg] 8 Jun 2017 Abstract We introduce new families of Integral Probability Metrics (IPM) for training Generative Adversarial Networks
More informationarxiv: v1 [stat.ml] 14 May 2015
Training generative neural networks via aximum ean Discrepancy optimization Gintare Karolina Dziugaite University of Cambridge Daniel. Roy University of Toronto Zoubin Ghahramani University of Cambridge
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationHypothesis testing:power, test statistic CMS:
Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this
More informationFisher GAN. Abstract. 1 Introduction
Fisher GAN Youssef Mroueh, Tom Sercu mroueh@us.ibm.com, tom.sercu@ibm.com Equal Contribution AI Foundations, IBM Research AI IBM T.J Watson Research Center Abstract Generative Adversarial Networks (GANs)
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationModel Selection for Galaxy Shapes
Model Selection for Galaxy Shapes Chunzhe Zhang Advisor: Thomas C.M. Lee Collaborators: Vinay Kashyap, Andreas Zezas May 24, 2016 Chunzhe Zhang (UCDavis) Model Selection May 24, 2016 1 / 20 Overview 1
More informationarxiv: v4 [stat.ml] 27 Feb 2017
Shakir Mohamed 1 Balaji Lakshminarayanan 1 arxiv:1610.03483v4 [stat.ml] 27 Feb 2017 Abstract Generative adversarial networks (GANs) provide an algorithmic framework for constructing generative models with
More informationNonparametric Density Estimation with Adversarial Losses
Nonparametric Density stimation with Adversarial Losses Shashank Singh 1,, Ananya Uppal 3 Boyue Li 4 Chun-Liang Li 1 Manzil Zaheer 1 Barnabás Póczos 1 1 Machine Learning Department Department of Statistics
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationKernel-Based Information Criterion
Kernel-Based Information Criterion Somayeh Danafar 1, Kenji Fukumizu 3 & Faustino Gomez 2 Computer and Information Science; Vol. 8, No. 1; 2015 ISSN 1913-8989 E-ISSN 1913-8997 Published by Canadian Center
More informationPopulation Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London
Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010 Coding so far... Time-series for both spikes and stimuli Empirical
More informationSupplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization
Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group
More informationGenerative adversarial networks
14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu Dilin Wang Department of Computer Science Dartmouth College Hanover, NH 03755 {qiang.liu, dilin.wang.gr}@dartmouth.edu
More informationContent by Week Week of October 14 27
Content by Week Week of October 14 27 Learning objectives By the end of this week, you should be able to: Understand the purpose and interpretation of confidence intervals for the mean, Calculate confidence
More informationPolitical Science 236 Hypothesis Testing: Review and Bootstrapping
Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The
More informationSome theoretical properties of GANs. Gérard Biau Toulouse, September 2018
Some theoretical properties of GANs Gérard Biau Toulouse, September 2018 Coauthors Benoît Cadre (ENS Rennes) Maxime Sangnier (Sorbonne University) Ugo Tanielian (Sorbonne University & Criteo) 1 video Source:
More informationFirst Order Generative Adversarial Networks
Calvin Seward 1 2 Thomas Unterthiner 2 Urs Bergmann 1 Nikolay Jetchev 1 Sepp Hochreiter 2 Abstract GANs excel at learning high dimensional distributions, but they can update generator parameters in directions
More informationlearning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015
learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 Introduction Often, training distribution does not match testing distribution Want to utilize information about test
More informationMeasuring Sample Quality with Kernels
Jackson Gorham 1 Lester Mackey 2 Abstract Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail
More informationarxiv: v1 [stat.ml] 28 Aug 2017
Characteristic and Universal Tensor Product Kernels arxiv:1708.08157v1 [stat.l] 8 Aug 017 Zoltán Szabó CAP, École Polytechnique Route de Saclay, 9118 Palaiseau, France Bharath K. Sriperumbudur Department
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationTwo-stage Sampled Learning Theory on Distributions
Two-stage Sampled Learning Theory on Distributions Zoltán Szabó (Gatsby Unit, UCL) Joint work with Arthur Gretton (Gatsby Unit, UCL), Barnabás Póczos (ML Department, CMU), Bharath K. Sriperumbudur (Department
More informationarxiv: v1 [cs.lg] 22 Jun 2009
Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this
More informationGANs. Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG
GANs Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG Machine Learning: Jordan Boyd-Graber UMD GANs 1 / 7 Problems with Generation Generative Models Ain t Perfect
More informationStein Points. Abstract. 1. Introduction. Wilson Ye Chen 1 Lester Mackey 2 Jackson Gorham 3 François-Xavier Briol Chris. J.
Wilson Ye Chen Lester Mackey 2 Jackson Gorham 3 François-Xavier Briol 4 5 6 Chris. J. Oates 7 6 Abstract An important task in computational statistics and machine learning is to approximate a posterior
More informationMinimax-optimal distribution regression
Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,
More informationNeed for Sampling in Machine Learning. Sargur Srihari
Need for Sampling in Machine Learning Sargur srihari@cedar.buffalo.edu 1 Rationale for Sampling 1. ML methods model data with probability distributions E.g., p(x,y; θ) 2. Models are used to answer queries,
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationConvergence guarantees for kernel-based quadrature rules in misspecified settings
Convergence guarantees for kernel-based quadrature rules in misspecified settings Motonobu Kanagawa, Bharath K Sriperumbudur, Kenji Fukumizu The Institute of Statistical Mathematics, Tokyo 19-856, Japan
More informationarxiv: v1 [stat.ml] 2 Jun 2016
f-gan: Training Generative Neural Samplers using Variational Divergence Minimization arxiv:66.79v [stat.ml] Jun 6 Sebastian Nowozin Machine Intelligence and Perception Group Microsoft Research Cambridge,
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationInformation Theoretical Estimators Toolbox
Journal of Machine Learning Research 15 (2014) 283-287 Submitted 10/12; Revised 9/13; Published 1/14 Information Theoretical Estimators Toolbox Zoltán Szabó Gatsby Computational Neuroscience Unit Centre
More informationRobustness in GANs and in Black-box Optimization
Robustness in GANs and in Black-box Optimization Stefanie Jegelka MIT CSAIL joint work with Zhi Xu, Chengtao Li, Ilija Bogunovic, Jonathan Scarlett and Volkan Cevher Robustness in ML noise Generator Critic
More information