learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015

Size: px
Start display at page:

Download "learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015"

Transcription

1 learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015

2 Introduction Often, training distribution does not match testing distribution Want to utilize information about test distribution Correct bias or discrepancy between training and testing distributions 1

3 Importance Weighting Labeled training data from source distribution Q Unlabeled test data from target distribution P Weight the cost of errors on training instances. Common definition of weight for point x: w(x) = P(x)/Q(x) 2

4 Importance Weighting Reasonable method, but sometimes doesn t work Can we give generalization bounds for this method? When does DA work? When does it not work? How should we weight the costs? 3

5 Overview Preliminaries Learning guarantee when loss is bounded Learning guarantee when loss is unbounded, but second moment is bounded Algorithm 4

6 Preliminaries: Rényi divergence For α 0, D α (P Q) between distributions P and Q D α (P Q) = 1 α 1 log 2 d α (P Q) = 2 D α(p Q) = ( P(x) P(x) Q(x) x [ x ) α 1 ] 1 P α α 1 (x) Q α 1 (x) Metric of info lost when Q is used to approximate P D α (P Q) = 0 iff P = Q 5

7 Preliminaries: Importance weights Lemma 1: Proof: E[w] = 1 E[w 2 ] = d 2 (P Q) σ 2 = d 2 (P Q) 1 E Q [w 2 ] = x X w 2 (x)q(x) = x X ( ) 2 P(x) Q(x) = d 2 (P Q) Q(x) Lemma 2: For all α > 0 and x X, E Q [w 2 (x)l 2 h (x)] d α+1(p Q)R(h) 1 1 α 6

8 Preliminaries: Importance weights Hölder s Inequality (Jin, Wilson, and Nobel, 2014): Let 1 p + 1 q = 1, then ( ) 1 ( a x b x a x p p ) 1 b x q q x x x 7

9 Preliminaries: Importance weights Proof for Lemma 2: Let the loss be bounded by B = 1, then E x Q [w 2 (x)l 2 h (x)] = x [ ] 2 P(x) Q(x) L 2 h Q(x) (x) = x P(x) 1 α [ ] P(x) P(x) α 1 α L 2 Q(x) h (x) [ [ ] α ] 1 [ P(x) α P(x) P(x)L 2α h Q(x) x [ = d α+1 (P Q) x x P(x)L h (x)l α+1 α 1 h α 1 ] α 1 α (x) ] α 1 α (x) d α+1 (P Q)R(h) 1 1 α B 1+ 1 α = dα+1 (P Q)R(h) 1 1 α 8

10 Learning Guarantees: Bounded case P(x) sup x w(x) = sup x Q(x) = d (P Q) = M. Let d (P Q) < +. Fix h H. Then, for any δ > 0, with probability at least 1 δ, R(h) ˆR w (h) M log 2 δ 2m M can be very large, so we naturally want a more favorable bound... 9

11 Overview Preliminaries Learning guarantee when loss is bounded Learning guarantee when loss is unbounded, but second moment is bounded Algorithm 10

12 Learning Guarantees: Bounded case Theorem 1: Fix h H. For any α 1, for any δ > 0, with probability at least 1 δ, the following bound holds: R(h) ˆR w (h) + 2M log 1 δ 3m + 2[d α+1 (P Q)R(h) 1 1 α R(h) 2 ] log 1 δ m 11

13 Learning Guarantees: Bounded case Bernstein s inequality (Bernstein 1946): when x i M. ( 1 Pr n n i=1 ) ( nϵ 2 ) x i ϵ exp 2σ 2 + 2Mϵ/3 12

14 Learning Guarantees: Bounded case Proof of Theorem 1: Let Z be the random variable w(x)l h (x) R(x). Then Z M. Thus, by lemma 2, the variance of Z can be bounded in terms of d α+1 (P Q): σ 2 (Z) = E Q [w 2 (x)l h (x) 2 )] R(h) 2 d α+1 (P Q)R(h) 1 1 α R(h) 2 ( mϵ Pr[R(h) ˆR 2 ) /2 w (h) > ϵ] exp σ 2. (Z) + ϵm/3 13

15 Learning Guarantees: Bounded case Thus, setting δ to match upper bound, then with probability at least 1 δ R(h) ˆR w (h) + 2M log 1 δ 3m + M 2 log 2 1 δ 9m 2 + 2σ2 (Z) log 1 δ m = ˆR w (h) + 2M log 1 δ 3m + 2σ 2 (Z) log 1 δ m 14

16 Learning Guarantees: Bounded case Theorem 2: Let H be a finite hypothesis set. Then for any δ > 0, with probability at least 1 δ, the following holds for the importance weighting method: R(h) ˆR w (h) + 2M(log H + log 1 δ ) 3m + 2d 2 (P Q)(log H = log 1 δ m 15

17 Learning Guarantees: Bounded case Theorem 2 holds when α = 1. Note that theorem 1 can be simplified in the case of α = 1: R(h) ˆR w (h) + 2M log 1 δ 3m + 2d 2 (P Q) log 1 δ m Thus, theorem 2 follows by including the cardinality of H 16

18 Learning Guarantees: Bounded case Proposition 2: Lower bound. Assume M < and σ 2 (w)/m 2 1/m. Assume there exists h 0 H such that L h0 (x) = 1 for all x. There exists an absolute constant c, c = 2/41 2, such that [ ] d2 (P Q) 1 Pr sup R(h) ˆR w (h) c > 0 h H 4m Proof from theorem 9 of Cortes, Mansour, and Mohri,

19 Overview Preliminaries Learning guarantee when loss is bounded Learning guarantee when loss is unbounded, but second moment is bounded Algorithm 18

20 Learning Guarantees: Unbounded case d (P Q) < does not always hold... Assume P and Q follow a Guassian distribution with σ P and σ Q with means µ and µ P(x) Q(x) = σ [ P exp σ2 Q (x µ)2 σp 2(x µ ) 2 ] σ Q 2σ 2 P σ2 Q Thus, even if σ P = σ Q and µ µ P(x), d (P Q) = sup x Q(x) =, thus Theorem 1 is not informative. 19

21 Learning Guarantees: Unbounded case However, the variance of the importance weights is bounded. d w (P Q) = σ 2 P σ + Q 2π [ exp 2σ2 Q (x µ)2 σp 2(x µ ) 2 ] 2σP 2 σq 2 dx 20

22 Learning Guarantees: Unbounded case Intuition: if µ = µ and σ P >> σ Q Q provides some useful information about P But sample from Q only has few points far from µ A few extreme sample points would have large weights Likewise, if σ P = σ Q but µ >> µ, weights would be negligible. 21

23 Learning Guarantees: Unbounded case Theorem 3: Let H be a hypothesis set such that Pdim({L h (x) : H H}) = p <. Assume that d 2 (P Q) < + and w(x) 0 for all x. Then for δ > 0, with probability at least 1 δ, the following holds: R(h) ˆR w (h) + 2 5/4 8 d 2 (P Q) 3 p log 2me p + log 4 δ m 22

24 Learning Guarantees: Unbounded case Proof outline (full proof in of Cortes, Mansour, Mohri, 2010): [ ] E[L Pr sup h ] Ê[L h ] Ê[L h H > ϵ 2 + log 1 2 h ] ϵ [ ] ˆPr[L h>t] Pr[L Pr sup h >t] h H,t R > ϵ ˆPr[L h >t] [ ] ( ) R(h) ˆR(h) Pr sup h H > ϵ 2 + log 1 R(h) ϵ 4Π H (2m) exp mϵ2 4 [ ] Pr sup h H E[L h(x)] Ê[L h (x)] > ϵ 2 + log 1 E[L 2 h (x)] ϵ ( ) 4 exp p log 2em p mϵ2 4 [ ] ( ) E[L Pr sup h (x)] Ê[L h (x)] h H > ϵ 4 exp p log 2em E[L 2 h (x)] p mϵ8/3 4 5/3 E[L h (x)] Ê[L h (x)] 2 5/4 max{ E[L 2 h Ê[L (x)], 2 8 h (x)]} 3 p log 2me p +log 8 δ m 23

25 Learning Guarantees: Unbounded case Thus, we can show the following: Pr [ ] ( R(h) ˆR w (h) sup > ϵ 4 exp p log 2em ) h H d2 (P Q) p mϵ8/3. 4 5/3 Where p = Pdim({L h (x) : h H} is the pseudo-dimension of H = {w(x)l h (x) : h H}. Note, any set shattered by H is shattered by H, since there exists a subset B of a set A that is shattered by H, such that H shatters A with witnesses s i = r i /w(x i ). 24

26 Overview Preliminaries Learning guarantee when loss is bounded Learning guarantee when loss is unbounded, but second moment is bounded Algorithm 25

27 Alternative algorithms We can generalize this analysis to an arbitrary function u : X R, u > 0. Let ˆR u (h) = 1 m m i=1 u(x i)l h (x i ) and let ˆQ be the empirical distribution: Theorem 4: Let H be a hypothesis set such that Pdim({L h (x) : h H}) = p <. Assume that 0 < E Q [u 2 (x)] < + and u(x) 0 for all x. Then for any δ > 0 with probability at least 1 δ, R(h) ˆR u (h) E Q [ [w(x) u(x)]lh (x) ] ) +2 5/4 max ( E Q [u 2 (x)l 2h (x)], EˆQ[u 2 (x)l 2 h (x)] 38 p log 2me p + log 4 δ m 26

28 Alternative algorithms Other functions u than w can be used to reweight cost of error Minimize upper bound ( EQ ) max [u 2 ], EˆQ[u 2 ] E Q [u 2 ] ( 1 + O(1/ m) ), [ ] min u U E w(x u(x) + γ E Q [u 2 ] Trade-off between bias and variance minimization. 27

29 Alternative algorithms 28

30 Alternative algorithms 29

Learning Bounds for Importance Weighting

Learning Bounds for Importance Weighting Learning Bounds for Importance Weighting Corinna Cortes Google Research corinna@google.com Yishay Mansour Tel-Aviv University mansour@tau.ac.il Mehryar Mohri Courant & Google mohri@cims.nyu.edu Motivation

More information

Sample Selection Bias Correction

Sample Selection Bias Correction Sample Selection Bias Correction Afshin Rostamizadeh Joint work with: Corinna Cortes, Mehryar Mohri & Michael Riley Courant Institute & Google Research Motivation Critical Assumption: Samples for training

More information

Learning with Imperfect Data

Learning with Imperfect Data Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

The PAC Learning Framework -II

The PAC Learning Framework -II The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline

More information

Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania)

Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania) Learning Bouds for Domain Adaptation Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania) Presentation by: Afshin Rostamizadeh (New York

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Learning Kernels -Tutorial Part III: Theoretical Guarantees.

Learning Kernels -Tutorial Part III: Theoretical Guarantees. Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

i=1 = H t 1 (x) + α t h t (x)

i=1 = H t 1 (x) + α t h t (x) AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

Domain Adaptation for Regression

Domain Adaptation for Regression Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma Theory of testing hypotheses X: a sample from a population P in P, a family of populations. Based on the observed X, we test a

More information

Classification: The PAC Learning Framework

Classification: The PAC Learning Framework Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis Dimension The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and

More information

Learning Interpretable Features to Compare Distributions

Learning Interpretable Features to Compare Distributions Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Domain Adaptation MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Non-Ideal World time real world ideal domain sampling page 2 Outline Domain adaptation. Multiple-source

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

Hypothesis Testing with Communication Constraints

Hypothesis Testing with Communication Constraints Hypothesis Testing with Communication Constraints Dinesh Krithivasan EECS 750 April 17, 2006 Dinesh Krithivasan (EECS 750) Hyp. testing with comm. constraints April 17, 2006 1 / 21 Presentation Outline

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

576M 2, we. Yi. Using Bernstein s inequality with the fact that E[Yi ] = Thus, P (S T 0.5T t) e 0.5t2

576M 2, we. Yi. Using Bernstein s inequality with the fact that E[Yi ] = Thus, P (S T 0.5T t) e 0.5t2 APPENDIX he Appendix is structured as follows: Section A contains the missing proofs, Section B contains the result of the applicability of our techniques for Stackelberg games, Section C constains results

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

MMD GAN 1 Fisher GAN 2

MMD GAN 1 Fisher GAN 2 MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of

More information

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Models of Language Acquisition: Part II

Models of Language Acquisition: Part II Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical

More information

Mathematical Foundations of Supervised Learning

Mathematical Foundations of Supervised Learning Mathematical Foundations of Supervised Learning (growing lecture notes) Michael M. Wolf November 26, 2017 Contents Introduction 5 1 Learning Theory 7 1.1 Statistical framework.........................

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Selective Prediction. Binary classifications. Rong Zhou November 8, 2017

Selective Prediction. Binary classifications. Rong Zhou November 8, 2017 Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?

More information

Proofs, Tables and Figures

Proofs, Tables and Figures e-companion to Levi, Perakis and Uichanco: Data-driven Newsvendor Prolem ec Proofs, Tales and Figures In this electronic companion to the paper, we provide proofs of the theorems and lemmas. This companion

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University. Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Performance Metrics for Machine Learning. Sargur N. Srihari

Performance Metrics for Machine Learning. Sargur N. Srihari Performance Metrics for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Performance Metrics 2. Default Baseline Models 3. Determining whether to gather more data 4. Selecting hyperparamaters

More information

Preliminary check: are the terms that we are adding up go to zero or not? If not, proceed! If the terms a n are going to zero, pick another test.

Preliminary check: are the terms that we are adding up go to zero or not? If not, proceed! If the terms a n are going to zero, pick another test. Throughout these templates, let series. be a series. We hope to determine the convergence of this Divergence Test: If lim is not zero or does not exist, then the series diverges. Preliminary check: are

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

12 Statistical Justifications; the Bias-Variance Decomposition

12 Statistical Justifications; the Bias-Variance Decomposition Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Source Coding with Lists and Rényi Entropy or The Honey-Do Problem

Source Coding with Lists and Rényi Entropy or The Honey-Do Problem Source Coding with Lists and Rényi Entropy or The Honey-Do Problem Amos Lapidoth ETH Zurich October 8, 2013 Joint work with Christoph Bunte. A Task from your Spouse Using a fixed number of bits, your spouse

More information

References for online kernel methods

References for online kernel methods References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square

More information

Rademacher Bounds for Non-i.i.d. Processes

Rademacher Bounds for Non-i.i.d. Processes Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Kyle Reing University of Southern California April 18, 2018

Kyle Reing University of Southern California April 18, 2018 Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

Relevant sections from AMATH 351 Course Notes (Wainwright): 1.3 Relevant sections from AMATH 351 Course Notes (Poulin and Ingalls): 1.1.

Relevant sections from AMATH 351 Course Notes (Wainwright): 1.3 Relevant sections from AMATH 351 Course Notes (Poulin and Ingalls): 1.1. Lecture 8 Qualitative Behaviour of Solutions to ODEs Relevant sections from AMATH 351 Course Notes (Wainwright): 1.3 Relevant sections from AMATH 351 Course Notes (Poulin and Ingalls): 1.1.1 The last few

More information

A Readable Introduction to Real Mathematics

A Readable Introduction to Real Mathematics Solutions to selected problems in the book A Readable Introduction to Real Mathematics D. Rosenthal, D. Rosenthal, P. Rosenthal Chapter 10: Sizes of Infinite Sets 1. Show that the set of all polynomials

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

Lecture 5 - Hausdorff and Gromov-Hausdorff Distance

Lecture 5 - Hausdorff and Gromov-Hausdorff Distance Lecture 5 - Hausdorff and Gromov-Hausdorff Distance August 1, 2011 1 Definition and Basic Properties Given a metric space X, the set of closed sets of X supports a metric, the Hausdorff metric. If A is

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries EC 521 MATHEMATICAL METHODS FOR ECONOMICS Lecture 1: Preliminaries Murat YILMAZ Boğaziçi University In this lecture we provide some basic facts from both Linear Algebra and Real Analysis, which are going

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Algebraic Information Geometry for Learning Machines with Singularities

Algebraic Information Geometry for Learning Machines with Singularities Algebraic Information Geometry for Learning Machines with Singularities Sumio Watanabe Precision and Intelligence Laboratory Tokyo Institute of Technology 4259 Nagatsuta, Midori-ku, Yokohama, 226-8503

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse

CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, 20 Lecturer: Aaron Roth Lecture 6 Scribe: Aaron Roth Finishing up from last time. Last time we showed: The Net Mechanism: A Partial

More information

AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES

AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES DUSTIN HEDMARK Abstract. A study of the conditions under which a topological space is metrizable, concluding with a proof of the Nagata Smirnov

More information

MA5206 Homework 4. Group 4. April 26, ϕ 1 = 1, ϕ n (x) = 1 n 2 ϕ 1(n 2 x). = 1 and h n C 0. For any ξ ( 1 n, 2 n 2 ), n 3, h n (t) ξ t dt

MA5206 Homework 4. Group 4. April 26, ϕ 1 = 1, ϕ n (x) = 1 n 2 ϕ 1(n 2 x). = 1 and h n C 0. For any ξ ( 1 n, 2 n 2 ), n 3, h n (t) ξ t dt MA526 Homework 4 Group 4 April 26, 26 Qn 6.2 Show that H is not bounded as a map: L L. Deduce from this that H is not bounded as a map L L. Let {ϕ n } be an approximation of the identity s.t. ϕ C, sptϕ

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan

More information

Cauchy Sequences. x n = 1 ( ) 2 1 1, . As you well know, k! n 1. 1 k! = e, = k! k=0. k = k=1

Cauchy Sequences. x n = 1 ( ) 2 1 1, . As you well know, k! n 1. 1 k! = e, = k! k=0. k = k=1 Cauchy Sequences The Definition. I will introduce the main idea by contrasting three sequences of rational numbers. In each case, the universal set of numbers will be the set Q of rational numbers; all

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Time Series Prediction & Online Learning

Time Series Prediction & Online Learning Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

7.1 Coupling from the Past

7.1 Coupling from the Past Georgia Tech Fall 2006 Markov Chain Monte Carlo Methods Lecture 7: September 12, 2006 Coupling from the Past Eric Vigoda 7.1 Coupling from the Past 7.1.1 Introduction We saw in the last lecture how Markov

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

ALGEBRAIC PROPERTIES OF THE LATTICE R nx

ALGEBRAIC PROPERTIES OF THE LATTICE R nx CHAPTER 6 ALGEBRAIC PROPERTIES OF THE LATTICE R nx 6.0 Introduction This chapter continues to explore further algebraic and topological properties of the complete lattice Rnx described in Chapter 5. We

More information

arxiv: v3 [cs.ai] 20 Nov 2015

arxiv: v3 [cs.ai] 20 Nov 2015 Learning Adversary Behavior in Security Games: A PAC Model Perspective Arunesh Sinha, Debarun Kar, Milind Tambe University of Southern California {aruneshs, dkar, tambe}@usc.edu arxiv:5.00043v3 [cs.ai]

More information

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):

More information

HW3. All layers except the first will simply pass through the previous layer: W (t+1) . (The last layer will pass the first neuron).

HW3. All layers except the first will simply pass through the previous layer: W (t+1) . (The last layer will pass the first neuron). HW3 Inbal Joffe Aran Carmon Theory Questions 1. Let s look at the inputs set S = x 1,..., x d R d such that (x i ) j = δ ij (that is, 1 if i = j, and 0 otherwise). The size of the set is d. And for every

More information

Sample complexity bounds for differentially private learning

Sample complexity bounds for differentially private learning Sample complexity bounds for differentially private learning Kamalika Chaudhuri University of California, San Diego Daniel Hsu Microsoft Research Outline 1. Learning and privacy model 2. Our results: sample

More information

FA Homework 5 Recitation 1. Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks

FA Homework 5 Recitation 1. Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks FA17 10-701 Homework 5 Recitation 1 Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks Note Remember that there is no problem set covering some of the lecture material; you may need to study these topics

More information