learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015
|
|
- Aileen Garrison
- 5 years ago
- Views:
Transcription
1 learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015
2 Introduction Often, training distribution does not match testing distribution Want to utilize information about test distribution Correct bias or discrepancy between training and testing distributions 1
3 Importance Weighting Labeled training data from source distribution Q Unlabeled test data from target distribution P Weight the cost of errors on training instances. Common definition of weight for point x: w(x) = P(x)/Q(x) 2
4 Importance Weighting Reasonable method, but sometimes doesn t work Can we give generalization bounds for this method? When does DA work? When does it not work? How should we weight the costs? 3
5 Overview Preliminaries Learning guarantee when loss is bounded Learning guarantee when loss is unbounded, but second moment is bounded Algorithm 4
6 Preliminaries: Rényi divergence For α 0, D α (P Q) between distributions P and Q D α (P Q) = 1 α 1 log 2 d α (P Q) = 2 D α(p Q) = ( P(x) P(x) Q(x) x [ x ) α 1 ] 1 P α α 1 (x) Q α 1 (x) Metric of info lost when Q is used to approximate P D α (P Q) = 0 iff P = Q 5
7 Preliminaries: Importance weights Lemma 1: Proof: E[w] = 1 E[w 2 ] = d 2 (P Q) σ 2 = d 2 (P Q) 1 E Q [w 2 ] = x X w 2 (x)q(x) = x X ( ) 2 P(x) Q(x) = d 2 (P Q) Q(x) Lemma 2: For all α > 0 and x X, E Q [w 2 (x)l 2 h (x)] d α+1(p Q)R(h) 1 1 α 6
8 Preliminaries: Importance weights Hölder s Inequality (Jin, Wilson, and Nobel, 2014): Let 1 p + 1 q = 1, then ( ) 1 ( a x b x a x p p ) 1 b x q q x x x 7
9 Preliminaries: Importance weights Proof for Lemma 2: Let the loss be bounded by B = 1, then E x Q [w 2 (x)l 2 h (x)] = x [ ] 2 P(x) Q(x) L 2 h Q(x) (x) = x P(x) 1 α [ ] P(x) P(x) α 1 α L 2 Q(x) h (x) [ [ ] α ] 1 [ P(x) α P(x) P(x)L 2α h Q(x) x [ = d α+1 (P Q) x x P(x)L h (x)l α+1 α 1 h α 1 ] α 1 α (x) ] α 1 α (x) d α+1 (P Q)R(h) 1 1 α B 1+ 1 α = dα+1 (P Q)R(h) 1 1 α 8
10 Learning Guarantees: Bounded case P(x) sup x w(x) = sup x Q(x) = d (P Q) = M. Let d (P Q) < +. Fix h H. Then, for any δ > 0, with probability at least 1 δ, R(h) ˆR w (h) M log 2 δ 2m M can be very large, so we naturally want a more favorable bound... 9
11 Overview Preliminaries Learning guarantee when loss is bounded Learning guarantee when loss is unbounded, but second moment is bounded Algorithm 10
12 Learning Guarantees: Bounded case Theorem 1: Fix h H. For any α 1, for any δ > 0, with probability at least 1 δ, the following bound holds: R(h) ˆR w (h) + 2M log 1 δ 3m + 2[d α+1 (P Q)R(h) 1 1 α R(h) 2 ] log 1 δ m 11
13 Learning Guarantees: Bounded case Bernstein s inequality (Bernstein 1946): when x i M. ( 1 Pr n n i=1 ) ( nϵ 2 ) x i ϵ exp 2σ 2 + 2Mϵ/3 12
14 Learning Guarantees: Bounded case Proof of Theorem 1: Let Z be the random variable w(x)l h (x) R(x). Then Z M. Thus, by lemma 2, the variance of Z can be bounded in terms of d α+1 (P Q): σ 2 (Z) = E Q [w 2 (x)l h (x) 2 )] R(h) 2 d α+1 (P Q)R(h) 1 1 α R(h) 2 ( mϵ Pr[R(h) ˆR 2 ) /2 w (h) > ϵ] exp σ 2. (Z) + ϵm/3 13
15 Learning Guarantees: Bounded case Thus, setting δ to match upper bound, then with probability at least 1 δ R(h) ˆR w (h) + 2M log 1 δ 3m + M 2 log 2 1 δ 9m 2 + 2σ2 (Z) log 1 δ m = ˆR w (h) + 2M log 1 δ 3m + 2σ 2 (Z) log 1 δ m 14
16 Learning Guarantees: Bounded case Theorem 2: Let H be a finite hypothesis set. Then for any δ > 0, with probability at least 1 δ, the following holds for the importance weighting method: R(h) ˆR w (h) + 2M(log H + log 1 δ ) 3m + 2d 2 (P Q)(log H = log 1 δ m 15
17 Learning Guarantees: Bounded case Theorem 2 holds when α = 1. Note that theorem 1 can be simplified in the case of α = 1: R(h) ˆR w (h) + 2M log 1 δ 3m + 2d 2 (P Q) log 1 δ m Thus, theorem 2 follows by including the cardinality of H 16
18 Learning Guarantees: Bounded case Proposition 2: Lower bound. Assume M < and σ 2 (w)/m 2 1/m. Assume there exists h 0 H such that L h0 (x) = 1 for all x. There exists an absolute constant c, c = 2/41 2, such that [ ] d2 (P Q) 1 Pr sup R(h) ˆR w (h) c > 0 h H 4m Proof from theorem 9 of Cortes, Mansour, and Mohri,
19 Overview Preliminaries Learning guarantee when loss is bounded Learning guarantee when loss is unbounded, but second moment is bounded Algorithm 18
20 Learning Guarantees: Unbounded case d (P Q) < does not always hold... Assume P and Q follow a Guassian distribution with σ P and σ Q with means µ and µ P(x) Q(x) = σ [ P exp σ2 Q (x µ)2 σp 2(x µ ) 2 ] σ Q 2σ 2 P σ2 Q Thus, even if σ P = σ Q and µ µ P(x), d (P Q) = sup x Q(x) =, thus Theorem 1 is not informative. 19
21 Learning Guarantees: Unbounded case However, the variance of the importance weights is bounded. d w (P Q) = σ 2 P σ + Q 2π [ exp 2σ2 Q (x µ)2 σp 2(x µ ) 2 ] 2σP 2 σq 2 dx 20
22 Learning Guarantees: Unbounded case Intuition: if µ = µ and σ P >> σ Q Q provides some useful information about P But sample from Q only has few points far from µ A few extreme sample points would have large weights Likewise, if σ P = σ Q but µ >> µ, weights would be negligible. 21
23 Learning Guarantees: Unbounded case Theorem 3: Let H be a hypothesis set such that Pdim({L h (x) : H H}) = p <. Assume that d 2 (P Q) < + and w(x) 0 for all x. Then for δ > 0, with probability at least 1 δ, the following holds: R(h) ˆR w (h) + 2 5/4 8 d 2 (P Q) 3 p log 2me p + log 4 δ m 22
24 Learning Guarantees: Unbounded case Proof outline (full proof in of Cortes, Mansour, Mohri, 2010): [ ] E[L Pr sup h ] Ê[L h ] Ê[L h H > ϵ 2 + log 1 2 h ] ϵ [ ] ˆPr[L h>t] Pr[L Pr sup h >t] h H,t R > ϵ ˆPr[L h >t] [ ] ( ) R(h) ˆR(h) Pr sup h H > ϵ 2 + log 1 R(h) ϵ 4Π H (2m) exp mϵ2 4 [ ] Pr sup h H E[L h(x)] Ê[L h (x)] > ϵ 2 + log 1 E[L 2 h (x)] ϵ ( ) 4 exp p log 2em p mϵ2 4 [ ] ( ) E[L Pr sup h (x)] Ê[L h (x)] h H > ϵ 4 exp p log 2em E[L 2 h (x)] p mϵ8/3 4 5/3 E[L h (x)] Ê[L h (x)] 2 5/4 max{ E[L 2 h Ê[L (x)], 2 8 h (x)]} 3 p log 2me p +log 8 δ m 23
25 Learning Guarantees: Unbounded case Thus, we can show the following: Pr [ ] ( R(h) ˆR w (h) sup > ϵ 4 exp p log 2em ) h H d2 (P Q) p mϵ8/3. 4 5/3 Where p = Pdim({L h (x) : h H} is the pseudo-dimension of H = {w(x)l h (x) : h H}. Note, any set shattered by H is shattered by H, since there exists a subset B of a set A that is shattered by H, such that H shatters A with witnesses s i = r i /w(x i ). 24
26 Overview Preliminaries Learning guarantee when loss is bounded Learning guarantee when loss is unbounded, but second moment is bounded Algorithm 25
27 Alternative algorithms We can generalize this analysis to an arbitrary function u : X R, u > 0. Let ˆR u (h) = 1 m m i=1 u(x i)l h (x i ) and let ˆQ be the empirical distribution: Theorem 4: Let H be a hypothesis set such that Pdim({L h (x) : h H}) = p <. Assume that 0 < E Q [u 2 (x)] < + and u(x) 0 for all x. Then for any δ > 0 with probability at least 1 δ, R(h) ˆR u (h) E Q [ [w(x) u(x)]lh (x) ] ) +2 5/4 max ( E Q [u 2 (x)l 2h (x)], EˆQ[u 2 (x)l 2 h (x)] 38 p log 2me p + log 4 δ m 26
28 Alternative algorithms Other functions u than w can be used to reweight cost of error Minimize upper bound ( EQ ) max [u 2 ], EˆQ[u 2 ] E Q [u 2 ] ( 1 + O(1/ m) ), [ ] min u U E w(x u(x) + γ E Q [u 2 ] Trade-off between bias and variance minimization. 27
29 Alternative algorithms 28
30 Alternative algorithms 29
Learning Bounds for Importance Weighting
Learning Bounds for Importance Weighting Corinna Cortes Google Research corinna@google.com Yishay Mansour Tel-Aviv University mansour@tau.ac.il Mehryar Mohri Courant & Google mohri@cims.nyu.edu Motivation
More informationSample Selection Bias Correction
Sample Selection Bias Correction Afshin Rostamizadeh Joint work with: Corinna Cortes, Mehryar Mohri & Michael Riley Courant Institute & Google Research Motivation Critical Assumption: Samples for training
More informationLearning with Imperfect Data
Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationThe PAC Learning Framework -II
The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline
More informationAuthors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania)
Learning Bouds for Domain Adaptation Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania) Presentation by: Afshin Rostamizadeh (New York
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationi=1 = H t 1 (x) + α t h t (x)
AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that
More informationIntroduction to Machine Learning
Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationPenalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms
university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationChapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma
Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma Theory of testing hypotheses X: a sample from a population P in P, a family of populations. Based on the observed X, we test a
More informationClassification: The PAC Learning Framework
Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationSupervised Machine Learning (Spring 2014) Homework 2, sample solutions
58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationInformation measures in simple coding problems
Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random
More informationThe Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationAdvanced Machine Learning
Advanced Machine Learning Domain Adaptation MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Non-Ideal World time real world ideal domain sampling page 2 Outline Domain adaptation. Multiple-source
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationHypothesis Testing with Communication Constraints
Hypothesis Testing with Communication Constraints Dinesh Krithivasan EECS 750 April 17, 2006 Dinesh Krithivasan (EECS 750) Hyp. testing with comm. constraints April 17, 2006 1 / 21 Presentation Outline
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More information576M 2, we. Yi. Using Bernstein s inequality with the fact that E[Yi ] = Thus, P (S T 0.5T t) e 0.5t2
APPENDIX he Appendix is structured as follows: Section A contains the missing proofs, Section B contains the result of the applicability of our techniques for Stackelberg games, Section C constains results
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationMMD GAN 1 Fisher GAN 2
MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationModels of Language Acquisition: Part II
Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical
More informationMathematical Foundations of Supervised Learning
Mathematical Foundations of Supervised Learning (growing lecture notes) Michael M. Wolf November 26, 2017 Contents Introduction 5 1 Learning Theory 7 1.1 Statistical framework.........................
More informationDomain-Adversarial Neural Networks
Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationSelective Prediction. Binary classifications. Rong Zhou November 8, 2017
Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?
More informationProofs, Tables and Figures
e-companion to Levi, Perakis and Uichanco: Data-driven Newsvendor Prolem ec Proofs, Tales and Figures In this electronic companion to the paper, we provide proofs of the theorems and lemmas. This companion
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationApplication of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.
Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationPerformance Metrics for Machine Learning. Sargur N. Srihari
Performance Metrics for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Performance Metrics 2. Default Baseline Models 3. Determining whether to gather more data 4. Selecting hyperparamaters
More informationPreliminary check: are the terms that we are adding up go to zero or not? If not, proceed! If the terms a n are going to zero, pick another test.
Throughout these templates, let series. be a series. We hope to determine the convergence of this Divergence Test: If lim is not zero or does not exist, then the series diverges. Preliminary check: are
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationSource Coding with Lists and Rényi Entropy or The Honey-Do Problem
Source Coding with Lists and Rényi Entropy or The Honey-Do Problem Amos Lapidoth ETH Zurich October 8, 2013 Joint work with Christoph Bunte. A Task from your Spouse Using a fixed number of bits, your spouse
More informationReferences for online kernel methods
References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square
More informationRademacher Bounds for Non-i.i.d. Processes
Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based
More informationEstimating the accuracy of a hypothesis Setting. Assume a binary classification setting
Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier
More informationKyle Reing University of Southern California April 18, 2018
Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationRelevant sections from AMATH 351 Course Notes (Wainwright): 1.3 Relevant sections from AMATH 351 Course Notes (Poulin and Ingalls): 1.1.
Lecture 8 Qualitative Behaviour of Solutions to ODEs Relevant sections from AMATH 351 Course Notes (Wainwright): 1.3 Relevant sections from AMATH 351 Course Notes (Poulin and Ingalls): 1.1.1 The last few
More informationA Readable Introduction to Real Mathematics
Solutions to selected problems in the book A Readable Introduction to Real Mathematics D. Rosenthal, D. Rosenthal, P. Rosenthal Chapter 10: Sizes of Infinite Sets 1. Show that the set of all polynomials
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More informationLecture 5 - Hausdorff and Gromov-Hausdorff Distance
Lecture 5 - Hausdorff and Gromov-Hausdorff Distance August 1, 2011 1 Definition and Basic Properties Given a metric space X, the set of closed sets of X supports a metric, the Hausdorff metric. If A is
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationEC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries
EC 521 MATHEMATICAL METHODS FOR ECONOMICS Lecture 1: Preliminaries Murat YILMAZ Boğaziçi University In this lecture we provide some basic facts from both Linear Algebra and Real Analysis, which are going
More informationIntroduction to Machine Learning
Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationAlgebraic Information Geometry for Learning Machines with Singularities
Algebraic Information Geometry for Learning Machines with Singularities Sumio Watanabe Precision and Intelligence Laboratory Tokyo Institute of Technology 4259 Nagatsuta, Midori-ku, Yokohama, 226-8503
More informationCIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse
CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, 20 Lecturer: Aaron Roth Lecture 6 Scribe: Aaron Roth Finishing up from last time. Last time we showed: The Net Mechanism: A Partial
More informationAN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES
AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES DUSTIN HEDMARK Abstract. A study of the conditions under which a topological space is metrizable, concluding with a proof of the Nagata Smirnov
More informationMA5206 Homework 4. Group 4. April 26, ϕ 1 = 1, ϕ n (x) = 1 n 2 ϕ 1(n 2 x). = 1 and h n C 0. For any ξ ( 1 n, 2 n 2 ), n 3, h n (t) ξ t dt
MA526 Homework 4 Group 4 April 26, 26 Qn 6.2 Show that H is not bounded as a map: L L. Deduce from this that H is not bounded as a map L L. Let {ϕ n } be an approximation of the identity s.t. ϕ C, sptϕ
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More informationCauchy Sequences. x n = 1 ( ) 2 1 1, . As you well know, k! n 1. 1 k! = e, = k! k=0. k = k=1
Cauchy Sequences The Definition. I will introduce the main idea by contrasting three sequences of rational numbers. In each case, the universal set of numbers will be the set Q of rational numbers; all
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationIntroduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More information7.1 Coupling from the Past
Georgia Tech Fall 2006 Markov Chain Monte Carlo Methods Lecture 7: September 12, 2006 Coupling from the Past Eric Vigoda 7.1 Coupling from the Past 7.1.1 Introduction We saw in the last lecture how Markov
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationALGEBRAIC PROPERTIES OF THE LATTICE R nx
CHAPTER 6 ALGEBRAIC PROPERTIES OF THE LATTICE R nx 6.0 Introduction This chapter continues to explore further algebraic and topological properties of the complete lattice Rnx described in Chapter 5. We
More informationarxiv: v3 [cs.ai] 20 Nov 2015
Learning Adversary Behavior in Security Games: A PAC Model Perspective Arunesh Sinha, Debarun Kar, Milind Tambe University of Southern California {aruneshs, dkar, tambe}@usc.edu arxiv:5.00043v3 [cs.ai]
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationHW3. All layers except the first will simply pass through the previous layer: W (t+1) . (The last layer will pass the first neuron).
HW3 Inbal Joffe Aran Carmon Theory Questions 1. Let s look at the inputs set S = x 1,..., x d R d such that (x i ) j = δ ij (that is, 1 if i = j, and 0 otherwise). The size of the set is d. And for every
More informationSample complexity bounds for differentially private learning
Sample complexity bounds for differentially private learning Kamalika Chaudhuri University of California, San Diego Daniel Hsu Microsoft Research Outline 1. Learning and privacy model 2. Our results: sample
More informationFA Homework 5 Recitation 1. Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks
FA17 10-701 Homework 5 Recitation 1 Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks Note Remember that there is no problem set covering some of the lecture material; you may need to study these topics
More information