Selective Prediction
|
|
- Blaise Hunt
- 5 years ago
- Views:
Transcription
1 COMS Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability to output I do t kow i additio to the regular biary outputs This is the behavior of a selective classifier I other words, a selective classifier is allowed to reject decisio makig without pealty Such classifiers are useful i makig medical predictios, because the cost of makig a wrog predictio is usually much higher tha refusig to make ay decisio i these situatios 11 Ideal Selective Classifier What would a ideal selective classifier looks like? Ituitively, misclassificatio rate should ot be the oly measuremet for selective classifiers Imagie a selective classifier that refuses makig ay predictio, such classifier is useless, but it has a misclassificatio rate of 0 Therefore, it makes sese to evaluate a selective classifier ot oly by its misclassificatio rate, but also the probability it will refuse makig predictio Let C be a selective classifier i biary classificatio settig, we defie the followig: Coverage (cover(c)): the probability that C predicts a label istead of refusig makig predictio Erorr (err(c)): the probability that the true label is differet from C s predictio whe C makes predictio Risk: risk(c) = err(c) cover(c) We seek to boud both error ad coverage of a ideal classifier with high probability (1 δ) where 0 δ 1 2 Realizable Settig Let X deote the feature space, D over X { 1, 1} be the uderlyig ukow data distributio Set S = {{x 1, y 1 }, {x 2, y 2 },, {x, y }} is a set of labelled examples, ad U = {x +1, x +2,, x +m } is a set of m ulabelled examples, where x i X ad y j { 1, 1} for 1 i + m, 1 j Let H be a set of hypotheses, ad h H be the target hypothesis such that the true label of x is the same as the predictio h (x) I additio, we 1
2 defie versio space V with respect to S to be the set of hypotheses that are cosistet with the examples i S We itroduce 3 selective classifiers: Cofidece-rated Predictor, CZ Selective Classifier ad Selective Classifier Strategy 21 Cofidece-rated Predictor A cofidece-rated predictor C maps from U to a set of m distributios over { 1, 0, 1} If the j-th distributio is [β j, 1 β j α j, α j ], the Pr{C(x +j ) = 1} = β j, Pr{C(x +j ) = 1} = α j ad Pr{C(x +j ) = 0} = 1 β j α j Algorithm 1 Cofidece-rated Predictor iput Labelled data S, ulabelled data U, error boud ɛ 1: Compute versio space V with respect to S 2: Solve the liear program: m max (α i + β i ) subject to: h V, i, α i + β i 1 i, α i, β i 0 β i + i:h(x +i )=1 i:h(x +i )= 1 α i ɛm 3: retur the cofidece-rated predictor {[β i, 1 β i α i, α i ], i = 1, 2,, m} Theorem 1 A cofidece-rated predictor produced by Algorithm 1 has a error guaratee ɛ with optimal coverage for the ulabelled examples i U Proof Accordig the the costraits i the liear program, 0 α i 1, 0 β i 1, 0 1 α i β i 1 ad the probability the predictor makes a wrog decisio with respect to the uiform distributio over U is less tha ɛm/m = ɛ Moreover, the liear program maximizes m α i + β i, which is equivalet as miimizig that is the coverage of the predictor 1 m m 1 α i β i Remark 1 The feasible regio of the liear program i Algorithm 1 is always o-empty Proof The cadidate solutio {[0, 1, 0], i = 1, 2,, 3} will always be i the feasible solutio set 2
3 22 CZ Selective Classifier A CZ selective classifier C is defied by a tuple (h, (γ 1, γ 2,, γ m )) where h H ad 0 γ i 1 for all i = 1, 2,, m Give ulabelled example x +i U, C predicts 0 with probability 1 γ i ad predicts C(x +i ) = h(x +i ) with probability γ i Algorithm 2 CZ Selective Classifier iput Labelled data S, ulabelled data U, error boud ɛ 1: Compute versio space V with respect to S 2: Radomly choose h 0 V 3: Solve the liear program: m max subject to: γ i i, 0 γ i 1 h V, γ i ɛm i:h(x +i ) h 0 (x +i ) 4: retur the CZ selective classifier (h 0, (γ 1, γ 2,, γ m )) Theorem 2 Let P be a cofidece-rated predictor produced by Algorithm 1 A CZ selective classifier C produced by Algorithm 2 has a error guaratees ɛ with cover(c) cover(p) ɛ Ituitively, there exists h 1 V that produces the most differet predictios o U from h 0 If h 1 ad h 0 lies i two opposite eds of the versio space, cover(c) would be worse tha the average case 23 Selective Classifier Strategy There are a few drawbacks usig cofidece-rated predictors ad CZ selective classifiers First, the umber of costraits ca be ifiite Secod, we eed to iput ulabelled dataset U ad these two methods oly produce predictio of examples i U However, we wat to geeralize the problem ad provide a solutio uder iductive settig Thus, we are goig to remove U from the iput to the algorithm I order to use selective classifier strategy, we itroduce a few additioal defiitios: Let d be the VC dimesio of H True error rate of hypothesis h is: err P (h) = Pr {h(x) Y } (X,Y ) D 3
4 empirical error rate of hypothesis h is: err S (h) = 1 1 {h(xi ) y i } For ay hypothesis class H, distributio D over X, ad real umber r > 0, ad V(h, r) = {h H : err P (h ) err P (h) + r} ˆV(h, r) = {h H : err S (h ) err S (h) + r} B(h, r) = {h H : Pr X D {h (X) h(x)} r} Disagreemet regio of hypothesis class H is: DIS(H) = {x X : h 1, h 2 H st h 1 (x) h 2 (x)} for G H, let G the volume of the disagreemet regio: G = Pr{DIS(G)} disagreemet coefficiet is: θ = sup r>0 B(h, r) r A ew type of selective classifier C st C(x) = (h, g)(x) = { h(x) if g(x) = 0 0 if g(x) = 0 ad cover(h, g) = E[g(X)] Algorithm 3 Selective Classifier Strategy iput labelled data S, d, δ Output a selective classifier (h, g) st risk(h, g) = risk(h, g) 1: Compute versio space V with respect to S 2: Radomly choose h 0 V 3: Set G = V 4: Costruct g st g(x) = 1 if ad oly if x {X \ DIS(G)} 5: Set h = h 0 4
5 Theorem 3 (Cosistet Hypothesis error rate boud i terms of VC dimesio, theorem 215 from lecture otes) For ay ad δ (0, 1), with probability at least 1 δ, every hypothesis h V has error rate err P (h) 4d l(2 + 1) + 4 l 4 δ Theorem 4 Let C = (h, g) be a selective classifier output by Algorithm 3, the h achieves 0 error rate whe it makes predictio, ad C achieves ear optimal coverage with probability 1 δ Proof Give g(x) = 1 if ad oly if x is ot i the disagreemet regio of the versio space V, whe h makes a predictio, it will always agree with all hypotheses i V icludig h Thus, h achieves 0 error rate, moreover risk(h, g) = risk(h, g) Now, we will show that C achieves ear optimal coverage with probability 1 δ Let r = 4d l(2+1)+4 l 4 δ, the for ay h V, h V(h, r) ad V V(h, r) Furthermore, if h V(h, r), h B(h, r) i the realizable settig Also, we kow that r (0, 1), B(h, r) θr Therefore, with probability at least 1 δ, V B(h, r) θr ad cover(h, g) = 1 V 1 θr = 1 θ 4d l(2 + 1) + 4 l 4 δ 3 The Noisy Settig I the oisy settig, the labels are correspodig to the predictio of target hypothesis h with oises We will show that with high probability, the selective classifier produced by Algorithm 4 achieves same error rate as h whe it makes predictio, ad its coverage is ear optimal Algorithm 4 Selective Classifier Strategy - Noisy iput labelled data S, d, δ Output a selective classifier (h, g) st risk(h, g) = risk(h, g) with probability 1 δ 1: Set ĥ = ERM(H, S) so that ĥ is ay empirical risk miimizer from H 2: Set G = ˆV(ĥ, 4 2e d l( d 2 )+l 8 δ ) 3: Costruct g st g(x) = 1 if ad oly if x {X \ DIS(G)} 4: Set h = ĥ Before we prove the error boud ad coverage of Algorithm 4, let s itroduce some ew defiitios: 5
6 We would like to geeralize the defiitio of risk usig a loss fuctio L(Y, Y), the risk(h, g) = E[L(h(X), Y )g(x)] cover(h, g) Excess loss class is defied as F = {L(h(x), y) L(h (x), y) : h H} Give 0 β 1 ad B 1, class F is said to be a (β, B)-Berstei class with respect to D, if every f F satisfies E[f 2 ] B(E[f]) β Lemma 1 If F is a (β, B)-Berstei class with respect to D, the for ay r > 0, Proof If h V(h, r), by defiitio of V Also V(h, r) B(h, Br β ) E[1 {h(x) Y } ] E[1 {h (X) Y }] + r E[1 {h(x) Y } 1 {h (X) Y }] r E[1 {h(x) h (X)}] = E[ 1 {h(x) Y } 1 {h (X) Y } ] = E[(1 {h(x) Y } 1 {h (X) Y }) 2 ] (1) = E[(L(h(X), Y ) L(h (X), Y )) 2 ] (2) = E[f 2 ] B(E[f]) β = B(E[1 {h(x) Y } 1 {h (X) Y }]) β Br β From (1) to (2) we assume usig 0-1 loss fuctio Ituitively, we ca imagie that B(h, Br β ) as havig a bigger tolerace such that the ball icludes V(h, r) as the 2D visualizatio Figure 1 show below Theorem 5 (Lemma 1 from [1]) For ay δ > 0, if H has VC dimesio d, with probability at least 1 δ 2e d l( d h H, err P (h) err S (h) ) + l 2 δ Similarly, uder the same coditio, h H, err S (h) err P (h) e d l( ) + l 2 d δ
7 Figure 1: 2D visualizatio of ituitio behid Lemma 1 Left: The yellow regio depicts the hypothesis class H The yellow regio that lies iside the gree dotted circle represets the hypotheses i V(h, r) Right: The yellow regio iside the red dotted circle represets the hypotheses i B(h, Br β ) By Lemma 1 we show that the red dotted circle fully cotais the gree dotted circle Also, because the differece i defiitio, the red dotted circle ad gree dotted circle have differet ceters Lemma 2 For ay r > 0, 0 < δ < 1, with probability at least 1 δ, ˆV(ĥ, r) V(h, 2σ(, δ/2, d) + r) where σ(, δ, d) = 2 2 2e d l( ) + l 2 d δ Proof If h ˆV(ĥ, r), the err S (h) err S (ĥ) + r (*) Accordig to Theorem 5, with probability at least 1 δ, err S (ĥ) err S(h ) (**) err P (h) err S (h) + σ(, δ/2, d) err S (h ) err P (h ) + σ(, δ/2, d) So with probability at least 1 δ, err P (h) + err S (h ) err S (h) + err P (h ) + 2σ(, δ/2, d) err P (h) + err S (h ) err S (ĥ) + r + err P (h ) + 2σ(, δ/2, d) [applyig (*)] err P (h) + err S (ĥ) err S(ĥ) + r + err P (h ) + 2σ(, δ/2, d) [applyig (**)] err P (h) err P (h ) + 2σ(, δ/2, d) + r err P (h) B(h, 2σ(, δ/2, d) + r) 7
8 Similarly, we visualize the ituitio below i Figure 2 Usig Theorem 5, we costruct a slightly bigger ball aroud h that icludes the empirical error ball Figure 2: 2D visualizatio of ituitio behid Lemma 2 Left: The blue regio, which is the itersectio of the gree dotted circle ad H, represets the hypotheses i ˆV(ĥ, r) Right: The itersectio of H ad the red dotted circle represets the hypotheses i V(h, r + ) I Lemma 2, we show that whe we set = 2σ(, δ/2, d), with probability at least 1 δ, hypotheses i ˆV(ĥ, r) are also i V(h, r + ) Lemma 3 Assume that H has disagreemet coefficiet θ, ad F is a (β, B)-Berstei class with respect to D, the for ay r > 0 ad 0 < δ < 1, with probability at least 1 δ, ˆV(ĥ, r) Bθ(2σ(, δ/2, d) + r)β Proof Combie Lemma 1 ad Lemma 2, we get with probability at least 1 δ ˆV(ĥ, r) V(h, 2σ(, δ/2, d) + r) B(h, B(2σ(, δ/2, d) + r) β ) Thus, ˆV(ĥ, r) B(h, B(2σ(, δ/2, d) + r) β ) Bθ(2σ(, δ/2, d) + r) β Theorem 6 Assume that H has disagreemet coefficiet θ ad that F is said to be a (β, B)- Berstei class with respect to D Let (h, g) be selective classifier output by Algorithm 4 the for ay r > 0 ad 0 < δ < 1, with probability at least 1 δ, cover(h, g) 1 θb(4σ(, δ/4, d) + r) β err P (h ) = err P (h) 8
9 Proof By Theorem 5, with probability at least 1 δ/2, err S (h ) err P (h ) + σ(, δ/4, d) err P (ĥ) errs(ĥ) + σ(, δ/4, d) The, with probability at least 1 δ/2, err S (h ) err S (ĥ) + 2σ(, δ/4, d), which implies h ˆV(ĥ, 2σ(, δ/4, d)) So, for ay x X, if g(x) = 1, ĥ(x) = h (x) Therefore, err P (h ) = err P (h) Fially, usig a uio boud, we ca applyig Lemma 3, we get that with probability at least 1 δ, cover(ĥ, g) = 1 G 1 θb(4σ(, δ/4, d))β err P (h ) = err P (h) Oe of the cocer with Algorithm 4 is that it is o so clear how to set β or B We kow that if the excess loss class is o-egative, the it is a (1, B)-Berstei class for ay B However, it is urealistic to assume o-egative excess loss class for arbitrary datasets Aother area may be iterestig for future research is to challege the idea of usig h for compariso i reliable learig uder the oisy settig Give that h still make mistakes, ca we do better tha h whe we decide to make a decisio give a reasoable coverage? Bibliographic otes Besides explicitly marked refereces, the cofidece-rated predictor ad CZ selective classifier are proposed by [2], ad the selective classifier strategy i both realizable settig ad oisy settig are proposed by [4] Refereces [1] Olivier Bousquet, Stéphae Bouchero, ad Gábor Lugosi Itroductio to statistical learig theory I Advaced lectures o machie learig, pages Spriger, 2004 [2] Kamalika Chaudhuri ad Chicheg Zhag Improved algorithms for cofidece-rated predictio with error guaratees 2013 [3] RL Rivest ad R Sloa A formal model of hierarchical cocept-learig If Comput, 114(1):88 114, October 1994 [4] Yair Wieer ad Ra El-Yaiv Agostic selective classificatio I Advaces i eural iformatio processig systems, pages ,
Selective Prediction. Binary classifications. Rong Zhou November 8, 2017
Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationLecture 12: February 28
10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationLecture 11: Pseudorandom functions
COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More informationDiscrete-Time Systems, LTI Systems, and Discrete-Time Convolution
EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationSummary. Recap ... Last Lecture. Summary. Theorem
Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationOn the Theory of Learning with Privileged Information
O the Theory of Learig with Privileged Iformatio Dmitry Pechyoy NEC Laboratories Priceto, NJ 08540, USA pechyoy@ec-labs.com Vladimir Vapik NEC Laboratories Priceto, NJ 08540, USA vlad@ec-labs.com Abstract
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More informationLecture 14: Graph Entropy
15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationSection 11.8: Power Series
Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationLecture 15: Strong, Conditional, & Joint Typicality
EE376A/STATS376A Iformatio Theory Lecture 15-02/27/2018 Lecture 15: Strog, Coditioal, & Joit Typicality Lecturer: Tsachy Weissma Scribe: Nimit Sohoi, William McCloskey, Halwest Mohammad I this lecture,
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More information6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:
6.895 Essetial Codig Theory October 0, 004 Lecture 11 Lecturer: Madhu Suda Scribe: Aastasios Sidiropoulos 1 Overview This lecture is focused i comparisos of the followig properties/parameters of a code:
More informationTests of Hypotheses Based on a Single Sample (Devore Chapter Eight)
Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationPattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationLecture 2 Clustering Part II
COMS 4995: Usupervised Learig (Summer 8) May 24, 208 Lecture 2 Clusterig Part II Istructor: Nakul Verma Scribes: Jie Li, Yadi Rozov Today, we will be talkig about the hardess results for k-meas. More specifically,
More informationCS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning
Lecture 3 Cocept learig Milos Hauskrecht milos@cs.pitt.edu Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm Probably approximately
More informationThe Choquet Integral with Respect to Fuzzy-Valued Set Functions
The Choquet Itegral with Respect to Fuzzy-Valued Set Fuctios Weiwei Zhag Abstract The Choquet itegral with respect to real-valued oadditive set fuctios, such as siged efficiecy measures, has bee used i
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More information6.003 Homework #3 Solutions
6.00 Homework # Solutios Problems. Complex umbers a. Evaluate the real ad imagiary parts of j j. π/ Real part = Imagiary part = 0 e Euler s formula says that j = e jπ/, so jπ/ j π/ j j = e = e. Thus the
More informationThe picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled
1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationLecture 11: Channel Coding Theorem: Converse Part
EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More informationRecursive Algorithm for Generating Partitions of an Integer. 1 Preliminary
Recursive Algorithm for Geeratig Partitios of a Iteger Sug-Hyuk Cha Computer Sciece Departmet, Pace Uiversity 1 Pace Plaza, New York, NY 10038 USA scha@pace.edu Abstract. This article first reviews the
More informationSupplemental Material: Proofs
Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More informationCS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning
Lecture 22 Cocept learig Milos Hauskrecht milos@cs.pitt.edu 5329 Seott Square Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm
More informationECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations
ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationJacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3
No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies
More informationSection 14. Simple linear regression.
Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo
More informationn=1 a n is the sequence (s n ) n 1 n=1 a n converges to s. We write a n = s, n=1 n=1 a n
Series. Defiitios ad first properties A series is a ifiite sum a + a + a +..., deoted i short by a. The sequece of partial sums of the series a is the sequece s ) defied by s = a k = a +... + a,. k= Defiitio
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More information