Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Size: px

Start display at page:

Download "Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance"

Jonas Snow
5 years ago
Views:

1 Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples? Evaluatio of Performace of Leared h Wat to decide whether to use h or ot: Wat to uderstad the accuracy of the hypothesis leared from a limited-size traiig set. Evaluatio may be part of the ML algorithm itself. Give a hypothesis outperformig aother, how probable is it that this hypothesis is more accurate i geeral? With limited data, how to lear ad also estimate its accuracy? Use of statistical methods to put a boud o the error betwee the estimated ad the true accuracy. 1 2 Issues Lear hypothesis o limited data, ad estimate future accuracy: Trade-off Betwee Bias ad Variace DATA High bias, low variace Low bias, high variace Bias i the estimate: f() f() f() The traiig data is a subset of the istace space, ad may itroduce bias: the estimated error may be differet from the true error. Variace i the estimate: Eve though the estimate may be ubiased, there ca be a large variace i the accuracy over differet test sets. Less parameters less accurate, but variace over differet test sets is reduced. More parameters more accurate, but variace over differet test sets is icreased. Usually, smaller traiig sets lead to larger variace. 3 4

2 Topics Estimatig Hypothesis Accuracy Evaluatig hypotheses (estimate accuracy of a hypothesis). Compare accuracy of two hypotheses. Compare accuracy of two algorithms whe data set is limited. Geeral setup: X: istace space. D: prob. distributio of ecouterig X. Task: Give hypothesis h ad data set of size from distributio D, what is the best estimate of the accuracy of h o future istaces from the same distributio? What is the probable error i the accuracy estimate? 5 6 Probability Distributio of Sample Mea Istace space X S 1 S 2 P() 1 2 Eample of Samplig Distributio of the Mea a X = {1, 2, 3, 4}, ad each umbers are equally likely to occur (i.e., D is a uiform distributio). Let s sample with = 2. From istace space X, draw a small sample set S i of size. For differet sample sets S i, the mea will differ: i 1 X The questios are: S i Is i = X (where X is the true mea over X)? How is i distributed (P (), for { 1, 2,... })? Samples of size 2 Observatio 1st\2d ,1 1,2 1,3 1,4 2 2,1 2,2 2,3 2,4 3 3,1 3,2 3,3 3,4 4 4,1 4,2 4,3 4,4 a From Kachiga (1991) Sample meas Observatio 1st\2d

3 Sample Distributio vs. Samplig Distributio of the Mea 1 5 Samplig Distributio of the Mea Uderlyig distributio with mea ad std σ..8 4 Distributio of sample mea s has mea s = ad std: P().6.4 Freq() 3 2 σ s = σ, Depedig o how you sample your data, your sample mea ca ed up beig differet values. The sample mea has a distributio of its ow cetered at the actual populatio mea ( P ={1,2,3,4} 1 4 = 2.5). ad teds to the ormal distributio as grows. Iterpretatio: Whe you get a particular sample mea s, you kow it is distributed like N (, σ s ). With more samples, σ s reduces, so you re more cofidet about your particular s beig close to the true mea. 9 1 True mea ad sample mea s Sample Error ad True Error Sample error: P() p r s P() s Sample error of hypothesis h based o sample set S of size : error S (h) 1 X δ(f(), h()), S With a particular probability p, s is withi a particular rage r from the true mea. I other words, if you pick ay sample mea s, with the probability p, the true mea is withi the rage r. Give a fied probability p =.95, the rage r is determied by the variace σ s. 11 where f( ) is the target fuctio, ad δ(a, b) = 1 if a = b ad if a b. I other words, error S (h) is the mea error of hypothesis h. True error: True error of hypothesis h is the probability that h will misclassify a sigle eample draw from the distributio D: error D (h) Pr D [f() h()] 12

4 Cofidece Iterval Cofidece Iterval (95%) How good a estimator of error D (h) is provided by error S (h)? Wat to estimate true error based o sample S of eamples accordig to distributio D. h commits r errors: error S (h) = r/. With appro. 95% probability, true error is withi the iterval: r errors (h)(1 error S (h)) error S (h) ± P(X) 1.96 σ Normal distributio with mea ad std σ. 95% of the area lies withi ±1.96σ. Differet costat factors for 99%, etc % X Cofidece Iterval Eample S of size = 4. h committig r = 12 errors. error S (h) = 12/4 =.3 (mea error, or error rate). 95% cofidece iterval:.3 ± 1.96 r.3 (1..3) =.3 ±.14 Note: if is high, eve whe r/ may be the same, the iterval size would reduce Samplig Theory Basics: Summary Radom variable: variable that ca take o values with certai probability. Probability distributio: Pr(Y = y i ). Epected value: E[Y ] = P i y ipr(y = y i ). Variace: V ar(y ) = E[(Y E(Y )) 2 ] = E[Y 2 ] E[Y ] 2. Stadard deviatio: p V ar(y ). Biomial distributio: biary outcome, with probability p of ad (1 p) for 1; Probability of r 1 s with samples. Normal distributio Cetral limit theorem: sum of iid radom variables ted to the ormal distributio. Estimator is a radom variable Y that estimates parameter p. Estimatio bias: E(Y ) p. N% cofidece iterval estimate of p: iterval that icludes true p with N% probability. 16

5 Biomial Distributio: e.g., Coi Toss P(r) Biomial distributio for = 4, p = Outcome itself is described by a radom variable Y {Head, T ail}. P (Y = Head) = p ad P (Y = T ail) = (1 p). Probability of observig r heads out of coi tosses (this value correspods to a radom variable R): Pr(R = r) =! r!( r)! pr (1 p) ( r). Pr(R = r) ca be see as the probability of observig r errors i a sample size of (for biary target categories). 17 Mea ad Variace i Biomical Distributios E[Y ] P i=1 y ipr(y = y i ) = p V ar[y ] E[(Y E[Y ]) 2 ] = p(1 p) Errors, i Terms of Bioomial Distributio error S (h) = r error D = p 18 Estimatio Bias Estimatio bias of a estimator Y for a parameter p is: E[Y ] p Variace i Estimatio error S (h) = r q Std[r] = p(1 p) Std[error S (h)] =» r Std = Std[r] = p p(1 p) = s p(1 p) s error S (h)(1 error S (h)) Normal Distributio Normal distributio with mea, stadard deviatio Mea E[X] =, ad variace V ar[x] = σ 2. Probability desity: p() = 1 2πσ 2 e 1 2 Probability of fallig betwee iterval [a, b]: Z b a p()d 2 σ Cetral limit theorem: sum of a large umber of iid radom variables (the sum itself is a radom variable) teds to Normal. 2

6 Cofidece Iterval i Normal Distributios N% of probability mass i Normal distributios are withi: ± z N σ. That meas, a radomly draw value y will be withi the above iterval with a N% chace. I other words, if you pick ay value y, with N% chace, the mea will be withi the iterval: y ± z N σ. Cofidece Itervals for Differet % % of area (probability) lies i ± 1.28σ N% of area (probability) lies i ± z N σ N%: 5% 68% 8% 9% 95% 98% 99% z N : Calculatig Cofidece Itervals Two-Sided vs. Oe-Sided Bouds 1. Pick parameter p to estimate error D (h) 2. Choose a estimator error S (h) 3. Determie probability distributio that govers estimator Distributio of error S (h) ca be approimated by Normal distributio whe is large 4. Fid iterval (L, U ) such that N% of probability mass falls i the iterval Two-sided: Lower ad upper boud with 1(1 α/2)% cofidece Oe-sided: Lower boud oly (or upper boud oly) with 1(1 α)%. What is the probability that error D (h) is at most U? Use table of z N values 23 24

7 Differece i Error of Two Hypotheses Test h 1 o sample S 1, test h 2 o S 2 1. Pick parameter to estimate 2. Choose a estimator d error D (h 1 ) error D (h 2 ) ˆd error S1 (h 1 ) error S2 (h 2 ) 3. Determie probability distributio that govers estimator v u σ t error S (h 1 1 )(1 error S1 (h 1 )) error S2 (h 2 )(1 error S2 (h 2 )) + ˆd Fid iterval (L, U ) such that N% of probability mass falls i the iterval v uut error S1 (h 1 )(1 error S1 (h 1 )) error S2 (h 2 )(1 error S2 (h 2 )) ˆd±z N Hypothesis Testig What is the prob. that error D (h 1 ) > error D (h 2 )? Eve if error S1 (h 1 ) > error S2 (h 2 ), there is a chace that error D (h 1 ) < error D (h 2 ). E.g., what is the chace of d > whe ˆd =.1 (error S1 (h 1 ) =.3 ad error S2 (h 2 ) =.2)? ˆd < d +.1 = E[ ˆd] +.1 = ˆd +.1 ˆd < d σ ˆd = d z 9% = 1.64 for two-sided iterval, so the chace is 95%. Better to thik how to reject the ull hypothesis: Null hypothesis H : d = Alterative hypothesis H 1 : d > (must esure P (d < ) = ) 26 Paired t-test for Comparig h A ad h B 1. Partitio data ito k disjoit test sets T 1, T 2,..., T k of equal size, where this size is at least For i from 1 to k, do δ i error Ti (h A ) error Ti (h B ) 3. Retur the value δ, where δ 1 kx δ i k i=1 N % cofidece iterval estimate for d: δ ± t N,(k 1) s δ v u 1 kx t (δ s δ i δ) 2 k(k 1) i=1 Note: δ i approimately Normally distributed, ad t differ for differet sample size, as well as %. Comparig learig algorithms L A ad L B What we d like to estimate: E S D [error D (L A (S)) error D (L B (S))] where L(S) is the hypothesis output by learer L usig traiig set S, i.e., the epected differece i true error betwee hypotheses output by learers L A ad L B, whe traied usig radomly selected traiig sets S draw accordig to distributio D. But, give limited data D, what is a good estimator? could partitio D ito traiig set S ad traiig set T, ad measure error T (L A (S )) error T (L B (S )) eve better, repeat this may times ad average the results (et slide) 27 28

8 Comparig learig algorithms L A ad L B 1. Partitio data D ito k disjoit test sets T 1, T 2,..., T k of equal size, where this size is at least For i from 1 to k, do use T i for the test set, ad the remaiig data for traiig set S i S i {D T i } h A L A (S i ) h B L B (S i ) δ i error Ti (h A ) error Ti (h B ) 3. Retur the value δ, where δ 1 k 29 kx i=1 δ i Comparig learig algorithms L A ad L B Notice we d like to use the paired t test o δ to obtai a cofidece iterval, but ot really correct, because the traiig sets i this algorithm are ot idepedet (they overlap!). More correct to view algorithm as producig a estimate of istead of E S D [error D (L A (S)) error D (L B (S))] E S D [error D (L A (S)) error D (L B (S))] but eve this approimatio is better tha o compariso. 3

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses