Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 ( pts) Suppose tht A is n event such tht P r(a) = 0 nd tht B is ny other event. Prove tht A nd B re independent events. Since the event A B is subset of the event A, nd Pr(A) = 0, so P r(a B) = 0. Hence P r(a B) = 0 = P r(a) P r(b). 1. ( 3 pts) Prove: Let 1,,..., n be possible vlues of A. Then for ny event B, P (B = b) = n P (B = b A = i ) P (A = i ) P (B = b) = P (B = b (T RUE)) = P (B = b (A = 1 A =... A = n )) = P ((B = b A = 1 ) (B = b A = )... (B = b A = n ))) = P (B = b A = 1 ) + P ((B = b A = )... (B = b A = n ))) P ((B = b A = 1 ) ((B = b A = )... (B = b A = n )))) = P (B = b A = 1 ) + P ((B = b A = )... (B = b A = n ))) =... = n P (B = b A = i ) = n P (B = b A = i ) P (A = i ) 1.3 ( 5 pts) Soldier A nd Soldier B re prcticing shooting. The probbility tht A would miss the trget is 0. nd the probbility tht B would miss the trget is 0.5. The probbility tht both A nd B would miss the trgets is 0.1. - Wht is the probbility tht t lest one of the two will miss the trget? P (A B) = P (A) + P (B) P (A B) = 0.6 - Wht is the probbility tht exctly one of the two soldiers will miss the trget? P (A B) + P (B Ā) = 0.5 1.4 ( 4 pts) A box contins three crds. One crd is red on both sides, one crd is green on both sides, nd one crd is red on one side nd green on the other. Then we rndomly select one crd from this box, nd we cn know the color of the selected crd s upper side. If this side is green, wht is the probbility tht the other side of the crd is lso green? P (the other side green this side green) = P (both sides green) P (this side green) = 1/ = 3 1
1.5 ( 4 pts) Suppose tht the p.d.f. of rndom vrible X is: f(x) = { cx, for1 x 0, otherwise (1) - Wht is the vlue of constnt c? 1 cx dx = 7 3 c = 1 c = 3 7 - Sketch the p.d.f. - P r(x > 3/) =? 3/ cx dx = 37/56 Question. Expecttion (18 pts).1 ( 4 pts) If n integer between 100 nd 00 is to be chosen t rndom, wht is the expected vlue? E(X) = 1 101 (100 + 101 +... + 00) = 150. ( 5 pts) A rbbit is plying jumping gme with friends. She strts from the origin of rel line nd moves long the line in jumps of one step. For ech jump, she flips coin. If heds, she would jump one step to the left (i.e. negtive direction). Otherwise, she would jump one step to the right. The chnce of heds is p (0 p 1). Wht is the expected vlue of her position fter n jumps? ( ssume ech step is in equl length nd ssume one step s one unit on the rel line) For the ith jumping, E(X i ) = ( 1)p + (1)(1 p) = 1 p, So the position fter n jumps is: E(X 1 + X +... + X n ) = E(X 1 ) + E(X ) +... + E(X n ) = n(1 p).3 ( 4 pts) Suppose tht the rndom vrible X hs uniform distribution on intervl [0, 1]. Rndom vrible Y hs uniform distribution on the intervl [4, 10]. X nd Y re independent. Suppose rectngle is to be constructed for which the lengths of two djcent sides re X nd Y. So wht is the expected vlue of the re of this rectngle? Since X nd Y re independent, E(X Y ) = E(X) E(Y ) = 0.5 7 = 3.5.4 ( 5 pts) Suppose tht X is rndom vrible. E(X) = µ, V r(x) = σ, then wht is the vlue of E[X(X 1)] =? E[X(X 1)] = E[X ] E[X] = vr(x) + E[X] µ = σ + µ µ Question 3. Norml Distribution ( 6 pts) Suppose X hs norml distribution with men 1 nd vrince 4. Find the vlue of the following:. P r(x 3) P r(x 3) = P r(z 3 1 4 ) = Φ(1) = 0.8413 b. P r( X ) P r( X ) = Φ( 1 1 ) Φ( ) = Φ(0.5) (1 Φ(1.5)) = 0.647
Question 4. Byes Theorem ( 8 pts) In certin dy cre clss, 30 percent of the children hve grey eyes, 50 percent of the children hve blue eyes, nd the other 0 percent s eyes re in other colors. One dy they ply gme together. In the first run, 65 percent of the grey eye kids were selected into the gme, 8 percent of the blue eye kids selected in, nd 50 percent of the kids with other colors were chosen. So if child is selected t rndom from the clss, nd we know tht he ws not in the first run gme, wht is the probbility tht he hs blue eyes? Assume B: blue eyes O: other color eyes G: grey eyes NF: not in the first run gme P (B NF ) = P (B)P (NF B) P (B)P (NF B)+P (O)P (NF O)+P (G)P (NF G) = 0.5 0.18 0.5 0.18+0. 0.5+0.3 0.35 = 0.3051 Question 5. Probbilistic Inference (15 pts) Imgine there re three boxes lbelled A, B nd C. Two of them re empty, nd one contins prize. Unfortuntely, they re ll closed nd you don t know where the prize is. You first pick box t rndom, sy box A. However, before you open it, box B is opened by someone, nd you see tht it is empty. You now hve to mke your finl choice s to wht box to open: A or C. Question: For ech of the cses below, nswer wht box would you open so s to mximize the chnces tht the box you open contins the prize. Support your rguments by computing the probbility of the prize being in box A nd C. Here re the three strtegies ccording to which box B ws chosen to be opened: 1. ( 5 pts) In this strtegy if you first pick box (in this cse A) with prize, then one of the other two boxes is opened t rndom. On the other hnd, if you first choose box tht hs no prize, then the empty box tht you did not pick is chosen.. ( 5 pts) In this strtegy it is just one of the two boxes tht you did not pick is chosen t rndom (in this cse it is rndom choice between B nd C). 3. ( 5 pts) In this strtegy one of empty boxes is chosen t rndom (independently of whether you initilly pick box with prize or not). Let SpB stnd for rndom event of someone picks box B. In ll the cses the prior (before box B ws opened) probbilities tht prize is in box A, B, or C re P (A) = P (B) = P (C) =. The differences re in the conditionl probbilities: P (SpB A), P (SpB B), P (SpB C). In ll three cses we compute posterior (fter box B ws opened) probbilities. We then pick box with the highest probbility of contining prize. 1. P (SpB A) = 1/ P (SpB B) = 0 P (SpB C) = 1 P (SpB A)P (A) P (SpB C)P (C) P (A SpB) = nd P (C SpB) = = P (SpB A)P (A) + P (SpB B)P (B) + P (SpB C)P (C) = 1/ + 0 + 1 = 1/ P (A SpB) = 1/ 1/ = nd P (C SpB) = 1 1/ = /3. P (SpB A) = 1/ P (SpB B) = 1/ P (SpB C) = 1/ Unlike in the previous sub-question, here the box tht ws opened by someone (nmely, box B) could hve contined prize. Therefore, the posterior probbilities we re interested in re: P (A SpB B) nd P (C SpB B). P (SpB B A)P (A) P (SpB B) P (SpB B C)P (C) P (SpB B) P (A SpB B) = nd P (C SpB B) = P (SpB B A) = P (SpB A)P ( B A) = 1/ 1 = 1/ P (SpB B B) = 0 P (SpB B C) = P (SpB C)P ( B C) = 1/ 1 = 1/ P (SpB B) = P (SpB B A)P (A) + P (SpB B B)P (B) + P (SpB B C)P (C) = 1/ + 0 + 1/ = P (A SpB B) = 1/ = 1/ nd P (C SpB B) = 1/ = 1/ 3
3. P (SpB A) = 1/ P (SpB B) = 0 P (SpB C) = 1/ P (SpB A)P (A) P (SpB C)P (C) P (A SpB) = nd P (C SpB) = = P (SpB A)P (A)+P (SpB B)P (B)+P (SpB C)P (C) = 1/ +0 +1/ = P (A SpB) = 1/ = 1/ nd P (C SpB) = 1/ = 1/ Question 6. PAC-lerning I (15pts) Consider n imge clssifiction problem. Suppose n lgorithm first splits ech imge into n = 4 blocks (the blocks re non-overlpping nd ech block is t the sme loction nd of constnt size cross ll imges) nd computes some sclr feture vlue for ech of the blocks (e.g., verge intensity of the pixels within the block). Suppose tht this feture is discrete nd cn tke m = 10 vlues. The clssifiction function clssifies n imge s 1 whenever ech of the n feture vlues lies within some intervl tht is specific to this feture (i.e., the vlue of the first feture is between 1 nd b 1, the vlue of the second feture is between nd b, nd so on), nd 0 otherwise. We would like to lern these intervls ( nd b vlues for ech intervl) utomticlly bsed on trining set of imges. All the other prmeters such s loctions nd sizes of the blocks re not being lerned. The following questions re helpful in understnding the requirements on the size of the trining set. 1. ( 7 pts) Wht is the size of the hypothesis spce H? Assume tht only intervls with i b i re considered for lerning.. ( 4 pts) Assuming noiseless dt nd tht the function we re trying to lern is cpble of perfect clssifiction, give n upper bound on the size of the trining set required to be sure with 99% probbility tht the lerned function will hve true error rte of t most 5%. 3. ( 4 pts) Compre H (the nswer to question 1) nd the required trining dtset size R (the nswer to question ). Why does R not seem to be very ffected by the number of possible hypotheses? Wht prmeter does mke R increse quickly nd why? (Plese provide only few sentences for ech question). 1. The number of possible intervls for ny prticulr feture vlue is computed s follows: for given i the possible b i vlues re from i to m (tht is, m i + 1 vlues) hence, the number of possible intervls is m + (m 1) + (m ) +... + 1 = m(m+1) Since there re n fetures nd we use boolen conjunction function H = ( m(m+1) ) n = 55 4. R 0.69 ɛ (log (55 4 ) + log (1/δ)) = 410.8 3. In terms of formul, R is logrithmiclly relted to the number of possible hypotheses nd inverse proportionlly to ɛ. Thus, ɛ ffects R much stronger. Intuitively speking, if lerned hypothesis is consistent with lrge number of iid dt points, then chnces re it will clssify correctly the test dt points s well. This will hold independently of how mny hypotheses we hve. On the other hnd, in order to gurntee very smll misclssifiction rte (ɛ), the hypothesis needs to be trined on very lrge number of smples (so tht they cover lmost ll of the input spce). Question 7. PAC-lerning II (0pts) Consider lerning problem in which input dtpoints re rel numbers distributed uniformly in between nd b, nd output is binry. The true function we re trying to lern is x < c for some c b (tht is, output 1 whenever x < c nd 0 otherwise). The set of hypotheses is therefore: H = {(x < c) c b} (the hypothesis spce is therefore infinite: ll rel vlues of c in between nd b). Assuming tht we hve m dtpoints for trining, derive n upper bound on the probbility of lerning hypothesis tht will hve 4
true clssifiction error lrger thn ɛ. The derivtion should be done in the sme spirit s the one used to derive PAC bound on the probbility of lerning bd h for the cse when hypothesis spce H is finite. Do not use bounds tht re bsed on VC-dim (plese ignore this sentence if you do not know VC-dim nywy). Give the bound in terms of, b, c, m nd ɛ. Then evlute it numericlly for the following vlues: = 0, b =, c = 1, m = 0 nd ɛ = 0.1. (Hint: you my need to use integrls). There re few possible nswers to this. Here re some (others re lso possible): 1. P (we lern h such tht trueerror(h ) > ɛ) P (the set H contins h such tht trueerror(h ) > ɛ) = P ( h, h is consistent with m exmples nd trueerror(h ) > ɛ) = P ( c, c is consistent with x 1... x m nd trueerror(h = (x < c)) > ɛ) P (x 1... x m / [mx(c ɛ(b ), ) c ] or x 1... x m / [c min(c + ɛ(b ), b)]) = P (x 1... x m / [mx(c ɛ(b ), ) c ])+ P (x 1... x m / [c min(c + ɛ(b ), b)]) P (x 1... x m / [mx(c ɛ(b ), ) c ] nd x 1... x m / [c min(c + ɛ(b ), b)]) = (1 c mx(c ɛ(b ), ) ) m + b (1 min(c + ɛ(b ), b) c ) m b (1 min(c + ɛ(b ), b) mx(c ɛ(b ), ) ) m b For well-behved c we hve: δ (1 ɛ) m (1 ɛ) m.. This version is lmost correct nd deserves full credit if given. P (we lern h such tht trueerror(h ) > ɛ) P (the set H contins h such tht trueerror(h ) > ɛ) = P ( h, h is consistent with m exmples nd trueerror(h ) > ɛ) = P ( c, c is consistent with x 1... x m nd trueerror(h = (x < c)) > ɛ) P (c is consistent with x 1... x m nd trueerror(h = (x < c)) > ɛ)dc = P (x 1... x m / [c, c ] if c < c or x 1... x m / [c, c] if c > c nd c, c > ɛ(b ))dc = mx(c ɛ(b ),) mx(c ɛ(b ),) 1/(b ) m ( P (x 1... x m / [c, c ])dc + P (x 1... x m / [c, c])dc = min(c +ɛ(b ),b) (1 (c c)/(b )) m dc + mx(c ɛ(b ),) (b c + c) m dc + min(c +ɛ(b ),b) min(c +ɛ(b ),b) (1 (c c )/(b )) m dc = (b c + c ) m dc) = 1/((m + 1)(b )) m ((b c + mx(c ɛ(b ), )) m+1 (b c + ) m+1 (b b + c ) m+1 + (b min(c + ɛ(b ), b) + c ) m+1 ) = 1/((m + 1)(b )) m ((b c + mx(c ɛ(b ), )) m+1 (b c ) m+1 ( + c ) m+1 + (b min(c + ɛ(b ), b) + c ) m+1 ) 5