Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Similar documents
1 Probability Density Functions

Math 426: Probability Final Exam Practice

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

Continuous Random Variables

CS667 Lecture 6: Monte Carlo Integration 02/10/05

Chapter 5 : Continuous Random Variables

The Regulated and Riemann Integrals

Normal Distribution. Lecture 6: More Binomial Distribution. Properties of the Unit Normal Distribution. Unit Normal Distribution

Riemann is the Mann! (But Lebesgue may besgue to differ.)

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

Problem Set 9. Figure 1: Diagram. This picture is a rough sketch of the 4 parabolas that give us the area that we need to find. The equations are:

Chapter 0. What is the Lebesgue integral about?

38.2. The Uniform Distribution. Introduction. Prerequisites. Learning Outcomes

Lecture 3 Gaussian Probability Distribution

7.2 The Definite Integral

Chapters 4 & 5 Integrals & Applications

Review of Calculus, cont d

1 Structural induction, finite automata, regular expressions

Math 1B, lecture 4: Error bounds for numerical methods

Polynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

For the percentage of full time students at RCC the symbols would be:

Theoretical foundations of Gaussian quadrature

Math 8 Winter 2015 Applications of Integration

MATH 144: Business Calculus Final Review

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

The final exam will take place on Friday May 11th from 8am 11am in Evans room 60.

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

HW3 : Moment functions Solutions

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

The Fundamental Theorem of Calculus. The Total Change Theorem and the Area Under a Curve.

Line and Surface Integrals: An Intuitive Understanding

Riemann Sums and Riemann Integrals

Topics Covered AP Calculus AB

1 Online Learning and Regret Minimization

X Z Y Table 1: Possibles values for Y = XZ. 1, p

f(a+h) f(a) x a h 0. This is the rate at which

Riemann Sums and Riemann Integrals

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

Expectation and Variance

MATH362 Fundamentals of Mathematical Finance

Lecture 1: Introduction to integration theory and bounded variation

4.1. Probability Density Functions

5.7 Improper Integrals

Reinforcement learning II

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

Name Solutions to Test 3 November 8, 2017

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

p-adic Egyptian Fractions

p(t) dt + i 1 re it ireit dt =

1 The Riemann Integral

CS 109 Lecture 11 April 20th, 2016

Math 135, Spring 2012: HW 7

Week 10: Line Integrals

MAA 4212 Improper Integrals

n f(x i ) x. i=1 In section 4.2, we defined the definite integral of f from x = a to x = b as n f(x i ) x; f(x) dx = lim i=1

3.4 Numerical integration

LECTURE NOTE #12 PROF. ALAN YUILLE

. Double-angle formulas. Your answer should involve trig functions of θ, and not of 2θ. sin 2 (θ) =

7 - Continuous random variables

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019

Improper Integrals, and Differential Equations

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

MORE FUNCTION GRAPHING; OPTIMIZATION. (Last edited October 28, 2013 at 11:09pm.)

Joint distribution. Joint distribution. Marginal distributions. Joint distribution

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

We know that if f is a continuous nonnegative function on the interval [a, b], then b

Numerical integration

Section 4.8. D v(t j 1 ) t. (4.8.1) j=1

USA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year

Review of Probability Distributions. CS1538: Introduction to Simulations

Lecture 14: Quadrature

Overview of Calculus I

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Mapping the delta function and other Radon measures

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

Section 6: Area, Volume, and Average Value

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

Integrals - Motivation

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

4.4 Areas, Integrals and Antiderivatives

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Lecture 21: Order statistics

STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS

Math Calculus with Analytic Geometry II

8 Laplace s Method and Local Limit Theorems

Heat flux and total heat

Review of basic calculus

The practical version

MA123, Chapter 10: Formulas for integrals: integrals, antiderivatives, and the Fundamental Theorem of Calculus (pp.

. Double-angle formulas. Your answer should involve trig functions of θ, and not of 2θ. cos(2θ) = sin(2θ) =.

Main topics for the First Midterm

7.2 Riemann Integrable Functions

19 Optimal behavior: Game theory

INTRODUCTION TO INTEGRATION

Definite Integrals. The area under a curve can be approximated by adding up the areas of rectangles = 1 1 +

Transcription:

Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 ( pts) Suppose tht A is n event such tht P r(a) = 0 nd tht B is ny other event. Prove tht A nd B re independent events. Since the event A B is subset of the event A, nd Pr(A) = 0, so P r(a B) = 0. Hence P r(a B) = 0 = P r(a) P r(b). 1. ( 3 pts) Prove: Let 1,,..., n be possible vlues of A. Then for ny event B, P (B = b) = n P (B = b A = i ) P (A = i ) P (B = b) = P (B = b (T RUE)) = P (B = b (A = 1 A =... A = n )) = P ((B = b A = 1 ) (B = b A = )... (B = b A = n ))) = P (B = b A = 1 ) + P ((B = b A = )... (B = b A = n ))) P ((B = b A = 1 ) ((B = b A = )... (B = b A = n )))) = P (B = b A = 1 ) + P ((B = b A = )... (B = b A = n ))) =... = n P (B = b A = i ) = n P (B = b A = i ) P (A = i ) 1.3 ( 5 pts) Soldier A nd Soldier B re prcticing shooting. The probbility tht A would miss the trget is 0. nd the probbility tht B would miss the trget is 0.5. The probbility tht both A nd B would miss the trgets is 0.1. - Wht is the probbility tht t lest one of the two will miss the trget? P (A B) = P (A) + P (B) P (A B) = 0.6 - Wht is the probbility tht exctly one of the two soldiers will miss the trget? P (A B) + P (B Ā) = 0.5 1.4 ( 4 pts) A box contins three crds. One crd is red on both sides, one crd is green on both sides, nd one crd is red on one side nd green on the other. Then we rndomly select one crd from this box, nd we cn know the color of the selected crd s upper side. If this side is green, wht is the probbility tht the other side of the crd is lso green? P (the other side green this side green) = P (both sides green) P (this side green) = 1/ = 3 1

1.5 ( 4 pts) Suppose tht the p.d.f. of rndom vrible X is: f(x) = { cx, for1 x 0, otherwise (1) - Wht is the vlue of constnt c? 1 cx dx = 7 3 c = 1 c = 3 7 - Sketch the p.d.f. - P r(x > 3/) =? 3/ cx dx = 37/56 Question. Expecttion (18 pts).1 ( 4 pts) If n integer between 100 nd 00 is to be chosen t rndom, wht is the expected vlue? E(X) = 1 101 (100 + 101 +... + 00) = 150. ( 5 pts) A rbbit is plying jumping gme with friends. She strts from the origin of rel line nd moves long the line in jumps of one step. For ech jump, she flips coin. If heds, she would jump one step to the left (i.e. negtive direction). Otherwise, she would jump one step to the right. The chnce of heds is p (0 p 1). Wht is the expected vlue of her position fter n jumps? ( ssume ech step is in equl length nd ssume one step s one unit on the rel line) For the ith jumping, E(X i ) = ( 1)p + (1)(1 p) = 1 p, So the position fter n jumps is: E(X 1 + X +... + X n ) = E(X 1 ) + E(X ) +... + E(X n ) = n(1 p).3 ( 4 pts) Suppose tht the rndom vrible X hs uniform distribution on intervl [0, 1]. Rndom vrible Y hs uniform distribution on the intervl [4, 10]. X nd Y re independent. Suppose rectngle is to be constructed for which the lengths of two djcent sides re X nd Y. So wht is the expected vlue of the re of this rectngle? Since X nd Y re independent, E(X Y ) = E(X) E(Y ) = 0.5 7 = 3.5.4 ( 5 pts) Suppose tht X is rndom vrible. E(X) = µ, V r(x) = σ, then wht is the vlue of E[X(X 1)] =? E[X(X 1)] = E[X ] E[X] = vr(x) + E[X] µ = σ + µ µ Question 3. Norml Distribution ( 6 pts) Suppose X hs norml distribution with men 1 nd vrince 4. Find the vlue of the following:. P r(x 3) P r(x 3) = P r(z 3 1 4 ) = Φ(1) = 0.8413 b. P r( X ) P r( X ) = Φ( 1 1 ) Φ( ) = Φ(0.5) (1 Φ(1.5)) = 0.647

Question 4. Byes Theorem ( 8 pts) In certin dy cre clss, 30 percent of the children hve grey eyes, 50 percent of the children hve blue eyes, nd the other 0 percent s eyes re in other colors. One dy they ply gme together. In the first run, 65 percent of the grey eye kids were selected into the gme, 8 percent of the blue eye kids selected in, nd 50 percent of the kids with other colors were chosen. So if child is selected t rndom from the clss, nd we know tht he ws not in the first run gme, wht is the probbility tht he hs blue eyes? Assume B: blue eyes O: other color eyes G: grey eyes NF: not in the first run gme P (B NF ) = P (B)P (NF B) P (B)P (NF B)+P (O)P (NF O)+P (G)P (NF G) = 0.5 0.18 0.5 0.18+0. 0.5+0.3 0.35 = 0.3051 Question 5. Probbilistic Inference (15 pts) Imgine there re three boxes lbelled A, B nd C. Two of them re empty, nd one contins prize. Unfortuntely, they re ll closed nd you don t know where the prize is. You first pick box t rndom, sy box A. However, before you open it, box B is opened by someone, nd you see tht it is empty. You now hve to mke your finl choice s to wht box to open: A or C. Question: For ech of the cses below, nswer wht box would you open so s to mximize the chnces tht the box you open contins the prize. Support your rguments by computing the probbility of the prize being in box A nd C. Here re the three strtegies ccording to which box B ws chosen to be opened: 1. ( 5 pts) In this strtegy if you first pick box (in this cse A) with prize, then one of the other two boxes is opened t rndom. On the other hnd, if you first choose box tht hs no prize, then the empty box tht you did not pick is chosen.. ( 5 pts) In this strtegy it is just one of the two boxes tht you did not pick is chosen t rndom (in this cse it is rndom choice between B nd C). 3. ( 5 pts) In this strtegy one of empty boxes is chosen t rndom (independently of whether you initilly pick box with prize or not). Let SpB stnd for rndom event of someone picks box B. In ll the cses the prior (before box B ws opened) probbilities tht prize is in box A, B, or C re P (A) = P (B) = P (C) =. The differences re in the conditionl probbilities: P (SpB A), P (SpB B), P (SpB C). In ll three cses we compute posterior (fter box B ws opened) probbilities. We then pick box with the highest probbility of contining prize. 1. P (SpB A) = 1/ P (SpB B) = 0 P (SpB C) = 1 P (SpB A)P (A) P (SpB C)P (C) P (A SpB) = nd P (C SpB) = = P (SpB A)P (A) + P (SpB B)P (B) + P (SpB C)P (C) = 1/ + 0 + 1 = 1/ P (A SpB) = 1/ 1/ = nd P (C SpB) = 1 1/ = /3. P (SpB A) = 1/ P (SpB B) = 1/ P (SpB C) = 1/ Unlike in the previous sub-question, here the box tht ws opened by someone (nmely, box B) could hve contined prize. Therefore, the posterior probbilities we re interested in re: P (A SpB B) nd P (C SpB B). P (SpB B A)P (A) P (SpB B) P (SpB B C)P (C) P (SpB B) P (A SpB B) = nd P (C SpB B) = P (SpB B A) = P (SpB A)P ( B A) = 1/ 1 = 1/ P (SpB B B) = 0 P (SpB B C) = P (SpB C)P ( B C) = 1/ 1 = 1/ P (SpB B) = P (SpB B A)P (A) + P (SpB B B)P (B) + P (SpB B C)P (C) = 1/ + 0 + 1/ = P (A SpB B) = 1/ = 1/ nd P (C SpB B) = 1/ = 1/ 3

3. P (SpB A) = 1/ P (SpB B) = 0 P (SpB C) = 1/ P (SpB A)P (A) P (SpB C)P (C) P (A SpB) = nd P (C SpB) = = P (SpB A)P (A)+P (SpB B)P (B)+P (SpB C)P (C) = 1/ +0 +1/ = P (A SpB) = 1/ = 1/ nd P (C SpB) = 1/ = 1/ Question 6. PAC-lerning I (15pts) Consider n imge clssifiction problem. Suppose n lgorithm first splits ech imge into n = 4 blocks (the blocks re non-overlpping nd ech block is t the sme loction nd of constnt size cross ll imges) nd computes some sclr feture vlue for ech of the blocks (e.g., verge intensity of the pixels within the block). Suppose tht this feture is discrete nd cn tke m = 10 vlues. The clssifiction function clssifies n imge s 1 whenever ech of the n feture vlues lies within some intervl tht is specific to this feture (i.e., the vlue of the first feture is between 1 nd b 1, the vlue of the second feture is between nd b, nd so on), nd 0 otherwise. We would like to lern these intervls ( nd b vlues for ech intervl) utomticlly bsed on trining set of imges. All the other prmeters such s loctions nd sizes of the blocks re not being lerned. The following questions re helpful in understnding the requirements on the size of the trining set. 1. ( 7 pts) Wht is the size of the hypothesis spce H? Assume tht only intervls with i b i re considered for lerning.. ( 4 pts) Assuming noiseless dt nd tht the function we re trying to lern is cpble of perfect clssifiction, give n upper bound on the size of the trining set required to be sure with 99% probbility tht the lerned function will hve true error rte of t most 5%. 3. ( 4 pts) Compre H (the nswer to question 1) nd the required trining dtset size R (the nswer to question ). Why does R not seem to be very ffected by the number of possible hypotheses? Wht prmeter does mke R increse quickly nd why? (Plese provide only few sentences for ech question). 1. The number of possible intervls for ny prticulr feture vlue is computed s follows: for given i the possible b i vlues re from i to m (tht is, m i + 1 vlues) hence, the number of possible intervls is m + (m 1) + (m ) +... + 1 = m(m+1) Since there re n fetures nd we use boolen conjunction function H = ( m(m+1) ) n = 55 4. R 0.69 ɛ (log (55 4 ) + log (1/δ)) = 410.8 3. In terms of formul, R is logrithmiclly relted to the number of possible hypotheses nd inverse proportionlly to ɛ. Thus, ɛ ffects R much stronger. Intuitively speking, if lerned hypothesis is consistent with lrge number of iid dt points, then chnces re it will clssify correctly the test dt points s well. This will hold independently of how mny hypotheses we hve. On the other hnd, in order to gurntee very smll misclssifiction rte (ɛ), the hypothesis needs to be trined on very lrge number of smples (so tht they cover lmost ll of the input spce). Question 7. PAC-lerning II (0pts) Consider lerning problem in which input dtpoints re rel numbers distributed uniformly in between nd b, nd output is binry. The true function we re trying to lern is x < c for some c b (tht is, output 1 whenever x < c nd 0 otherwise). The set of hypotheses is therefore: H = {(x < c) c b} (the hypothesis spce is therefore infinite: ll rel vlues of c in between nd b). Assuming tht we hve m dtpoints for trining, derive n upper bound on the probbility of lerning hypothesis tht will hve 4

true clssifiction error lrger thn ɛ. The derivtion should be done in the sme spirit s the one used to derive PAC bound on the probbility of lerning bd h for the cse when hypothesis spce H is finite. Do not use bounds tht re bsed on VC-dim (plese ignore this sentence if you do not know VC-dim nywy). Give the bound in terms of, b, c, m nd ɛ. Then evlute it numericlly for the following vlues: = 0, b =, c = 1, m = 0 nd ɛ = 0.1. (Hint: you my need to use integrls). There re few possible nswers to this. Here re some (others re lso possible): 1. P (we lern h such tht trueerror(h ) > ɛ) P (the set H contins h such tht trueerror(h ) > ɛ) = P ( h, h is consistent with m exmples nd trueerror(h ) > ɛ) = P ( c, c is consistent with x 1... x m nd trueerror(h = (x < c)) > ɛ) P (x 1... x m / [mx(c ɛ(b ), ) c ] or x 1... x m / [c min(c + ɛ(b ), b)]) = P (x 1... x m / [mx(c ɛ(b ), ) c ])+ P (x 1... x m / [c min(c + ɛ(b ), b)]) P (x 1... x m / [mx(c ɛ(b ), ) c ] nd x 1... x m / [c min(c + ɛ(b ), b)]) = (1 c mx(c ɛ(b ), ) ) m + b (1 min(c + ɛ(b ), b) c ) m b (1 min(c + ɛ(b ), b) mx(c ɛ(b ), ) ) m b For well-behved c we hve: δ (1 ɛ) m (1 ɛ) m.. This version is lmost correct nd deserves full credit if given. P (we lern h such tht trueerror(h ) > ɛ) P (the set H contins h such tht trueerror(h ) > ɛ) = P ( h, h is consistent with m exmples nd trueerror(h ) > ɛ) = P ( c, c is consistent with x 1... x m nd trueerror(h = (x < c)) > ɛ) P (c is consistent with x 1... x m nd trueerror(h = (x < c)) > ɛ)dc = P (x 1... x m / [c, c ] if c < c or x 1... x m / [c, c] if c > c nd c, c > ɛ(b ))dc = mx(c ɛ(b ),) mx(c ɛ(b ),) 1/(b ) m ( P (x 1... x m / [c, c ])dc + P (x 1... x m / [c, c])dc = min(c +ɛ(b ),b) (1 (c c)/(b )) m dc + mx(c ɛ(b ),) (b c + c) m dc + min(c +ɛ(b ),b) min(c +ɛ(b ),b) (1 (c c )/(b )) m dc = (b c + c ) m dc) = 1/((m + 1)(b )) m ((b c + mx(c ɛ(b ), )) m+1 (b c + ) m+1 (b b + c ) m+1 + (b min(c + ɛ(b ), b) + c ) m+1 ) = 1/((m + 1)(b )) m ((b c + mx(c ɛ(b ), )) m+1 (b c ) m+1 ( + c ) m+1 + (b min(c + ɛ(b ), b) + c ) m+1 ) 5